This document describes a simple AI system consisting of a Large Language Model with Retrieval Augmented Generation (RAG) using an external SaaS for inference. This is being used as a vehicle for developing a governance framework for on-boarding GenAI technology, and should be considered an early stage draft document.
Distributed under the CC0 1.0 Universal.
Introduction
The rapid advancements in Artificial Intelligence (AI), particularly Generative AI, are set to revolutionize both business operations and personal lives. In the financial services sector, these innovations present immense opportunities that span product offerings, client interactions, employee productivity, and organizational operations. Few technologies have promised such a broad and transformative impact.
However, these advancements also bring significant challenges. Issues like hallucinations, prompt injections, and model unpredictability introduce unique complexities to safely integrate and deploy AI technologies. The pace of technological change means that today’s solutions may become obsolete tomorrow, necessitating a flexible yet robust governance approach.
Financial institutions (in particular) are eager to onboard, experiment with, and deploy AI technologies to stay both competitive and innovative. Yet, the risk landscape and regulatory environment of the financial services industry necessitates proper governance. Existing processes/frameworks may not be adequately equipped to address the novel challenges posed by AI, particularly Generative AI. Therefore, there is a critical need for an adaptive governance framework that promotes the safe, trustworthy, and compliant adoption of AI technologies.
Value Proposition
The AI Readiness Governance Framework aims to bridge the gap between the transformative potential of AI and the stringent requirements of the financial services industry. By providing a structured approach to identifying, assessing, and mitigating risks associated with AI systems, the framework empowers organizations to harness AI’s benefits while maintaining compliance with regulatory standards and safeguarding against operational risks.
- For Technical Teams, the AI Readiness Governance Framework can offer developers a clear set of guidelines and best practices for integrating AI technologies into existing systems. This can improve risk mitigation by identifying potential risks early in the development cycle, promoting robust and secure AI solutions.
- For CISOs and Risk Management teams, the AI Readiness Governance Framework can work to establish an early-stage ‘toolbox’ to evaluate the security implications of AI technologies. This helps to ensure that AI implementations align with organizational policies and regulatory requirements.
- For Heads of AI and CTO/CIO Offices, the AI Readiness Governance Framework provides benefit to aligning AI initiatives with the organization’s strategic objectives and technological roadmap. In turn, this can improve resource optimization and ROI with respect to investments in AI development/adoption.
- For Vendors and Service Providers, the AI Readiness Governance Framework can aide in understanding the governance expectations of financial institutions and build trust with clients by demonstrating adherence with an established framework.
Intended Audience
As briefly touched upon in the Value Proposition, this framework is designed for a broad range of stakeholders involved in the adoption and governance of AI technologies within the financial services industry. See the list below for a high-level outline of potential stakeholders.
- Financial Services CISOs and Risk Management Teams: Responsible for assessing and mitigating security risks, ensuring data confidentiality, integrity, and availability.
- Policy Control Offices: Tasked with developing and enforcing policies governing AI technology use in compliance with regulatory standards.
- CTO/CIO Offices (Architecture and Development Teams, Data/Model Acquisition and Management Teams): Overseeing the integration of AI systems into the organization’s technological infrastructure.
- Vendor Solution Teams (Third-Party Vendors): Aligning AI products and services with the governance standards required by financial institutions.
- Vendor Purchasing Teams: Evaluating and procuring AI technologies that meet governance and compliance criteria.
- Model Risk Management Teams: (May not be in direct scope but considered for future roadmap) Assessing risks associated with AI models and their deployment.
- Banking Legal Teams: Understanding the legal implications of deploying AI technologies and ensuring adherence to laws and regulations.
- Industry Regulators: Gaining insights into industry best practices for AI governance within financial services, facilitating informed regulatory oversight.
- Cloud Service Providers (CSPs) and Model Providers: Tailoring offerings to meet the stringent governance requirements of financial institutions.
- Other Open Source Providers: Including collaborative projects like Common Cloud Controls (CCC) and others, to collaborate and integrate governance best practices across platforms.
Initial scope - RAG-based Q&A
While the ultimate goal is to develop a comprehensive governance framework that can accommodate a wide variety of use cases, starting with a narrowly focused, well-defined initial scope offers several advantages. By selecting a common, high-impact use case that financial organizations frequently encounter, we can create immediate value and demonstrate the effectiveness of our approach. This smaller, more manageable scope will allow for a quicker implementation and iteration process, enabling us to refine our framework based on real-world experience. Additionally, focusing on a use case that can be open-sourced not only fosters collaboration but also ensures that the framework is adaptable and beneficial to a broader community. This initial effort will serve as a foundation for developing a robust governance structure that can be scaled and expanded to cover a wide range of applications in the future.
In Scope (for now):
- An architecture based on a Generative AI Large Language Model.
- System composed of a SaaS inference endpoint external to the organization.
- Usage of Retrieval Augmented Generation (RAG), to provide a knowledge base custom to the organization.
- Users are internal to the organization. They interact with the system through a UI and are responsible for how they use the provided information.
Out of Scope (for now):
- Pre-trained model (open source or not) that the organization deploys on its own infrastructure.
- Fine tunning of a model (be it open source, or SaaS).
- AI agents collaborating.
- User interacting with the model is external to the organization.
- User is not first responsible for action, but the AI model, being it completely independent, or having light supervision.
- Most safety/bias consideration are out of scope for the first version of the governance framework, but will be considered later as it is one of the objectives for the group in general.
- Small models that are very specialized, and are updated continuosly depending on the data being feeded (example models for organizing cache in a bigger system).
Assumptions:
- We can rely upon the operational controls in place in the organisaton already. Financial industry standard existing governance is assumed for the infrastructure, supply chain, and application code.
- The system has an auditor, and its result/actions has potential legal implications.
- Users of this system have responsibility for their actions.
- Mitigation steps for ethical and responsibility concerns are the remit of the application developer and its data’s classifications.
- The system’s metadata/metrics/logs to audit the accuracy and bias of this system are known and have an existing process.
- Carbon outputs and system cost are intertwined, but these are the responsibility of app/system developers.
Metadata
Each Threat or Control has a status which indicates its current level of maturity:
Status | Description |
---|---|
Pre-Draft | Initial brainstorming or collection of ideas, notes, and outlines. The document is not yet formally written. |
Draft | The document is in the early stages of development, with content being written and structured but not yet polished. |
Review | The document is being reviewed by others for feedback. It may undergo multiple iterations at this stage. |
Approved | The document is in its final form and has been approved by the AI Readiness team. |
Contents
Threats
ID | Status | Title |
---|---|---|
TR-1 | Draft | Information Leaked to Hosted Model |
TR-2 | Draft | Insufficient access control with vector store |
TR-3 | Pre-Draft | Lack of source data access controls |
TR-4 | Draft | Hallucination |
TR-5 | Draft | Instability in foundation model behaviour |
TR-6 | Draft | Non-deterministic behaviour |
TR-7 | Draft | Availability of foundational model |
TR-8 | Draft | Tampering with the foundational model |
TR-9 | Draft | Tampering with the vector store |
TR-10 | Draft | Prompt injection |
TR-11 | Draft | Lack of foundation model versioning |
TR-12 | Draft | Ineffective storage and encryption |
TR-13 | Draft | Testing and monitoring |
TR-14 | Draft | Inadequate system alignment |
Controls
ID | Status | Title |
---|---|---|
CT-1 | Draft | Data Leakage Prevention and Detection |
CT-2 | Draft | Data filtering from Confluence into the samples |
CT-3 | Draft | User/app/model firewalling/filtering |
CT-4 | Draft | System observability |
CT-5 | Draft | System acceptance testing |
CT-6 | Pre-Draft | Data quality & classification/sensitivity |
CT-7 | Pre-Draft | Legal/contractual agreements |
CT-8 | Pre-Draft | QoS/Firewall/DDoS prev |
CT-9 | Pre-Draft | Alerting / DoW spend alert |
CT-10 | Draft | Version (pinning) of the foundational model |
CT-11 | Pre-Draft | Human feedback loop |
CT-12 | Pre-Draft | Role-based data access |
CT-13 | Pre-Draft | Provide citations |
CT-14 | Draft | Encrypt data at rest |
CT-15 | Draft | LLM-as-a-Judge |
CT-16 | Draft | Preserving access controls in the ingested data |
Threats
TR-1 - Information Leaked to Hosted Model
- Document Status
- Draft
- Threat Type
- Confidentiality
In the provided system architecture, sensitive data is transmitted to a SaaS-based Generative AI platform for inference, posing a risk of information leakage. Sensitive organizational data, proprietary algorithms, and confidential information may be unintentionally exposed due to inadequate control measures within the hosted model. This can occur through several mechanisms unique to Large Language Models (LLMs), as outlined in OWASP’s LLM06, such as overfitting, memorization, and prompt-based attacks.
LLMs can retain data from training processes or user interactions, potentially recalling sensitive information during unrelated sessions, a phenomenon known as “memorization” When data such as Personally Identifiable Information (PII) or proprietary financial strategies enter the model, the risk of inadvertent disclosure rises, particularly when insufficient data sanitization or filtering mechanisms are in place. Additionally, adversarial actors could exploit prompt injection attacks to manipulate the model into revealing sensitive data.
Furthermore, data retention policies or model fine-tuning can exacerbate these risks. When fine-tuning is done on proprietary data without strict access control, sensitive information may inadvertently be disclosed to lower-privileged users, violating principles of least privilege. Without clear Terms of Use, data sanitization, and input validation, the organization loses visibility into how sensitive information is processed by the LLM and where it may be disclosed.
It is, however, important to understand distinct risk vectors between commercial/enterprise-grade and free hosted LLMs. For instance, commercial LLMs like ChatGPT offer a “Memory” setting to manage what the system is allowed to memorize from your conversations and Data controls to restrict what can be used to train their models. Additionally, enterprise-grade LLMs will usually sanitize sensitive data when used in organizational environments and often include stringent terms of use related to the handling of your data inputs and outputs that must first be accepted before interacting with the model. Free hosted LLMs, on the other hand, may use your data to train their models without you explicitly knowing that it is happening. Thus, you must always exercise due diligence when interacting with hosted LLM services to better understand how your input and output data is being used behind the scenes.
Key Risks
- Two-Way Trust Boundary: The client-to-LLM interaction introduces a two-way trust boundary where neither input nor output can be fully trusted. This makes it critical to assume the output could leak sensitive information unintentionally, even when the input appears benign.
- Model Overfitting and Memorization: LLMs may retain sensitive data introduced during training, leading to unintentional data leakage in future interactions. This includes potential cross-user leakage, where one user’s sensitive data might be disclosed to another.
- External Inference Endpoint Risks: Hosted models may not provide transparent mechanisms for how input data is processed, retained, or sanitized, increasing the risk of persistent exposure of proprietary data.
This risk is aligned with OWASP’s LLM06: Sensitive Information Disclosure, which highlights the dangers of exposing proprietary or personally identifiable information (PII) through large-scale, externally hosted AI systems.
Links
- https://ithandbook.ffiec.gov/
- https://www.ffiec.gov/%5C/press/PDF/FFIEC_Appendix_J.pdf
- Scalable Extraction of Training Data from (Production) Language Models
- https://arxiv.org/abs/2311.17035
TR-2 - Insufficient access control with vector store
- Document Status
- Draft
- Threat Type
- Confidentiality
Vector stores are specialized databases designed to store and manage ‘vector embeddings’—dense numerical representations of data such as text, images, or other complex data types. According to OpenAI, “An embedding is a vector (list) of floating point numbers. The distance between two vectors measures their relatedness. Small distances suggest high relatedness and large distances suggest low relatedness.” These embeddings capture the semantic meaning of the input data, enabling advanced operations like semantic search, similarity comparisons, and clustering.
In the context of Retrieval-Augmented Generation (RAG) models, vector stores play a critical role. When a user query is received, it’s converted into an embedding, and the vector store is queried to find the most semantically similar embeddings, which correspond to relevant pieces of data or documents. These retrieved data are then used to generate responses using Large Language Models (LLMs).
Threat Description
In the described system architecture, where a LLM employing RAG relies on a vector store to retrieve relevant organizational knowledge (e.g., from Confluence), the immaturity of current vector store technologies poses significant confidentiality and integrity risks. Vector stores, which hold embeddings of sensitive internal data, may lack enterprise-grade security controls such as robust access control mechanisms, encryption at rest, and audit logging. Misconfigurations or incomplete implementations can lead to unauthorized access to sensitive embeddings, enabling data tampering, theft, or unintentional disclosure.
While embeddings are not directly interpretable by humans, recent research has demonstrated that embeddings can reveal substantial information about the original data. For instance, embedding inversion attacks can reconstruct sensitive information from embeddings, potentially exposing proprietary or personally identifiable information (PII). The paper “Text Embeddings Reveal (Almost) as Much as Text” illustrates this very point, discussing how embeddings can be used to recover the content of the original text with high fidelity. If you are interested in learning more about how an embedding inversion attack works in practice, check-out the corresponding GitHub repository related to the above paper.
Moreover, embeddings can be subject to membership inference attacks, where an adversary determines whether a particular piece of data is included in the embedding store. This is particularly problematic in sensitive domains where the mere presence of certain information (e.g., confidential business transactions or properitary data) is sensitive. For example, if embeddings are created over a document repository for investment bankers, an adversary could generate various embeddings corresponding to speculative or confidential scenarios like “Company A to acquire Company B.” By probing the vector store to see how many documents are similar to that embedding, they could infer whether such a transaction is being discussed internally, effectively uncovering confidential corporate activities.
As related to insufficient access control, one of the primary threats involves data poisoning, where an attacker with access to the vector store injects malicious or misleading embeddings into the system (see: PoisonedRag for a related example). Compromised embeddings could degrade the quality or accuracy of the LLM’s responses, leading to integrity issues that are difficult to detect. Since embeddings are dense numerical representations, spotting malicious alterations is not as straightforward as with traditional data.
Given the nascent nature of vector store products, they may not adhere to enterprise security standards, leaving gaps that could be exploited by malicious actors or internal users. For example:
- Misconfigured Access Controls: Lack of role-based access control (RBAC) or overly permissive settings may allow unauthorized internal or external users to retrieve sensitive embeddings, bypassing intended security measures.
- Encryption Failures: Without encryption at rest, embeddings that contain sensitive or proprietary information may be exposed to anyone with access to the storage layer, leading to data breaches or tampering.
- Audit Deficiencies: The absence of robust audit logging makes it difficult to detect unauthorized access, modifications, or data exfiltration, allowing breaches to go unnoticed for extended periods.
This risk aligns with OWASP’s LLM06: Sensitive Information Disclosure, which highlights the dangers of exposing proprietary or PII through large-scale, externally hosted AI systems.
TR-3 - Lack of source data access controls
- Document Status
- Pre-Draft
- Threat Type
- Confidentiality
The system allows access to corporate data sources (e.g. Confluence, employee databases), however given these sources may have different access control policies, there is the threat that when accessed via the application a user can access data they are not authorized to see at the source, due to the architecture not honoring source access controls.
The threat posed here is the loss of data access controls in building a traditional RAG application. The access control restrictions in place in Confluence, for example, will not be replicated in the Vector database or at the model level.
Examples of important things to consider:
- loss of access control data when data is ingested
- limitations of system prompt designed to obey corporate access controls
TR-4 - Hallucination
- Document Status
- Draft
- Threat Type
- Integrity
LLM hallucinations refer to instances when a large language model (LLM) generates incorrect or nonsensical information that seems plausible but is not based on factual data or reality. These “hallucinations” occur because the model generates text based on patterns in its training data rather than true understanding or access to current, verified information.
The likelihood of hallucination can be minimised by using RAG techniques, providing the LLM with facts provided directly via the prompt. However, the response provided by the model is a synthesis of the information within the input prompt and information retained within the model. There is no reliable way to ensure the response is restricted to the facts provided via the prompt, and as such, RAG-based applications still hallucinate.
There is currently no reliable method for removing hallucinations, with this being an active area of research.
Links
- [2305.14292] WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia - “WikiChat achieves 97.9% factual accuracy in conversations with human users about recent topics, 55.0% better than GPT-4, while receiving significantly higher user ratings and more favorable comments.”
- [2401.11817] Hallucination is Inevitable: An Innate Limitation of Large Language Models
Severity
The severity of this threat depends on the specific use case. The failure scenario involves either an employee or a client making a decision based on erroneous information provided by the system. With some use cases, the potential impact of ill-informed decisions may be minimal, whereas in other cases, the impact could be significant. Regardless, this failure mode needs consideration.
TR-5 - Instability in foundation model behaviour
- Document Status
- Draft
- Threat Type
- Integrity
Instability in foundation model behaviour would manifest itself as deviations in the output (i.e during inferencing), when supplied with the same prompt.
If you completely rely on the model provider for its evaluation, for example because it’s of a kind that changes very quickly, like a code generation model (that can change several times a day), you have to acknowledge that you are putting all the responsibility on the outcome of the architecture on that provider, and you trust what it does whenever the model version is changed.
If you need to implement strict tests on your whole architecture, the foundation model behaviour may change over time if the third-party doesn’t have rigorous version control (covered in Threat 11), you don’t pin the version you are using, or they silently change model versions without notifying the consumer. This can lead to instability in the model’s behaviour, which may impact on the client’s business operations and actions taken upon model output.
The provider may change the model without explicit customer knowledge. This could lead to unexpected response from the model that it has not been tested for by the corporation. The provider might also provide a mechanism to pin the model version. In both cases non-determinism can lead to instability in model behaviour.
Another mechanism to induce instability is around perturbations. There is recent research into using prompt perturbations to attack grounding and hence weaken system defences against malicious attacks.
Links
- https://www.arxiv.org/abs/2408.14595
- https://www.bbc.co.uk/news/technology-68025677
TR-6 - Non-deterministic behaviour
- Document Status
- Draft
- Threat Type
- Integrity
A fundamental property of LLMs is the non-determinism of their response. This is because LLMs generate responses by predicting the probability of the next word or token in a given context, meaning different prompts equating to the same request can produce different responses from the models. This method also means that LLMs can tend towards winding or unintelligible outputs when the outputs being produced are larger. LLMs also make use of sampling methods like top-k sampling during text generation or have an internal state or seed mechanisms which can cause distinct responses from the same prompt used in different requests.This can make it difficult or impossible to reproduce results, and may result in different (and potentially incorrect) results being returned occasionally and in a hard to predict manner.
One danger of this is as users get differing responses when they use the system at different times or get different answers than a colleague asking the same question leading to the confusion in the user and the overall trust in the system may degrade until users no longer want to use the tool at all.
This behaviour can also increase the difficulty of evaluating your system as you may be unable to reproduce what you thought was a bug, or consistently produce evaluation metrics to see if you are improving or degrading the system as you make changes.
Severity
This is not a hugely severe risk in the context of this RAG chatbot as it is presenting information that the user can access elsewhere, and may already have a passing familiarity with which should help with their understanding of the system’s outputs.
However even though the non-determinism risk of LLMs is reduced in RAG systems, there is a risk of not providing the same response or information source to similar queries due to the internal data source having intersecting information about the wanted topic and the system may produce different results with different citations.
Links
The Non-determinism of ChatGPT in Code Generation
TR-7 - Availability of foundational model
- Document Status
- Draft
- Threat Type
- Availability
RAG systems are proliferating due to the low barrier of entry compared to traditional chatbot technologies. RAG has applicability for internal users as well as supporting customer-related use cases.
Many LLMs require GPU compute infrastructure to provide an acceptable level of responsiveness. Furthermore, some of the best-performing LLMs are proprietary. Therefore, the path of least resistance is to call out to a Technology Service Provider (TSP).
Key Risks
- Denial of Wallet (DoW): Over-usage that results in slowdowns,
unexpected costs, or outages. Examples include:
- Unexpectedly long prompts due to large chunk sizes or many
chunks packed into the prompt.
- This could be exacerbated by systems that work with multimedia content.
- LLM attacks to exfiltrate training data may also generate the maximum number of tokens.
- Scripts which repeatedly hit API endpoints without built-in
throttling.
- This may result from Strats searching for new sources of “signal”.
- Agentic systems which may generate additional LLM / search calls that were not taken into account in the initial design / capacity planning.
- Unexpectedly long prompts due to large chunk sizes or many
chunks packed into the prompt.
- TSP Outages:
- Some LLM providers might not have sufficient technical maturity to provide uptime to meet target SLA.
- Inability to fail over to an equivalent LLM back end due to proprietary model / lack of capacity at other TSPs. (FFIEC Appendix J).
- VRAM Exhaustion:
- VRAM utilization can increase for a variety of reasons:
- new release of LLM library has a memory leak
- new caching techniques to improve throughput at the cost of VRAM
- setting a longer context length
- etc.
- VRAM utilization can increase for a variety of reasons:
Links
- https://www.prompt.security/blog/denial-of-wallet-on-genai-apps-ddow
- https://ithandbook.ffiec.gov/
- https://www.ffiec.gov/%5C/press/PDF/FFIEC_Appendix_J.pdf
TR-8 - Tampering with the foundational model
- Document Status
- Draft
- Threat Type
- Integrity, Confidentiality, Availability
The SaaS-based LLM provider is a 3rd party supplier and as such is subject to all typical supply chain, insider, and software integrity threats. Supply chain attacks on infrastructure and OSS are covered in other frameworks, however supply chains for foundational models, training data, model updates, and model retraining are all potential targets for attackers. The underlying firmware of GPUs, the operating system, and the machine learning libraries should be considered as part of the supply chain too.
Whilst fine tuning is out of scope of this framework; it is worth noting that adversarial attacks can be built upon model weights for open source models. There could be a possibility of malicious actors using this information to induce unsafe behaviour in the model.
Similarly, back doors engineered into the model can be triggered through malicious means. This could result in unsafe behaviour or such tampering could result in removing safe guards around the model.
Severity
Given that the LLM is a hosted SaaS the expectation is that the API vendor would have gone through extensive 3rd party checks before being on-boarded into the financial institution. Therefore, they would have demonstrated adequate quality control measures in their supply chain. So this is a low-risk threat.
TR-9 - Tampering with the vector store
- Document Status
- Draft
- Threat Type
- Integrity, Confidentiality
A malicious actor may tamper with the vectors stored in the client’s vector store. This data is made available to application users, resulting in unauthorized data access or proliferation of falsities. There is a possibility of tampering with the client vector store. This could be happen during ingesting Confluence by leveraging a known back door in the ingest pipeline by a malicious actor. An adversary could use this back door to poison the vector store, or use it in other innovative ways to introduce poison data into the vector store.
Severity
Low-risk as requires access to clients vector store. This is assumed to be adequately protected as any SaaS would be required to demonstrate as a 3rd party vendor to a financial institution.
TR-10 - Prompt injection
- Document Status
- Draft
- Threat Type
- Integrity, Confidentiality
Users of the application or malitious internal agents can craft prompts that are sent to the SaaS-based LLM and potentially cause damaging responses. This is one of the most popular attack vectors as privilage requirements for this attach vector are the lowest [^1][^2]. Unlike SQL injection the scope of attack in prompt injection is wider and can result in incorrect answers, toxic responses, information disclosure, denial of service, unethical and biased responses. A good example of an incident in public is the DPD chatbot [^3].
There are two popular approachs of Direct Prompt Injection (also known as “Jailbreaking”) where a user tries to escape the underlying behaviour of the system to get access to data store or inappropriate responses putting the reputation of the company at risk. Indirect Prompt Injection seeks to hijack other user’s sessions and direct them or other systems (if part of a component) to actions or impact critical decision making that might result in other attack vectors including privilage esculation. Please note that this risk exists even without an active prompt injection attack as badly designed systems hallucinate directing users to actions that can be harmful to the organization. This is particlally invisible if the prompt is part of a component of the decision making system and not directly interacting with users where there is no human in the loop.
Additionally attacks can profile systems due to access to internal data and inner working with a model inversion to reveal the model’s internal workings, which can be used to steal the model’s training or RAG data.
Severity
As per the internal user RAG use case, access to the system should only be within the boundary of the organization and Direct Prompt Injections have low-medium risk as exposures can be controlled, however in the case of the Indirect Prompt Injection based where the system can be medium to high depending on the critical of the subject domain being queried.
References
- [^1] OWASP Top 10 for LLM Applications - https://owasp.org/www-project-top-10-for-large-language-model-applications/assets/PDF/OWASP-Top-10-for-LLMs-2023-v1_1.pdf
- [^2] Mitre Prompt Injection - https://attack.mitre.org/techniques/T1055/
- [^3] DPD Chatbot Swears at Customer - BBC - https://www.bbc.co.uk/news/technology-68025677
TR-11 - Lack of foundation model versioning
- Document Status
- Draft
- Threat Type
- Integrity
Inadequate or unpublished API versioning and/or model version control may result in response instability, due to changes in the foundation model and the client’s absence of opportunity to benchmark and test the new model.
This can lead to instability in the model’s behaviour, which may impact on the client’s business operations.
Challenges with versioning
There are unique challenges to versioning large language models (LLMs) and consequently, the APIs that provide LLM capabilities. Unlike traditional software, where versioning is tracking changes in source code, version LLMs must account for a range of factors such as model behavior, training data, architecture, and computational requirements. Some key challenges include:
- Model size and complexity:
- Models are incredibly large, comprising of a large number of parameters. Managing and tracking changes and their impacts across such massive models can be very complex. It is also challenging to quantify or summarize changes in a meaningful way.
- Dynamic nature of LLMs
- Some LLMs are designed to learn and adapt over time, while others are updated through fine-tuning and customizations. This makes it difficult to keep track of changes or define discrete versions, as the model is constantly being updated.
- Non-deterministic behavior:
- LLMs can produce different outputs for the same input due to factors like temperature settings, making it difficult to define a “new version”.
- Multidimensional changes:
- Updates to LLMs might involve changes to the model architecture, training data, fine-tuning process, and inference parameters. Capturing all these dimensions in a version number or identifier is challenging.
- Changes to LLMs can range from minor tweaks (e.g., adjusting hyperparaemters) to significant changes (e.g., retraining with new data), making it challenging to define the proper granularity of versioning.
- Training data versioning:
- LLMs are trained on massive amounts of data, making it difficult to track and manage changes in the training corpus.
- Resource management:
- Running multiple versions of LLMs simultaneously can strain computational resources and challenge infrastructure.
- Lack of standardization:
- There are no widely accepted standard for versioning LLMs, leading to inconsistent practices across different organizations.
Problems caused by inadequate versioning:
- Inconsistent output
- LLM may product different responses to the same prompt, leading to inconsistent user experiences or decision-making
- Reproducibility/Traceability
- Inability to replicate or trace past outputs, which may be required in some business context or during testing and debugging
- Performance variability:
- Unexpected changes in model performance, even introducing regression in some areas (e.g., more bias)
- Assessing improvements or regression becomes challenging
- Compliance and auditing:
- Inability to track and explain model changes can lead to compliance problems and difficulty in auditing decisions
- Integration and compatibility/backward compatibility:
- Other systems or APIs may depend on specific behaviors of an LLM
- Testing and quality assurance:
- Difficulty in identifying root cause of errors or bugs
- Inability to replicate issues or isolate model changes that are causing issues
- Security and privacy:
- Difficult to track security vulnerabilities or privacy issues
- New security or privacy issues may be introduced
Severity
The severity of the risk will depend on the LLM vendor’s versioning practices (e.g., if a new version is released, are users still using the previous version by default or are they automatically updated). If users are automatically redirected to the new version, then the severity will be higher, and it will then depend on the magnitude of changes made to the existing LLM, and how different the new version’s behavior is.
TR-12 - Ineffective storage and encryption
- Document Status
- Draft
- Threat Type
Lacking sufficient data storage security leaves your project vulnerable to outside actors gaining access to the data used for your model potentially exposing your data to be taken and used or tampered with to corrupt and misuse your model. Many regulations covering within industry also have requirements for the secure storage of data, especially in cases where it can be used to identify people, meaning improper security can put you at risk of regulatory and legal issues alongside the risk of losing your users trust in the system you have developed.
Severity
The severity of this risk depends largely on the data you are storing, if the RAG system is simply creating a more usable and accessible access point to information that is already public there is no real risk associated with the data being accessed by an outside party. However if you have information that you want to restrict and control access to, it is important to both have strong access controls and to properly secure the data at all points in the system.
TR-13 - Testing and monitoring
- Document Status
- Draft
- Threat Type
- Integrity
The evaluation and monitoring of LLM systems is integral to their correct utilisation, while also being difficult to do effectively and efficiently. Effective evaluation and monitoring can help ensure the accuracy and reliability of LLM-generated outputs while improving the trustworthiness of the model. Along with helping to stay on top of potential issues, such as inaccurate retrieval of data, misunderstanding of queries, or loss of efficiency as the system scales.
Evaluation of an LLM is the assessment of the accuracy, reliability and precision of the outputs of a model accumulating into the effectiveness of the system produced. RAG models are often assessed with the use of ‘ground truth’ which is the use of existing answers to queries used as a standard of comparison for the output of the model to determine the level of correctness and similarity.
Without proper evaluation in place you will not be able to understand whether your model is effective or not, or whether changes made in hopes of improving model performance have had a positive or negative effect. Without continuous evaluation in place it is possible that during the lifetime of a system as the input data used for the model gradually changes the effectiveness of the model fluctuates, progressively deteriorating until your model is either unhelpful at reaching your goals. With the correct evaluation in place it would be simpler to find when this is occurring allowing you to address problems faster and more effectively.
Severity
This risk has a medium severity, as it being an issue is primarily downstream from something else in your system going wrong, but good monitoring and evaluation will allow you to catch the issues faster and to make the appropriate changes to the system and is therefore essential in any RAG system architecture.
TR-14 - Inadequate system alignment
- Document Status
- Draft
- Threat Type
- Integrity
Alignment
There is a specific goal you want to achieve when using an AI system in a successful way. It may be an overarching project goal, o a specific requirement for single queries to the AI system.
This risk describes when the AI system behaves in a way that doesn’t align with the intended goal.
What can go wrong
In the more basic instance, it already has been described when we talked about hallucinations that the model may not be giving useful output to queries. But even when the model does seem to be performing well in the short term for individual queries, it may be putting the emphasis on a specific topic that, when using the AI model on scale, will in turn have undesirable consequences.
For example, an AI with a goal to maximize a company’s profit could suggest exploiting regulatory loopholes or ignoring the social responsibility for the impact of implementing its solutions on population. In another example, an AI tasked with selecting candidates for jobs positions may be choosing people that perform quite well in the roles, but may be biased against specific kind of population in an unfair way. You could even think that making some processes completely AI automated could pose a risk, for removing completely responsibility about something from humans may end up making nobody knowing anything at all about the automated task, having no accountability or responsibility when it misbehaves.
Also an AI system that is aligned at first may become misaligned in future situations given its non-deterministic behavior, when new versions of the model are deployed, or it uses different contextual information (system prompt, RAG database, etc).
In general we can summarize that the AI system may optimize for a goal in a way that causes unintended or harmful side effects, not only for its immediate goals, but for society in general and the long term.
Responsible AI
The concept of responsible AI defines the practice of developing and deploying AI systems in a way that we make sure they are aligned with human values, ensure safety, fairness, and accountability while minimizing risks and unintended consequences.
Links
- Other threats
- Controls
- AI vendor providers
- Research
Controls
CT-1 - Data Leakage Prevention and Detection
- Document Status
- Draft
- Control Type
- Detective
- Mitigates
Preventing Leakage of Session Data
The use of Third-Party Service Providers (TSPs) for LLM-powered services and raw endpoints will be attractive for a variety of reasons:
-
The best-peforming LLMs may be proprietary with model weights unavailable for local deployment.
-
GPU compute to power LLMs can require upgrades in data center power and cooling with long lead times.
-
GPUs are expensive and may be limited in supply. TSPs may be required for “burst capacity”.
Given that leakage of session data may have adverse impact to clients and risk fines from regulators, preventing leakage thereof is a priority. Traditional risks and mitigations include:
-
Network Taps / Man-In-The-Middle Attacks
-
Ensure that communication with the TSP is encrypted following industry best practices.
-
Prefer architectures where the LLM provider hosts their service in your cloud tenant to avoid transmitting data outside your system boundaries.
-
-
Filesystem
- Ensure that the LLM provider defaults to “zero persistence” of logs, session data, and core dumps except where agreed upon in writing.
-
Improper Disposal of Storage Medium
- Ensure that the vendor follows Data Lifecycle Management best practices that are compatible with that of your organization.
-
Multi-Tenant Architecture
- Have your SecDesign team review the TSPs system architecture to ensure that no data can be leaked to other sessions / users.
LLM-specific attacks:
-
Memorization
- Verify that your legal agreement with the LLM provider explicitly states that no API inputs/outputs will be used for model training.
-
LLM Caching
- Given the high cost of GPUs, the TSP may be incentivized to reduce redundant computation through caching of activations / prefixes. Ensure that the TSP provides information about any such optimizations for your ML team to review.
Preventing Leakage of Training Data
If you have fine-tuned a model with proprietary data, there is the potential for it to be extracted as described in:
Have your ML team ensure that there are guardrails in place to detect attempts at training data extraction through prompting.
Detecting Leakage of Session Data
To address the potential for data leakage when proprietary information is processed by an external LLM (typically a SaaS-based solution), a detective control using data leakage canaries.
This technique aims to monitor and detect unauthorized disclosure of sensitive information by embedding uniquely identifiable markers (canaries) into the data streams or queries sent to the hosted model.
Key components of this control include:
-
Canary Data Injection: Canary data consists of artificial or uniquely identifiable tokens embedded within the proprietary data sent to the hosted model. These tokens do not have legitimate business value but are crafted to detect leaks when exposed. For instance, queries containing unique strings can be planted during interactions with the LLM, and if these tokens appear in unauthorized contexts (e.g., responses from the model, external logs, or internet forums), it would indicate a potential breach.
-
Data Fingerprinting: Proprietary data can be fingerprinted using cryptographic hashing techniques to create unique signatures. By monitoring the hosted service or conducting external reconnaissance, organizations can detect if these fingerprints appear in unauthorized locations, signaling a data breach or leakage.
-
Plugin Architecture Integration: Implementing canary and fingerprint detection mechanisms into the plugin architecture of the SaaS-based LLM ensures that data leakage detection is embedded into the system at multiple touchpoints, providing continuous monitoring across interactions.
-
Detection and Response Workflow: Once a canary token is detected in an unauthorized environment, the system triggers an immediate alert, initiating the incident response process. This includes identifying the source of the breach, determining the extent of the leakage, and implementing remedial actions, such as ceasing further interaction with the compromised model.
This control is particularly effective in detecting unauthorized use or exposure of internal proprietary information processed by external systems. By embedding canaries and fingerprints, organizations can maintain a degree of oversight and control, even when data is processed beyond their direct infrastructure.
Detecting Leakage of Model Weights
In situations where detecting leakage of model weights is necessary, there are approaches to “fingerprinting” that are being explored in the research community. On example based on knowledge-injection is:
CT-2 - Data filtering from Confluence into the samples
- Document Status
- Draft
- Control Type
- Preventative
- Mitigates
To mitigate the risk of sensitive data leakage and tampering in the vector store, the data filtering control ensures that sensitive information from internal knowledge sources, such as Confluence, is anonymized and/or entirely excluded before being processed by the model. This control aims to limit the exposure of sensitive organizational knowledge when creating embeddings that feed into the vector store, thus reducing the likelihood of confidential information being accessible or manipulated.
- Anonymization and Data Sanitization: Data extracted from Confluence or other internal knowledge repositories is filtered through anonymization and sanitization processes before it is converted into embeddings for the vector store. This prevents direct or identifiable sensitive data (e.g., PII, proprietary algorithms) from entering the model or vector store in the first place. Techniques could include, but are not limited to:
- Masking personally identifiable information (PII).
- Redacting sensitive financial or operational data.
- Scrubbing proprietary corporate knowledge that may be exposed to unauthorized users or malicious actors.
- Segregation of Sensitive Models: For data that cannot be adequately sanitized, a separate model and vector store instance can be used to enforce stronger access controls. For example, highly sensitive data might be stored in a restricted vector store with more granular RBAC and encryption measures, ensuring only authorized users can query the sensitive data.
- Response Filtering Based on Confluence Heuristics: The model’s responses are also filtered against the same heuristics used to sanitize the data at the ingestion point. This ensures that sensitive data that may have bypassed sanitization is detected and scrubbed before being returned to the user. Response filtering acts as a secondary layer of protection, preventing sensitive or proprietary data from leaking through model outputs.
This control emphasizes a preventive focus on the entry of sensitive information into the vector store, which minimizes the attack surface and significantly reduces the risk of unauthorized access or poisoning. This is especially important given the immaturity of vector store technologies.
CT-3 - User/app/model firewalling/filtering
As in any information system component, you can monitor and filter interactions between the model, inputs from the user, queries to RAG databases or other sources of information, and outputs.
A simple analogy is a Web Access Firewall detecting URLs trying to exploit known vulnerabilities in specific server versions, and detecting and filtering from the output malicious Javascript code embedded in the returned web page.
Not only user input and LLM output should be considered. To populate the RAG database with custom information, it has to use the same tokenizer that follows the exact same embeddings format (converting words to vectors) that the LLM uses. This means that for an LLM in a SaaS where the tokenizer is available as an endpoint, you have to send the full information to the embeddings endpoint to convert it to vectors, and store the returned data into your RAG database.
Things to monitor/filter
Monitoring This would make possible to potentially detect and block undesired behavior, like:
- RAG data ingestion
- Before you send the full custom information to the embeddings endpoint of a SaaS LLM, you should carefully filter any potential private information that shouldn’t be disclosed.
- User input
- Detecting and blocking abuse from the user to the LLM, like prompt Injection
- Detecting disclosing potentially private information to an LLM hosted externally to the organization as SaaS, and filtering it out or anonymizing.
- LLM output
- Detecting an answer that is too long, maybe because a user has tricked the LLM to do so, to cause a [denial of service] or make the LLM behave erratically, potentially disclosing private information.
- Detecting if the answer is not in the right format, for example if you always expect a formatted JSON output, or when it’s in a different language than the LLM use case, a known attack technique.
- Detecting the LLM is giving known answers for when it is resisting abuse, that may not have been blocked on input, and could add up to constitute an attack in progress looking for vulnerabilities in guardrails.
- Detecting private information being disclosed, coming from the RAG database, or the data used to pre-train the model.
- Detecting and blocking possible foul language that the LLM has been forced to use by the user, avoiding reputational loss for the organization.
- Repopulating in the answer an anonymized question, for example if the user included his email, it was first replaced by a generic one, then repopulated after the LLM answered.
A tool like this could combine with detecting queries or responses that go beyond certain size, as described in CT-8 - QoS/Firewall/DDoS prevention, which could be part of a denial of wallet attack, or trying to destabilize the LLM so it provides possible private information contained in the training data.
Ideally if possible, not only user and LLM but all interactions in components for the AI system should be monitored, logged and include safety block mechanism. When deciding where to focus, when information crosses boundaries is when filtering should happen.
Challenges
RAG database
It’s more practical to pre-process the data for RAG previously to send it to the embeddings endpoint, which can run more complex filtering for a long time. But you could add some in-line filters.
When data is stored as embeddings, a vector search is done converting the user query as vectors using the embeddings endpoint that may be external to you. That means the information in the vector database is more or less opaque, and you can’t easily analyze it, add filters in or out of it, or specify different access levels for the information stored.
Filtering
Applying static filters is easy for known patterns computer-related, as emails or domains, and well known terms like company names. Picking up more generic things, as names of people that may be common knowledge, or private information, is not. Same happens for detecting prompt injections, which by its very nature are designed to bypass security, or foul language. That is why a popular technique is using “LLM as a judge”, that we describe later in this document.
Streaming
When users send interactive requests to an LLM, streaming allows them to progressively receive the result word by word. This is a great improvement to usability, as users can see progress, read the answer while it’s being generated, and act before it’s complete, just taking the first part if it’s all they need, or canceling the generation if it’s not in the right direction.
But when any kind of filtering has to be applied, you need to disconnect streaming as you need the full answer to process it, or risk exposing the information to the user before you know it’s safe. For a complex slow system that can take long to answer that has a huge impact on usability.
An alternative approach can be to start providing a streamed answer to the user while detection is done on the fly, and revert to cancel the answer and remove all shown text when a problem is detected. But the risk of exposing the wrong information to the user has to be weighted depending on the criticality of the information and the final user that receives it.
Remediation techniques
As mentioned, you could just implement static checks for blocklists and regular expressions, that would just detect the most simple situations.
Adding a system prompt with instructions of what to block is not recommended, as it’s easy to not only bypass it, but expose the prompt and see exactly what is the logic for things that the system shouldn’t show.
A common technique is LLM as a judge, when a secondary LLM analyzes the query and the response, trained not to give proper answers to users, but on categorization of different situations (prompt injection, abuse, foul language). You could use this also as a SaaS product, or a local instance that would have a non-trivial computational cost, as for each query (no matter if the main LLM is SaaS), you would also run an LLM evaluation.
For private information disclosure, when in a critical situation you may want to train your own LLM Judge to be able to categorize that information in a bespoke way to your organization.
Having humans provide feedback voluntarily on responses when on production, where there is an easy way to warn if the system is failing into any of the behaviors described, is a complementary control that would allow verifying these guardrails are working as expected.
Additional considerations
A full API monitoring solution will put in place observability and security benefits not only for AI security, but general system security. For example, a security proxy setup could also ensure all communications between different components are encrypted using TLS.
Logging would allow not only to understand better how users and the system are behaving, but also detect situations that can only be understood when looking at data at a statistical level, for example a coordinated denial of service on the AI system.
Links
- Tooling
- LLM Guard: Open source LLM filter for sanitization, detection of harmful language, prevention of data leakage, and resistance against prompt injection attacks.
- deberta-v3-base-prompt-injection-v2: Open source LLM model, a fine-tuned version of microsoft/deberta-v3-base specifically developed to detect and classify prompt injection attacks which can manipulate language models into producing unintended outputs.
- ShieldLM: open source bilingual (Chinese and English) safety detector that mainly aims to help to detect safety issues in LLMs’ generations.
CT-4 - System observability
- Document Status
- Draft
- Control Type
- Detective
- Mitigates
What to log/monitor
When talking about observability, these are the main things to log and monitor:
- Inputs from the user
- Output from the model
- APIs calls
- Information crossing trust boundaries
It is always good to log and monitor all, or as much as possible. But when there are limitations because of the bandwidth of the ingested data, consider at least the previous sources. Every application using the models should be included.
Consider employing a solution for horizontal monitoring, across several inputs/outputs at the same time for the whole architecture.
Why
The following reasons explain why we want to log and monitor, and what threats we tackle or how we benefit from it:
- Anomaly detection, something that other controls are not explicitly looking into.
- Audit and compliance. Although we are not aware of current regulation that requires it, it could become mandatory in the future.
-
Version control drift from changes, including model, system architecture, and data. Detecting performance degradation that can affect the stability and safety of the whole system.
-
Set clear SLA for availability and performance metrics, specially when multiple tenants are involved. This and cost monitoring is related to security because of denial of wallet, for that we look into API calls, CPU-lock/scale out, front-end DOS. Once we monitor it, we can rate limit or throttle when reaching limits (Finops), and alerting as described in CT-9 - Alerting / DoW spend alert.
-
The recommendation to capture inputs including user prompt, can be used to discover external misuse of your system, even if individually blocked, by individuals or organized campaigns that may require severe measures to completely block while maintaining stability.
-
Being able to detect data bleeding (information from a user in a session being available for other users/session), and data persistence (information from one finished session being available into another session). Also known as data pollution, related CT-3 - User/app/model firewalling/filtering, CT-1 - Draft Data Leakage Prevention and Detection, CT-2 - Draft Data filtering from Confluence into the samples.
-
Assurances that responsible AI metrics are met, need logging, monitoring and processing of the whole system behavior against the specific metrics for this topic.
- When in the future you are experiencing security issues, you need to have started collecting information in the past for a full fledged analysis and troubleshooting.
CT-5 - System acceptance testing
System Acceptance Testing is the final phase of the software testing process where the complete system is tested against the specified requirements to ensure it meets the criteria for deployment. For non-AI systems, this will involve creating a number of test cases which are executed, with an expectation that when all tests pass the system is guaranteed to meet its requirements.
With LLM applications System Acceptance Testing has a similar form, where the complete system is tested via a set of defined test cases. However, there are a couple of notable differences when compared to non-AI systems:
- LLM-based applications exhibit variability in their output, where the same response could be phrased differently despite exactly the same preconditions. The acceptance criteria needs to accommodate this variability, using techniques to validate a given response contains (or excludes) certain information, rather than giving an exact match.
- For non-AI systems often the goal is to achieve a 100% pass rate for test cases. Whereas for LLM-based applications, it is likely that lower pass rate is acceptable. The overall quality of the system is considered a sliding scale rather than a fixed bar.
For example, a test harness for a RAG-based chat application would likely require a test data store which contains known ‘facts’. The test suite would comprise a number of test cases covering a wide variety of questions and responses, where the test framework asserts the factual accuracy of the response from the system under test. The suite should also include test cases that explore the various failure modes of this system, exploring bias, prompt injection, hallucination and more.
System Acceptance Testing is a highly effective control for understanding the overall quality of an LLM-based application. While the system is under development it quantifies quality, allowing for more effective and efficient development. And when the system becomes ready for production it allows risks to be quantified.
Links
- GitHub - openai/evals: Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
- Evaluation / 🦜️🔗 LangChain
- Promptfoo
- Inspect
CT-6 - Data quality & classification/sensitivity
- Data is classified within the Confluence data store, and filtered prior to ingestion
- Manual quality assurance on the data at the necessary scale? Can this be automated?
- Data governance for source data? Scope and context for input data?
CT-7 - Legal/contractual agreements
- Document Status
- Pre-Draft
- Control Type
- Preventative
- Mitigates
- Understand contractually what the SaaS provider does with any data you send them
- Terms of service, use of data for training other models, privacy policy, etc.
- Ensure that foundational models don’t drift or change in unexpected ways
- Data sovereignty / location where your data is stored
CT-8 - QoS/Firewall/DDoS prev
LLM endpoints may be abused due to:
- attacks to exfiltrate data
- jobs coded without the understanding that GPUs are a finite resource
Controls should be in place to ensure that “noisy neighbors” do not interfere with the availability of critical systems.
API Gateways and Keys
LLM endpoints should require authentication to ensure that only approved use cases can access the LLM. A common approach is to deploy an API gateway and generate API keys specific to each use case. The assignments of keys allows:
- revocation of keys on a per use case basis to block misbehaving applications
- attribution of cost at the use case level to ensure shared infrastructure receives necessary funding and to allow ROI to be measured
- priortizing access of LLM requests when capacity has been saturated and SLAs across all consumers cannot be satisfied
Modeling and Monitoring
Prioritizing access requires understanding:
- expected utilization at various times of day
- how to contact the owners when SLAs cannot be met
Systems should be in place to:
- detect anomalous loads
- throttle loads on a per use case basis to fit the assigned capacity
- allow intraday reprioritization
Further reading
- TR-7 Availability of foundational model
- TR-10 Prompt injection
- CT-9 Alerting / DoW spend alert
CT-9 - Alerting / DoW spend alert
- Document Status
- Pre-Draft
- Control Type
- Detective
- Mitigates
- Add usage limits and alerts
- Context length limiting (avoid an attacker from flooding your context)
- Response length monitoring (avoid an attacker from consistently flooding your responses)
CT-10 - Version (pinning) of the foundational model
Supplier Controls:
Ensure the supplier is contractually obligated to provide enough of the below practices to allow for the development of an upgrade strategy:
- Version Numbers and Release Notes: Establish a clear versioning system (e.g., major/minor/patch updates) and detailed release notes to help users understand the scope and impact of updates. Proper versioning includes documenting key updates in model behavior, architecture, and training data. With each new version, document what’s changed (e.g., increased accuracy, better handling of certain tasks).
- User Notifications: Implement proactive notifications to inform users when new versions are available or when an older version will be deprecated.
- Documentation, API Flexibility, and Backward Compatibility Options: Provide extensive documentation and API options to allow users to choose which version they want to use, along with allowing users to continue using previous versions until they’re ready to migrate to a new version. The version of the model in use should not be impacted until migration to the new version is complete.
- Monitoring and Feedback Loops: Collect user feedback on new versions and monitoring any potential regressions or issues that arise in specific use cases.
- Rigorous Testing: Companies need to test new versions comprehensively before deploying them, especially in production environments where accuracy and reliability are crucial.
- Testing and Validation: Offering users tools to test new versions in sandbox environments before transitioning to them in production, minimizing disruptions in workflows.
Organization Controls:
Organizations using LLMs via APIs may also implement controls for effective versioning:
- Establish procedures to test new versions of models
- Establish procedures to deploy application to migrate a model version with rollback procedures
- Regular audits of deployed model versions
- Integration of version information in model outputs for traceability
CT-11 - Human feedback loop
Human Feedback Loop
Implementing a human feedback loop is crucial for the effective deployment and continuous improvement of Generative AI solutions. Without appropriate feedback mechanisms, the solution may not perform optimally, and opportunities for enhancement over time could be missed. The following considerations should be kept in mind when designing a feedback loop:
- Alignment with Key Performance Indicators (KPIs)
- Define a robust set of KPIs and evaluation metrics for the solution. For example, measure how many queries are answered correctly as annotated by subject matter experts (SMEs).
- Feedback should include both quantitative and qualitative data to adequately assess if the solution is meeting established KPIs and/or objectives.
- Intended Use of Feedback Data Solution developers should clearly define and document how the feedback data will be utilized:
- Prompt Fine-tuning: Analyze user feedback in relation to the LLM’s responses. Identify gaps between the solution’s purpose and the responses generated. Adjust prompts to bridge these gaps.
- RAG Document Update: Examine low-rated responses to identify content improvement opportunities where the quality of the underlying data is the root cause.
- Model/Data Drift Detection: Use feedback data to quantitatively detect model or data drift due to changes in the foundational model version or data quality over time.
- Advanced Usages: Consider using feedback data for Reinforcement Learning from Human Feedback (RLHF) or fine-tuning the model itself to enhance performance.
Defining these goals ensures the long-term viability and continuous improvement of the solution.
- User Experience
- Survey the target audience to gauge their willingness to provide feedback.
- Ensure that the feedback mechanism does not hamper the effectiveness or usability of the solution.
- Wide vs. Narrow Feedback
- In scenarios where user experience is paramount, and feedback might interfere with usability, consider creating a smaller group of tester SMEs.
- These SMEs can collaborate closely with the development team to provide continuous, detailed feedback without impacting the broader user base.
Types of Feedback Mechanisms
As stated earlier, there are several ways one can collect feedback data. The choice you make will ultimately depend on the intention and use-case of feedback data usage. The two major categories of feedback are:
- Quantitative Feedback
- Involves questions that can be answered with categorical responses or numerical ratings. For example, asking users to rate the chatbot’s response on a scale of 1-5 with defined parameters.
- Quantitative data can be effectively used alongside other metrics to assess KPIs.
- Qualitative Feedback
- Consists of open-ended questions where users provide free-form text feedback.
- Allows users to offer insights not captured in quantitative metrics.
- Natural Language Processing (NLP) or additional LLMs can be employed to analyze and derive insights from this feedback.
Reinforcement Learning from Human Feedback (RLHF)
It is important to briefly discuss RLHF given its importance and prevelance in the GenAI space, but first, you might be asking: What is RLHF?
Reinforcement Learning from Human Feedback (RLHF) is a machine learning technique that incorporates human evaluations to optimize models for more efficient self-learning. Traditional Reinforcement Learning (RL) trains software agents to make decisions that maximize rewards, improving accuracy over time. RLHF integrates human feedback into the reward function, enabling the model to perform tasks more aligned with human goals, preferences, and ethical considerations.
How RLHF Works
- State Space and Action Space
- State Space: Represents all relevant information about the task at hand.
- Action Space: Contains all possible decisions the AI agent can make.
- Reward Function
- Human evaluators compare the model’s responses to desired outcomes, scoring them based on criteria like correctness, helpfulness, and alignment with human values.
- The reward function incorporates these human evaluations, providing positive reinforcement for desirable outputs and penalties for undesirable ones.
- Policy Optimization
- The model adjusts its policy (decision-making strategy) to maximize cumulative rewards based on human feedback.
- Over time, the model learns to generate responses that are more accurate, contextually appropriate, and aligned with user expectations.
Implement RLHF in the Feedback Loop
- Data Collection
- Gather human feedback systematically, ensuring a diverse and representative set of evaluations.
- Both quantitative ratings and qualitative comments can be used to capture the nuances of user expectations.
- Model Training
- Incorporate the collected feedback into the model’s training process using RL algorithms.
- Regularly update the model to reflect new insights and changing user needs.
- Monitoring and Evaluation
- Continuously assess the model’s performance against defined KPIs.
- Monitor for unintended behaviors or biases, adjusting the training process as necessary.
Benefits of RLHF
RLHF helps mitigate risks associated with unaligned AI behavior, such as generating harmful or biased content. By incorporating human judgments, the model becomes better at avoiding inappropriate or unsafe outputs. Models trained with RLHF are more likely to produce outputs that are ethical, fair, and respectful of social norms. RLHF allows for ongoing refinement of the model as it interacts with users and receives more feedback, leading to sustained performance improvements.
Integration with LLM-as-a-Judge (CT-15)
Another aspect of Human Feeback which one should consider is related to the growing field of LLM-as-a-Judge (see: CT-15). When we use LLM-as-a-Judge, one should consider the same set of Quantitative and Qaulitative feedback from the LLM as they will from humans. This will provide them an opportunity to compare feedback between machine and humans. Also it will allow to rate the effectiveness of using LLM-as-a-Judge. One should also consider Narrow feedback on atleast a sample of the LLM-as-a-Judge results. The sample size (in terms of %) and the methodology is use-case dependent. For instance, if the use case is business-critical, then one needs to verify a higher number of samples
Wrap Up
A well-designed human feedback loop, enhanced with techniques like RLHF, is essential for the success of Generative AI solutions. By aligning the feedback mechanism with KPIs, clearly defining the intended use of feedback data, and ensuring a positive user experience, organizations can significantly improve the performance, safety, and reliability of their AI models. Incorporating RLHF, for instance, not only refines the model’s outputs but also ensures that the AI system remains aligned with human values and organizational objectives over time.
CT-12 - Role-based data access
- Document Status
- Pre-Draft
- Control Type
- Preventative
- Mitigates
- Ensure data provided by Confluence is aligned with the end-user role
- For data stored in encrypted file system, before a model/user/system tries to process and train/retrive the data, it has to prove its authentication, either through hardware-based or software-based attestation
CT-13 - Provide citations
- Document Status
- Pre-Draft
- Control Type
- Detective
- Mitigates
- Provide citations / linkage to the source data in Confluence
CT-14 - Encrypt data at rest
- Document Status
- Draft
- Control Type
- Prevantative
- Mitigates
Encrypting data at rest involves converting stored information into a secure format using encryption algorithms, making it inaccessible without the proper decryption key. This process protects sensitive data from unauthorized access or breaches, even if the storage medium is compromised. It is considered standard practice, with many tools and organizations turning this feature on by default across their IT estate and third-party tools.
Despite this being standard practice, it is still a notable control within the context of LLM applications. New technologies and techniques move at pace, driven initially by features. Sometimes suitable security measures are lacking. More specifically, vector data stores, which are a relatively recent area of research and development, may lack this feature.
Potential Tools and Approaches
This can be done in many ways, either as part of an existing cloud infrastructure utilising tools like Azure’s AI Search vector store or AWS OpenSearch. Both provide an extensive range of services well suited to the implementation of a RAG system.
It is also possible to turn this from an issue of managing secure access at rest to utilising a service that turns the problem into making a secure API call such as with Pinecone. Which provides high level security and authentication services alongside serverless, low latency hosting.
In a case where you don’t want to introduce a cloud infrastructure service or use an API system you could use a self hosted system like FTP using a service such as Redis to create the vector database for your RAG system to access. In such a setup it is important to make sure that your private server is set up securely, e.g, using adequate authentication through ssh keys. In the case of using a larger cloud service you are able to delegate this responsibility.
Another tool available is FAISS which provides in runtime storage and access to various algorithms which provide efficient similarity searches across your vectors. Even though FAISS runs in-memory and is limited to the system’s available memory, it also offers persistence export options as well. However, If FAISS is used in-memory mode only without persistence, this reduces the data breach risk significantly.
CT-15 - LLM-as-a-Judge
- Document Status
- Draft
- Control Type
- Detective
- Mitigates
Testing (evaluating model responses against a set of test cases) and monitoring (continuous evaluation in production) are vital elements in the process of the development and continued deployment of an LLM System, they ensure that your system is functioning properly and that your changes to the system bring a positive improvement, and more.
As this is such an important and large subject there are a wide range of approaches and tools available, one of which is the use of LLMs-as-a-Judge. This is the use of an LLM to evaluate the quality of a response generated by an LLM. This has become a popular area of research due to the expensive nature of evaluation by humans and by the improved ability of LLMs since the advent of GPT4.
For example in our RAG use case, you may present an LLM evaluator with a text input explaining the companies policy explaining what an employee’s responsibilities are with regard to document control, followed by a sentence stating: “Employees must follow a clean desk policy and ensure they have no confidential information present and visible on their desk when they are not present there”. You would then ask the evaluator if this statement is true and for an explanation of why it is or is not, given the article explaining employee responsibilities.
The effectiveness of an evaluator can be measured either using classification or correlation metrics, with the latter being more difficult to use than the former. Examples of classification metrics include: Accuracy - measures the proportion of outputs which are correct; Precision - measures what proportion of the retrieved information is relevant information; Recall - measures the proportion of relevant information out of all the relevant information in the corpus. An explanation and evaluation of correlation metrics can be found here.
A range of research has found that LLMs can be used as effective evaluators of the outputs of other LLMs, even showing that while a fine-tuned in-domain model can achieve higher accuracy on specific in-domain tests generalised LLMs can be more “generalised and fair”, meaning depending on the specific use case it may be less effective to create a bespoke evaluator. The literature has shown a range of effective approaches at this evaluation, and given the recent nature of this approach it is probable that more effective approaches will be found, along with existing approaches improving as state-of-the-art models improve. Given all this, it is highly recommended to introduce an LLM based evaluator to a testing procedure, but it is important to have human oversight in the use of this evaluation to verify its results as with all things stemming from LLMs there is a lot of scope for error and mistake. This can mean not just taking evaluation scores but looking into the confusion matrix and getting an understanding of what the evaluation is telling you, as this is a tool to make it easier to find potential issues not something that you can just set and no longer have to worry about your system.
Potential Tools and Approaches
- Cross Examination
- Zero-Resource Black-Box Hallucination Detection
- Pairwise preference search
- Fairer preference optimisation
- Relevance assessor
- LLMs-as-juries
- Summarisation Evaluation
- NLG Evaluation
- MT-Bench and Chatbot arena
Links
- https://eugeneyan.com/writing/llm-evaluators/
- https://www.databricks.com/blog/LLM-auto-eval-best-practices-RAG
- https://www.databricks.com/blog/announcing-mlflow-28-llm-judge-metrics-and-best-practices-llm-evaluation-rag-applications-part
- https://medium.com/thedeephub/evaluation-metrics-for-rag-systems-5b8aea3b5478
- https://www.databricks.com/blog/enhancing-llm-as-a-judge-with-grading-notes
CT-16 - Preserving access controls in the ingested data
- Document Status
- Draft
- Control Type
- Detective
- Mitigates
When ingesting data to be queried using RAG architecture, the source may have defined different access levels boundaries that are lost in the destination vector storage.
The vector storage may support different access controls, which you should verify that they replicate the original source data ones.
Source data can be segregated in different replicated RAG architectures corresponding to different access level boundaries. Users of a specific domain can then only be permissioned to access the instance of the RAG that holds the data that they should have access to. If there are many different access levels, and resource consumption is a concern, you should consider concentrating them to a few flat levels that still serve the original security restrictions.
System prompt access control techniques are available, whereby a system prompt may be designed to build access controls into the RAG. This is ineffecient and has been to proved to be easily bypassed. So this should be avoided as a mechanism of developing access controls into the RAG architecture.
- segregating storage into diffent stores
- future integration points that may come in foundation model or vector store
-
access control restrictions on the application
- If the vector storage supports access controls, you should verify that they replicate the original source data ones.
- Source data can be segregated in different repliacted RAG arquitectures, where users of a specific access domain only can access the version of the RAG that holds the data they should have access to.
- System prompt is an inneficient access control that can be bypassed, and should be avoided.
- In any case, a review of data ingested in the RAG and match with the users that have access to the RAG should yield there is no exfiltration possible.
In trying to preserve access controls in the ingested data the corporate access control should be replicated.