AI Readyniess

AI Readiness Governance Framework

This document describes a simple AI system consisting of a Large Language Model with Retrieval Augmented Generation (RAG) using an external SaaS for inference. This is being used as a vehicle for developing a governance framework for on-boarding GenAI technology, and should be considered an early stage draft document.

Distributed under the CC0 1.0 Universal.

Introduction

The rapid advancements in Artificial Intelligence (AI), particularly Generative AI, are set to revolutionize both business operations and personal lives. In the financial services sector, these innovations present immense opportunities that span product offerings, client interactions, employee productivity, and organizational operations. Few technologies have promised such a broad and transformative impact.

However, these advancements also bring significant challenges. Issues like hallucinations, prompt injections, and model unpredictability introduce unique complexities to safely integrate and deploy AI technologies. The pace of technological change means that today’s solutions may become obsolete tomorrow, necessitating a flexible yet robust governance approach.

Financial institutions (in particular) are eager to onboard, experiment with, and deploy AI technologies to stay both competitive and innovative. Yet, the risk landscape and regulatory environment of the financial services industry necessitates proper governance. Existing processes/frameworks may not be adequately equipped to address the novel challenges posed by AI, particularly Generative AI. Therefore, there is a critical need for an adaptive governance framework that promotes the safe, trustworthy, and compliant adoption of AI technologies.

Value Proposition

The AI Readiness Governance Framework aims to bridge the gap between the transformative potential of AI and the stringent requirements of the financial services industry. By providing a structured approach to identifying, assessing, and mitigating risks associated with AI systems, the framework empowers organizations to harness AI’s benefits while maintaining compliance with regulatory standards and safeguarding against operational risks.

Intended Audience

As briefly touched upon in the Value Proposition, this framework is designed for a broad range of stakeholders involved in the adoption and governance of AI technologies within the financial services industry. See the list below for a high-level outline of potential stakeholders.

Initial scope - RAG-based Q&A

While the ultimate goal is to develop a comprehensive governance framework that can accommodate a wide variety of use cases, starting with a narrowly focused, well-defined initial scope offers several advantages. By selecting a common, high-impact use case that financial organizations frequently encounter, we can create immediate value and demonstrate the effectiveness of our approach. This smaller, more manageable scope will allow for a quicker implementation and iteration process, enabling us to refine our framework based on real-world experience. Additionally, focusing on a use case that can be open-sourced not only fosters collaboration but also ensures that the framework is adaptable and beneficial to a broader community. This initial effort will serve as a foundation for developing a robust governance structure that can be scaled and expanded to cover a wide range of applications in the future.

In Scope (for now):

Out of Scope (for now):

Assumptions:

LLM using RAG threats

Metadata

Each Threat or Control has a status which indicates its current level of maturity:

Status Description
Pre-Draft Initial brainstorming or collection of ideas, notes, and outlines. The document is not yet formally written.
Draft The document is in the early stages of development, with content being written and structured but not yet polished.
Review The document is being reviewed by others for feedback. It may undergo multiple iterations at this stage.
Approved The document is in its final form and has been approved by the AI Readiness team.

Contents

Threats

ID Status Title
TR-1 Draft Information Leaked to Hosted Model
TR-2 Draft Insufficient access control with vector store
TR-3 Pre-Draft Lack of source data access controls
TR-4 Draft Hallucination
TR-5 Draft Instability in foundation model behaviour
TR-6 Draft Non-deterministic behaviour
TR-7 Draft Availability of foundational model
TR-8 Draft Tampering with the foundational model
TR-9 Draft Tampering with the vector store
TR-10 Draft Prompt injection
TR-11 Draft Lack of foundation model versioning
TR-12 Draft Ineffective storage and encryption
TR-13 Draft Testing and monitoring
TR-14 Draft Inadequate system alignment

Controls

ID Status Title
CT-1 Draft Data Leakage Prevention and Detection
CT-2 Draft Data filtering from Confluence into the samples
CT-3 Draft User/app/model firewalling/filtering
CT-4 Draft System observability
CT-5 Draft System acceptance testing
CT-6 Pre-Draft Data quality & classification/sensitivity
CT-7 Pre-Draft Legal/contractual agreements
CT-8 Pre-Draft QoS/Firewall/DDoS prev
CT-9 Pre-Draft Alerting / DoW spend alert
CT-10 Draft Version (pinning) of the foundational model
CT-11 Pre-Draft Human feedback loop
CT-12 Pre-Draft Role-based data access
CT-13 Pre-Draft Provide citations
CT-14 Draft Encrypt data at rest
CT-15 Draft LLM-as-a-Judge
CT-16 Draft Preserving access controls in the ingested data

Threats

TR-1 - Information Leaked to Hosted Model

Document Status
Draft
Threat Type
Confidentiality

In the provided system architecture, sensitive data is transmitted to a SaaS-based Generative AI platform for inference, posing a risk of information leakage. Sensitive organizational data, proprietary algorithms, and confidential information may be unintentionally exposed due to inadequate control measures within the hosted model. This can occur through several mechanisms unique to Large Language Models (LLMs), as outlined in OWASP’s LLM06, such as overfitting, memorization, and prompt-based attacks.

LLMs can retain data from training processes or user interactions, potentially recalling sensitive information during unrelated sessions, a phenomenon known as “memorization” When data such as Personally Identifiable Information (PII) or proprietary financial strategies enter the model, the risk of inadvertent disclosure rises, particularly when insufficient data sanitization or filtering mechanisms are in place. Additionally, adversarial actors could exploit prompt injection attacks to manipulate the model into revealing sensitive data.

Furthermore, data retention policies or model fine-tuning can exacerbate these risks. When fine-tuning is done on proprietary data without strict access control, sensitive information may inadvertently be disclosed to lower-privileged users, violating principles of least privilege. Without clear Terms of Use, data sanitization, and input validation, the organization loses visibility into how sensitive information is processed by the LLM and where it may be disclosed.

It is, however, important to understand distinct risk vectors between commercial/enterprise-grade and free hosted LLMs. For instance, commercial LLMs like ChatGPT offer a “Memory” setting to manage what the system is allowed to memorize from your conversations and Data controls to restrict what can be used to train their models. Additionally, enterprise-grade LLMs will usually sanitize sensitive data when used in organizational environments and often include stringent terms of use related to the handling of your data inputs and outputs that must first be accepted before interacting with the model. Free hosted LLMs, on the other hand, may use your data to train their models without you explicitly knowing that it is happening. Thus, you must always exercise due diligence when interacting with hosted LLM services to better understand how your input and output data is being used behind the scenes.

Key Risks

This risk is aligned with OWASP’s LLM06: Sensitive Information Disclosure, which highlights the dangers of exposing proprietary or personally identifiable information (PII) through large-scale, externally hosted AI systems.

TR-2 - Insufficient access control with vector store

Document Status
Draft
Threat Type
Confidentiality

Vector stores are specialized databases designed to store and manage ‘vector embeddings’—dense numerical representations of data such as text, images, or other complex data types. According to OpenAI, “An embedding is a vector (list) of floating point numbers. The distance between two vectors measures their relatedness. Small distances suggest high relatedness and large distances suggest low relatedness.” These embeddings capture the semantic meaning of the input data, enabling advanced operations like semantic search, similarity comparisons, and clustering.

In the context of Retrieval-Augmented Generation (RAG) models, vector stores play a critical role. When a user query is received, it’s converted into an embedding, and the vector store is queried to find the most semantically similar embeddings, which correspond to relevant pieces of data or documents. These retrieved data are then used to generate responses using Large Language Models (LLMs).

Threat Description

In the described system architecture, where a LLM employing RAG relies on a vector store to retrieve relevant organizational knowledge (e.g., from Confluence), the immaturity of current vector store technologies poses significant confidentiality and integrity risks. Vector stores, which hold embeddings of sensitive internal data, may lack enterprise-grade security controls such as robust access control mechanisms, encryption at rest, and audit logging. Misconfigurations or incomplete implementations can lead to unauthorized access to sensitive embeddings, enabling data tampering, theft, or unintentional disclosure.

While embeddings are not directly interpretable by humans, recent research has demonstrated that embeddings can reveal substantial information about the original data. For instance, embedding inversion attacks can reconstruct sensitive information from embeddings, potentially exposing proprietary or personally identifiable information (PII). The paper “Text Embeddings Reveal (Almost) as Much as Text” illustrates this very point, discussing how embeddings can be used to recover the content of the original text with high fidelity. If you are interested in learning more about how an embedding inversion attack works in practice, check-out the corresponding GitHub repository related to the above paper.

Moreover, embeddings can be subject to membership inference attacks, where an adversary determines whether a particular piece of data is included in the embedding store. This is particularly problematic in sensitive domains where the mere presence of certain information (e.g., confidential business transactions or properitary data) is sensitive. For example, if embeddings are created over a document repository for investment bankers, an adversary could generate various embeddings corresponding to speculative or confidential scenarios like “Company A to acquire Company B.” By probing the vector store to see how many documents are similar to that embedding, they could infer whether such a transaction is being discussed internally, effectively uncovering confidential corporate activities.

As related to insufficient access control, one of the primary threats involves data poisoning, where an attacker with access to the vector store injects malicious or misleading embeddings into the system (see: PoisonedRag for a related example). Compromised embeddings could degrade the quality or accuracy of the LLM’s responses, leading to integrity issues that are difficult to detect. Since embeddings are dense numerical representations, spotting malicious alterations is not as straightforward as with traditional data.

Given the nascent nature of vector store products, they may not adhere to enterprise security standards, leaving gaps that could be exploited by malicious actors or internal users. For example:

This risk aligns with OWASP’s LLM06: Sensitive Information Disclosure, which highlights the dangers of exposing proprietary or PII through large-scale, externally hosted AI systems.

TR-3 - Lack of source data access controls

Document Status
Pre-Draft
Threat Type
Confidentiality

The system allows access to corporate data sources (e.g. Confluence, employee databases), however given these sources may have different access control policies, there is the threat that when accessed via the application a user can access data they are not authorized to see at the source, due to the architecture not honoring source access controls.

The threat posed here is the loss of data access controls in building a traditional RAG application. The access control restrictions in place in Confluence, for example, will not be replicated in the Vector database or at the model level.

Examples of important things to consider:

TR-4 - Hallucination

Document Status
Draft
Threat Type
Integrity

LLM hallucinations refer to instances when a large language model (LLM) generates incorrect or nonsensical information that seems plausible but is not based on factual data or reality. These “hallucinations” occur because the model generates text based on patterns in its training data rather than true understanding or access to current, verified information.

The likelihood of hallucination can be minimised by using RAG techniques, providing the LLM with facts provided directly via the prompt. However, the response provided by the model is a synthesis of the information within the input prompt and information retained within the model. There is no reliable way to ensure the response is restricted to the facts provided via the prompt, and as such, RAG-based applications still hallucinate.

There is currently no reliable method for removing hallucinations, with this being an active area of research.

Severity

The severity of this threat depends on the specific use case. The failure scenario involves either an employee or a client making a decision based on erroneous information provided by the system. With some use cases, the potential impact of ill-informed decisions may be minimal, whereas in other cases, the impact could be significant. Regardless, this failure mode needs consideration.

TR-5 - Instability in foundation model behaviour

Document Status
Draft
Threat Type
Integrity

Instability in foundation model behaviour would manifest itself as deviations in the output (i.e during inferencing), when supplied with the same prompt.

If you completely rely on the model provider for its evaluation, for example because it’s of a kind that changes very quickly, like a code generation model (that can change several times a day), you have to acknowledge that you are putting all the responsibility on the outcome of the architecture on that provider, and you trust what it does whenever the model version is changed.

If you need to implement strict tests on your whole architecture, the foundation model behaviour may change over time if the third-party doesn’t have rigorous version control (covered in Threat 11), you don’t pin the version you are using, or they silently change model versions without notifying the consumer. This can lead to instability in the model’s behaviour, which may impact on the client’s business operations and actions taken upon model output.

The provider may change the model without explicit customer knowledge. This could lead to unexpected response from the model that it has not been tested for by the corporation. The provider might also provide a mechanism to pin the model version. In both cases non-determinism can lead to instability in model behaviour.

Another mechanism to induce instability is around perturbations. There is recent research into using prompt perturbations to attack grounding and hence weaken system defences against malicious attacks.

TR-6 - Non-deterministic behaviour

Document Status
Draft
Threat Type
Integrity

A fundamental property of LLMs is the non-determinism of their response. This is because LLMs generate responses by predicting the probability of the next word or token in a given context, meaning different prompts equating to the same request can produce different responses from the models. This method also means that LLMs can tend towards winding or unintelligible outputs when the outputs being produced are larger. LLMs also make use of sampling methods like top-k sampling during text generation or have an internal state or seed mechanisms which can cause distinct responses from the same prompt used in different requests.This can make it difficult or impossible to reproduce results, and may result in different (and potentially incorrect) results being returned occasionally and in a hard to predict manner.

One danger of this is as users get differing responses when they use the system at different times or get different answers than a colleague asking the same question leading to the confusion in the user and the overall trust in the system may degrade until users no longer want to use the tool at all.

This behaviour can also increase the difficulty of evaluating your system as you may be unable to reproduce what you thought was a bug, or consistently produce evaluation metrics to see if you are improving or degrading the system as you make changes.

Severity

This is not a hugely severe risk in the context of this RAG chatbot as it is presenting information that the user can access elsewhere, and may already have a passing familiarity with which should help with their understanding of the system’s outputs.

However even though the non-determinism risk of LLMs is reduced in RAG systems, there is a risk of not providing the same response or information source to similar queries due to the internal data source having intersecting information about the wanted topic and the system may produce different results with different citations.

The Non-determinism of ChatGPT in Code Generation

TR-7 - Availability of foundational model

Document Status
Draft
Threat Type
Availability

RAG systems are proliferating due to the low barrier of entry compared to traditional chatbot technologies. RAG has applicability for internal users as well as supporting customer-related use cases.

Many LLMs require GPU compute infrastructure to provide an acceptable level of responsiveness. Furthermore, some of the best-performing LLMs are proprietary. Therefore, the path of least resistance is to call out to a Technology Service Provider (TSP).

Key Risks

TR-8 - Tampering with the foundational model

Document Status
Draft
Threat Type
Integrity, Confidentiality, Availability

The SaaS-based LLM provider is a 3rd party supplier and as such is subject to all typical supply chain, insider, and software integrity threats. Supply chain attacks on infrastructure and OSS are covered in other frameworks, however supply chains for foundational models, training data, model updates, and model retraining are all potential targets for attackers. The underlying firmware of GPUs, the operating system, and the machine learning libraries should be considered as part of the supply chain too.

Whilst fine tuning is out of scope of this framework; it is worth noting that adversarial attacks can be built upon model weights for open source models. There could be a possibility of malicious actors using this information to induce unsafe behaviour in the model.

Similarly, back doors engineered into the model can be triggered through malicious means. This could result in unsafe behaviour or such tampering could result in removing safe guards around the model.

Severity

Given that the LLM is a hosted SaaS the expectation is that the API vendor would have gone through extensive 3rd party checks before being on-boarded into the financial institution. Therefore, they would have demonstrated adequate quality control measures in their supply chain. So this is a low-risk threat.

TR-9 - Tampering with the vector store

Document Status
Draft
Threat Type
Integrity, Confidentiality

A malicious actor may tamper with the vectors stored in the client’s vector store. This data is made available to application users, resulting in unauthorized data access or proliferation of falsities. There is a possibility of tampering with the client vector store. This could be happen during ingesting Confluence by leveraging a known back door in the ingest pipeline by a malicious actor. An adversary could use this back door to poison the vector store, or use it in other innovative ways to introduce poison data into the vector store.

Severity

Low-risk as requires access to clients vector store. This is assumed to be adequately protected as any SaaS would be required to demonstrate as a 3rd party vendor to a financial institution.

TR-10 - Prompt injection

Document Status
Draft
Threat Type
Integrity, Confidentiality

Users of the application or malitious internal agents can craft prompts that are sent to the SaaS-based LLM and potentially cause damaging responses. This is one of the most popular attack vectors as privilage requirements for this attach vector are the lowest [^1][^2]. Unlike SQL injection the scope of attack in prompt injection is wider and can result in incorrect answers, toxic responses, information disclosure, denial of service, unethical and biased responses. A good example of an incident in public is the DPD chatbot [^3].

There are two popular approachs of Direct Prompt Injection (also known as “Jailbreaking”) where a user tries to escape the underlying behaviour of the system to get access to data store or inappropriate responses putting the reputation of the company at risk. Indirect Prompt Injection seeks to hijack other user’s sessions and direct them or other systems (if part of a component) to actions or impact critical decision making that might result in other attack vectors including privilage esculation. Please note that this risk exists even without an active prompt injection attack as badly designed systems hallucinate directing users to actions that can be harmful to the organization. This is particlally invisible if the prompt is part of a component of the decision making system and not directly interacting with users where there is no human in the loop.

Additionally attacks can profile systems due to access to internal data and inner working with a model inversion to reveal the model’s internal workings, which can be used to steal the model’s training or RAG data.

Severity

As per the internal user RAG use case, access to the system should only be within the boundary of the organization and Direct Prompt Injections have low-medium risk as exposures can be controlled, however in the case of the Indirect Prompt Injection based where the system can be medium to high depending on the critical of the subject domain being queried.

References

TR-11 - Lack of foundation model versioning

Document Status
Draft
Threat Type
Integrity

Inadequate or unpublished API versioning and/or model version control may result in response instability, due to changes in the foundation model and the client’s absence of opportunity to benchmark and test the new model.

This can lead to instability in the model’s behaviour, which may impact on the client’s business operations.

Challenges with versioning

There are unique challenges to versioning large language models (LLMs) and consequently, the APIs that provide LLM capabilities. Unlike traditional software, where versioning is tracking changes in source code, version LLMs must account for a range of factors such as model behavior, training data, architecture, and computational requirements. Some key challenges include:

  1. Model size and complexity:
    • Models are incredibly large, comprising of a large number of parameters. Managing and tracking changes and their impacts across such massive models can be very complex. It is also challenging to quantify or summarize changes in a meaningful way.
  2. Dynamic nature of LLMs
    • Some LLMs are designed to learn and adapt over time, while others are updated through fine-tuning and customizations. This makes it difficult to keep track of changes or define discrete versions, as the model is constantly being updated.
  3. Non-deterministic behavior:
    • LLMs can produce different outputs for the same input due to factors like temperature settings, making it difficult to define a “new version”.
  4. Multidimensional changes:
    • Updates to LLMs might involve changes to the model architecture, training data, fine-tuning process, and inference parameters. Capturing all these dimensions in a version number or identifier is challenging.
    • Changes to LLMs can range from minor tweaks (e.g., adjusting hyperparaemters) to significant changes (e.g., retraining with new data), making it challenging to define the proper granularity of versioning.
  5. Training data versioning:
    • LLMs are trained on massive amounts of data, making it difficult to track and manage changes in the training corpus.
  6. Resource management:
    • Running multiple versions of LLMs simultaneously can strain computational resources and challenge infrastructure.
  7. Lack of standardization:
    • There are no widely accepted standard for versioning LLMs, leading to inconsistent practices across different organizations.

Problems caused by inadequate versioning:

  1. Inconsistent output
    • LLM may product different responses to the same prompt, leading to inconsistent user experiences or decision-making
  2. Reproducibility/Traceability
    • Inability to replicate or trace past outputs, which may be required in some business context or during testing and debugging
  3. Performance variability:
    • Unexpected changes in model performance, even introducing regression in some areas (e.g., more bias)
    • Assessing improvements or regression becomes challenging
  4. Compliance and auditing:
    • Inability to track and explain model changes can lead to compliance problems and difficulty in auditing decisions
  5. Integration and compatibility/backward compatibility:
    • Other systems or APIs may depend on specific behaviors of an LLM
  6. Testing and quality assurance:
    • Difficulty in identifying root cause of errors or bugs
    • Inability to replicate issues or isolate model changes that are causing issues
  7. Security and privacy:
    • Difficult to track security vulnerabilities or privacy issues
    • New security or privacy issues may be introduced

Severity

The severity of the risk will depend on the LLM vendor’s versioning practices (e.g., if a new version is released, are users still using the previous version by default or are they automatically updated). If users are automatically redirected to the new version, then the severity will be higher, and it will then depend on the magnitude of changes made to the existing LLM, and how different the new version’s behavior is.

TR-12 - Ineffective storage and encryption

Document Status
Draft
Threat Type

Lacking sufficient data storage security leaves your project vulnerable to outside actors gaining access to the data used for your model potentially exposing your data to be taken and used or tampered with to corrupt and misuse your model. Many regulations covering within industry also have requirements for the secure storage of data, especially in cases where it can be used to identify people, meaning improper security can put you at risk of regulatory and legal issues alongside the risk of losing your users trust in the system you have developed.

Severity

The severity of this risk depends largely on the data you are storing, if the RAG system is simply creating a more usable and accessible access point to information that is already public there is no real risk associated with the data being accessed by an outside party. However if you have information that you want to restrict and control access to, it is important to both have strong access controls and to properly secure the data at all points in the system.

TR-13 - Testing and monitoring

Document Status
Draft
Threat Type
Integrity

The evaluation and monitoring of LLM systems is integral to their correct utilisation, while also being difficult to do effectively and efficiently. Effective evaluation and monitoring can help ensure the accuracy and reliability of LLM-generated outputs while improving the trustworthiness of the model. Along with helping to stay on top of potential issues, such as inaccurate retrieval of data, misunderstanding of queries, or loss of efficiency as the system scales.

Evaluation of an LLM is the assessment of the accuracy, reliability and precision of the outputs of a model accumulating into the effectiveness of the system produced. RAG models are often assessed with the use of ‘ground truth’ which is the use of existing answers to queries used as a standard of comparison for the output of the model to determine the level of correctness and similarity.

Without proper evaluation in place you will not be able to understand whether your model is effective or not, or whether changes made in hopes of improving model performance have had a positive or negative effect. Without continuous evaluation in place it is possible that during the lifetime of a system as the input data used for the model gradually changes the effectiveness of the model fluctuates, progressively deteriorating until your model is either unhelpful at reaching your goals. With the correct evaluation in place it would be simpler to find when this is occurring allowing you to address problems faster and more effectively.

Severity

This risk has a medium severity, as it being an issue is primarily downstream from something else in your system going wrong, but good monitoring and evaluation will allow you to catch the issues faster and to make the appropriate changes to the system and is therefore essential in any RAG system architecture.

TR-14 - Inadequate system alignment

Document Status
Draft
Threat Type
Integrity

Alignment

There is a specific goal you want to achieve when using an AI system in a successful way. It may be an overarching project goal, o a specific requirement for single queries to the AI system.

This risk describes when the AI system behaves in a way that doesn’t align with the intended goal.

What can go wrong

In the more basic instance, it already has been described when we talked about hallucinations that the model may not be giving useful output to queries. But even when the model does seem to be performing well in the short term for individual queries, it may be putting the emphasis on a specific topic that, when using the AI model on scale, will in turn have undesirable consequences.

For example, an AI with a goal to maximize a company’s profit could suggest exploiting regulatory loopholes or ignoring the social responsibility for the impact of implementing its solutions on population. In another example, an AI tasked with selecting candidates for jobs positions may be choosing people that perform quite well in the roles, but may be biased against specific kind of population in an unfair way. You could even think that making some processes completely AI automated could pose a risk, for removing completely responsibility about something from humans may end up making nobody knowing anything at all about the automated task, having no accountability or responsibility when it misbehaves.

Also an AI system that is aligned at first may become misaligned in future situations given its non-deterministic behavior, when new versions of the model are deployed, or it uses different contextual information (system prompt, RAG database, etc).

In general we can summarize that the AI system may optimize for a goal in a way that causes unintended or harmful side effects, not only for its immediate goals, but for society in general and the long term.

Responsible AI

The concept of responsible AI defines the practice of developing and deploying AI systems in a way that we make sure they are aligned with human values, ensure safety, fairness, and accountability while minimizing risks and unintended consequences.

Controls

CT-1 - Data Leakage Prevention and Detection

Document Status
Draft
Control Type
Detective
Mitigates

Preventing Leakage of Session Data

The use of Third-Party Service Providers (TSPs) for LLM-powered services and raw endpoints will be attractive for a variety of reasons:

  1. The best-peforming LLMs may be proprietary with model weights unavailable for local deployment.

  2. GPU compute to power LLMs can require upgrades in data center power and cooling with long lead times.

  3. GPUs are expensive and may be limited in supply. TSPs may be required for “burst capacity”.

Given that leakage of session data may have adverse impact to clients and risk fines from regulators, preventing leakage thereof is a priority. Traditional risks and mitigations include:

LLM-specific attacks:

Preventing Leakage of Training Data

If you have fine-tuned a model with proprietary data, there is the potential for it to be extracted as described in:

Have your ML team ensure that there are guardrails in place to detect attempts at training data extraction through prompting.

Detecting Leakage of Session Data

To address the potential for data leakage when proprietary information is processed by an external LLM (typically a SaaS-based solution), a detective control using data leakage canaries.

This technique aims to monitor and detect unauthorized disclosure of sensitive information by embedding uniquely identifiable markers (canaries) into the data streams or queries sent to the hosted model.

Key components of this control include:

This control is particularly effective in detecting unauthorized use or exposure of internal proprietary information processed by external systems. By embedding canaries and fingerprints, organizations can maintain a degree of oversight and control, even when data is processed beyond their direct infrastructure.

Detecting Leakage of Model Weights

In situations where detecting leakage of model weights is necessary, there are approaches to “fingerprinting” that are being explored in the research community. On example based on knowledge-injection is:

CT-2 - Data filtering from Confluence into the samples

Document Status
Draft
Control Type
Preventative
Mitigates

To mitigate the risk of sensitive data leakage and tampering in the vector store, the data filtering control ensures that sensitive information from internal knowledge sources, such as Confluence, is anonymized and/or entirely excluded before being processed by the model. This control aims to limit the exposure of sensitive organizational knowledge when creating embeddings that feed into the vector store, thus reducing the likelihood of confidential information being accessible or manipulated.

This control emphasizes a preventive focus on the entry of sensitive information into the vector store, which minimizes the attack surface and significantly reduces the risk of unauthorized access or poisoning. This is especially important given the immaturity of vector store technologies.

CT-3 - User/app/model firewalling/filtering

Document Status
Draft
Control Type
Preventative
Mitigates

As in any information system component, you can monitor and filter interactions between the model, inputs from the user, queries to RAG databases or other sources of information, and outputs.

A simple analogy is a Web Access Firewall detecting URLs trying to exploit known vulnerabilities in specific server versions, and detecting and filtering from the output malicious Javascript code embedded in the returned web page.

Not only user input and LLM output should be considered. To populate the RAG database with custom information, it has to use the same tokenizer that follows the exact same embeddings format (converting words to vectors) that the LLM uses. This means that for an LLM in a SaaS where the tokenizer is available as an endpoint, you have to send the full information to the embeddings endpoint to convert it to vectors, and store the returned data into your RAG database.

Things to monitor/filter

Monitoring This would make possible to potentially detect and block undesired behavior, like:

A tool like this could combine with detecting queries or responses that go beyond certain size, as described in CT-8 - QoS/Firewall/DDoS prevention, which could be part of a denial of wallet attack, or trying to destabilize the LLM so it provides possible private information contained in the training data.

Ideally if possible, not only user and LLM but all interactions in components for the AI system should be monitored, logged and include safety block mechanism. When deciding where to focus, when information crosses boundaries is when filtering should happen.

Challenges

RAG database

It’s more practical to pre-process the data for RAG previously to send it to the embeddings endpoint, which can run more complex filtering for a long time. But you could add some in-line filters.

When data is stored as embeddings, a vector search is done converting the user query as vectors using the embeddings endpoint that may be external to you. That means the information in the vector database is more or less opaque, and you can’t easily analyze it, add filters in or out of it, or specify different access levels for the information stored.

Filtering

Applying static filters is easy for known patterns computer-related, as emails or domains, and well known terms like company names. Picking up more generic things, as names of people that may be common knowledge, or private information, is not. Same happens for detecting prompt injections, which by its very nature are designed to bypass security, or foul language. That is why a popular technique is using “LLM as a judge”, that we describe later in this document.

Streaming

When users send interactive requests to an LLM, streaming allows them to progressively receive the result word by word. This is a great improvement to usability, as users can see progress, read the answer while it’s being generated, and act before it’s complete, just taking the first part if it’s all they need, or canceling the generation if it’s not in the right direction.

But when any kind of filtering has to be applied, you need to disconnect streaming as you need the full answer to process it, or risk exposing the information to the user before you know it’s safe. For a complex slow system that can take long to answer that has a huge impact on usability.

An alternative approach can be to start providing a streamed answer to the user while detection is done on the fly, and revert to cancel the answer and remove all shown text when a problem is detected. But the risk of exposing the wrong information to the user has to be weighted depending on the criticality of the information and the final user that receives it.

Remediation techniques

As mentioned, you could just implement static checks for blocklists and regular expressions, that would just detect the most simple situations.

Adding a system prompt with instructions of what to block is not recommended, as it’s easy to not only bypass it, but expose the prompt and see exactly what is the logic for things that the system shouldn’t show.

A common technique is LLM as a judge, when a secondary LLM analyzes the query and the response, trained not to give proper answers to users, but on categorization of different situations (prompt injection, abuse, foul language). You could use this also as a SaaS product, or a local instance that would have a non-trivial computational cost, as for each query (no matter if the main LLM is SaaS), you would also run an LLM evaluation.

For private information disclosure, when in a critical situation you may want to train your own LLM Judge to be able to categorize that information in a bespoke way to your organization.

Having humans provide feedback voluntarily on responses when on production, where there is an easy way to warn if the system is failing into any of the behaviors described, is a complementary control that would allow verifying these guardrails are working as expected.

Additional considerations

A full API monitoring solution will put in place observability and security benefits not only for AI security, but general system security. For example, a security proxy setup could also ensure all communications between different components are encrypted using TLS.

Logging would allow not only to understand better how users and the system are behaving, but also detect situations that can only be understood when looking at data at a statistical level, for example a coordinated denial of service on the AI system.

CT-4 - System observability

Document Status
Draft
Control Type
Detective
Mitigates

What to log/monitor

When talking about observability, these are the main things to log and monitor:

It is always good to log and monitor all, or as much as possible. But when there are limitations because of the bandwidth of the ingested data, consider at least the previous sources. Every application using the models should be included.

Consider employing a solution for horizontal monitoring, across several inputs/outputs at the same time for the whole architecture.

Why

The following reasons explain why we want to log and monitor, and what threats we tackle or how we benefit from it:

CT-5 - System acceptance testing

Document Status
Draft
Control Type
Preventative
Mitigates

System Acceptance Testing is the final phase of the software testing process where the complete system is tested against the specified requirements to ensure it meets the criteria for deployment. For non-AI systems, this will involve creating a number of test cases which are executed, with an expectation that when all tests pass the system is guaranteed to meet its requirements.

With LLM applications System Acceptance Testing has a similar form, where the complete system is tested via a set of defined test cases. However, there are a couple of notable differences when compared to non-AI systems:

  1. LLM-based applications exhibit variability in their output, where the same response could be phrased differently despite exactly the same preconditions. The acceptance criteria needs to accommodate this variability, using techniques to validate a given response contains (or excludes) certain information, rather than giving an exact match.
  2. For non-AI systems often the goal is to achieve a 100% pass rate for test cases. Whereas for LLM-based applications, it is likely that lower pass rate is acceptable. The overall quality of the system is considered a sliding scale rather than a fixed bar.

For example, a test harness for a RAG-based chat application would likely require a test data store which contains known ‘facts’. The test suite would comprise a number of test cases covering a wide variety of questions and responses, where the test framework asserts the factual accuracy of the response from the system under test. The suite should also include test cases that explore the various failure modes of this system, exploring bias, prompt injection, hallucination and more.

System Acceptance Testing is a highly effective control for understanding the overall quality of an LLM-based application. While the system is under development it quantifies quality, allowing for more effective and efficient development. And when the system becomes ready for production it allows risks to be quantified.

CT-6 - Data quality & classification/sensitivity

Document Status
Pre-Draft
Control Type
Preventative
Mitigates

CT-7 - Legal/contractual agreements

Document Status
Pre-Draft
Control Type
Preventative
Mitigates

CT-8 - QoS/Firewall/DDoS prev

Document Status
Pre-Draft
Control Type
Preventative
Mitigates

LLM endpoints may be abused due to:

  1. attacks to exfiltrate data
  2. jobs coded without the understanding that GPUs are a finite resource

Controls should be in place to ensure that “noisy neighbors” do not interfere with the availability of critical systems.

API Gateways and Keys

LLM endpoints should require authentication to ensure that only approved use cases can access the LLM. A common approach is to deploy an API gateway and generate API keys specific to each use case. The assignments of keys allows:

  1. revocation of keys on a per use case basis to block misbehaving applications
  2. attribution of cost at the use case level to ensure shared infrastructure receives necessary funding and to allow ROI to be measured
  3. priortizing access of LLM requests when capacity has been saturated and SLAs across all consumers cannot be satisfied

Modeling and Monitoring

Prioritizing access requires understanding:

  1. expected utilization at various times of day
  2. how to contact the owners when SLAs cannot be met

Systems should be in place to:

Further reading

CT-9 - Alerting / DoW spend alert

Document Status
Pre-Draft
Control Type
Detective
Mitigates

CT-10 - Version (pinning) of the foundational model

Document Status
Draft
Control Type
Preventative
Mitigates

Supplier Controls:

Ensure the supplier is contractually obligated to provide enough of the below practices to allow for the development of an upgrade strategy:

Organization Controls:

Organizations using LLMs via APIs may also implement controls for effective versioning:

CT-11 - Human feedback loop

Document Status
Pre-Draft
Control Type
Detective
Mitigates

Human Feedback Loop

Implementing a human feedback loop is crucial for the effective deployment and continuous improvement of Generative AI solutions. Without appropriate feedback mechanisms, the solution may not perform optimally, and opportunities for enhancement over time could be missed. The following considerations should be kept in mind when designing a feedback loop:

  1. Alignment with Key Performance Indicators (KPIs)
  1. Intended Use of Feedback Data Solution developers should clearly define and document how the feedback data will be utilized:

Defining these goals ensures the long-term viability and continuous improvement of the solution.

  1. User Experience
  1. Wide vs. Narrow Feedback

Types of Feedback Mechanisms

As stated earlier, there are several ways one can collect feedback data. The choice you make will ultimately depend on the intention and use-case of feedback data usage. The two major categories of feedback are:

  1. Quantitative Feedback
  1. Qualitative Feedback
Reinforcement Learning from Human Feedback (RLHF)

It is important to briefly discuss RLHF given its importance and prevelance in the GenAI space, but first, you might be asking: What is RLHF?

Reinforcement Learning from Human Feedback (RLHF) is a machine learning technique that incorporates human evaluations to optimize models for more efficient self-learning. Traditional Reinforcement Learning (RL) trains software agents to make decisions that maximize rewards, improving accuracy over time. RLHF integrates human feedback into the reward function, enabling the model to perform tasks more aligned with human goals, preferences, and ethical considerations.

How RLHF Works

  1. State Space and Action Space
  1. Reward Function
  1. Policy Optimization

Implement RLHF in the Feedback Loop

  1. Data Collection
  1. Model Training
  1. Monitoring and Evaluation

Benefits of RLHF

RLHF helps mitigate risks associated with unaligned AI behavior, such as generating harmful or biased content. By incorporating human judgments, the model becomes better at avoiding inappropriate or unsafe outputs. Models trained with RLHF are more likely to produce outputs that are ethical, fair, and respectful of social norms. RLHF allows for ongoing refinement of the model as it interacts with users and receives more feedback, leading to sustained performance improvements.

Integration with LLM-as-a-Judge (CT-15)

Another aspect of Human Feeback which one should consider is related to the growing field of LLM-as-a-Judge (see: CT-15). When we use LLM-as-a-Judge, one should consider the same set of Quantitative and Qaulitative feedback from the LLM as they will from humans. This will provide them an opportunity to compare feedback between machine and humans. Also it will allow to rate the effectiveness of using LLM-as-a-Judge. One should also consider Narrow feedback on atleast a sample of the LLM-as-a-Judge results. The sample size (in terms of %) and the methodology is use-case dependent. For instance, if the use case is business-critical, then one needs to verify a higher number of samples

Wrap Up

A well-designed human feedback loop, enhanced with techniques like RLHF, is essential for the success of Generative AI solutions. By aligning the feedback mechanism with KPIs, clearly defining the intended use of feedback data, and ensuring a positive user experience, organizations can significantly improve the performance, safety, and reliability of their AI models. Incorporating RLHF, for instance, not only refines the model’s outputs but also ensures that the AI system remains aligned with human values and organizational objectives over time.

CT-12 - Role-based data access

Document Status
Pre-Draft
Control Type
Preventative
Mitigates

CT-13 - Provide citations

Document Status
Pre-Draft
Control Type
Detective
Mitigates

CT-14 - Encrypt data at rest

Document Status
Draft
Control Type
Prevantative
Mitigates

Encrypting data at rest involves converting stored information into a secure format using encryption algorithms, making it inaccessible without the proper decryption key. This process protects sensitive data from unauthorized access or breaches, even if the storage medium is compromised. It is considered standard practice, with many tools and organizations turning this feature on by default across their IT estate and third-party tools.

Despite this being standard practice, it is still a notable control within the context of LLM applications. New technologies and techniques move at pace, driven initially by features. Sometimes suitable security measures are lacking. More specifically, vector data stores, which are a relatively recent area of research and development, may lack this feature.

Potential Tools and Approaches

This can be done in many ways, either as part of an existing cloud infrastructure utilising tools like Azure’s AI Search vector store or AWS OpenSearch. Both provide an extensive range of services well suited to the implementation of a RAG system.

It is also possible to turn this from an issue of managing secure access at rest to utilising a service that turns the problem into making a secure API call such as with Pinecone. Which provides high level security and authentication services alongside serverless, low latency hosting.

In a case where you don’t want to introduce a cloud infrastructure service or use an API system you could use a self hosted system like FTP using a service such as Redis to create the vector database for your RAG system to access. In such a setup it is important to make sure that your private server is set up securely, e.g, using adequate authentication through ssh keys. In the case of using a larger cloud service you are able to delegate this responsibility.

Another tool available is FAISS which provides in runtime storage and access to various algorithms which provide efficient similarity searches across your vectors. Even though FAISS runs in-memory and is limited to the system’s available memory, it also offers persistence export options as well. However, If FAISS is used in-memory mode only without persistence, this reduces the data breach risk significantly.

CT-15 - LLM-as-a-Judge

Document Status
Draft
Control Type
Detective
Mitigates

Testing (evaluating model responses against a set of test cases) and monitoring (continuous evaluation in production) are vital elements in the process of the development and continued deployment of an LLM System, they ensure that your system is functioning properly and that your changes to the system bring a positive improvement, and more.

As this is such an important and large subject there are a wide range of approaches and tools available, one of which is the use of LLMs-as-a-Judge. This is the use of an LLM to evaluate the quality of a response generated by an LLM. This has become a popular area of research due to the expensive nature of evaluation by humans and by the improved ability of LLMs since the advent of GPT4.

For example in our RAG use case, you may present an LLM evaluator with a text input explaining the companies policy explaining what an employee’s responsibilities are with regard to document control, followed by a sentence stating: “Employees must follow a clean desk policy and ensure they have no confidential information present and visible on their desk when they are not present there”. You would then ask the evaluator if this statement is true and for an explanation of why it is or is not, given the article explaining employee responsibilities.

The effectiveness of an evaluator can be measured either using classification or correlation metrics, with the latter being more difficult to use than the former. Examples of classification metrics include: Accuracy - measures the proportion of outputs which are correct; Precision - measures what proportion of the retrieved information is relevant information; Recall - measures the proportion of relevant information out of all the relevant information in the corpus. An explanation and evaluation of correlation metrics can be found here.

A range of research has found that LLMs can be used as effective evaluators of the outputs of other LLMs, even showing that while a fine-tuned in-domain model can achieve higher accuracy on specific in-domain tests generalised LLMs can be more “generalised and fair”, meaning depending on the specific use case it may be less effective to create a bespoke evaluator. The literature has shown a range of effective approaches at this evaluation, and given the recent nature of this approach it is probable that more effective approaches will be found, along with existing approaches improving as state-of-the-art models improve. Given all this, it is highly recommended to introduce an LLM based evaluator to a testing procedure, but it is important to have human oversight in the use of this evaluation to verify its results as with all things stemming from LLMs there is a lot of scope for error and mistake. This can mean not just taking evaluation scores but looking into the confusion matrix and getting an understanding of what the evaluation is telling you, as this is a tool to make it easier to find potential issues not something that you can just set and no longer have to worry about your system.

Potential Tools and Approaches

CT-16 - Preserving access controls in the ingested data

Document Status
Draft
Control Type
Detective
Mitigates

When ingesting data to be queried using RAG architecture, the source may have defined different access levels boundaries that are lost in the destination vector storage.

The vector storage may support different access controls, which you should verify that they replicate the original source data ones.

Source data can be segregated in different replicated RAG architectures corresponding to different access level boundaries. Users of a specific domain can then only be permissioned to access the instance of the RAG that holds the data that they should have access to. If there are many different access levels, and resource consumption is a concern, you should consider concentrating them to a few flat levels that still serve the original security restrictions.

System prompt access control techniques are available, whereby a system prompt may be designed to build access controls into the RAG. This is ineffecient and has been to proved to be easily bypassed. So this should be avoided as a mechanism of developing access controls into the RAG architecture.

In trying to preserve access controls in the ingested data the corporate access control should be replicated.