Inadequate or unpublished API versioning and/or model version control may result in response instability, due to changes in the foundation model and the client’s absence of opportunity to benchmark and test the new model.
This can lead to instability in the model’s behaviour, which may impact on the client’s business operations.
Challenges with versioning
There are unique challenges to versioning large language models (LLMs) and consequently, the APIs that provide LLM capabilities. Unlike traditional software, where versioning is tracking changes in source code, version LLMs must account for a range of factors such as model behavior, training data, architecture, and computational requirements. Some key challenges include:
- Model size and complexity:
- Models are incredibly large, comprising of a large number of parameters. Managing and tracking changes and their impacts across such massive models can be very complex. It is also challenging to quantify or summarize changes in a meaningful way.
- Dynamic nature of LLMs
- Some LLMs are designed to learn and adapt over time, while others are updated through fine-tuning and customizations. This makes it difficult to keep track of changes or define discrete versions, as the model is constantly being updated.
- Non-deterministic behavior:
- LLMs can produce different outputs for the same input due to factors like temperature settings, making it difficult to define a “new version”.
- Multidimensional changes:
- Updates to LLMs might involve changes to the model architecture, training data, fine-tuning process, and inference parameters. Capturing all these dimensions in a version number or identifier is challenging.
- Changes to LLMs can range from minor tweaks (e.g., adjusting hyperparaemters) to significant changes (e.g., retraining with new data), making it challenging to define the proper granularity of versioning.
- Training data versioning:
- LLMs are trained on massive amounts of data, making it difficult to track and manage changes in the training corpus.
- Resource management:
- Running multiple versions of LLMs simultaneously can strain computational resources and challenge infrastructure.
- Lack of standardization:
- There are no widely accepted standard for versioning LLMs, leading to inconsistent practices across different organizations.
Problems caused by inadequate versioning:
- Inconsistent output
- LLM may product different responses to the same prompt, leading to inconsistent user experiences or decision-making
- Reproducibility/Traceability
- Inability to replicate or trace past outputs, which may be required in some business context or during testing and debugging
- Performance variability:
- Unexpected changes in model performance, even introducing regression in some areas (e.g., more bias)
- Assessing improvements or regression becomes challenging
- Compliance and auditing:
- Inability to track and explain model changes can lead to compliance problems and difficulty in auditing decisions
- Integration and compatibility/backward compatibility:
- Other systems or APIs may depend on specific behaviors of an LLM
- Testing and quality assurance:
- Difficulty in identifying root cause of errors or bugs
- Inability to replicate issues or isolate model changes that are causing issues
- Security and privacy:
- Difficult to track security vulnerabilities or privacy issues
- New security or privacy issues may be introduced