Did you know that ChatGPT/ OpenAI’s models experience “behaviour drift”? If you’re not aware of this potential problem, it could become a real pain to your project.

Why does drift occur? OpenAI’s models are black boxes and OpenAI is constantly changing how these models work under the hood. The behaviour drift occurs when OpenAI makes updates or architectural changes to their base models without explicitly telling us what exactly the changes are.

This means that if you build an LLM application today and use OpenAI APIs as an endpoint, in a couple of months, it may not perform exactly the same as it did when you first ran it.

You can combat model drift by finetuning your own open source base model such as Llama 2, checkpointing it and deploying the checkpointed model to the cloud. That way you’ll have total control over your models behaviour and could even have version control over the series of models you develop over time which is essential for proper testing and behaviour tracking.

Evidence for shifting performance of the GPT is provided in this linked article. Specifically, the end of the article contains information on behaviour drift