Das LLM

I want to discuss why I have a preference for owning the means of my token production rather than relying on cheaper, easier and sometimes faster proprietary models. I will start by outlining the inevitable product-model coupling to demonstrate why model ownership is an important consideration. Then I will outline the benefits to being able to control your own models, and also discuss the often persuasive counter points which often push technical leads down the proprietary model route. Hopefully this will form a cohesive argument to invest time and money into looking to onshore your model API calls from external providers and to your own environments.

We have nothing to lose except our chains! (and a reasonable amount of time, money and human resource).

Product - Model Coupling

AI products become coupled to (v)LLM models as standard process when taking a tool to production. During MVP, tools like https://www.langchain.com/ make swapping models around arbitrary and experimentation with a number of model providers is good practice. As a model is chosen and the tool matures, changes to the underlying model become more difficult.

Having recently tried to move from GPT-4o to Claude 3.7 to umbrella an application in AWS, the returned tokens became incompatible with the rest of the application codebase and cost developer time to debug. Ultimately the switch was scrapped meaning a dependency external to the application deployment environment must be maintained.

There are also specific features that are available in some models that are not supported by all providers default. Obvious examples include structured outputs and early-stage MCP support. Once the AI product moves towards production and the codebase surrounding the API calls becomes more complex, the product becomes more coupled with the model.

On top of the technical coupling, the most valuable use-cases for LLMs are often in highly regulated environments that require layers of bureaucracy and approvals before models are signed off for production. Moving from, say, OpenAI to Grok can result in a full repeat of the process disincentivising developers from making minor performance gains. This is especially tedious in certain sectors than others, such as health data where rigorous bias analysis often needs to be replicated adding long time frames to delivery.

As an AI product matures it becomes more tightly coupled to the underlying model, taking us further away from the plug-and-change utopia that LLM frameworks appear to offer. Ceding control of the most important element of a codebase to an external party is a high-risk strategy, especially when technical and bureaucratic barriers make changing that dependency hard work.

Owning our Own

Why should we bother owning our models? Does it really matter that we offshore the most valuable component of our AI product to a provider we have no control over?

I propose that yes this is important for 3 reasons:

Control the means of your token production

Allowing an external provider to own and sell you token processing also means that we have no ability to prevent them from stopping servicing certain models or their configurations.

Providers are businesses and they sell products that they can make money on. The AWS Bedrock Claude models are offered on demand while they are popular enough for AWS to dedicate the hardware to them. Once they are no longer popular enough to make sense to run, the models will be depreciated and replaced leading to performance risk for tools in production. Depending on the use-case this could be extremely harmful to an organisation, its customers or both.

Providers also charge a cost-per-token for use of their models. Suppose a energy cost spike caused by an oil embargo affects the deployed region, it is not impossible to imagine that this could be passed on to their customers. This can be expanded to imagine a national security threat where providers are not longer legally allowed to serve the API to certain regions. What happens then?

Owning the means of your token production by handling your own models nullifies both real and imaginary provider risks. Azure can bump unprofitable models, states can make it illegal to process your data but I’d like to see them try to take down a hosted Llama fine-tune with local backups running your own NVIDIA H100s.

Owning your own models means that you can engage in some free-market capitalism to source GPUs from a range of competitive options. This leads me to my next point…
Portability

Open models like Meta’s Llama, Alibaba’s Qwen and Google’s Gemma models are widely accessible and can be run from almost anywhere. These models can be tuned and deployed to any cloud environment or any on-premises server your heart (or data governance structures) desire. Should an issue arise that threatens your existing deployments, no worries there are a wealth of online and offline options for re-deploying in an instant.

This control over where your model is deployed means you can meet almost any regulatory requirement.

Finally this brings me on to…
Model Customisation

There are almost 2,000,000 models on HuggingFace alone that you are able to select from. A lot are fine-tunes of existing models to meet certain use-cases and there is evidence that these models can outperform a general, non-fine-tuned proprietary model, if you own good data to do the tuning with.

OpenAI and Anthropic offer the ability to fine-tune models too but they won’t provide you with the files to deploy them wherever. If you want to utilise the abilities of a fine-tuned model and don’t want to be forced to use certain infrastructure then open models are an obvious choice.

If you exist in a space where tuning models must be completed in certain regions, AWS Bedrock will only let you tune in the US for example. Owning the model allows you to fine-tune wherever you can get your hands on the required GPU capacity.

In Defence of Proprietary Models

So why do so many choose to use proprietary models instead of owning their own models then? I think there are 3 key reasons why we fall back on GPT/Claude/Gemini to be tool foundations that are hard to overlook, especially for MVPs.

Up front cost

Getting started with any of the large providers is extremely straight forward. With a tiny budget you can generate an API key and start running queries for pennies to develop an MVP in hours or days. Because most initial versions of tools are driven by demonstrating that it can work on a small sample of pseudo-production data, the API costs are small and do not need to be accounted for. They are obvious choices for getting something off-the-ground to demonstrate value to users because of how easily they can be spun up.

As with most production systems though, the end use-case is likely not to be on small datasets. Input data will rise dramatically in production and API costs will grow linearly.
Infrastructure Effort

Proprietary models come available from user-friendly APIs that generally require just a key and off-you-go! The provider will handle everything else, your only focus needs to be on engineering your prompt to reliably get the outputs you need.

Infrastructure can be a hassle to maintain. It requires a combination of thought, process and delivery to maintain yourself. These problems are exacerbated if you own the infrastructure yourself rather than offshoring to a cloud provider. Containers, connections and GPUs all require maintenance by somebody.

Advocating for cloud infrastructure for a moment, a big bonus behind using these providers is they are able to do this management on our behalf. If a GPU overheats or goes offline, you are not responsible for physically fixing - SLAs are agreed and engineering experts will handle these problems instead.

However if the model is tied to the infrastructure, such as with AWS Bedrock’s Claude models or Azure’s OpenAI Foundary, if the infrastructure cost spikes there is not much that can be done to move the model elsewhere. Owning your own models means that they can be bumped to whichever GPU provider you fancy or even to on-premises solutions.
Risk Management

Proprietary models are generally well suited for almost any generic task. They’ve been trained on a large corpus of pre- and post-training data to work well for most coding, data mining and agentic tasks. Despite what Meta may say, they can outperform most open models and you can have confidence they will work for most generic tasks.

They are inherently “lower risk” because of this.

Model distillation techniques, as well as my own personal experience, have demonstrated that smaller, cheaper, open models can be trained to perform to similar levels if given the required labelled training data. If your aim is simply to match performance of GPT, get GPT to generate the training data to see similar performance metrics on your own models!

Conclusion

Owning your own is clearly not as easy as using large providers who rely on this to embed their products deep within value-generating tools. They do not want you to look for alternatives. To believe that their general-purpose, on-demand solutions provide the only option for your AI needs.

However this is not the case. We can reduce external dependencies we have little/no control over and embed highly portable, endlessly customizable and personally owned models to service our use cases.

There will always be a need for cheap, quick solutions to demonstrate tool value, however when these tools go into production the risks around relying on external providers must be taken into account.

What happens if cost-per-token spike? What happens if the provider turns your model off? Or when they are no longer accessible?

Considering owned models means we can consider long-term cheaper and lower risk solutions for our tools that ultimately leads to better value-generation by our own hands.