B2B AI will require self hosted LLMs
Business to business AI solutions that handle sensitive or large amounts of business customer data will have to host their own LLM infrastructure for both security and performance reasons.
Many B2B solutions handle not only the data of their direct business customer but potentially sensitive information of that business’ customers. If your AI solution could potentially throw any of that information into the plaintext of an LLM context window of the third party sub-processor like OpenAI you could be taking on tremendous risk. Depending on the agreement with the third party that data could be retained and trained on or used in other ways internally to improve their models and thus potentially leak information to users in future models. Additionally, your customer data is potentially totally exposed in the event of a data leak with the sub-processor.
Trusting a third party sub-processor with your customer data is not new but the data patterns with LLMs are. LLMs require potentially a LOT of text based input in order to operate. This input is called the Context Window and can vary in potential length from 1,000 to 32,000 tokens. A token is part of a word or a word. In order to get the output you seek from an LLM on behalf of a customer you may be curating a very long and detailed Context Window containing a convenient representation of human readable data that may bring together historical and sensitive personally identifiable information about a particular person. A data leak of these Context Windows could be very damaging.
An AI feature may require a long many turn conversation with an LLM that involves sending and receiving large amounts of information that takes significant time to generate. There is a lot of time transmitting and waiting on information. These calls will take even more time if they are being sent to a third party sub-processor because they will have to travel over the internet. Many are using OpenAI in this way right now. The benefit now is the time to standing up a solution: just wire up a few OpenAI calls and you have an AI product! The downside though is this time spent in input/output, the waiting on responses and from my experience: totally unpredictable performance and throughput.
The solution to these two significant problems is self hosting LLMs. Customer data can be kept secure within your own internal network and not have to travel to a third party processor. Additionally, the cost of transit can be eliminated by keeping all calls in-house. Another benefit is delivering throughput guarantees for your users by controlling the resources involved.
The challenge at the moment is standing up this solution. Running your own large server instance is also possible but complex to stand up and scale. Services like AWS Bedrock have the potential to really be the solution we need to deliver on security and performance.