Implementing Generative AI Within Budget
Today, almost all businesses want to leverage the power of Generative AI (Gen AI) using Large Language Models (LLMs). Pricing is often the dealbreaker when it comes to LLMs since it can be expensive when working on resource-intensive AI use cases. If you’re wondering which LLMs are the right-fit for your business use case, this article is a must read.
Two ways of Integrating Gen AI
Integrating Gen AI into organizational processes can be done in two ways: using External APIs or Self-Hosting LLMs on cloud. Each method has distinct advantages and disadvantages and the choice really dependants on your business use case and organizational limitations.
1. Using External APIs:
Pre-trained models like OpenAI’s ChatGPT convert textual data into numerical vectors called embeddings. These embeddings can be stored in a vector database for querying and analysis. Companies usually opt for External APIs for quicker time to market. But each API request is a cost, so frequent use means higher costs. LLMs charge companies per word (i.e. via tokens). An LLM takes a series of words and converts them into tokens. Charges for tokens include both – what you type (input) and what it generates (output). You can keep usage limits to prevent unexpected expenses.
Talking about API token costs, LLM providers like ChatGPT 4 Turbo or the Claude 3 Opus will cost companies $20 to 50 per million tokens. Then there are mid ranged LLMs like Mistral which cost $10 per million token and most cost-effective LLMs like Gemini Pro or ChatGPT 3.5 are priced at around $1 per million tokens. Pricings vary according to the precision, speed, and the features you want for your use case responses. (Prices are as of H1 2024)
Depending on the use case, companies must also make decisions about fine-tuning the AI or using it as is and which prompt engineering techniques to use (It involves the development and optimization of prompts to get the desired model output). So, there might be additional finetuning costs, prompt engineering costs or embedding costs.
Most recently, we worked with a retail company to enable personalized marketing and enhance product recommendations, which is a really good example of using external APIs. Top LLMs providers are trained on large datasets, making them surprisingly good at understanding what customers want and need. In such cases when the requirement is not expensive, accessing their database for queries makes responses cheaper, flexible, and up to date
2. Self-Hosting LLMs:
Before you opt for self-hosting, you must realize that LLMs are intensely power hungry. This means cloud computing will be involved along with lengthy LLM prompts. You should assess the computing and maintenance costs and compare those with external provider costs. More and more companies are choosing to build and run their own use cases with self-hosted LLMs. This gives them more control, freedom, and flexibility. Companies also need to make decisions on fine-tuning the AI, the infrastructure requirements, and maintenance.
There are no APIs, hence the costs will only include expenses based on the duration for which specific hardware is reserved for running the model, as per its geographical location. Since LLMs have billions of parameters, companies need at least a CPU and a GPU based on use case complexity. More computation means higher costs. This means, let’s say Meta’s Llama-2 is deployed on GCP N4 in South Carolina and uses a NVIDIA V100 GPU, it will cost the company approximately $4.1 per hour. The number of conversations per day will control the total costs. GPU is a major part of the cost. If your use case demands a high-performance GPUs like A100 or H800, it can increase the costs to $10 per hour.
In self-hosting LLMs, we have helped manufacturing clients implement sales document copilots. This helped their sales teams get intelligence from documents, leading to stronger client meetings and deals closed without needing domain expertise support. This was a good self-hosting use case since LLMs are well trained in sales datasets and know relevant insights in manufacturing.
Beyond Costs: Strategic Considerations
While cost plays a crucial role, it’s not the only factor for success. Assessing the ROI is an important and the most difficult part of any Gen AI use case. Carefully evaluate the net profit from LLM vs the total cost of the LLM implementation. A high-end model might be overkill for simple tasks, while an affordable option might lack the power for complex applications. Some models demand significant training data, which can incur additional costs. Also, consider the in-house expertise needed for integration and ongoing maintenance.
We have helped various leaders assess ROI and the true business impact of their Gen AI initiatives. Partnering with a pharmaceutical company, we implement Copilot to dig into a corpus of 100s of research papers to get key insights. We helped them quantify the cost-benefit analysis of their AI use cases in accelerating new product development.
Want to know more about Gen AI? Read our comprehensive Generative AI for Executives guide which is packed with practical insights and real-world examples of Gen AI. Also, check out our detailed eBook on Large Language Models if you want to know about the top LLMs in the industry and which one you should opt for.
Tags:
BlogRecent Blogs
- Advanced Retrieval Techniques for RAG Success
- Vector Databases: Making AI Smarter and Faster
- Vector Embeddings: The Secret to Better AI
- RAG Evaluation Essentials
- Document Chunking: The Key to Smart Data Handling
- How to Select The Best-fit LLM for Your Business Need
- How to Assess Your Company’s Gen AI Maturity?