Until very recently, much of the discussion surrounding the transformative power of generative AI has been closely tied to the power of Large Language Models (LLMs) trained on massive volumes of data. However, in his recent keynote at IBM Think 2025, Arvind Krishna hailed the benefits to the enterprise of small language models (SLMs) like IBM’s own Granite family.
The Case for Smaller Language Models
Smaller models, Krishna said, are “incredibly accurate; they are much faster, more cost-effective to run and, you can choose to run them where you want”. He suggested that smaller, fit-for-purpose models tailored to specific use cases are more appropriate for business:
“To win, you are going to need to build special-purpose models, much smaller, that are tailored for a particular use-case that can ingest the enterprise data and then work.”
Underlining these sentiments, IT industry analyst Gartner has predicted that by 2027, enterprises will use small, task-specific models three times more than general-purpose large language models.
So, what exactly is an SLM and why are they starting to attract greater interest?
While there’s no formal definition, AI models with fewer than 30 billion parameters are generally considered to be SLMs. Compare smaller models like IBM Granite 3.3 8B (8 billion parameters) or the Mistral 7B (7 billion parameters), for example, to the massive LLMs like Llama 4 Maverick (400 billion total parameters with 17 billion active parameters using Mixture of Experts architecture) or OpenAI’s ChatGPT-4, estimated to contain 1.8 trillion parameters.
Parameters are the internal variables, such as weights and biases, that are the essential building blocks of AI models. They are responsible for encoding everything a model can recall and reason about. So, broadly speaking, models that have more parameters are more capable. They can memorize more facts, languages, and conduct more complex reasoning.
In many cases, however, most businesses are unlikely to ever need the broad, generalized knowledge and capabilities that the huge LLMs are able to provide.
Real-World Use Cases
If you’re a bank, mobile phone business or pharma company, you probably won’t need a model that’s trained on vast swathes of generalized data, which enables it to write sonnets in the style of Shakespeare or tell you where the game of soccer originated. Your needs are likely to be met by models trained on relevant enterprise data. You will benefit from focused models that are tightly targeted and fine-tuned to specific use cases tied closely to your own business and industry sector.
For example, enterprise AI applications such as a corporate AI assistant or chatbot for internal staff can be well served by smaller models that use retrieval-augmented generation (RAG) frameworks to dynamically access up-to-date domain-specific data from a company or industry database.
Success with SLMs requires high-quality, domain-specific training data, which enterprises must carefully curate and maintain. It is also becoming easier for companies to integrate their corporate data into AI models, making the creation of focused, company-specific SLMs a more realistic option. For example, InstructLab, an open-source initiative introduced by IBM and RedHat, simplifies the process of infusing enterprise data into a model. It means smaller models can be customized using far fewer computing resources than traditional model retraining, allowing businesses to create SLMs tailored to their specific use cases much more easily and cost-effectively.
Cost and Latency Advantages
The primary advantage of using smaller models for deploying and using AI is the overall cost reduction. Larger models need more compute power, energy and memory, requiring more graphics processing units (GPUs) and other data center resources - all of which ratchet up the cost.
This very likely explains why research suggests nearly 9 in 10 companies are now struggling to execute and scale AI initiatives on time, often citing budget constraints, computing availability, and GPU shortages among the main reasons.
In general, larger models cost more both to train and to run in production. For example, Arvind Krishna, in his IBM Think keynote, suggested that LLMs can incur 30 times higher inference costs compared to Granite.
Cost is not the only factor, however. Another benefit of smaller models is lower latency. Having fewer parameters means faster processing times, which typically enables SLMs to respond more quickly, thereby driving productivity improvements. In enterprises where AI is used in customer-facing applications, such as customer service chatbots, lower latency can result in faster service and enhanced customer satisfaction.
More control over data protection and security
For enterprises that prioritize data privacy and security, SLMs are likely to be a more natural choice. Their smaller size means they can be deployed in private cloud environments or even on-premises behind firewalls, giving organizations greater control over how their data is protected, including how it is guarded against cybersecurity threats.
Many LLMs are trained on massive volumes of data gathered from a wide variety of disparate sources including the World Wide Web. Often, the exact data sets they use are never made public. Many of the new generation of smaller models, such as IBM Granite, prioritize transparency. They are built on cleaned and filtered enterprise datasets to minimize the risks of bias and inappropriate output. They represent trustworthy models that enterprises can confidently use to train on or integrate their proprietary data, unlocking AI’s full potential.
Using SLMs at the Edge
Because SLMs require less memory and computational power, they are also well-suited to resource-constrained environments such as edge devices like sensors, IoT devices, or mobile apps.
For example, in industrial applications, manufacturing machinery equipped with SLMs could be used to identify potential failures before they occur, enabling faster interventions and less downtime. In medical settings, SLMs could enable patient monitoring and diagnostics through wearable devices or medical sensors, allowing for real-time health assessments while ensuring patient information remains secure on local devices.
An additional benefit of SLMs that is gaining importance is their reduced environmental impact. Because smaller models use fewer data centre energy and resources in model training and in AI inferencing, they also have a smaller carbon footprint than LLMs.
Conclusion: Small makes often better sense for the enterprise
For enterprises looking to implement Generative AI, small language models are often a more practical, cost-effective, and trustworthy way forward. Their lower latency, greater control over data protection and security and reduced environmental impact make them a better fit for many real-world enterprise needs. It’s worth noting however, that SLMs may have limitations in some complex reasoning tasks that require the broader knowledge and capabilities of larger models.
A useful nuance is that smaller models are intended to complement, rather than replace, larger ones. Arvind Krishna stressed , for example, that small models are “not a substitute for larger [AI] models, it’s an ‘and’ with the larger models you can now tailor … to enterprise needs”. In other words, enterprises will likely use a hybrid approach - LLMs for broad tasks and SLMs for many specialized tasks.
It’s true that LLMs have dominated our conversations about Generative AI until now. However, we are now starting to see that smaller is often the more intelligent decision for many enterprise applications.