The emergence of ChatGPT has generated significant market excitement surrounding Large Language Models (LLMs) and the myriad of new possibilities they bring. However, another term is also being mentioned frequently since it was coined by the researches of the Stanford Institute for Human-Centered Artificial Intelligence (HAI): Foundation Models
Although it is a highly technical topic, I present a didactic and straightforward explanation of the subject.
Official Definition
The Standford Researchers official definition is:
"Models trained on broad data (generally using self-supervision at scale) that can be adapted to a wide range of downstream tasks".
In other words, it is a model trained on a huge amount of unlabeled data that can be used for different tasks, with minimal human supervision.
And what it means in real life?
Okay, what does it mean? In practice, it's not exactly something new. It's use the same principles of the good old artificial intelligence that we've known for decades: train (supervised or unsupervised) models on historical data and use that to respond or predict something based on inputs, e.g. texts, images, or raw data in general. What we are seeing over the years are a series of evolutions that bring scale and, consequently, more exciting results.
When many years ago we started talking about deep learning for the first time, it also represented a huge leap in relation to the traditional machine learning architectures used at the time. That's because we started using larger datasets, a much more powerful computational capacity (we started using GPUs at that time) and because we started to really use the concept of "transfer learning" (which enables the ability to learn a new task through the transfer of knowledge from a related task that has already been learned in the past). When we started using Deep Learning we had a similar buzz from today, as we started using AI to answer things about different subjects, in different domains from those the algorithms were trained for.
That said, let's get back to the main subject of the blog. Foundation Models can, simplistically, be considered as a new leap, an evolution of deep learning. That's because we're talking about even bigger datasets (incredibly bigger) and an even bigger computational capacity. This significantly enhances the model's ability to learns context and thus meaning by tracking relationships in the inputted data (like the words in this sentence).
Why Foundation Models?
In the Standford paper, they explain that the name "foundation models" was thought to go beyond the "technical dimension" of these models and capture the significance of the paradigm shift in AI and its sociological impacts. But their explanation that I liked the most was:
"We also chose the term "foundation" to connote the significance of architectural stability, safety, and security: poorly-constructed foundations are a recipe for disaster and well-executed foundations are a reliable bedrock for future applications."
Foundation models and Large Language Models
sane amount of paramete
rs and an extremely powerful computational capacity (GPT-3, for example, has 175 billion parameters and was trained on 570 gigabytes of text). In addition to ChatGPT, the most famous of the moment, we have many other examples of state-of-the-art NLP models, such as BERT, GPT4, T5, BLOOM, LaMDA, among others.
Other foundation model examples
Foundation Models go beyond NLP. The concept can be used for several other applications. When we talk about computer vision we have several famous examples like CLIP, ALIGN, FLORENCE, Wu Dao 2 and also vision Generative Models like DALLE and GLIDE. The paper from Standford Research also mention the technology's potential to create foundation models for other applications such as Robotics, Foundation models for task specification, Foundation models for task learning, Reasoning and search...
Conclusion
Foundation Models represent a new way of using Artificial Intelligence in the real world, in real life. These large-scale models allow you to use AI to understand, synthesize, transform, create, predict and many other tasks, in different domains, with minimal supervision. It is necessary, however, to understand that models like these still require responsibility in their use, because like any machine learning model, they are subject to failures, Inaccurate information and biased content. However, when used correctly, these models will completely transform society since we have a wide range of applications in real-life that can have a significant impact. And it's just the beginning.