Hello IBM Community,
I am trying to have a clear definition of the following three concepts: Language Model, Large Language Model, and Foundation Model.
I have noticed lately that people may confuse these terms or use them interchangeably, that's why I thought it would be a good idea for at least our community to have a clear definition for each of these terms.
I will give the definitions that I think are the most reasonable from my point of view, but I invite you to challenge/refine them as you deem fit.
LM: a language model is usually model with a transformer-based architecture that is trained on a certain corpus. For instance, BERT or CamemBERT are examples of an LM. We may add the number of parameters and the size of the training corpus (in tokens) to the definition.
LLM: a large language model is also a language model but that contains a larger number of parameters and/or is trained on a corpus of larger count. For example, Large CamemBERT can be considered as an LLM according to this definition.
FM: a foundation model is a generalization of a language model. It is usually large because it has a wide background of knowledge, and is not necessarily specialized in text (NLP), but can also simulate audio, images, time-series, etc.
What bugs me the most is the limit that separates between an LM and an LLM especially, since it remains vague as to what distinguishes one from the other? is it the number of parameters or the size of the training corpus? If either, what is the separation limit? 1B parameter to be considered LLM? What are generally the global practices and more specifically for us at IBM?
Thank you for your inputs.
#FM#LLm#LM#transformers#foundation-model
------------------------------
Bourhan DERNAYKA
------------------------------