Hey, I am the tech lead from our company. We are using IBM WatsonX as a core component. I have looked around the docs and many articles, but I do not yet have a clear idea, I just need pointing in the right directions.
How do we take up a foundational model, dump in pages upon pages from our database to train it and then save it. Afterwards in the future, when the same model is prompted, it should answer using its its llm capabilities, while using the data and information it was trained upon by us.
I know that finetuning and embeddings exist for similar use cases. In case of finetuning, please do correct me if I am wrong, but I do not think dumping in information would be possible, we'll have to have good prompt answer sets right? In case of embeddings, even though embeddings can selectively pick up relevant data, and put only those within the model context, In our use case even that might not be enough.
We are currently using the embeddings approach, but to have a wider range of context, we want the model to be trained upon our whole database, but we dont want to train it from scratch.
tldr: how do we dump in tons of data (exceeding context length by a very huge huge margin) on a foundational model, and have it use that data when it answers in the future? we don't want to use embeddings, if its possible using finetuning, please guide how.
Thank you.
#watsonx.ai------------------------------
swayam shree
------------------------------