View Only
Expand all | Collapse all

Context length of the models in ibm watsonx.ai

  • 1.  Context length of the models in ibm watsonx.ai

    Posted Wed February 21, 2024 09:55 AM

    Hi All,

    Could you please help with a couple of queries regarding context-length parameter for LLMs on Watsonx.ai.
    Is there a way to extend these context-length for the models that are hosted on IBM Watsonx.ai platform ?
    Or do we have to adhere to the context-lengths mentioned in this document:
    While testing, we may need to provide inputs that can exceed the context-lengths limits of the models, hence would like to clarify.
    If we cannot exceed the context lengths of the current models on IBM Watsonx.ai, then from what I read online, we would need to do chunking of the larger data. If you have reference of any good online documentation/resources for the context length and chunking strategies, kindly let us know. Thank you for your help!


    Tanuja Bhide

  • 2.  RE: Context length of the models in ibm watsonx.ai
    Best Answer

    Posted Mon February 26, 2024 05:47 PM
    Edited by Catherine Cao Wed February 28, 2024 11:15 AM

    Hi, Tanja,

    You are right, the context length is not imposed by IBM but the model itself. The only exception might be the new mixtral 7x8 billions model, it provides long context window (32k), but if you are on a Lite plan, the limit will be ~ 4k. If on a paid plan, you should be able to use the full context length.

    And here are two options you could try to walk around the context restrictions depending on your use case.

    1. Chunking: For RAG use cases, as you mentioned you can split larger documents to smaller chunks and store in a vector database and provide only relevant chunks that can fit into the context window for augmented generation: here is an example and once you get familiar, there are more chunking strategies that you can explore.
    2. Prompt tuning: If you need longer context length to include more examples, you can try tuning your model, in this way, you save token space at inference time as you don't need include examples in your prompt any more for your tuned models.

    Hope it helps!

    Catherine CAO