Instana U: Learn more

 View Only

LLM Observability with Instana

By JINSONG WANG posted Mon June 17, 2024 12:48 PM

  

LLM Observability With Instana

Overview of Solution

Large Language Models (LLMs) are revolutionizing many industries, from natural language processing to AI-driven applications. Observability of these models is crucial to ensure their performance, reliability, and scalability. Instana, a powerful observability platform, can be used to monitor LLMs, providing insights into their operations, detecting anomalies, and ensuring optimal performance. This blog will guide you through the process of setting up LLM observability with Instana.

Reference Architecture

Architecture of Monitoring LLMs

Installing OTel Data Collector for LLM (ODCL)

To collect OpenTelemetry metrics for various LLM and LLM applications, you need to install ODCL. All implementations are based on predefined OpenTelemetry Semantic Conventions.

Pre-requisites

Verify that the following prerequisites are met for ODCL:

Install Data Collector and Configuration

  • Install the collector:
    • Download the installation package:
      wget https://github.com/instana/otel-dc/releases/download/v1.0.0/otel-dc-llm-1.0.0.tar
    • Extract the package to the preferred deployment location:
      tar xf otel-dc-llm-1.0.0.tar
  • Modify the configuration file:
    • Open the config.yaml file:
cd otel-dc-llm-1.0.0
vi config/config.yaml
      • Update the following fields in theconfig.yaml file:
        • otel.backend.url: The OTel gRPC address of the Instana agent, for example: http://<instana-agent-host>:4317.
        • otel.service.name: The Data Collector name, which can be any string that you choose.
        • <ai-system>.price.prompt.tokens.per.kilo: The unit price per thousand prompt tokens.
        • <ai-system>.price.complete.tokens.per.kilo: The unit price per thousand complete tokens.
      • Open the logging.properties file:
        vi config/logging.properties

    Configure the Java logging settings in the file logging.properties according to your needs.

    • Run the Data Collector with the following command according to your current system:
      nohup ./bin/otel-dc-llm >/dev/null 2>&1 &

    You can also use tools like tmux or screen to run this program in the background.

    Instrumenting LLM applications

    Instrumentation is the process of injecting the code into LLM applications to get detailed information about the LLM API calls in the applications. For more information about supported LLMs, see Instrumenting LLM applications.

    The instrumentation collects both trace and metric data. The trace data is sent to the Instana agent directly. The metrics data is sent to the LLM Data Collector first for aggregation, and then the collector sends it to the Instana agent.

    Pre-requisites

    Verify that python3.10+ is installed. Run the following command to check the installed version:

    python3 -V

    (Optional) It is recommended to create a virtual environment for your applications. This helps keep your dependencies organized and prevents conflicts with other applications. To create a virtual environment, run:

    pip install virtualenv
    virtualenv traceloop
    source traceloop/bin/activate

    Instrument LLM application:

    Install the SDK and dependencies:

    a. Run the following command in your terminal:
    pip install traceloop-sdk==0.18.2

    Install dependencies that depend on the instrumentations you need. Below are examples using watsonx, OpenAI and Anthropic.

    • Install dependencies for watsonx:
      pip install ibm-watsonx-ai==1.0.5 ibm-watson-machine-learning==1.0.357 langchain-ibm==0.1.7
    • Install dependency for OpenAI:
      pip install openai==1.31.0
    • Install dependency for Anthropic:
      pip install anthropic==0.25.7
    • Install dependency for LangChain:
      pip install langchain==0.2.2 langchain-community==0.2.3 langchain-core==0.2.4 langchain-text-splitters==0.2.1
    b. In your LLM app, initialize the Traceloop tracer:
    from traceloop.sdk import Traceloop
    Traceloop.init()

    If you’re running this locally, you can disable the batch sending to see the traces immediately:

    Traceloop.init(disable_batch=True)

    If you have complex workflows or chains, you can annotate them to get a better understanding of what is going on. You can see the complete trace of your workflow on Traceloop or any other dashboard that you use. You can use decorators to make this easier. For example, if you have a function that renders a prompt and calls an LLM, add @workflow (or for asynchronous methods, use @aworkflow).

    If you use an LLM framework like Haystack, Langchain, or LlamaIndex, you do not need to add any annotations to your code.

    from traceloop.sdk.decorators import workflow
    @workflow(name="suggest_answers")
    def suggest_answers(question: str):

    For more information, see annotations.

    (Optional) To quickly verify the installation and configuration, you can use the following code to generate a sample application named sample-app.py:
    • The following is example for watsonx:
      import os, types, time, random
      from ibm_watsonx_ai.metanames import GenTextParamsMetaNames
      from ibm_watsonx_ai.foundation_models import ModelInference
      from pprint import pprint
      from traceloop.sdk import Traceloop
      from traceloop.sdk.decorators import workflow
      from langchain_ibm import WatsonxLLM
      
      Traceloop.init(app_name="watsonx_llm_langchain_question")
      
      def watsonx_llm_init() -> ModelInference:
          watsonx_llm_parameters = {
              GenTextParamsMetaNames.DECODING_METHOD: "sample",
              GenTextParamsMetaNames.MAX_NEW_TOKENS: 100,
              GenTextParamsMetaNames.MIN_NEW_TOKENS: 1,
              GenTextParamsMetaNames.TEMPERATURE: 0.5,
              GenTextParamsMetaNames.TOP_K: 50,
              GenTextParamsMetaNames.TOP_P: 1,
          }
          models = ['ibm/granite-13b-chat-v2', 'ibm/granite-13b-instruct-v2']
          model = random.choice(models)
          watsonx_llm = WatsonxLLM(
              model_id=model,
              url="https://us-south.ml.cloud.ibm.com",
              apikey=os.getenv("IAM_API_KEY"),
              project_id=os.getenv("PROJECT_ID"),
              params=watsonx_llm_parameters,
          )
          return watsonx_llm
      
      @workflow(name="watsonx_llm_langchain_question")
      def watsonx_llm_generate(question):
          watsonx_llm = watsonx_llm_init()
          return watsonx_llm.invoke(question)
      
      for i in range(10):
          question_multiple_responses = [ "What is AIOps?", "What is GitOps?"]
          question = random.choice(question_multiple_responses)
          response = watsonx_llm_generate(question)
          if isinstance(response, types.GeneratorType):
               for chunk in response:
                   print(chunk, end='')
          pprint(response)
          time.sleep(3)
    • The following is example for OpenAI:
      import os, time, random
      from openai import OpenAI
      from traceloop.sdk import Traceloop
      from traceloop.sdk.decorators import workflow
      
      client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
      
      Traceloop.init(app_name="openai_sample_service", disable_batch=True)
      
      @workflow(name="streaming_ask")
      def ask_workflow():     
          models = [ "gpt-3.5-turbo", "gpt-4-turbo-preview" ]
          mod = random.choice(models)     
          questions = [ "What is AIOps?", "What is GitOps?" ]
          question = random.choice(questions)     
          stream = client.chat.completions.create(
              model=mod,
              messages=[{"role": "user", "content": question}],
              stream=True,
          )
          for part in stream:
              print(part.choices[0].delta.content or "", end="")     
      for i in range(10):
          ask_workflow()
          time.sleep(3);

    Configuration for Traceloop Instrumentation

    Configure the environment to export your traces and metrics and the credentials to access watsonx or OpenAI. For Traceloop, see the following information. For other options, see Exporting.

    • Environments required:

      export TRACELOOP_BASE_URL=<agent-host>:4317
      export TRACELOOP_METRICS_ENABLED="true"
      export TRACELOOP_METRICS_ENDPOINT=<otel-dc-llm-host>:8000
      export TRACELOOP_HEADERS="api-key=DUMMY_KEY"
      export OTEL_EXPORTER_OTLP_INSECURE=true
      export OTEL_METRIC_EXPORT_INTERVAL=10000
    • Credentials required to access OpenAI, watsonx and Anthropic:

      The hostname 'otel-dc-llm-host' is the host on which the ODCL is installed. {: note}

      • Only for OpenAI instrumentation:

        export OPENAI_API_KEY=<openai-api-key>

        To create an API key to access the OpenAI API or use the existing one, see OpenAI.

      • Only for watsonx instrumentation:

        export IAM_API_KEY=<watsonx-iam-api-key>
        export PROJECT_ID=<watsonx-project-id>

        To create the Project ID and IAM API key to access watsonx or to use the existing one, see IBM watsonx and IBM Cloud.

      • Only for Anthropic instrumentation:

        export ANTHROPIC_API_KEY=<anthropic-api-key>

        To create an API key to access the Anthropic API or use the existing one, see Anthropic.

    • (Optional) Run the sample application to verify installation and configuration:
      python3 ./sample-app.py
    • Run the LLM application:
      python3 ./<instrumented-llm-app>.py

    View metrics and trace

    After you install OpenTelemetry (OTel) Data Collector and instrument the LLM applications, you can view the metrics in the Instana UI.

    1. Open the Instana UI, and click Infrastructure. Then, click Analyze Infrastructure.
    2. Select OTEL LLMonitor from the list of types of the entities.
    3. Click the entity instance of OTEL LLMonitor entity type to open the associated dashboard.

    You can view the following LLM observability metrics:

    • Total Tokens
    • Total Cost
    • Total Request
    • Average Duration

    For more information about viewing traces, see Analyzing traces and calls.

    Sandbox Trial

    Take a tour to monitor LLM applications by Instana at IBM Instana Sandbox trial.

    Conclusion

    Implementing LLM observability with Instana helps maintain optimal performance and reliability of your LLM applications. By following the steps outlined in this blog, you can ensure that your applications are monitored effectively, potential issues are identified early, and overall system performance is optimized.

    Permalink