Instana

The community for performance and observability professionals to learn, to share ideas, and to connect with others.

View Only

Back to Blog List

Monitoring traces of Amazon SageMaker AI using Instana

By Swathi Kannan posted Mon June 09, 2025 09:51 AM

Amazon SageMaker is a fully managed service provided by Amazon Web Services (AWS) that enables data scientists, developers, and machine learning (ML) engineers to build, train, and deploy machine learning models at scale. You can now monitor Amazon SageMaker traces using Instana.

Instana is an application performance monitoring (APM), and observability platform designed for cloud-native, microservices-based architectures. It offers real-time, end-to-end visibility into distributed systems by automatically discovering services, tracing requests across service boundaries, and collecting metrics, logs, and traces.

To monitor SageMaker traces using Instana, you can use the SageLoop Instrumentor - a Python-based SDK (software development toolkit) provided by Traceloop. This SDK enables automated tracing and observability for AWS SageMaker model training and inference pipelines. It acts as a middleware layer that hooks into your code to collect telemetry data such as metrics, traces, and metadata—without requiring major changes to your existing workflows.

The SageMaker Instrumentor from Traceloop automatically records and monitors model predictions, allowing users to observe key details such as:

What was transmitted
The duration of the prediction
Whether the prediction was successful

All of this can be achieved without writing any additional code.

To access more detailed telemetry - like SageMaker model's traces and metrics - you can view them on Instana dashboards by configuring the Instana host as the Traceloop backend URL.

Prerequisites

Before you start monitoring your SageMaker model's performance with Traceloop and Instana, make sure you have the following pre-requisites in place:

The SageMaker AI environment is set up and configured with a trained model and an endpoint exposed.

Steps to run Python application to interact with the model and collect the traces

1. Create the virtual environment and activate it.

$ python3 -m venv .sagemaker
$ source .sagemaker/bin/activate

2. Install the required packages.

$ pip install -r requirements.txt

The requirement file will have the following basic requirements

boto3==1.38.7
traceloop-sdk==0.38.7

# add these if you are using the below example code
numpy==1.25.2
pandas==1.5.3

After creating and training the model using SageMaker AI, you can expose the trained model as an endpoint. In AWS SageMaker, an endpoint is a fully managed, real-time REST API that hosts your machine learning model, allowing it to be used for inference (predictions). When you deploy a trained model, SageMaker sets up an endpoint which:

Is always active and ready to receive requests.
Accepts input data (like a JSON payloads)
Runs the model
Returns the predictions back to the client.

Calling the endpoint and passing the required payload for the model will generate the predictions from the model.

A sample application code is shown below. This code calls the SageMaker-trained model, and Instana monitors the traces as the Traceloop backend URL is configured to Instana. To see the detailed architecture of this integration please refer to AI Observability architecture. The training model used in this example is written from Tutorial for building models with Notebook Instances document.

Export the environment variables before running the sample application code.

To export traces and metrics directly to the Instana backend (agentless mode), run the following command:

export TRACELOOP_BASE_URL=<instana-agent-host>:4317
export TRACELOOP_HEADERS="x-instana-key=<agent-key>,x-instana-host=<instana-host>"
export OTEL_EXPORTER_OTLP_INSECURE=false

To export traces and metrics to Instana by using an Instana agent, run the following command:

export TRACELOOP_BASE_URL=<instana-agent-host>:4317
export OTEL_EXPORTER_OTLP_INSECURE=true

After setting the environment variables, create a Python file and add this sample application code.

import boto3
from traceloop.sdk import Traceloop
from traceloop.sdk.decorators import task, workflow
import numpy as np
import pandas as pd

# This code is written expecting a sample model that is expecting a 12 field payload.

# The details will be available with the app_name in Instana UI
Traceloop.init(app_name="sagemaker_endpoint", disable_batch=True) # initializing the traceloop

# Creates a SageMaker Runtime client using Boto3. Allows you to interact with deployed SageMaker endpoints.
runtime = boto3.Session().client('sagemaker-runtime', region_name="us-east-1")


@workflow(name="sagemaker_endpoint")
def invoke_sagemaker_endpoint(endpoint):
    # 1 row with 12 features
    X_test = ([[1, 8.2, 10.5, 1.5, 9.6, 1.3, 11.2, 2.9, 100, 11, 12.0, 13]]) # payload being passed

    # Converting the data into CSV format
    df = pd.DataFrame(X_test)
    csv_input = df.to_csv(index=False, header=False)
    csv_bytes = csv_input.encode('utf-8') # Converting to byte as expected by sagemaker

    # Sends a POST request to the SageMaker endpoint using the invoke_endpoint() method
    response = runtime.invoke_endpoint(
        EndpointName=endpoint,
        ContentType='text/csv',
        Body=csv_bytes
    )

    # Reads the model’s prediction result from the response and decodes it from bytes to string.
    return response['Body'].read().decode('utf-8')

invoke_sagemaker_endpoint('<sagemaker-endpoint>') # Endpoint passed

SageMaker metrics and traces in Instana

The monitoring and trace details of the model will be available under, Applications >>> Service >>> "app_name", wherein the "app_name" is the name of the app that you initialized in the Traceloop.

This dashboard provides a summary of SageMaker endpoint call.

Here, the user can see the metrics and details under "sagemaker_endpoint", which is the app_name in the sample code. It provides information about the calls/second being made to the endpoint, the number of error calls, and the latency details.

The below image shows the real time flow of data which helps to check the performance.

The below image shows the trace details of the call.

The demonstrated sample application code shows that traces are automatically exported to Instana via Traceloop, since the Traceloop backend URL is initialized with the Instana backend. When the sample application code contacts the SageMaker endpoint, the SageMaker instrumentor, a part of the Traceloop SDK, captures and delivers data to the backend, which is Instana. This data is then displayed in Instana's dashboard in the form of tables and traces reflecting how Instana can be used to monitor Amazon SageMaker traces.

#Stan'sCorner
#Tracing

0 comments

25 views

Permalink

https://community.ibm.com/community/user/blogs/swathi-kannan/2025/06/09/monitoring-traces-of-amazon-sagemaker-using-instan

Instana

Instana

Monitoring traces of Amazon SageMaker AI using Instana

By Swathi Kannan posted Mon June 09, 2025 09:51 AM

Prerequisites

Steps to run Python application to interact with the model and collect the traces

SageMaker metrics and traces in Instana

Permalink

Additional
Resources

Office

Quick Links

Instana

Instana

Monitoring traces of Amazon SageMaker AI using Instana

By Swathi Kannan posted Mon June 09, 2025 09:51 AM

Prerequisites

Steps to run Python application to interact with the model and collect the traces

SageMaker metrics and traces in Instana

Permalink

Additional Resources

Office

Quick Links

Additional
Resources