Amazon SageMaker is a fully managed service provided by Amazon Web Services (AWS) that enables data scientists, developers, and machine learning (ML) engineers to build, train, and deploy machine learning models at scale. You can now monitor Amazon SageMaker traces using Instana.
Instana is an application performance monitoring (APM), and observability platform designed for cloud-native, microservices-based architectures. It offers real-time, end-to-end visibility into distributed systems by automatically discovering services, tracing requests across service boundaries, and collecting metrics, logs, and traces.
To monitor SageMaker traces using Instana, you can use the SageLoop Instrumentor - a Python-based SDK (software development toolkit) provided by Traceloop. This SDK enables automated tracing and observability for AWS SageMaker model training and inference pipelines. It acts as a middleware layer that hooks into your code to collect telemetry data such as metrics, traces, and metadata—without requiring major changes to your existing workflows.
The SageMaker Instrumentor from Traceloop automatically records and monitors model predictions, allowing users to observe key details such as:
All of this can be achieved without writing any additional code.
To access more detailed telemetry - like SageMaker model's traces and metrics - you can view them on Instana dashboards by configuring the Instana host as the Traceloop backend URL.
Prerequisites
Before you start monitoring your SageMaker model's performance with Traceloop and Instana, make sure you have the following pre-requisites in place:
-
The SageMaker AI environment is set up and configured with a trained model and an endpoint exposed.
Steps to run Python application to interact with the model and collect the traces
1. Create the virtual environment and activate it.
$ python3 -m venv .sagemaker
$ source .sagemaker/bin/activate
2. Install the required packages.
$ pip install -r requirements.txt
The requirement file will have the following basic requirements
boto3==1.38.7
traceloop-sdk==0.38.7
# add these if you are using the below example code
numpy==1.25.2
pandas==1.5.3
-
After creating and training the model using SageMaker AI, you can expose the trained model as an endpoint. In AWS SageMaker, an endpoint is a fully managed, real-time REST API that hosts your machine learning model, allowing it to be used for inference (predictions). When you deploy a trained model, SageMaker sets up an endpoint which:
-
Is always active and ready to receive requests.
-
Accepts input data (like a JSON payloads)
-
Runs the model
-
Returns the predictions back to the client.
-
Export the environment variables before running the sample application code.
export TRACELOOP_BASE_URL=<instana-agent-host>:4317
export TRACELOOP_HEADERS="x-instana-key=<agent-key>,x-instana-host=<instana-host>"
export OTEL_EXPORTER_OTLP_INSECURE=false
export TRACELOOP_BASE_URL=<instana-agent-host>:4317
export OTEL_EXPORTER_OTLP_INSECURE=true
import boto3
from traceloop.sdk import Traceloop
from traceloop.sdk.decorators import task, workflow
import numpy as np
import pandas as pd
# This code is written expecting a sample model that is expecting a 12 field payload.
# The details will be available with the app_name in Instana UI
Traceloop.init(app_name="sagemaker_endpoint", disable_batch=True) # initializing the traceloop
# Creates a SageMaker Runtime client using Boto3. Allows you to interact with deployed SageMaker endpoints.
runtime = boto3.Session().client('sagemaker-runtime', region_name="us-east-1")
@workflow(name="sagemaker_endpoint")
def invoke_sagemaker_endpoint(endpoint):
# 1 row with 12 features
X_test = ([[1, 8.2, 10.5, 1.5, 9.6, 1.3, 11.2, 2.9, 100, 11, 12.0, 13]]) # payload being passed
# Converting the data into CSV format
df = pd.DataFrame(X_test)
csv_input = df.to_csv(index=False, header=False)
csv_bytes = csv_input.encode('utf-8') # Converting to byte as expected by sagemaker
# Sends a POST request to the SageMaker endpoint using the invoke_endpoint() method
response = runtime.invoke_endpoint(
EndpointName=endpoint,
ContentType='text/csv',
Body=csv_bytes
)
# Reads the model’s prediction result from the response and decodes it from bytes to string.
return response['Body'].read().decode('utf-8')
invoke_sagemaker_endpoint('<sagemaker-endpoint>') # Endpoint passed
SageMaker metrics and traces in Instana
The monitoring and trace details of the model will be available under, Applications >>> Service >>> "app_name", wherein the "app_name" is the name of the app that you initialized in the Traceloop.
This dashboard provides a summary of SageMaker endpoint call.

Here, the user can see the metrics and details under "sagemaker_endpoint", which is the app_name in the sample code. It provides information about the calls/second being made to the endpoint, the number of error calls, and the latency details.
The below image shows the real time flow of data which helps to check the performance.

The below image shows the trace details of the call.

The demonstrated sample application code shows that traces are automatically exported to Instana via Traceloop, since the Traceloop backend URL is initialized with the Instana backend. When the sample application code contacts the SageMaker endpoint, the SageMaker instrumentor, a part of the Traceloop SDK, captures and delivers data to the backend, which is Instana. This data is then displayed in Instana's dashboard in the form of tables and traces reflecting how Instana can be used to monitor Amazon SageMaker traces.
#Stan'sCorner
#Tracing