Introduction
In this blog, we’ll walk through how to use Milvus' bulk-import feature to seamlessly load large-scale datasets into collections. Whether you're dealing with millions of entries or just looking for a faster and more efficient data import method, this blog covers how to import data from CSV files using the Import API.
Prerequisites
Before starting, ensure you have:
If you have an Milvus service running in watsonx.data, you can get Milvus host details from Infrastructure manager (Navigate to Infrastructure manager → click on Milvus service → obtain the host information from the HTTPS host .
fig-1
Overview of Milvus Bulk Import API
The Import API consists of following main steps:
Let’s break down the main steps involved in using the Import API:
- Milvus Collection setup
Ensure that our target collection is already created in Milvus with the correct schema . The fields in your CSV file must match the schema defined in this collection (including vector and scalar fields).
sample collection schema-
field1 = FieldSchema(name=“emp_id”, dtype=DataType.INT64, description="int64", is_primary=True, auto_id=False)
field2 = FieldSchema(name=“embeddings”, dtype=DataType.FLOAT_VECTOR, description="float vector", dim=384,
is_primary=False)
Field Name |
Data Type |
Description |
Is Primary |
Auto ID |
Max Length |
emp_id |
INT64 |
int64 |
Yes |
No |
|
embeddings |
FLOAT_VECTOR |
float vector |
No |
- |
384 |
sample create collection request:
curl --location --request POST '{host}:{port}/api/v1/collection' \
--header 'Authorization: Bearer {{token}}' \
--header 'Content-Type: application/json' \
--data '{
"collection_name": "employees",
"consistency_level": 1,
"schema": {
"autoID": false,
"description": "Employee search",
"fields": [
{
"name": "emp_id",
"description": "emp_id id",
"is_primary_key": true,
"autoID": false,
"data_type": 5
},
{
"name": "embeddings",
"description": "embedded vector of employee details",
"autoID": false,
"data_type": 101,
"is_primary_key": false,
"type_params": [
{
"key": "dim",
"value": "384"
}
]
}
],
"name": "employees"
}
}'
Note:
-
Host and port details can be obtained from the web console (refer to Fig. 1).
-
If you are trying to connect to the Milvus server on Cloud Pak for Data (CPD), you also need to include the self-signed certificate from the Milvus server with your request.
command to generate certificate-
echo QUIT | openssl s_client -showcerts -connect <host>:443 | awk '/-----BEGIN CERTIFICATE-----/ {p=1}; p; /-----END CERTIFICATE-----/ {p=0}' > host.cert
2. Prepare CSV Files
Ensure that your CSV file is formatted correctly. Each file should contain the necessary data fields for vector insertion, typically each column will represent a the field name of Milvus collection.
For example, for the above collection , the csv file may look like this

Upload the csv file to the external bucket where Milvus is configured.
3. Invoke the Import api
Once the file is uploaded, trigger the import api using the /api/v1/import
endpoint. This call will reference the file path from the bucket and the collection name where the data should be imported.
sample request:
curl --location --request POST ‘{https_host}:{port}/api/v1/import' \
--header 'Authorization: Bearer {{token}}' \
--header 'Content-Type: application/json' \
--data '{
"collection_name": "employees",
"files": [
"bulk/employee_file.csv"
]
}
response-
200 - {'status': {}, 'tasks': [456807833073099946]}
We can monitor the import job progress using the task ID returned in the response using /api/v1/import/state
endpoint
Sample request:
curl --location --request GET ‘{{https_host}}:{{port}}/api/v1/import/state' \
--header 'Authorization: Bearer {{token}}' \
--header 'Content-Type: application/json' \
--data '{
"task": 456807833073116226
}'
response-
{'status': {}, 'state': 2, 'row_count': 100, 'infos': [{'key': 'failed_reason'}, {'key': 'progress_percent', 'value': '70'}], 'create_ts': 1744473242}
{'status': {}, 'state': 6, 'row_count': 100, 'infos': [{'key': 'failed_reason'}, {'key': 'progress_percent', 'value': '100'}], 'create_ts': 1744473242}
We can check the row_count and progress_percent to get the number of rows inserted and the job progress respectively.
Conclusion-
Using the Import API, you can seamlessly load massive amounts of data into Milvus collection with minimal effort. By defining your collection schema, uploading your CSV files to a supported storage bucket, and triggering the import via API calls, you can quickly populate your Milvus collections with millions of records.
Note- Milvus also supports Numpy (.npy
) and JSON (.json
) file types in place of .csv
for bulk import.
#watsonx.data