watsonx.data

Put your data to work, wherever it resides, with the hybrid, open data lakehouse for AI and analytics

View Only

Back to Blog List

How to Perform Bulk Insert in Milvus from a CSV File via the Import API

By Prabhu Nair posted Tue April 15, 2025 01:30 AM

Introduction

In this blog, we’ll walk through how to use Milvus' bulk-import feature to seamlessly load large-scale datasets into collections. Whether you're dealing with millions of entries or just looking for a faster and more efficient data import method, this blog covers how to import data from CSV files using the Import API.

Prerequisites

Before starting, ensure you have:

A running Milvus instance in watsonx.data (Make sure you're using Milvus version 2.5.0 or higher that supports CSV format)
Python 3.x
A sample CSV file formatted correctly for your collection schema
Valid credentials to access the Milvus server.

If you have an Milvus service running in watsonx.data, you can get Milvus host details from Infrastructure manager (Navigate to Infrastructure manager → click on Milvus service → obtain the host information from the HTTPS host .

fig-1

Overview of Milvus Bulk Import API

The Import API consists of following main steps:

. Milvus Collection Setup
Prepare CSV Files and Upload data files into bucket
Invoke the import api
Check the job status

Let’s break down the main steps involved in using the Import API:

Milvus Collection setup

Ensure that our target collection is already created in Milvus with the correct schema . The fields in your CSV file must match the schema defined in this collection (including vector and scalar fields).

sample collection schema-

field1 = FieldSchema(name=“emp_id”, dtype=DataType.INT64, description="int64", is_primary=True, auto_id=False)

field2 = FieldSchema(name=“embeddings”, dtype=DataType.FLOAT_VECTOR, description="float vector", dim=384,

is_primary=False)

Field Name	Data Type	Description	Is Primary	Auto ID	Max Length
emp_id	INT64	int64	Yes	No
embeddings	FLOAT_VECTOR	float vector	No	-	384

sample create collection request:

curl --location --request POST '{host}:{port}/api/v1/collection' \
  --header 'Authorization: Bearer {{token}}' \

 --header 'Content-Type: application/json' \
 --data '{
 "collection_name": "employees",
 "consistency_level": 1,
 "schema": {
 "autoID": false,
 "description": "Employee search",
 "fields": [
 {
 "name": "emp_id",
 "description": "emp_id id",
 "is_primary_key": true,
 "autoID": false,
 "data_type": 5
 },
 {
 "name": "embeddings",
 "description": "embedded vector of employee details",
 "autoID": false,
 "data_type": 101,
 "is_primary_key": false,
 "type_params": [
 {
 "key": "dim",
 "value": "384"
 }
 ]
 }
 ],
 "name": "employees"
 }
}'

Note:

Host and port details can be obtained from the web console (refer to Fig. 1).
If you are trying to connect to the Milvus server on Cloud Pak for Data (CPD), you also need to include the self-signed certificate from the Milvus server with your request.

command to generate certificate-

echo QUIT | openssl s_client -showcerts -connect <host>:443 | awk '/-----BEGIN CERTIFICATE-----/ {p=1}; p; /-----END CERTIFICATE-----/ {p=0}' > host.cert

2. Prepare CSV Files

Ensure that your CSV file is formatted correctly. Each file should contain the necessary data fields for vector insertion, typically each column will represent a the field name of Milvus collection.

For example, for the above collection , the csv file may look like this

Upload the csv file to the external bucket where Milvus is configured.

3. Invoke the Import api

Once the file is uploaded, trigger the import api using the /api/v1/import endpoint. This call will reference the file path from the bucket and the collection name where the data should be imported.

sample request:

curl --location --request POST ‘{https_host}:{port}/api/v1/import' \



--header 'Authorization: Bearer {{token}}' \



--header 'Content-Type: application/json' \



--data '{


          "collection_name": "employees",



    "files": [



        "bulk/employee_file.csv"

          ]



}

response-

200 - {'status': {}, 'tasks': [456807833073099946]}

We can monitor the import job progress using the task ID returned in the response using /api/v1/import/state endpoint

Sample request:

curl --location --request GET ‘{{https_host}}:{{port}}/api/v1/import/state' \


 --header 'Authorization: Bearer {{token}}' \



 --header 'Content-Type: application/json' \



 --data '{



    "task": 456807833073116226


}'

response-

{'status': {}, 'state': 2, 'row_count': 100, 'infos': [{'key': 'failed_reason'}, {'key': 'progress_percent', 'value': '70'}], 'create_ts': 1744473242}

{'status': {}, 'state': 6, 'row_count': 100, 'infos': [{'key': 'failed_reason'}, {'key': 'progress_percent', 'value': '100'}], 'create_ts': 1744473242}

We can check the row_count and progress_percent to get the number of rows inserted and the job progress respectively.

Conclusion-

Using the Import API, you can seamlessly load massive amounts of data into Milvus collection with minimal effort. By defining your collection schema, uploading your CSV files to a supported storage bucket, and triggering the import via API calls, you can quickly populate your Milvus collections with millions of records.

Note- Milvus also supports Numpy (.npy) and JSON (.json) file types in place of .csv for bulk import.

#watsonx.data

0 comments

55 views

Permalink

https://community.ibm.com/community/user/blogs/prabhu-nair/2025/04/12/how-to-perform-bulk-insert-in-milvus-from-a-csv-fi

watsonx.data

watsonx.data

How to Perform Bulk Insert in Milvus from a CSV File via the Import API

By Prabhu Nair posted Tue April 15, 2025 01:30 AM

Introduction

Prerequisites

sample create collection request:

command to generate certificate-

sample request:

Sample request:

Permalink

Additional
Resources

Office

Quick Links

watsonx.data

watsonx.data

How to Perform Bulk Insert in Milvus from a CSV File via the Import API

By Prabhu Nair posted Tue April 15, 2025 01:30 AM

Introduction

Prerequisites

sample create collection request:

command to generate certificate-

sample request:

Sample request:

Permalink

Additional Resources

Office

Quick Links

Additional
Resources