watsonx.data

watsonx.data

Put your data to work, wherever it resides, with the hybrid, open data lakehouse for AI and analytics

 View Only

Exploring VTS (Vector Transport Service): An Open-Source Tool for Moving Vector Data

By Gifi Siby posted 21 days ago

  

Introduction

Vector Transport Service (VTS) is an open-source tool designed to simplify the migration and synchronisation of vector data across a wide range of platforms. It supports moving data from popular sources like Elasticsearch, Qdrant, PostgreSQL, Pinecone, and more into vector databases such as Milvus and Zilliz Cloud. VTS offers both real-time streaming and offline batch import modes, making it adaptable to different use cases.

GitHub Address: https://github.com/zilliztech/vts

Core Capabilities of VTS

VTS inherits the high throughput and low latency characteristics of Apache SeaTunnel, while extending support to vector and unstructured data. This makes it a powerful tool for building AI application data pipelines, enabling real-time synchronisation, transformation, and loading of vector data efficiently.

Key capabilities include:

  1. Rich, extensible connectors
  2. Unified stream and batch processing for real-time synchronisation and offline batch imports
  3. Distributed snapshot support for ensuring data consistency
  4. High performance, low latency, and scalability
  5. Real-time monitoring and visual management

Primary Use Cases

  • Vector database migration: A core strength of VTS is its ability to migrate vector data, essential for AI and machine learning applications that handle large volumes of high-dimensional data.
  • AI application data pipelines: Build scalable pipelines tailored to AI workloads.
  • Real-time vector data synchronisation
  • VTS supports the ingestion of raw or semi-structured text data (e.g., JSON, CSV), and can convert it into vectors using embedding model plugins.
  • Cross-platform data integration: VTS enables seamless data migration between traditional relational databases and modern vector databases.

Vector Transport Service also introduces vector-specific capabilities such as:

  • Support for multiple data sources

  • Schema matching

  • Basic data validation

Supported Connectors

VTS supports a wide range of connectors, making it compatible with various data sources and storage systems. Current supported connectors include (but are not limited to):

  • Milvus

  • Pinecone

  • Qdrant

  • PostgreSQL

  • Elasticsearch

  • Tencent Vector DB

Supported Transforms

VTS provides flexible data transformation operations, allowing users to preprocess or restructure data before migration. Example include:

  • TablePathMapper – for renaming tables or changing table paths

  • FieldMapper – for adding or deleting columns

  • Embedding – for applying text vectorisation or generating vector representations of text

Supported Data Types

VTS can handle a variety of complex data types and operations, including:

  • Float Vectors

  • Sparse Float Vectors

  • Multi-vector columns

  • Dynamic columns

  • Upsert and Bulk Insert (optimised for large offline batches)

These capabilities enhance its effectiveness in managing sophisticated data migration workflows, especially in AI and vector-based systems.

Supported Deployments

The tool is compatible with both SaaS and On-Prem deployment environments. 

How does Vector Transport Service (VTS) work?

Prerequisites

  • Docker installed
  • Access to source and target databases
  • Required credentials and permissions
  • Milvus Version >= 2.3.6

Obtain VTS

  1. Pull the VTS Image
    docker pull zilliz/vector-transport-service:latest
    docker run -it zilliz/vector-transport-service:latest /bin/bash

  2. Configure Your Migration Create a configuration file (e.g., migration.conf):
    
    env {
      parallelism = 1
      job.mode = "BATCH"
    }
    
    source {
      # Source configuration (e.g., Milvus, Elasticsearch, etc.)
      Milvus {
        url = "https://your-source-url:19530"
        token = "your-token"
        database = "default"
        collections = ["your-collection"]
        batch_size = 100
      }
    }
    
    sink {
      # Target configuration
      Milvus {
        url = "https://your-target-url:19530"
        token = "your-token"
        database = "default"
        batch_size = 10
      }
    } 
  3. Run the Migration
    Cluster Mode (Recommended): Runs in a distributed environment using the SeaTunnel cluster. Supports parallel execution for large-scale or production migrations.
    # Start the cluster
    mkdir -p ./logs
    ./bin/seatunnel-cluster.sh -d
    
    # Submit the job
    ./bin/seatunnel.sh --config ./migration.conf

    Local Mode: Runs the migration locally on a single machine. Simple to set up and use—ideal for development, testing, or small-scale migrations.
    ./bin/seatunnel.sh --config ./migration.conf -m local
    

Usage Overview

Category Description
Deployment Self-hosted and user-managed
Ease of Use Requires manual deployment and ongoing maintenance
Supported Data Sources Milvus, Elasticsearch, OpenSearch, Pinecone, Qdrant, PostgreSQL, and other major vector databases
Real-time Sync Supported (configuration required manually)
Network Requirement Compatible with private networks
Cost Free and open-source; users bear infrastructure and operational costs
Best for Organizations with existing infrastructure that prefer on-premises, self-managed solutions

Performance

In a real-world demo (Pinecone to Milvus migration), VTS achievement is claimed as:

  • Sync rate: 2,961 vectors/sec

  • Total vectors: 100 million

  • Time taken: ~9.5 hours

  • Environment: 4 CPU cores, 8 GB RAM

Future Support: Unstructured Data Sources

VTS is actively expanding its support for unstructured data. Currently supported:

  • Shopify data types

Planned support:

  • PDFs

  • Google Docs

  • Slack data

  • Images and text

FAQs:

  1. How does VTS work? 
    -->  VTS automates the migration process by extracting data from your source system, transforming it to match the target schema, and then loading it into your destination vector database.
  2. Does VTS support zero downtime migration?
    -->  Yes, VTS supports real-time, zero-downtime migration by creating an initial snapshot of your data and continuously synchronizing changes. This ensures your applications remain operational throughout the migration process.
  3. Are there any limitations or requirements for zero downtime migration with VTS?
    Currently, zero downtime migration is only supported for data migration from Milvus to Zilliz Cloud. To enable this feature, you need to manually deploy Milvus CDC (Change Data Capture) for continuous data synchronization.


#watsonx.data

0 comments
67 views

Permalink