Data Integration

 View Only

Creating your first DataSynchronization job

By Harsh Mittal posted Mon January 08, 2024 04:34 PM

  

Co-authors: James Cho, Katie Le

In this article, we will understand how to create an DataSynchronization job from scratch and run it. 

What is DataSynchronization Community Edition Tool?

DataSynchronization Community Edition is a free-to-use, single node Docker container, high performance data integration tool which is easy-to-use, that supports real-time and offline data synchronization of massive data out of the box. It combines the features of Apache SeaTunnel and Carbon UI to create a perfect data integration tool for the users. It is a standout solution for enterprises or users seeking to navigate the complexities of modern data management. DataSynchronization Community Edition comes out as a one-step solution for your data integration and data synchronization needs.

NOTE: Currently, DataSynchronization Community Edition is supported on Windows and MacOs platforms only.

Prerequisites

To use add DataSynchronization Community Edition, you need to pull the docker image of DataSynchronization from Ghcr. Complete step by step tutorial for doing that is available here.

From the host machine:

Make sure the configuration file exists inside the container, if not, use docker cp or place the config file in the shared mount volume between the host and the container and run the following command:

docker exec -it bash -c '$SEATUNNEL_HOME/bin/seatunnel.sh --config /path/to/config/file' datasynchronization

where $SEATUNNEL_HOME is /opt/seatunnel

From inside the container:

  • Run the following command to get an interactive bash shell in the container:
docker exec -it datasynchronization bash
  • Create a config file if it doesn't exist.

  • Run the following command to deploy the connector configuration file:

$SEATUNNEL_HOME/bin/seatunnel.sh --config /path/to/config/file

where $SEATUNNEL_HOME is /opt/seatunnel


How to create an DataSynchronization job using UI?

Step 1: Access the UI using the following link: http://<host_ip>:8801/ui/

where <host_ip> is the IP address of the host machine.

The login credentials to access the UI are:

  • Username: admin
  • Password: IBM@DATA#SYNC@2024

Step 2: Navigating the home page

On the Home Screen, you can see various tabs on the left panel, which provides different functionalities. The first step that we need to do is add datasources.

Step 3: Add datasources by clicking on Datasource tab on the left panel. You can choose from a wide list of connectors.

Once you select a connector, you can add the details like name, URL, driver details and so on.

Step 4: Next is to create an DataSynchronization task. For that, click on "Task Synchronization on the left panel. Click on "Create Synchronization Task" and enter the details.

 

This will open up a canvas where we can select data sources from the left panel by simply dragging and dropping to the canvas.

We can also choose from various transformation options by the same drag-and-drop mechanism.

Click on save after the creation of task is done.

Step 5: Run and monitor the task

You can see the status of task as executed and finished successfully.

Conclusion

Using DataSynchronization is incredibly user-friendly and intuitive. Whether you are a data engineer, analyst, or business user, DataSynchronization allows you to achieve your desired results easily and efficiently.

Further read

Links to-

Community blog

Github Registry Link

4 comments
93 views

Permalink

Comments

Wed March 20, 2024 06:10 AM

Hi,Its support windows and MAC only As per above documentation

Wed March 20, 2024 04:01 AM

Hi Sanjay,

I'm on RHEL 9 with docker, but mac or Windows is of no use to me. logically it should work the same - principle of containers. I try a synchro with two SQL Server dB. If you want I can send you screenshots. 

jerzy

Tue March 19, 2024 12:54 AM

Hi Jerzy,

which  OS type ,you are trying? Please try with MAC or Windows,let me know

Mon February 12, 2024 12:45 PM

Hi,

The installation goes well, declaration of the source and target databases too, we can do the mapping but during execution there is always the "Unknown exception" message. So unusable :(