Co-authors: James Cho, Katie Le
In this article, we will understand how to create an DataSynchronization job from scratch and run it.
What is DataSynchronization Community Edition Tool?
DataSynchronization Community Edition is a free-to-use, single node Docker container, high performance data integration tool which is easy-to-use, that supports real-time and offline data synchronization of massive data out of the box. It combines the features of Apache SeaTunnel and Carbon UI to create a perfect data integration tool for the users. It is a standout solution for enterprises or users seeking to navigate the complexities of modern data management. DataSynchronization Community Edition comes out as a one-step solution for your data integration and data synchronization needs.
NOTE: Currently, DataSynchronization Community Edition is supported on Windows and MacOs platforms only.
Prerequisites
To use add DataSynchronization Community Edition, you need to pull the docker image of DataSynchronization from Ghcr. Complete step by step tutorial for doing that is available here.
From the host machine:
Make sure the configuration file exists inside the container, if not, use docker cp or place the config file in the shared mount volume between the host and the container and run the following command:
docker exec -it bash -c '$SEATUNNEL_HOME/bin/seatunnel.sh --config /path/to/config/file' datasynchronization
where $SEATUNNEL_HOME
is /opt/seatunnel
From inside the container:
- Run the following command to get an interactive bash shell in the container:
docker exec -it datasynchronization bash
$SEATUNNEL_HOME/bin/seatunnel.sh --config /path/to/config/file
where $SEATUNNEL_HOME
is /opt/seatunnel
How to create an DataSynchronization job using UI?
Step 1: Access the UI using the following link: http://<host_ip>:8801/ui/
where <host_ip>
is the IP address of the host machine.
The login credentials to access the UI are:
Username: admin
Password: IBM@DATA#SYNC@2024
Step 2: Navigating the home page
On the Home Screen, you can see various tabs on the left panel, which provides different functionalities. The first step that we need to do is add datasources.
Step 3: Add datasources by clicking on Datasource tab on the left panel. You can choose from a wide list of connectors.
Once you select a connector, you can add the details like name, URL, driver details and so on.
Step 4: Next is to create an DataSynchronization task. For that, click on "Task Synchronization on the left panel. Click on "Create Synchronization Task" and enter the details.
This will open up a canvas where we can select data sources from the left panel by simply dragging and dropping to the canvas.
We can also choose from various transformation options by the same drag-and-drop mechanism.
Click on save after the creation of task is done.
Step 5: Run and monitor the task
You can see the status of task as executed and finished successfully.
Conclusion
Using DataSynchronization is incredibly user-friendly and intuitive. Whether you are a data engineer, analyst, or business user, DataSynchronization allows you to achieve your desired results easily and efficiently.
Further read
Links to-
Community blog
Github Registry Link