DevOps Automation

DevOps Automation

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.


#DevOps
 View Only

Initializing data in a PostgreSQL Container from JSON file using jq

By Hanna Sabu Mampallil posted 4 days ago

  

Containerized development has made it super easy to spin up PostgreSQL instances in seconds. But it is not always useful, as most databases need to be populated with data. For data population in databases, there are various ways, the simplest one is using INSERT queries. But if we have a lot of data to be added, using INSERT statements is painful. If the data is large and structured, it is best to use the JSON format.

How PostgreSQL Initialization works in Docker:

When a PostgreSQL image is created, it automatically runs any .sql, .sh, etc. files in the location:

/docker-entrypoint-initdb.d/

Note: These files run only when the database is created for the first time. If the database already exists, you must clear the volume for the initialization files to run again.

However, PostgreSQL will not import the JSON file placed in the given directory.  So, the JSON data should be transformed for the docker to automatically execute it.

The primary approach is to convert the JSON data into SQL INSERT queries and place them inside an SQL file, which PostgreSQL will run automatically. But as the data increases, the conversion becomes tedious and difficult to maintain.

Using jq for JavaScript Processing:

jq is a command-line JSON processor. It works like sed or grep for JSON data. This tool can be used to slice, filter, map, and transform values from JSON data.

We can proceed by creating a shell script, utilizing jq

1. Create a JSON file data.json and add the data given below

[
    {
        "name": "Data1",
        "email":"data1@email.com"
    },
    {
        "name": "Data2",
        "email":"data2@email.com"
    },
    {
        "name": "Data3",
        "email":"data3@email.com"
    },
    {
        "name": "Data4",
        "email":"data4@email.com"
    }
]

2. Create a SQL file init.sql, and add the query

CREATE TABLE Sample (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    email VARCHAR(255) NOT NULL UNIQUE
);

This defines a table Sample with the columns name and email.

3. Create a .sh file script.sh in the same directory as the JSON file

cat ./docker-entrypoint-initdb.d/data.json | jq -c '.[]' | while read -r row; 
do
    name=$(echo "$row" | jq -r '.name')
    email=$(echo "$row" | jq -r '.email')
    psql -U "postgres" -d "postgres" -c \
    "INSERT INTO Sample (name,email) VALUES ('$name','$email');"
done

The location ./docker-entrypoint-initdb.d/data.json is used because the script will run inside the docker container, the JSON file will be mapped to this path.

Using jq -c .’[]’ each JSON element is extracted as:

{"name":"Data1","email":"data1@email.com"}

Each extracted element is stored in row, and it loops, until all JSON objects are processed.

In each loop, name and email are extracted and embedded in an INSERT query, which is executed with the help of  psql command.

4.  Create a Dockerfile in the same directory :

FROM postgres
RUN apt-get update && apt-get install -y jq
COPY ./ /docker-entrypoint-initdb.d/

The jq package must be installed for the shell script to work.  Make sure the shell script, data.json, and Dockerfile are in the same directory

5. Build the image using the Dockerfile

docker build -t postgres-db .

6. Run the container

docker run --name test_db -e POSTGRES_PASSWORD=your-password -p 5432:5432 –rm postgres-db

Now, after running the container, you should see four INSERT 0 1 entries in your docker logs.

Conclusion

Populating a containerized PostgreSQL database can be challenging when the data is large or stored in JSON. Since PostgreSQL cannot directly import JSON data, a shell script with jq, which converts JSON data into INSERT queries, provides the solution. This approach automates the database initialization, with less manual effort, and is also reusable.

0 comments
5 views

Permalink