watsonx.data

watsonx.data

Put your data to work, wherever it resides, with the hybrid, open data lakehouse for AI and analytics

 View Only

How to create a table by uploading file in watsonx.data

By Ahmad Muzaffar Bin Baharudin posted Wed November 15, 2023 04:28 AM

  


In this tutorial, we'll walk through the process of creating a new data table in watsonx.data by uploading a file. The supported file formats for this feature are .csv, .txt, .parquet, and .json. Please note that the tutorial is based on watsonx.data version 1.0.3, and features may vary in future versions.

Step 1: Confirm or Create Schema
Navigate to your 'Data Manager' and confirm the schema under which you want to create a new data table.
If the schema doesn't exist, create a new schema.

For this demo, I will create a new table under my catalog called 'iceberg_data' and schema called ' sample_data'.

Step 2: Upload File
In the 'Data Manage', locate and click on 'Create table from file' to initiate the file upload process.
 

Select the data file you want to upload. Ensure it is in one of the supported formats: .csv, .txt, .parquet, or .json.


Confirm your data and click next to proceed.


Select your target schema and name your table.


Step 3: Completion
Once the upload is complete, go to 'Data Manager' to verify that your new data table is properly created in the target schema, and your data is correctly populated in your table.


Cautions

1. When uploading parquet file
In watsonx.data version 1.0.3, BigInt data type is not yet supported. This might raise error when uploading parquet file that contains BigInt type.

If you are converting csv file to parquet file, you may use python code below. I'm using pyarrow package to convert csv to parquet file because this method can avoid int32 to be automatically converted into in64 (BigInt).
import pyarrow.csv as csv
import pyarrow.parquet as pq

# Read csv as pyarrow table
tbl = csv.read_csv('cars.csv')
tbl

# Write as parquet data without partition
pq.write_to_dataset(tbl, root_path='cars_parquet')
You can get this sample dataset and python code from this github repository: https://github.com/muzibm/watsonx.data/tree/main/create_table_using_file
 
2. When uploading json file
Ensure that your .json file has a correct nested structure. For JSON file, you must enclose the content in []. You may find the documentation here.
It should follow a structure like this:
[
 {
   "Car": "Chevrolet Chevelle Malibu",
   "MPG": 18,
   "Cylinders": 8,
   "Displacement": 307,
   "Horsepower": 130,
   "Weight": 3504,
   "Acceleration": 12,
   "Model": 70,
   "Origin": "US"
 },
 {
   "Car": "Buick Skylark 320",
   "MPG": 15,
   "Cylinders": 8,
   "Displacement": 350,
   "Horsepower": 165,
   "Weight": 3693,
   "Acceleration": 11.5,
   "Model": 70,
   "Origin": "US"
 }
]

Generating a new table through file upload is a straightforward process within watsonx.data. Various techniques for data ingestion are available in watsonx.data, and I'll delve into this in my upcoming blog post.


Muz
Ecosystem Technical Enablement Specialist | Data & AI
IBM APAC Ecosystem Technical Enablement Team


#watsonx.data

0 comments
30 views

Permalink