watsonx.data

 View Only

make queries againsts watsonx.data (using presto engine) using API

By YUXIANG CAO posted Sun March 31, 2024 12:44 AM

  

In watsonx.data, you can connect to the Presto server in multiple ways. They are console, presto-cli, Java, and python. You can find details from this link. In this blog, I would like to demonstrate an API method to run queries against Presto server.

The environment is setup as bellow:

  1.  a watonx.data environment ( version 1.1.3). This environment is setup based on standalone installation method 
  2.  username: cpadmin
  3.  password: passw0rd
  4.  catalog: tpch
  5.  schema:tiny
  6.  query: select count(*) from tpch.tiny.nation;

to make success API call , we would need gather information about presto server hostname. This can be done via infrastructure manager or checking Openshift route. 

to use Openshift route , from  output of command (oc get route -A | grep -i presto)we could find our presto server hostname is ibm-lh-lakehouse-presto-01-presto-svc-cpd-instance.apps.6607c3d638faef001f926dd2.cloud.techzone.ibm.com

bash-3.2$ oc get route -A | grep -i presto
cpd-instance               ibm-lh-lakehouse-presto-01-presto-svc       ibm-lh-lakehouse-presto-01-presto-svc-cpd-instance.apps.6607c3d638faef001f926dd2.cloud.techzone.ibm.com                                        ibm-lh-lakehouse-presto-01-presto-svc              8443                   reencrypt              None

Or you can open your infrastructure manage page and click the engine you want to query against , and you can find same details from external host

Next, we are going to see how we can query against presto. The presto documentation is quite useful https://prestodb.io/docs/current/develop/client-protocol.html. So from this page, we understand that to make query against presto and we would need endpoint /v1/statement  with POST method 
Also this page mentioned that X-Presto-User must be presented in the header. To check query status , it will be GET method and nextURI .This  should be based previous POST response.
Since the response from server will be in JSON format, I am going to use jq command to make output easy to understand 
bash-3.2$ export presto_user=cpadmin
bash-3.2$ export presto_password=passw0rd
bash-3.2$ export sql="select count(*) from tpch.tiny.nation"
bash-3.2$  curl -X POST \
> https://ibm-lh-lakehouse-presto-01-presto-svc-cpd-instance.apps.6607c3d638faef001f926dd2.cloud.techzone.ibm.com/v1/statement \
>  -u "$presto_user:$presto_password" \
>  -H "X-Presto-User: cpadmin" \
>  -H "X-Presto-Catalog: tpch" \
>  -H "X-Presto-Schema: tiny" \
> -d "$sql" | jq .
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   904  100   867  100    37    919     39 --:--:-- --:--:-- --:--:--   958
{
  "id": "20240331_041813_00023_kivd2",
  "infoUri": "https://ibm-lh-lakehouse-presto-01-presto-svc-cpd-instance.apps.6607c3d638faef001f926dd2.cloud.techzone.ibm.com/ui/query.html?20240331_041813_00023_kivd2",
  "nextUri": "https://ibm-lh-lakehouse-presto-01-presto-svc-cpd-instance.apps.6607c3d638faef001f926dd2.cloud.techzone.ibm.com/v1/statement/queued/20240331_041813_00023_kivd2/1?slug=x2389511ebb074b55a44819eb06ce41f2",
  "stats": {
    "state": "WAITING_FOR_PREREQUISITES",
    "waitingForPrerequisites": true,
    "queued": false,
    "scheduled": false,
    "nodes": 0,
    "totalSplits": 0,
    "queuedSplits": 0,
    "runningSplits": 0,
    "completedSplits": 0,
    "cpuTimeMillis": 0,
    "wallTimeMillis": 0,
    "waitingForPrerequisitesTimeMillis": 0,
    "queuedTimeMillis": 0,
    "elapsedTimeMillis": 0,
    "processedRows": 0,
    "processedBytes": 0,
    "peakMemoryBytes": 0,
    "peakTotalMemoryBytes": 0,
    "peakTaskTotalMemoryBytes": 0,
    "spilledBytes": 0
  },
  "warnings": []
}
we have successfully submit a quey via API and to check response ,we will need to use GET method to check with nextUri 
the command we are going to run in this will be following: 
curl -X GET  https://ibm-lh-lakehouse-presto-01-presto-svc-cpd-instance.apps.6607c3d638faef001f926dd2.cloud.techzone.ibm.com/v1/statement/queued/20240331_041813_00023_kivd2/1?slug=x2389511ebb074b55a44819eb06ce41f2 -u "$presto_user:$presto_password" | jq .
Note that, depends on query nature and resource, we will need a few round of this command to check the status periodically and eventually once it is completed, it will show output like below and in the "data" section it show 25 which is the actual query result. 
bash-3.2$ curl -X GET  https://ibm-lh-lakehouse-presto-01-presto-svc-cpd-instance.apps.6607c3d638faef001f926dd2.cloud.techzone.ibm.com/v1/statement/queued/20240331_041813_00023_kivd2/1?slug=x2389511ebb074b55a44819eb06ce41f2 -u "$presto_user:$presto_password" | jq .
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  4028  100  4028    0     0   4284      0 --:--:-- --:--:-- --:--:--  4285
{
  "id": "20240331_041813_00023_kivd2",
  "infoUri": "https://ibm-lh-lakehouse-presto-01-presto-svc-cpd-instance.apps.6607c3d638faef001f926dd2.cloud.techzone.ibm.com/ui/query.html?20240331_041813_00023_kivd2",
  "partialCancelUri": "https://ibm-lh-lakehouse-presto-01-single-blue-0.ibm-lh-presto-headless-svc.cpd-instance.svc.cluster.local:8443/v1/stage/20240331_041813_00023_kivd2.0",
  "nextUri": "https://ibm-lh-lakehouse-presto-01-presto-svc-cpd-instance.apps.6607c3d638faef001f926dd2.cloud.techzone.ibm.com/v1/statement/executing/20240331_041813_00023_kivd2/1?slug=x2389511ebb074b55a44819eb06ce41f2",
  "columns": [
    {
      "name": "_col0",
      "type": "bigint",
      "typeSignature": {
        "rawType": "bigint",
        "typeArguments": [],
        "literalArguments": [],
        "arguments": []
      }
    }
  ],
  "data": [
    [
      25
    ]
  ],
  "stats": {
    "state": "RUNNING",
    "waitingForPrerequisites": false,
    "queued": false,
    "scheduled": true,
    "nodes": 1,
    "totalSplits": 20,
    "queuedSplits": 0,
    "runningSplits": 17,
    "completedSplits": 3,
    "cpuTimeMillis": 4481,
    "wallTimeMillis": 4549,
    "waitingForPrerequisitesTimeMillis": 1,
    "queuedTimeMillis": 0,
    "elapsedTimeMillis": 5515,
    "processedRows": 25,
    "processedBytes": 0,
    "peakMemoryBytes": 18,
    "peakTotalMemoryBytes": 228,
    "peakTaskTotalMemoryBytes": 228,
    "spilledBytes": 0,
    "rootStage": {
      "stageId": "0",
      "state": "RUNNING",
      "done": false,
      "nodes": 1,
      "totalSplits": 17,
      "queuedSplits": 0,
      "runningSplits": 17,
      "completedSplits": 0,
      "cpuTimeMillis": 2,
      "wallTimeMillis": 2,
      "processedRows": 2,
      "processedBytes": 62,
      "subStages": [
        {
          "stageId": "1",
          "state": "FINISHED",
          "done": true,
          "nodes": 1,
          "totalSplits": 3,
          "queuedSplits": 0,
          "runningSplits": 0,
          "completedSplits": 3,
          "cpuTimeMillis": 4479,
          "wallTimeMillis": 4547,
          "processedRows": 25,
          "processedBytes": 0,
          "subStages": []
        }
      ]
    },
    "runtimeStats": {
      "getColumnHandleTimeNanos": {
        "name": "getColumnHandleTimeNanos",
        "unit": "NANO",
        "sum": 271169,
        "count": 1,
        "max": 271169,
        "min": 271169
      },
      "S0-taskScheduledTimeNanos": {
        "name": "S0-taskScheduledTimeNanos",
        "unit": "NANO",
        "sum": 1731919,
        "count": 1,
        "max": 1731919,
        "min": 1731919
      },
      "S1-taskScheduledTimeNanos": {
        "name": "S1-taskScheduledTimeNanos",
        "unit": "NANO",
        "sum": 4547235748,
        "count": 1,
        "max": 4547235748,
        "min": 4547235748
      },
      "optimizerTimeNanos": {
        "name": "optimizerTimeNanos",
        "unit": "NANO",
        "sum": 46459601,
        "count": 1,
        "max": 46459601,
        "min": 46459601
      },
      "getColumnMetadataTimeNanos": {
        "name": "getColumnMetadataTimeNanos",
        "unit": "NANO",
        "sum": 75635,
        "count": 1,
        "max": 75635,
        "min": 75635
      },
      "getMaterializedViewTimeNanos": {
        "name": "getMaterializedViewTimeNanos",
        "unit": "NANO",
        "sum": 21438,
        "count": 1,
        "max": 21438,
        "min": 21438
      },
      "S1-driverCountPerTask": {
        "name": "S1-driverCountPerTask",
        "unit": "NONE",
        "sum": 3,
        "count": 1,
        "max": 3,
        "min": 3
      },
      "getLayoutTimeNanos": {
        "name": "getLayoutTimeNanos",
        "unit": "NANO",
        "sum": 979542,
        "count": 2,
        "max": 854800,
        "min": 124742
      },
      "S0-taskBlockedTimeNanos": {
        "name": "S0-taskBlockedTimeNanos",
        "unit": "NANO",
        "sum": 51834439243,
        "count": 1,
        "max": 51834439243,
        "min": 51834439243
      },
      "S1-taskElapsedTimeNanos": {
        "name": "S1-taskElapsedTimeNanos",
        "unit": "NANO",
        "sum": 4568648909,
        "count": 1,
        "max": 4568648909,
        "min": 4568648909
      },
      "fragmentPlanTimeNanos": {
        "name": "fragmentPlanTimeNanos",
        "unit": "NANO",
        "sum": 6227556,
        "count": 1,
        "max": 6227556,
        "min": 6227556
      },
      "S0-taskQueuedTimeNanos": {
        "name": "S0-taskQueuedTimeNanos",
        "unit": "NANO",
        "sum": 19967541,
        "count": 1,
        "max": 19967541,
        "min": 19967541
      },
      "S1-getSplitsTimeNanos": {
        "name": "S1-getSplitsTimeNanos",
        "unit": "NANO",
        "sum": 9278,
        "count": 1,
        "max": 9278,
        "min": 9278
      },
      "logicalPlannerTimeNanos": {
        "name": "logicalPlannerTimeNanos",
        "unit": "NANO",
        "sum": 6259265,
        "count": 1,
        "max": 6259265,
        "min": 6259265
      },
      "S0-driverCountPerTask": {
        "name": "S0-driverCountPerTask",
        "unit": "NONE",
        "sum": 17,
        "count": 1,
        "max": 17,
        "min": 17
      },
      "S0-taskElapsedTimeNanos": {
        "name": "S0-taskElapsedTimeNanos",
        "unit": "NANO",
        "sum": 0,
        "count": 1,
        "max": 0,
        "min": 0
      },
      "S1-taskQueuedTimeNanos": {
        "name": "S1-taskQueuedTimeNanos",
        "unit": "NANO",
        "sum": 22107437,
        "count": 1,
        "max": 22107437,
        "min": 22107437
      },
      "getViewTimeNanos": {
        "name": "getViewTimeNanos",
        "unit": "NANO",
        "sum": 286030,
        "count": 1,
        "max": 286030,
        "min": 286030
      },
      "getTableHandleTimeNanos": {
        "name": "getTableHandleTimeNanos",
        "unit": "NANO",
        "sum": 35069,
        "count": 1,
        "max": 35069,
        "min": 35069
      }
    },
    "progressPercentage": 15
  },
  "warnings": []
}
in this output , we get another nextUri 
https://ibm-lh-lakehouse-presto-01-presto-svc-cpd-instance.apps.6607c3d638faef001f926dd2.cloud.techzone.ibm.com/v1/statement/executing/20240331_041813_00023_kivd2/1?slug=x2389511ebb074b55a44819eb06ce41f2
we could use this one with GET method to check the query status and as below output shows, the query is completed 
bash-3.2$ curl -X GET https://ibm-lh-lakehouse-presto-01-presto-svc-cpd-instance.apps.6607c3d638faef001f926dd2.cloud.techzone.ibm.com/v1/statement/executing/20240331_041813_00023_kivd2/1?slug=x2389511ebb074b55a44819eb06ce41f2 -u "$presto_user:$presto_password" | jq .
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3659  100  3659    0     0   3811      0 --:--:-- --:--:-- --:--:--  3811
{
  "id": "20240331_041813_00023_kivd2",
  "infoUri": "https://ibm-lh-lakehouse-presto-01-presto-svc-cpd-instance.apps.6607c3d638faef001f926dd2.cloud.techzone.ibm.com/ui/query.html?20240331_041813_00023_kivd2",
  "columns": [
    {
      "name": "_col0",
      "type": "bigint",
      "typeSignature": {
        "rawType": "bigint",
        "typeArguments": [],
        "literalArguments": [],
        "arguments": []
      }
    }
  ],
  "stats": {
    "state": "FINISHED",
    "waitingForPrerequisites": false,
    "queued": false,
    "scheduled": true,
    "nodes": 1,
    "totalSplits": 20,
    "queuedSplits": 0,
    "runningSplits": 0,
    "completedSplits": 20,
    "cpuTimeMillis": 4487,
    "wallTimeMillis": 4560,
    "waitingForPrerequisitesTimeMillis": 1,
    "queuedTimeMillis": 0,
    "elapsedTimeMillis": 5517,
    "processedRows": 25,
    "processedBytes": 0,
    "peakMemoryBytes": 18,
    "peakTotalMemoryBytes": 228,
    "peakTaskTotalMemoryBytes": 228,
    "spilledBytes": 0,
    "rootStage": {
      "stageId": "0",
      "state": "FINISHED",
      "done": true,
      "nodes": 1,
      "totalSplits": 17,
      "queuedSplits": 0,
      "runningSplits": 0,
      "completedSplits": 17,
      "cpuTimeMillis": 8,
      "wallTimeMillis": 13,
      "processedRows": 3,
      "processedBytes": 93,
      "subStages": [
        {
          "stageId": "1",
          "state": "FINISHED",
          "done": true,
          "nodes": 1,
          "totalSplits": 3,
          "queuedSplits": 0,
          "runningSplits": 0,
          "completedSplits": 3,
          "cpuTimeMillis": 4479,
          "wallTimeMillis": 4547,
          "processedRows": 25,
          "processedBytes": 0,
          "subStages": []
        }
      ]
    },
    "runtimeStats": {
      "getColumnHandleTimeNanos": {
        "name": "getColumnHandleTimeNanos",
        "unit": "NANO",
        "sum": 271169,
        "count": 1,
        "max": 271169,
        "min": 271169
      },
      "S0-taskScheduledTimeNanos": {
        "name": "S0-taskScheduledTimeNanos",
        "unit": "NANO",
        "sum": 13363877,
        "count": 1,
        "max": 13363877,
        "min": 13363877
      },
      "S1-taskScheduledTimeNanos": {
        "name": "S1-taskScheduledTimeNanos",
        "unit": "NANO",
        "sum": 4547235748,
        "count": 1,
        "max": 4547235748,
        "min": 4547235748
      },
      "optimizerTimeNanos": {
        "name": "optimizerTimeNanos",
        "unit": "NANO",
        "sum": 46459601,
        "count": 1,
        "max": 46459601,
        "min": 46459601
      },
      "getColumnMetadataTimeNanos": {
        "name": "getColumnMetadataTimeNanos",
        "unit": "NANO",
        "sum": 75635,
        "count": 1,
        "max": 75635,
        "min": 75635
      },
      "getMaterializedViewTimeNanos": {
        "name": "getMaterializedViewTimeNanos",
        "unit": "NANO",
        "sum": 21438,
        "count": 1,
        "max": 21438,
        "min": 21438
      },
      "S1-driverCountPerTask": {
        "name": "S1-driverCountPerTask",
        "unit": "NONE",
        "sum": 3,
        "count": 1,
        "max": 3,
        "min": 3
      },
      "getLayoutTimeNanos": {
        "name": "getLayoutTimeNanos",
        "unit": "NANO",
        "sum": 979542,
        "count": 2,
        "max": 854800,
        "min": 124742
      },
      "S0-taskBlockedTimeNanos": {
        "name": "S0-taskBlockedTimeNanos",
        "unit": "NANO",
        "sum": 77259328664,
        "count": 1,
        "max": 77259328664,
        "min": 77259328664
      },
      "S1-taskElapsedTimeNanos": {
        "name": "S1-taskElapsedTimeNanos",
        "unit": "NANO",
        "sum": 4568648909,
        "count": 1,
        "max": 4568648909,
        "min": 4568648909
      },
      "fragmentPlanTimeNanos": {
        "name": "fragmentPlanTimeNanos",
        "unit": "NANO",
        "sum": 6227556,
        "count": 1,
        "max": 6227556,
        "min": 6227556
      },
      "S0-taskQueuedTimeNanos": {
        "name": "S0-taskQueuedTimeNanos",
        "unit": "NANO",
        "sum": 19967541,
        "count": 1,
        "max": 19967541,
        "min": 19967541
      },
      "S1-getSplitsTimeNanos": {
        "name": "S1-getSplitsTimeNanos",
        "unit": "NANO",
        "sum": 9278,
        "count": 1,
        "max": 9278,
        "min": 9278
      },
      "logicalPlannerTimeNanos": {
        "name": "logicalPlannerTimeNanos",
        "unit": "NANO",
        "sum": 6259265,
        "count": 1,
        "max": 6259265,
        "min": 6259265
      },
      "S0-driverCountPerTask": {
        "name": "S0-driverCountPerTask",
        "unit": "NONE",
        "sum": 17,
        "count": 1,
        "max": 17,
        "min": 17
      },
      "S0-taskElapsedTimeNanos": {
        "name": "S0-taskElapsedTimeNanos",
        "unit": "NANO",
        "sum": 5428745689,
        "count": 1,
        "max": 5428745689,
        "min": 5428745689
      },
      "S1-taskQueuedTimeNanos": {
        "name": "S1-taskQueuedTimeNanos",
        "unit": "NANO",
        "sum": 22107437,
        "count": 1,
        "max": 22107437,
        "min": 22107437
      },
      "getViewTimeNanos": {
        "name": "getViewTimeNanos",
        "unit": "NANO",
        "sum": 286030,
        "count": 1,
        "max": 286030,
        "min": 286030
      },
      "getTableHandleTimeNanos": {
        "name": "getTableHandleTimeNanos",
        "unit": "NANO",
        "sum": 35069,
        "count": 1,
        "max": 35069,
        "min": 35069
      }
    },
    "progressPercentage": 100
  },
  "warnings": []
}
In addition, we could confirm this query from Presto UI

#watsonx.data
#PrestoEngine
0 comments
9 views

Permalink