watsonx.data

watsonx.data

Put your data to work, wherever it resides, with the hybrid, open data lakehouse for AI and analytics

 View Only

Presto: Observation "system.runtime" schema provided by System Connecter

By MIKA SHIMOGAWA posted 2 days ago

  

Presto is one engine of watsonx.data. Presto System Connector is implemented in the Open Source Presto and provides information and metrics for queries running on Presto clusters. It also includes "kill_query (system.runtime.kill_query)" procedure.

In this article, I want to show the observation results of "runtime" schema in "system" catalog using watsonx.data 2.1.2. Here shows the structure of each table obtained with presto-cli and the output from Presto (Java) mainly. 

To Japanese Readers : This is the translation of Presto: システム・コネクターの system.runtimeにアクセスしてみた。 in Qiita.

Presto System Connector at watsonx.data

Presto's system connector is implemented by the Open Source Presto and provides information and metrics running on the Presto cluster. When you add a Presto engine in watsonx.data, System connector is automatically configured in the Presto engine. User can access "system"  catalog provided by the Presto System Connector using regular SQL queries. 

Using the watsonx.data Data manager, "system" catalog is listed in "System monitoring data" as follows:

In this article, I want to show my observation results of  "runtime" schema in "system" catalog. Here are the structures of each table and example output from Presto (Java) using presto-cli mainly. You can get the same output using watsonx.data Query workspace.

I confirmed the table structures in system.runtime schema of Presto (C++) are same as that of Presto (Java).

system.runtime

"system.runtime" schema has 4 tables:

table explanation
nodes The nodes table contains the list of visible nodes in the Presto cluster along with their status.
queries The queries table contains information about currently and recently running queries on the Presto cluster. From this table you can find out the original query text (SQL), the identity of the user who ran the query and performance information about the query including how long the query was queued and analyzed.
tasks The tasks table contains information about the tasks involved in a Presto query including where they were executed and how many rows and bytes each task processed.
transactions The transactions table contains the list of currently open transactions and related metadata. This includes information such as the create time, idle time, initialization parameters, and accessed catalogs.

 

I am going to introduce each table in detail. 

 

system.runtime.nodes

"system.runtime.nodes" table contains the list of visible nodes in the Presto cluster along with their status.

Table structure

The table structure is as follows. I omitted  "Extra" and "Comment" columns because these columns have no value. I add  "values observed" column to show values I observed in my environment as examples.


presto> describe system.runtime.nodes; Column | Type | Precision | Scale | Length | values observed --------------+---------+-----------+-------+------------+------------- node_id | varchar | NULL | NULL | 2147483647 | http_uri | varchar | NULL | NULL | 2147483647 | https://ibm-lh-lakehouse... node_version | varchar | NULL | NULL | 2147483647 | 0.286 coordinator | boolean | NULL | NULL | NULL | true / false state | varchar | NULL | NULL | 2147483647 | active node_type | varchar | NULL | NULL | 2147483647 | NORMAL (6 rows)

 

Output of Presto (Java) engine: In my watsonx.data , Presto (Java) runs in an environment where the coordinator and worker live together in one pod. So "system.runtime.nodes" shows 1 row and coordinator column become "ture".

presto> select * from system.runtime.nodes ;
               node_id                | http_uri                                                       | node_version | coordinator | state  | node_type
--------------------------------------+----------------------------------------------------------------+--------------+-------------+--------+-----------
 536b979f-5e94-4cd2-bfab-45be1a1aee3a | https://ibm-lh-lakehouse-presto512-single-blue-0.(domain):8443 | 0.286        | true        | active | NORMAL
(1 row)

 

Output of Presto (C++) engine: In my watsonx.data, Presto (C++) consists of one Coordinator and three Workers. "system.runtime.nodes" shows 4 lines: focusing on coordinator column.

  • ture is set on 1 row.
  • false is set on other 3 rows.

presto> select * from system.runtime.nodes;
               node_id                | http_uri                                                                   | node_version | coordinator | state  | node_type
--------------------------------------+----------------------------------------------------------------------------+--------------+-------------+--------+-----------
 4846690c-05e6-439b-9a10-0712d5e515d0 | https://ibm-lh-lakehouse-prestissimo578-prestissimo-worker-1.(domain):7443 | 0.286        | false       | active | NORMAL
 a4cc17ce-158c-4e39-beaa-0cc1603219d0 | https://ibm-lh-lakehouse-prestissimo578-coordinator-blue-0.(domain):8443   | 0.286        | true        | active | NORMAL
 aa63bd77-8a92-43ad-bd7f-9844354cf6c9 | https://ibm-lh-lakehouse-prestissimo578-prestissimo-worker-0.(domain):7443 | 0.286        | false       | active | NORMAL
 beac420a-0007-4259-99e9-ff7c439fabb3 | https://ibm-lh-lakehouse-prestissimo578-prestissimo-worker-2.(domain):7443 | 0.286        | false       | active | NORMAL
(4 rows)

  

system.runtime.queries

"system.runtime.queries" table contains information about currently and recently running queries on the Presto cluster. From this table you can find out the original query text (SQL), user who ran the query , and performance information including how long the query was queued  or analyzed.

Table structure

The table structure of system.runtime.queries is as follows. I omitted  "Extra" and "Comment" columns because these columns have no value. I add  "values observed" column to show values I observed in my environment as examples.


presto> describe system.runtime.queries ; Column | Type | Precision | Scale | Length | values observed -------------------+----------------+-----------+-------+------------+--------- query_id | varchar | NULL | NULL | 2147483647 + state | varchar | NULL | NULL | 2147483647 + RUNNING / FINISHED / FAILED / PLANNING user | varchar | NULL | NULL | 2147483647 + source | varchar | NULL | NULL | 2147483647 + presto-cli / wxd-system / wxd-sql ... query | varchar | NULL | NULL | 2147483647 + resource_group_id | array(varchar) | NULL | NULL | NULL + [global] queued_time_ms | bigint | 10 | NULL | NULL + analysis_time_ms | bigint | 10 | NULL | NULL + created | timestamp | NULL | NULL | NULL + started | timestamp | NULL | NULL | NULL + last_heartbeat | timestamp | NULL | NULL | NULL + end | timestamp | NULL | NULL | NULL + (12 rows)

 

Example output : Since it is horizontally long , I will introduce it in three parts.

(1) First 4 columns query_id, state, user, source, resource_group_id


presto>  select query_id, state, user , source , resource_group_id  from system.runtime.queries order by query_id desc limit 5;
          query_id           |  state   |   user    |   source   | resource_group_id
-----------------------------+----------+-----------+------------+-------------------
 20250422_061937_00187_drami | RUNNING  | cpadmin   | presto-cli | [global]
 20250422_061848_00186_drami | FINISHED | cpadmin   | presto-cli | [global]
 20250422_061740_00184_drami | FINISHED | cpadmin   | presto-cli | [global]
 20250422_061639_00183_drami | FAILED   | prestoadm | NULL       | [global]
 20250422_061638_00182_drami | FINISHED | prestoadm | NULL       | [global]
(5 rows)

 

(2) Columns query_id, user, source, query: 2 rows of query are displayed executed by "user02" using "presto-cli".

presto> select query_id, user, source , query from  system.runtime.queries where user='user02' order by query_id desc limit 2 ;

         query_id            |  user  |   source   |                                        query
-----------------------------+--------+------------+--------------------------------------------------------------------------------------
 20250422_055253_00163_drami | user02 | presto-cli | SELECT table_name FROM information_schema.tables WHERE table_schema = 'hsf1_no_part'
 20250422_055244_00162_drami | user02 | presto-cli | select                                                                              +
                             |        |            |  l_returnflag,                                                                      +
                             |        |            |  l_linestatus,                                                                      +
                             |        |            |  sum(l_quantity) as sum_qty,                                                        +
                             |        |            |  sum(l_extendedprice) as sum_base_price,                                            +
                             |        |            |  sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,                         +
                             |        |            |  sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,               +
                             |        |            |  avg(l_quantity) as avg_qty,                                                        +
                             |        |            |  avg(l_extendedprice) as avg_price,                                                 +
                             |        |            |  avg(l_discount) as avg_disc,                                                       +
                             |        |            |  count(*) as count_order                                                            +
                             |        |            | from                                                                                +
                             |        |            |  lineitem                                                                           +
                             |        |            | where                                                                               +
(Skip)

 

(3) Columns query_id, queued_time_ms,analysis_time_ms, created, started, last_heartbeat, end: Timestamp of start or end , etc. the duration (ms) of each phase.

presto> select query_id , queued_time_ms,analysis_time_ms ,created , started , last_heartbeat, "end" from system.runtime.queries where user='user02' order by query_id desc limit 2;
          query_id           | queued_time_ms | analysis_time_ms |         created         |         started         |     last_heartbeat      |           end
-----------------------------+----------------+------------------+-------------------------+-------------------------+-------------------------+-------------------------
 20250422_055253_00163_drami |              0 |              703 | 2025-04-22 05:52:54.019 | 2025-04-22 05:52:54.123 | 2025-04-22 05:52:55.517 | 2025-04-22 05:52:55.617
 20250422_055244_00162_drami |              0 |              686 | 2025-04-22 05:52:44.631 | 2025-04-22 05:52:44.947 | 2025-04-22 05:53:02.644 | 2025-04-22 05:53:02.716
(2 rows)

  

system.runtime.tasks

"system.runtime.tasks" table contains information about the tasks involved in a Presto query including where they were executed and how many rows and bytes each task processed.

Table structure


presto> describe system.runtime.tasks ;
         Column          |   Type    | Extra | Comment | Precision | Scale |   Length
-------------------------+-----------+-------+---------+-----------+-------+------------
 node_id                 | varchar   |       |         | NULL      | NULL  | 2147483647
 task_id                 | varchar   |       |         | NULL      | NULL  | 2147483647
 stage_execution_id      | varchar   |       |         | NULL      | NULL  | 2147483647
 stage_id                | varchar   |       |         | NULL      | NULL  | 2147483647
 query_id                | varchar   |       |         | NULL      | NULL  | 2147483647
 state                   | varchar   |       |         | NULL      | NULL  | 2147483647
 splits                  | bigint    |       |         |        10 | NULL  | NULL
 queued_splits           | bigint    |       |         |        10 | NULL  | NULL
 running_splits          | bigint    |       |         |        10 | NULL  | NULL
 completed_splits        | bigint    |       |         |        10 | NULL  | NULL
 split_scheduled_time_ms | bigint    |       |         |        10 | NULL  | NULL
 split_cpu_time_ms       | bigint    |       |         |        10 | NULL  | NULL
 split_blocked_time_ms   | bigint    |       |         |        10 | NULL  | NULL
 raw_input_bytes         | bigint    |       |         |        10 | NULL  | NULL
 raw_input_rows          | bigint    |       |         |        10 | NULL  | NULL
 processed_input_bytes   | bigint    |       |         |        10 | NULL  | NULL
 processed_input_rows    | bigint    |       |         |        10 | NULL  | NULL
 output_bytes            | bigint    |       |         |        10 | NULL  | NULL
 output_rows             | bigint    |       |         |        10 | NULL  | NULL
 physical_written_bytes  | bigint    |       |         |        10 | NULL  | NULL
 created                 | timestamp |       |         | NULL      | NULL  | NULL
 start                   | timestamp |       |         | NULL      | NULL  | NULL
 last_heartbeat          | timestamp |       |         | NULL      | NULL  | NULL
 end                     | timestamp |       |         | NULL      | NULL  | NULL
(24 rows)

 

Example Output  : This is an example information about a query (query_id='20250422_065234_00233_drami'). Since it is horizontally long, I will introduce it in four parts.

(1) The first 6 columns node_id , task_id, stage_execution_id ,stage_id,query_id, state: You can see Presto executed this query in 5 tasks.

presto> select node_id , task_id , stage_execution_id ,stage_id,query_id, state from system.runtime.tasks where query_id='20250422_065234_00233_drami';
               node_id                |               task_id               |       stage_execution_id        |           stage_id            |          query_id           |  state
--------------------------------------+-------------------------------------+---------------------------------+-------------------------------+-----------------------------+----------
 536b979f-5e94-4cd2-bfab-45be1a1aee3a | 20250422_065234_00233_drami.2.0.0.0 | 20250422_065234_00233_drami.2.0 | 20250422_065234_00233_drami.2 | 20250422_065234_00233_drami | FINISHED
 536b979f-5e94-4cd2-bfab-45be1a1aee3a | 20250422_065234_00233_drami.3.0.0.0 | 20250422_065234_00233_drami.3.0 | 20250422_065234_00233_drami.3 | 20250422_065234_00233_drami | FINISHED
 536b979f-5e94-4cd2-bfab-45be1a1aee3a | 20250422_065234_00233_drami.4.0.0.0 | 20250422_065234_00233_drami.4.0 | 20250422_065234_00233_drami.4 | 20250422_065234_00233_drami | FINISHED
 536b979f-5e94-4cd2-bfab-45be1a1aee3a | 20250422_065234_00233_drami.1.0.0.0 | 20250422_065234_00233_drami.1.0 | 20250422_065234_00233_drami.1 | 20250422_065234_00233_drami | FINISHED
 536b979f-5e94-4cd2-bfab-45be1a1aee3a | 20250422_065234_00233_drami.0.0.0.0 | 20250422_065234_00233_drami.0.0 | 20250422_065234_00233_drami.0 | 20250422_065234_00233_drami | FINISHED
(5 rows)

 

(2) Column task_id ,splits, queued_splits, running_splits, completed_splits, split_cpu_time_ms, split_blocked_time_ms: Information about the splits for each task

presto> select task_id ,splits, queued_splits , running_splits , completed_splits , split_cpu_time_ms , split_blocked_time_ms from system.runtime.tasks where query_id='20250422_065234_00233_drami';
               task_id               | splits | queued_splits | running_splits | completed_splits | split_cpu_time_ms | split_blocked_time_ms
-------------------------------------+--------+---------------+----------------+------------------+-------------------+-----------------------
 20250422_065234_00233_drami.2.0.0.0 |     37 |             0 |              0 |               37 |              2258 |                148500
 20250422_065234_00233_drami.3.0.0.0 |     34 |             0 |              0 |               34 |               824 |                 59482
 20250422_065234_00233_drami.4.0.0.0 |      1 |             0 |              0 |                1 |               166 |                     0
 20250422_065234_00233_drami.1.0.0.0 |     32 |             0 |              0 |               32 |                47 |                205241
 20250422_065234_00233_drami.0.0.0.0 |     17 |             0 |              0 |               17 |                10 |                109855
(5 rows)

 

(3) Columns task_id ,raw_input_bytes , raw_input_rows , processed_input_bytes, processed_input_rows , output_bytes , output_rows : Amount of Processing for Each Task

presto> select task_id ,raw_input_bytes , raw_input_rows , processed_input_bytes, processed_input_rows , output_bytes , output_rows from system.runtime.tasks where query_id='20250422_065234_00233_drami’;
               task_id               | raw_input_bytes | raw_input_rows | processed_input_bytes | processed_input_rows | output_bytes | output_rows
-------------------------------------+-----------------+----------------+-----------------------+----------------------+--------------+-------------
 20250422_065234_00233_drami.2.0.0.0 |        52865068 |        6148341 |              53453026 |              6148341 |       372026 |       11620
 20250422_065234_00233_drami.3.0.0.0 |        11426397 |        1530142 |              11486639 |              1530142 |      4119528 |      147126
 20250422_065234_00233_drami.4.0.0.0 |          330572 |         150000 |                330572 |               150000 |       542556 |       30142
 20250422_065234_00233_drami.1.0.0.0 |          372034 |          11620 |                429940 |                11620 |         4480 |         160
 20250422_065234_00233_drami.0.0.0.0 |            5088 |            160 |                  4480 |                  160 |          280 |          10
(5 rows)

 

(4) Columns task_id , physical_written_bytes, created, start, last_heartbeat, "end": Physical write of each task and the time of start and end, etc.


presto> select task_id , physical_written_bytes, created, start , last_heartbeat , "end" from system.runtime.tasks where query_id='20250422_065234_00233_drami';
               task_id               | physical_written_bytes |         created         |          start          |     last_heartbeat      |           end
-------------------------------------+------------------------+-------------------------+-------------------------+-------------------------+-------------------------
 20250422_065234_00233_drami.2.0.0.0 |                      0 | 2025-04-22 06:52:35.521 | 2025-04-22 06:52:35.728 | 2025-04-22 06:52:41.719 | 2025-04-22 06:52:41.813
 20250422_065234_00233_drami.3.0.0.0 |                      0 | 2025-04-22 06:52:35.515 | 2025-04-22 06:52:35.630 | 2025-04-22 06:52:38.619 | 2025-04-22 06:52:38.741
 20250422_065234_00233_drami.4.0.0.0 |                      0 | 2025-04-22 06:52:35.435 | 2025-04-22 06:52:35.625 | 2025-04-22 06:52:36.313 | 2025-04-22 06:52:36.333
 20250422_065234_00233_drami.1.0.0.0 |                      0 | 2025-04-22 06:52:35.258 | 2025-04-22 06:52:35.325 | 2025-04-22 06:52:41.713 | 2025-04-22 06:52:41.923
 20250422_065234_00233_drami.0.0.0.0 |                      0 | 2025-04-22 06:52:35.314 | 2025-04-22 06:52:35.323 | 2025-04-22 06:52:42.018 | 2025-04-22 06:52:42.017
(5 rows)

 

system.runtime.transactions

"system.runtime.transactions" table contains the list of currently open transactions and related metadata. This includes information such as the create time, idle time, initialization parameters, and accessed catalogs.

Table structure


presto> describe system.runtime.transactions;
       Column        |      Type      | Extra | Comment | Precision | Scale |   Length
---------------------+----------------+-------+---------+-----------+-------+------------
 transaction_id      | varchar        |       |         | NULL      | NULL  | 2147483647
 isolation_level     | varchar        |       |         | NULL      | NULL  | 2147483647
 read_only           | boolean        |       |         | NULL      | NULL  | NULL
 auto_commit_context | boolean        |       |         | NULL      | NULL  | NULL
 create_time         | timestamp      |       |         | NULL      | NULL  | NULL
 idle_time_secs      | bigint         |       |         |        10 | NULL  | NULL
 written_catalog     | varchar        |       |         | NULL      | NULL  | 2147483647
 catalogs            | array(varchar) |       |         | NULL      | NULL  | NULL
(8 rows)

 

Example of output Since it is long horizontally, I will introduce it in two parts.

(1) Columns transaction_id , isolation_level, read_only , auto_commit_context, create_time

presto> select transaction_id , isolation_level, read_only , auto_commit_context, create_time from system.runtime.transactions order by create_time desc;
            transaction_id            | isolation_level  | read_only | auto_commit_context |       create_time
--------------------------------------+------------------+-----------+---------------------+-------------------------
 f8e84c00-21a3-40f4-8c32-81f0291356d7 | READ UNCOMMITTED | false     | true                | 2025-04-28 06:49:59.837
 b3d277b5-b99b-4323-af2d-8a91dd10b2f5 | READ UNCOMMITTED | false     | true                | 2025-04-28 06:49:55.063
(2 rows)

 

(2) Column transaction_id , create_time , idle_time_secs , written_catalog , catalogs


presto> select transaction_id , create_time , idle_time_secs , written_catalog , catalogs from system.runtime.transactions order by create_time desc;
            transaction_id            |       create_time       | idle_time_secs | written_catalog |                        catalogs
--------------------------------------+-------------------------+----------------+-----------------+--------------------------------------------------------
 229a66ea-4359-409f-bfcb-2460d4d41ab3 | 2025-04-28 06:50:05.169 |              0 | NULL            | [system, $info_schema@system, $system@system]
 b3d277b5-b99b-4323-af2d-8a91dd10b2f5 | 2025-04-28 06:49:55.063 |              0 | NULL            | [hive_data, $system@hive_data, $info_schema@hive_data]
(2 rows)

Environment

  • OCP Version : 4.16
  • CP4D 5.1.2 (watsonx.data 2.1.2)
  • Configured Presto engines
    • P01(Presto Java) - Type : Presto (Java) v0.286 / Size : Starter / Coordinator Nodes : 1  Worker Nodes : 0
    • PCpp (Presto(C++)) - Type : Presto (C++) v0.286 / Size : Starter / Coordinator Nodes : 1 Worker Nodes : 3


#watsonx.data
#PrestoEngine

0 comments
22 views

Permalink