watsonx.data

Put your data to work, wherever it resides, with the hybrid, open data lakehouse for AI and analytics

View Only

Back to Blog List

Leading by Contribution: IBM’s Ongoing Investment in Open-Source Presto

By Ali LeClerc posted 7 hours ago

Blog Contributors:

Anant Aneja, Software Engineer at IBM

Yabin Ma, Software Developer at IBM

Ali LeClerc, Chair, Presto Community Team and Open Source at IBM

Ethan Zhang, Program Director, Open-Source Lakehouse Engines at IBM

----

At IBM, we believe open source is the engine of innovation. Presto, as a fast and flexible SQL engine for interactive analytics, continues to evolve rapidly thanks to community contributions. Over the past year, IBM engineers have focused on driving Presto forward across security, performance, native execution, and modern table formats like Iceberg and we’re proud of not only the work we’ve put in, but how we’re helping drive the vision of the best engine for the Open Data Lakehouse.

From upgrading critical dependencies and strengthening security to advancing Prestissimo (Presto’s native C++ execution engine) and expanding connector capabilities, our work aims to make Presto more powerful, production-ready, and enterprise-secure for everyone using it.

1. Security and Dependency Upgrades

We contributed extensively to strengthening Presto’s security posture by:

Upgrading critical dependencies (e.g. netty, okhttp, snappy-java, commons-beanutils, lucene, logback-core) to address multiple CVEs.
Fixing security vulnerabilities in connectors like Kafka, MySQL, Hive, Elasticsearch, and Accumulo.
Improving cryptographic protocols (e.g. pbkdf2 hashing) and adding security headers to Presto Router and UI static resources.
Adding access control SPI for column masking and row filtering

2. Iceberg Connector Enhancements

Key Iceberg-related contributions include:

Adding support for new metadata columns ($deleted, $delete_file_path, $data_sequence_number).
Introducing UPDATE SQL statement support for Iceberg tables.
Implementing cache invalidation procedures for Iceberg statistics and manifest files.
Supporting rollback, fast forward, and set_current_snapshot procedures.
Enhancing performance with split size configuration, manifest caching, and optimized table listing.

3. Native Execution (Prestissimo) Improvements

We significantly advanced Presto C++/Prestissimo with:

Adding support for Arrow Flight connectors.
Migrate all legacy Presto Java product tests to Native engine.
Fixing plan checker compatibility for CTAS and INSERT queries.
Adding Prometheus metrics collection and default concurrency optimizations.
Exposing APIs to clean up async data cache.

4. Connector Enhancements

We made improvements across multiple connectors:

MongoDB: Added support for JSON type and ALTER TABLE statements.
Redshift: Fixed VARBYTE handling and resolved missing dependency issues.
MySQL: Added GEOMETRY type support.
Oracle: Improved performance with fetch_size optimization.
Kafka: Added optional SASL support and upgraded versions to address security issues.
ElasticSearch: Upgraded to version 7 and improved exception handling.

5. SQL Engine and Optimizer Improvements

Notable contributions include:

Adding new optimizer rules (e.g. ReplaceRedundantJoinWithProject, RemoveRedundantJoin, Exchange before GroupId).
Enhancing array functions (array_top_n returns NULL on invalid n).
Adding overflow detection for INTERVAL operations.
Supporting SHOW CREATE SCHEMA DDL statement.

6. Router and Scheduler Enhancements

Added a new custom router scheduler plugin for plan checking.
Enabled dynamic config refresh and improved fallback logic for user errors.
Introduced RouterRequestInfo in schedulers for better URL destination handling.

7. Documentation and Maintenance

Added and revised documentation for connector properties, optimizer debugging, Iceberg, and cache configuration.
Migrated maven publishing to Central Portal and maintained CI/stable release pipelines.

8. SPI and Event Listener Enhancements

Enhanced event listener APIs for data lineage tracking and added query type metadata.
Introduced SPI for delegating row expression optimizers.
Enabled multiple query event listeners for extensibility.

9. General Bug Fixes and Performance Optimizations

Fixed bugs in Hive symlink table access, Iceberg statistics caching, variadic functions, and ORC cache invalidation.
Optimized reads for symlink tables and reduced resource utilization in tests.

10. Feature Highlights

Some standout features from IBM’s contributions:

Dynamic Catalog Management: Ability to dynamically add/remove catalogs without restarting Presto.
Native Type Manager: For improved type system integration in Prestissimo.
Pluggable Authenticators: Added support for custom Presto authenticators and pluggable JWT authenticators.
Arrow Flight Connector: Built connector to enable high-performance data transfer between Presto and external systems using the Apache Arrow Flight protocol.

Presto continues to evolve as a powerful, flexible engine for interactive analytics and IBM is proud to be one of the driving forces behind its growth. Our engineering work spans everything from native execution and Iceberg support to core SQL engine improvements, security, and performance optimizations. These efforts ensure that Presto remains a fast, reliable, and open platform for modern data and AI workloads.

If you’re curious to try Presto for yourself, check out the open-source project at prestodb.io/getting-started. And if you’re looking for the power of Presto without the complexity of managing a large-scale system, try out watsonx.data, IBM’s open data lakehouse built with Presto at its core.

#watsonx.data
#PrestoEngine

0 comments

2 views

Permalink

https://community.ibm.com/community/user/blogs/ali-leclerc/2025/07/15/ibms-ongoing-investment-in-presto

watsonx.data

watsonx.data

Leading by Contribution: IBM’s Ongoing Investment in Open-Source Presto

By Ali LeClerc posted 7 hours ago

1. Security and Dependency Upgrades

2. Iceberg Connector Enhancements

3. Native Execution (Prestissimo) Improvements

4. Connector Enhancements

5. SQL Engine and Optimizer Improvements

6. Router and Scheduler Enhancements

7. Documentation and Maintenance

8. SPI and Event Listener Enhancements

9. General Bug Fixes and Performance Optimizations

10. Feature Highlights

Permalink

Additional
Resources

Office

Quick Links

watsonx.data

watsonx.data

Leading by Contribution: IBM’s Ongoing Investment in Open-Source Presto

By Ali LeClerc posted 7 hours ago

1. Security and Dependency Upgrades

2. Iceberg Connector Enhancements

3. Native Execution (Prestissimo) Improvements

4. Connector Enhancements

5. SQL Engine and Optimizer Improvements

6. Router and Scheduler Enhancements

7. Documentation and Maintenance

8. SPI and Event Listener Enhancements

9. General Bug Fixes and Performance Optimizations

10. Feature Highlights

Permalink

Additional Resources

Office

Quick Links

Additional
Resources