Blog Contributors:
Anant Aneja, Software Engineer at IBM
Yabin Ma, Software Developer at IBM
Ali LeClerc, Chair, Presto Community Team and Open Source at IBM
Ethan Zhang, Program Director, Open-Source Lakehouse Engines at IBM
----
At IBM, we believe open source is the engine of innovation. Presto, as a fast and flexible SQL engine for interactive analytics, continues to evolve rapidly thanks to community contributions. Over the past year, IBM engineers have focused on driving Presto forward across security, performance, native execution, and modern table formats like Iceberg and we’re proud of not only the work we’ve put in, but how we’re helping drive the vision of the best engine for the Open Data Lakehouse.
From upgrading critical dependencies and strengthening security to advancing Prestissimo (Presto’s native C++ execution engine) and expanding connector capabilities, our work aims to make Presto more powerful, production-ready, and enterprise-secure for everyone using it.
1. Security and Dependency Upgrades
We contributed extensively to strengthening Presto’s security posture by:
- Upgrading critical dependencies (e.g. netty, okhttp, snappy-java, commons-beanutils, lucene, logback-core) to address multiple CVEs.
- Fixing security vulnerabilities in connectors like Kafka, MySQL, Hive, Elasticsearch, and Accumulo.
- Improving cryptographic protocols (e.g. pbkdf2 hashing) and adding security headers to Presto Router and UI static resources.
- Adding access control SPI for column masking and row filtering
2. Iceberg Connector Enhancements
Key Iceberg-related contributions include:
- Adding support for new metadata columns ($deleted, $delete_file_path, $data_sequence_number).
- Introducing UPDATE SQL statement support for Iceberg tables.
- Implementing cache invalidation procedures for Iceberg statistics and manifest files.
- Supporting rollback, fast forward, and set_current_snapshot procedures.
- Enhancing performance with split size configuration, manifest caching, and optimized table listing.
3. Native Execution (Prestissimo) Improvements
We significantly advanced Presto C++/Prestissimo with:
- Adding support for Arrow Flight connectors.
- Migrate all legacy Presto Java product tests to Native engine.
- Fixing plan checker compatibility for CTAS and INSERT queries.
- Adding Prometheus metrics collection and default concurrency optimizations.
- Exposing APIs to clean up async data cache.
4. Connector Enhancements
We made improvements across multiple connectors:
- MongoDB: Added support for JSON type and ALTER TABLE statements.
- Redshift: Fixed VARBYTE handling and resolved missing dependency issues.
- MySQL: Added GEOMETRY type support.
- Oracle: Improved performance with fetch_size optimization.
- Kafka: Added optional SASL support and upgraded versions to address security issues.
- ElasticSearch: Upgraded to version 7 and improved exception handling.
5. SQL Engine and Optimizer Improvements
Notable contributions include:
- Adding new optimizer rules (e.g. ReplaceRedundantJoinWithProject, RemoveRedundantJoin, Exchange before GroupId).
- Enhancing array functions (array_top_n returns NULL on invalid n).
- Adding overflow detection for INTERVAL operations.
- Supporting SHOW CREATE SCHEMA DDL statement.
6. Router and Scheduler Enhancements
- Added a new custom router scheduler plugin for plan checking.
- Enabled dynamic config refresh and improved fallback logic for user errors.
- Introduced RouterRequestInfo in schedulers for better URL destination handling.
7. Documentation and Maintenance
- Added and revised documentation for connector properties, optimizer debugging, Iceberg, and cache configuration.
- Migrated maven publishing to Central Portal and maintained CI/stable release pipelines.
8. SPI and Event Listener Enhancements
- Enhanced event listener APIs for data lineage tracking and added query type metadata.
- Introduced SPI for delegating row expression optimizers.
- Enabled multiple query event listeners for extensibility.
9. General Bug Fixes and Performance Optimizations
- Fixed bugs in Hive symlink table access, Iceberg statistics caching, variadic functions, and ORC cache invalidation.
- Optimized reads for symlink tables and reduced resource utilization in tests.
10. Feature Highlights
Some standout features from IBM’s contributions:
- Dynamic Catalog Management: Ability to dynamically add/remove catalogs without restarting Presto.
- Native Type Manager: For improved type system integration in Prestissimo.
- Pluggable Authenticators: Added support for custom Presto authenticators and pluggable JWT authenticators.
- Arrow Flight Connector: Built connector to enable high-performance data transfer between Presto and external systems using the Apache Arrow Flight protocol.
Presto continues to evolve as a powerful, flexible engine for interactive analytics and IBM is proud to be one of the driving forces behind its growth. Our engineering work spans everything from native execution and Iceberg support to core SQL engine improvements, security, and performance optimizations. These efforts ensure that Presto remains a fast, reliable, and open platform for modern data and AI workloads.
If you’re curious to try Presto for yourself, check out the open-source project at prestodb.io/getting-started. And if you’re looking for the power of Presto without the complexity of managing a large-scale system, try out watsonx.data, IBM’s open data lakehouse built with Presto at its core.
#watsonx.data#PrestoEngine