watsonx.data

 View Only

Why Multiple Engines Matter

By BRADLEY ROWEN posted Wed November 22, 2023 12:31 PM

  

The More Things Change, the More they Stay the Same. 

If you’re responsible for data—moving it, storing it, cleansing it, or using it to make better decisions—you know you have to keep one eye on the future at all times. If your data is currently safe and sound, you can still keep yourself up at night thinking about where it’s headed next. What will users ask for? What new applications will they introduce? What new data formats will arise, and how will you keep those safe, in order, and well understood? One thing that won’t change is the demand for high availability, performance, and adherence to SLAs and regulatory requirements.  

The struggles I’ve described above are part of the brochure for watsonx.data. As they say, “Modern problems require modern solutions.” Watsonx.data is certainly that. It’s designed to run where you need it: on premises; in any cloud (thanks to OpenShift); or as a managed service in the IBM Cloud or AWS. It’s designed to be open—with IBM being a major contributor to opensource projects like presto—and to include the open file and table formats your users and developers are demanding, like parquet, orc, Avro, and iceberg.  It’s also designed to make use of Object Storage, on the cloud, on your appliance or managed by software.  

On top of all that, the Multiple Engine approach is one of watsonx.data’s most distinguishing features. It makes sense; it’s a model we insist on almost everywhere that we handle data in our private lives. Think for a moment: would you ever accept an online bank that only allowed you to connect from your desktop? Nope. Mobile access is required. Would you sign up for a streaming service that didn’t work on phones and pcs as easily as your TV? Of course not. We expect the same shows, the same accounts, the same information, no matter how we access it. Note too, that we don’t always need ultra-high definition. When we’re watching the home theater in the basement, we want the highest performance possible. When we’re killing time at soccer practice, what streams quickly to our phones is good enough. In watsonx.data, you can decide what level of performance your workload needs and decide how to deliver it.  Scale a presto engine up or down, execute with spark, Db2, Netezza, or any engine that can connect to the Hive Metastore.

But the Multiple Engine approach also provides a hedge against the future. It means more than just reaching your data from the various tools you might already have in place now. It means that, as engines evolve, the lakehouse you’ve created in watsonx.data will be ready, as it is. A growing set of connectors is testament to the opensource commitment to flexibility and interoperability. Now, with data stored in the lakehouse, you won’t need to subtract capabilities in order to add new ones. The watsonx.data multiple engine approach means that you can run a Db2 workload against that open data as easily as a workload in presto. You can let spark do what it does best, without preventing Netezza from reaching the data either. As other data warehouse products are updated to make use of open file and table formats, they can join the party too!

Put simply, the multiple engine approach means that the more things change—applications, file formats, performance requirements—the more your data platform can stay the same—a hybrid lakehouse built with IBM’s watsonx.data.


#watsonx.data

1 comment
29 views

Permalink

Comments

Mon November 27, 2023 10:35 PM

Thank you for the explanation and summary Brad! I am curious as to what other engines you are hearing customers request for use with watsonx.data? Are there any limits to who could enter the ecosystem?