IBM TechXchange Group

 View Only

Scaling AI and Analytics: IBM's Sonia Mezzetta Talks Next-Gen Data Solutions

By Marius Ciortea posted Mon July 15, 2024 09:00 PM

  


Are you ready to dive into the future of data management and AI? Tomorrow or maybe for you Today, July 16, 2024, IBM TechXchange Day brings you a virtual extravaganza of cutting-edge insights and expert knowledge. Among the stellar lineup of speakers, we're thrilled to showcase Sonia M. (Sonia Mezzetta), a true luminary in the world of data management.

With over 19 years of experience, Sonia Mezzetta currently serves as a Program Executive Director in IBM's Data Fabric Product Management team. Her expertise spans a wide range of areas, including Modern Enterprise Architectures, Data Strategies, DataOps, Data Analytics, and Data Governance.

I recently had the opportunity to slack with Sonia about watsonx.data and its capabilities. Here's what she had to say:

1. How does https://www.ibm.com/products/watsonx-data handle massive datasets and complex queries?

Sonia: Data sets and complex queries are handled in watsonx.data via a hybrid, open data lakehouse architecture. Built with open standards to access/share data across disparate data environments, multiple query engines optimized for AI and analytical workloads at scale (Petabtyes of data). Data management is a specialized trait, and it needs to be done correctly, watsonx.data handles diverse use cases. We offer customers looking for a data modernization path that aren’t ready to migrate ALL their data to the cloud such as Financial, Health and Government customers via an on-premises data lakehouse solution. We also specialize in unlocking massive amounts of mainframe data to be used for analytics via an easy-button approach through the integration with our Datagate product. Finally, we can help customers modernize their data workloads from data lakes to a data lakehouse or augment Data Warehouse workloads with watsonx.data.

2. Can you elaborate on the fit-for-purpose query engines and their optimization strategies for different workloads?

Sonia: What makes watsonx.data unique, is the flexibility and diversity offered by its multi-engines (Presto, Presto C++, and Spark) focused on interactive SQL queries and large-scale data processing. Each engine has been optimized to be used with Apache Iceberg open tables, which is the top open table format in the industry. This offers reliability and interoperability in the management of large analytical data sets.

3. Since watsonx.data offers open data access, how does it ensure data security and user permissions across the organization?

Sonia: Unlocking your data to enable the ease of data sharing shouldn’t come at the sacrifice of data security. Many national, country and state legislations impose large fines if data isn’t protected from un-authorized use which makes data security even more critical. There are 2 layers of metadata management in watsonx.data, first layer is via Iceberg that enables features such as ACID, time travel and other. The second layer is a unified access layer that offers a robust framework for the management of access control enforced through watsonx.data engines consistently across data stores and access policies (which isn’t an easy undertaking).

4. How does watsonx.data integrate with AI and Machine Learning tools? Can data be directly fed into training pipelines or used for building AI models within the platform?

Sonia: Data is the foundation to AI. The effectiveness of AI is inherently tied to the ease of access to quality driven and trustworthy data - with this in mind, watsonx.data provides access to all your data, wherever it resides, through a single point of entry. A single copy of data can be shared across your organization without needing to migrate or re-catalog – reducing ETL and data duplication. With watsonx.data it manages both raw and transformed data used in AI. In the cases of traditional ML/AI, Spark is offered as a native engine to help process and transform data that can be used for training. In the world of Generative AI, watsonx.data provides a vector store, Milvus to store transformed data for use in patterns such as Retrieval Augmented Generation (RAG).

5. For a data-focused audience, cost is a major concern. Can you elaborate on watsonx.data's cost optimizing strategy for large-scale deployments?

Sonia: Price-performance optimization is one of watsonx.data’s key differentiators. Growing data workloads are managed via the ability to associate the best fit for purpose open query engine (Presto C++, Presto, Spark). Most recently we https://www.ibm.com/blog/announcement/delivering-superior-price-performance-and-enhanced-data-management-for-ai-with-ibm-watsonx-data/ at less than 60% of the cost based on a public 100TB TPC-DS query benchmark to demonstrate our commitment to customers on price-performance.

At TechXchange Day, Sonia will delve deeper into these topics and more, sharing insights that could transform your organization's approach to data management and AI integration. Don't miss her session in the "Scale responsible AI with watsonx" track, titled "Manage and deliver trusted data for generative AI," scheduled for 1:00-1:30 ET.

In this talk, Sonia will explore the critical role of data quality and management in successful AI implementations. She'll discuss how to build the right data foundation for generative AI, covering essential aspects such as:

  • Enabling real-time access to high-quality, governed data
  • Securing your AI training pipeline
  • Unlocking the full potential of your proprietary data
  • Optimizing workloads for price performance

Sonia will offer valuable insights on accelerating AI implementations with a trusted and secure data foundation, addressing the growing need for responsible AI strategies that optimize operations and maximize outputs. This session promises to be an invaluable resource for organizations looking to invest in or enhance their AI capabilities while ensuring data integrity and security.

Don't miss this opportunity to learn from Sonia and other industry leaders. The IBM TechXchange Day: AI and Automation virtual event kicks off on July 16, 2024, at 11 AM ET. With over 30 live speakers and hands-on learning opportunities across three tracks - AI-powered automation, Open-source and models, and Scaling responsible AI with watsonx - this is an event you won't want to miss.

0 comments
4 views

Permalink