File and Object Storage

 View Only

AI-Direct Data Services

By DOUGLAS O'FLAHERTY posted Tue March 19, 2024 11:47 AM


GenAI is revolutionizing how we interact with systems. Chat bots use natural language or haikus; speech can be translated or turned into images; summaries can be generated in moments. GenAI can interpret and extract meaning and provide contextualization, but it can also make things up. As enterprises are expanding their AI usage, they are also building systems based upon NVIDIA GPUs to ensure AI can be targeted, trusted, and empowering.

Developing and deploying trustworthy AI requires organizations to integrate model development, governance, and deployment in a virtuous cycle of develop, deploy, measure, and improve. IBM watsonx provides a leading platform for this paradigm. To implement it in most enterprises requires data connectivity that is not traditionally supported.

IBM Storage is introducing AI-direct data service reference architectures to facilitate open and trusted model development that can be integrated into the AI model development and application development flows to be deployed in targeted applications. It requires policy and event driven data sharing that is hybrid-by-design, auditable, and integrates with enterprise security.

Business operations tend to run one way: from systems-of-record to analytics to archive. This can be complex with various intermediary stages, branches, and repositories. An IBM Data Fabric will connect disparate islands of information to provide a common view for analytics and AI. A Data Fabric is critical for AI based applications to have real-time access to important information, but it has limitations.


AI development in the enterprise has several specialized data requirements. To be effective data science teams need performance, orchestration, and transparent connectivity. To protect the organization, they also need governance and security. Enterprise AI teams are independent, but partner closely with business units, developers, and user experience teams. They bring data into the AI Center of Excellence, transform it, and develop multiple models against the same data for different use cases. The input is data, the output is a model. The measure of excellence is feedback from the applications in which the AI is deployed.

Connecting data stores with flexible policy and performance

At the core is a high-performance data repository that is simultaneously independent of and logically connected to related data stores. IBM Storage Scale is the high-performance solution, certified by NVIDIA, to meet the scaling demands of AI. It is also part of a software-defined storage Global Data Platform to enable bi-directional sharing of any data. Advanced File Management is policy-based data caching and synchronization robust enough for global enterprise deployment.

Data Orchestration

Event driven data movement needs to be a part of an AI-direct architecture. Whether that data is in object storage, clouds, or other vendor on-premises storage, IBM Storage Data Catalogue can provide a unified view of the metadata, including permissions and usage information. Programmatic of manual triggers enable IBM Data Orchestrator to copy or move data to new data storage and manage access.

Transparent Connectivity

Most AI teams are deploying multiple specialty systems on-premises and in the cloud. These ranging from NVIDIA DGX SuperPODs for training to cloud TPUs to racks of servers and even mainframes. The variety of options is only increasing. A Global Data Platform connects them with a common namespace. It eliminates the overhead, complexity, and cost of managing multiple copies. Governance breaks when models are repeatedly cloned to each system, rather than a single source across a global namespace.

Enterprise Security

IBM Storage for Data and AI is a mature, widely deployed storage platform adopted by the most regulated industries across the world. IBM Storage Scale delivers on the enterprise features our clients need. This includes immutable SafeGuarded copies, SIEM integration, and audit logs.

Four Categories of Enterprise AI Data Service Requirements

IBM architecture for AI-direct data services serves the needs of the AI team and the business. It complements the Data Fabric and supports AI platforms such as watsonx. IBM Storage Scale is the foundation of high-performing clusters and high-performing teams. It reduces barriers to enterprise AI adoption and simplifies collaboration and governance.