File and Object Storage

File and Object Storage

Software-defined storage for building a global AI, HPC and analytics data platform 

 View Only

Is Your Enterprise AI Strategy Missing a Key Component?

By Vincent Hsu posted Mon May 19, 2025 02:01 AM

  
At the Computex exhibition in Taipei this week, NVIDIA will showcase the NVIDIA AI Data Platform reference design for agentic AI reasoning, which provides some of the key technologies used in content-aware IBM Storage Scale. IBM collaborates closely with NVIDIA on the software innovations underlying these new capabilities, and we’re excited by the use cases that clients are working on to tap into the business value of their massive stores of PDFs, presentations, and other unstructured data. 
 
At NVIDIA GTC in March, NVIDIA CEO Jensen Huang called 2025 “the year of inferencing” as AI infrastructure requirements increasingly shift from the months-long process of training AI models to the real-time process of using AI assistants and reasoning agents to infer the right answer – hence, inferencing. Open reasoning models like NVIDIA Llama Nemotron can tap into content during inference so that AI agents can provide more accurate and relevant answers based on business data.
 
As you might expect, an AI task this complex leverages many different aspects of the NVIDIA AI Data Platform, including GPU acceleration for real-time computation, high-speed networking with the NVIDIA Spectrum-X Ethernet networking platform and NVIDIA BlueField DPUs for low-latency data movement, plus integration with AI frameworks to efficiently deploy models across hybrid cloud and edge environments.
 
What may not be as well recognized is that data storage technology is now a crucial part of AI infrastructure – and a strategic advantage too important to overlook. 
 
Consider some of the challenges that enterprises confront in implementing AI assistants and agents:
  • A process called retrieval augmented generation (RAG) must be employed to retrieve relevant, up-to-date information from internal and external enterprise data sources and feed it into the language model to generate accurate and context-aware answers. But these data sources are often scattered across the globe, and the standard practice of copying all the data to a single repository for processing is costly, inefficient, and insecure.
  • Data is constantly changing, and it’s difficult to keep track of what’s changed, so organizations may simply reprocess the whole data set in batch mode whenever necessary. That can result in answers that are outdated or incorrect and is an inefficient use of compute resources. 
  • The key-value (KV) cache used to store context for large language models grows rapidly with each user interaction, making it increasingly difficult to manage at scale. Indeed, as the number of tokens used by an AI system increases with more complex questions and reasoning agents, the size of downstream artifacts like KV caches increases even faster.
  • It’s hard to maintain consistent access controls when the number of data sources multiplies, raising concerns about privacy, compliance, and unauthorized data exposure.
These challenges can’t be properly addressed within the old paradigm, in which data storage systems are just repositories for dumb 0s and 1s. It takes a new, more modern approach in which storage technology empowers AI capabilities in the underlying data.   
 
The architecture for content-aware Storage Scale was designed expressly for this purpose:
  • Storage Scale is used as a global data platform by many of the world’s largest organizations, and a key reason is its ability to virtualize data wherever it’s located in data centers, public and private clouds around the globe. That’s a significant advantage for enterprises wanting to implement retrieval augmented generation capabilities like content-aware storage, because they don’t need to copy all their data to a single location for processing. 
  • Storage Scale uses proprietary “watch folder” technology that instantly detects all changes to the file system, so that new information can be incrementally processed right away and incorporated into AI tools’ responses. 
  • Efficient key-value cache storage is key to inferencing performance – it requires a seamless hierarchy from GPU memory, CPU memory, and local storage to network storage. The IBM Storage research and development team is working closely with our counterparts at NVIDIA to optimize KV cache storage for real-world inferencing requirements.
  • Unlike non-storage-aware inferencing systems that might inadvertently conflate data sources with differing levels of access control, content-aware Storage Scale meticulously tracks the provenance of all source data to help ensure that responses from AI assistants and agents always respect the original access control permissions. 

If your organization’s AI strategy relies on AI agents and assistants that can provide trustworthy and accurate answers, it’s essential your inferencing systems are top-notch. That’s why it’s worth taking a close look at your storage infrastructure to make sure it has the architecture and capabilities you’ll need to evolve into a truly AI-first enterprise. 

0 comments
11 views

Permalink