Editor’s note: This is the first in a two-part series on big data analytics. It's based on content initially published in the SHARE President's Corner blog.
IBM has invested heavily in big data analytics, spending more than $14 billion on more than two-dozen analytics-related acquisitions over the past five years. It expects to generate $16 billion in revenue by 2015, up from $10 billion in 2010, according to published reports.
Within its broad portfolio, IBM has several major products that enable big data analytics: IBM InfoSphere BigInsights
, InfoSphere Streams
. In 2010, IBM acquired Netezza, which offers a data warehouse appliance that became the basis for current analytic database platforms. InfoSphere BigInsights sits on top of Hadoop to provide text-based analytics, a spreadsheet-based data discovery and exploration tool, and administration tools. InfoSphere is a platform for real-time analytic processing of structured and unstructured data.
In addition to its expert entity counting engine that helps companies turn “puzzle pieces into puzzles,” as IBM Chief Scientist and Distinguished Engineer Jeff Jonas puts it, the IBM Entity Analytics group is developing a new capability. It’s integrating SPSS, an IBM predictive analytics pattern-discovery system, into a new analytics engine, codenamed G2, that make SPSS entity analytics capabilities available to a much wider group of customers, according to Jonas.
“In the IBM big data group, we’ve got a handful of really, really cool things,” Jonas explained. “InfoSphere Stream is truly unique. It’s meant for [streaming] a million things a second and moving big data around. We have the expert counting engine for turning puzzle pieces into puzzles that I’m doing, and then we have the Hadoop platform for deep discovery and deep introspection after the fact. It’s going to be the combination of these things together—deeply integrated—that is going to give IBM a pretty unique story.”
And IBM hasn’t shirked big data internally. In 2009, it launched Blue Insight, its internal analytics cloud that gathers information from nearly 100 information warehouses and data stores, providing analytics on more than a petabyte of data. More than 200,000 IBMers had access to the new system upon launch, including sales, product development and manufacturing process engineers. Blue Insight served as a template for IBM Smart Analytics Cloud, the private analytics cloud offering for enterprises.
Big Data Analytics and SHARE
Helping IBM and other companies understand what users need to exploit new technology and developments, like big data, is a major focus for SHARE
. Throughout its history, when new technologies are introduced, this volunteer-run association for IT professionals has sponsored conferences and published educational materials. SHARE’s efforts in 2012 will include the development of task forces and discussion groups to provide a platform where users of IT can gather to explore these issues and share experiences on big data and big analytics. SHARE’s annual conferences are a major forum for discussion on these and other key enterprise IT topics.”
From Mainframe to the Cloud
While conventional wisdom long has held mainframe computing to be the province of big enterprise and government workloads, new pricing structures and old-world processing power are combining to bring the mainframe into the world of big data analytics, according to researchers.
“No other platform can offer predictable response time, uptime and security to tens of thousands of concurrent users like a mainframe,” said Neil Raden, vice president and principal analyst at Constellation Research Group
, who focuses on analytics and business intelligence (BI). “When you consider the fact that big data represents a sort of brutal convergence of analytical and operational processing, there will be a multitude of situations where nothing else will suffice. Big data could prove to be a big renaissance for the mainframe. Given the prevailing idea that tons of cheap hardware in a cluster solves all big data problems, this will come as quite a surprise to many.”
In an October 2011 research report by Clabby Analytics
and Enterprise Computing Advisors, the authors estimated that 40 percent of the current IBM System z customer base is either piloting or actually has deployed new workloads on their mainframes—including batch/transactional workloads and Java/Linux and analytics workloads.
According to the report, titled “Choosing IBM zEnterprise for Next Gen Business Analytics Applications
,” there are two main reasons customers are deploying new workloads on System z platforms. The first is superior economics, as one mainframe can cost less to operate and acquire than a bunch of servers. The second reason is the platform’s processing strength.
“We both [Enterprise Computing Advisors and Clabby Analytics] were conducting research into big data, analytics and mainframes to discover the intersection point,” said Joe Clabby, president of Clabby Analytics. “And we both concluded that if you have very large databases and you have to do a lot of heavy I/O—and you’re looking for security and resiliency and high availability—and you start looking at x86 architectures and midrange architectures versus mainframes, what you’re going to find is mainframes are very well suited for very large databases, and very large user populations.”
IBM, in fact, deployed its massive Blue Insights private cloud on System z hardware, using Cognos 8 BI
While it doesn’t have quite the history of mainframe computing, cloud computing has come into its own over the past decade. Companies like Salesforce.com have made not only software as a service (SaaS) a common term, but also the more esoteric infrastructure and platform as service concepts (IaaS and PaaS) have taken hold, too. (Analytic clouds are still gaining traction.)
In its big data analytics report, The Data Warehousing Institute
(TDWI) found that BI professionals consistently prefer private clouds over public ones, especially for BI, data warehousing and analytics. “This helps explain why public clouds have the weakest commitment,” writes TDWI’s director of research, Philip Russom. “The preference for private clouds is mostly due to paranoia over data security and governance. Even so, some organizations experiment with analytic tools and databases on a public cloud, then move them onto a private cloud once they decide analytics is mission critical.”
What to Do
With all the methodologies and technologies available to help organizations process and understand big data analytics—and ask the right questions—where should one start?
Jonas says organizations frequently miss the straightforward step of counting as they try to spot patterns by analyzing outliers and anomalies in big data. “If you think you have three facts about three different things or three different people, and it’s really three facts about the same person—if you can’t count that, you can’t see the vector,” Jonas says. “You can’t see how fast it’s going. This is being missed by a lot of organizations because they’re bringing their data together in piles, but they’re not counting it.”
Clabby Analytics Research Analyst Jane Clabby suggests that IT organizations start with a small pilot project and learn from there. There must be buy-in from business, IT and executives for projects to be successful. There also must be an organizational structure that enables the use of big data.
“You have to have a plan,” she says. “You start small with a project—we’re having an issue with fraudulent claims—and get started with that small problem. Tackle that and then branch out into other areas.”
Advice that brings the complexity of big data analytics down to its simplest level.
Renee Boucher Ferguson has more than 10 years experience covering enterprise tech—including ERP, data analytics, cloud computing, infrastructure, storage and mobile platforms—as both a technology journalist and senior research analyst.