In part one of the Capitalogix data science story, I focused on their strategic need for a data platform that supports speed, data variety and custom-built algorithms to find advantages for their business. A key success driver: they worked to make life better for the people on the front lines of delivering insights—their data scientists.
Capitalogix looks like a financial trading house, but it only has two traders on staff. Everyone else is a data scientist. Their search for better data systems drove their multiple data platform transformations: from Microsoft SQL Server to Netezza to the new IBM Integrated Analytics System (IAS) appliance.
A company built on data science has some strong ideas about what makes an effective data platform. In multiple interviews with key personnel, the following features stood out.
Cloud-readiness
An in-house appliance made sense for Capitalogix as the foundation for their data platform. They also wanted to retain the option to use cloud resources for backup and to shift workloads to the cloud if needed. For all its strengths, Netezza was—in the words of Capitalogix CEO Howard Getson, “an island.” Migration from Netezza to IAS was easy and provided the perfect solution: an appliance, tuned for maximum performance, with built-in cloud readiness so that new environments can be created and workloads can be moved without changing the code base.
Chris Jordan, CEO of iOLAP, a Capitalogix IT partner, explained that cloud compatibility is also important because when data scientists develop new algorithms they’re “not having to decide how to deploy, […] on-premises or in the cloud. We create the solution once, and we can deploy it wherever we want to.” It’s easy to spin up a new environment to create and test a new model and know that it will work the same in production. And right now, “IBM is the only provider that gives you the flexibility to go between cloud and on-premise on the same code base, ” said Jordan.
Built-in machine learning and AI for better of model development
Machine-led improvement to algorithms. With automated systems making high-value trades at superhuman speed, accuracy is paramount. The algorithms and models need to start out robust and to improve over time. When direct human intervention is needed to assess and improve these models, that human element is a major bottleneck in the speed of improvement.
Jordan said, “Capitalogix is now using artificial intelligence to create machine learning. These are really advanced processes and the platform that they're developing on is a big part of the reason why they're able to do it.”
Data cleanup at machine speed. Real-world data often isn’t clean. Transcription errors, format mismatches, corrupted files and more can affect workflow. Capitalogix soon learned that a model that performs brilliantly in a test environment can fail when fed with real-world data. However, with machine learning and AI, Capitalogix systems can detect and correct for errors so that the algorithms can work as expected. And these error-detection functions can be done by intelligent machine systems, at machine speed.
Geston agrees: “If technology can start to figure out what it was doing wrong and help correct those errors, we have to get those systems into production faster than ever.” Machine learning and AI are essential because “we can't afford to be wrong at light speed.”
The tools that data scientists want. An embedded data science environment, including tools like Watson Studio and Spark, is all about making life easier for the highly-skilled—and highly-sought-after—data scientists who make up the bulk of the Capitalogix team.
Jordan explained the workflow advantages in a nutshell: “Having the data science environment right there and embedded on IAS enables the data scientists at Capitalogix to experiment. They don't have to move data around from environment to environment, to go built a new model, to test a new model, to back test […it’s very valuable] to be able to quickly and easily create a new model, test it and see if it has validity.” The end result is quicker results and more agile responses to the market.
John Redding, database specialist at Capitalogix, provided the data scientist’s perspective: “A lot of our data scientists are really excited about the direct integration of Spark into the IAS appliance [because] it allows massively parallel application and allows things to be run much quicker. It also allows other languages [like] Python, Scala, Java or R to be run natively inside the box as opposed to having to pull the data out to another appliance.” With everything in one place, the data scientists can spend “90 percent of their time working on models” rather than searching for data.”
At a practical level, this data scientist-friendly system helps make Capitalogix a better place for data scientists to work. In general, these pros want to use the platform in a manner they’re familiar with. Getson explains that in his experience, data scientists are purists. They don’t want to have to learn new languages or new standards to do their work and they definitely don’t want him to “dictate or constrain” what they can do in the workplace.
The IAS “gives the data scientists access to tools that they want to use on a day-to-day basis,” he said. “[Nowadays] it's a competitive advantage to be able to recruit people and say if you want to use Python, if you want to use R, of course we support that, and still have it be part of our base system.”
Data has value
Capitalogix’s experience is a terrific illustration of the idea that data has value. They’re dedicated to combining data from different sources, using it to understand economic conditions—and then moving in the market to potentially profit from opportunities that others can’t see. To achieve this, they need a robust, fast, flexible data platform. They need cloud readiness, built-in data science tools, machine learning and AI. They found all this in one place: the IBM Integrated Analytics System.
Now that you’ve heard from Capitalogix about their own transformation, see what industry analyst are saying about appliances in IDC’s report “Delivering Hybrid Analytics at the Speed of Business.” Then reach out to schedule a no-cost, one-on-one consultation with one of our data warehouse appliance experts to discuss any questions about Captialogix, IAS—or how an appliance could benefit your business.