Global AI and Data Science

Wed June 19, 2019 03:50 PM

Michael Tamir

This newsletter is written and curated by Mike Tamir and Mike Mansour.

May 13, 2018

Hi all,

Hope you enjoy this week's ML Blueprint. This week is brought to you by fastdata.io.

Spotlight Articles

OpenAI Tracks Compute Power Used in ML Training Runs - It’s Doubling Faster than Moore’s Law

OpenAI released an analysis showing that the amount of compute used in the largest AI training runs since 2012 has an exponential 3.5 month doubling time, which is far quicker than Moore’s 18 month doubling period. Since advances in compute have been one of the factors contributing to AI progress, they suggest that it’s worth preparing for the implications of systems far outside today’s capabilities.

Machine Learning Blueprint's Take

The 3.5 month doubling time is remarkable. However, there are several factors to consider when looking at these results. First, it is not clear that a snapshot over the past 5.5 years is representative of future growth rates. While it is almost always safe to say that the past trends to do not guarantee future ones, we have seen a turning point in leveraging of Deep Learning algorithms over the past half decade that may indicate more of a phase change than continued trend. Second, as industry use of Deep Learning has matured, techniques like transfer learning which allow us to often sidestep extensive “AlphaGo Zero like” training from scratch. This suggest that while certain training use cases will have significant spikes, it might not be the case that these needs will be uniform across industry applications, especially if the trend of releasing more and more pretrained models for less computationally expensive fine tuning is something that continues.

[Link]

Trump Administration Finally Plays Catch Up on the AI-Front

In light of France’s, China’s and the EU’s recently announced plans for nationally prioritizing AI research and development, the American administration reveals their intentions by inviting practitioners to a meeting to forge a plan. There, they discussed giving certain companies access to select government datasets and creating a committee to help government agencies use ML technologies. Keeping “America First” is also a priority, so they’re considering retraining programs for the eventual labor displacement, and investing $200M into STEM programs to train future generations.

Machine Learning Blueprint's Take

While any progress forward to keeping the United States competitive in the future of AI, the plan lacks substance in comparison to other nation’s. There’s no mention of funding agendas, or importantly councils on the ethical/social implications of the technology. In fact, there does not exist a formalized plan to unify around. Furthermore, the plan to invest $200M into STEM programs is a pitifully small sum that should be greatly increased. Lastly, the “select” access to government data might become highly political and prone to favoritism depending on what companies are grated access. It’d be hard to see Trump granting Amazon special access to data in this current climate.

[Link]

Learning Machine Learning

Statistical Power

StitchFix’s Brad Klingenberg provides an friendly yet mathematically precise review of the False Discovery Rate problem in traditional significance testing. A great read for data scientists not familiar with the impact of statistical power on experimental inference and the difference between type 1 and type 2 hypothesis rejection errors.

[Link]

The Machine Learning Behind the Cambridge Analytica Scandal

While actually somewhat light on the technical details, this article outlines how Cambridge Analytica collected data, what the feature-set looked like, and what their predictors were for psychologically profiling users for effective political targeting.

Playing the Chrome 404 T-Rex Game with Reinforcement Learning, But Really End-to-End

More than just your run of the mill flappy-bird showcase, this author explains how to train and run an agent against the actual brower output by reading the canvas and converting it to an image, and sending commands over a web-sockets. They implement a Deep-Q Network, and use a combination of JS, Python and TF.

[Link]

Statistical Visualization Library Guide with Altair and Jupyter Notebooks

An alternative to Matplotlib, with an albeit easier to understand API, Altair lets you make interactive visualizations in your notebooks. This jupyter notebook series walks through getting started and beyond.

[Link]

Moving From Model-Free to Model-Based Deep Reinforcement Learning

Machine Learning News

MLPerf: New Benchmark for Measuring Algorithm Performance Given Certain Hardware

A contention in current benchmarks and performance claims is that the hardware infrastructure has a large impact on final metrics; on top of that many chip manufacturers are pushing algorithm research and then making claims on their proprietary hardware. MLPerf is meant to change that by making final metrics agnostic to hardware implementations. It also seems that it might be useful for comparing hardware vendors.

Machine Learning Blueprint's Take

The number of available benchmarks has been growing, each with their own specialties. It’s unlikely we’ll ever see “one benchmark to rule them all”, and having too many benchmarks might become confusing, as it opens up a new domain to understand for businesses making choices. Might we see a new domain expertise arise of those who specialize in just comparing hardware/algs? Gartner for Algorithms.

[Link]

Faster GPU Solutions for a Wide Variety of Algorithms

Sponsored Content

Traditional storage systems that feed the GPU servers for machine learning workloads can be too slow or have insufficient throughput to keep pace with the GPU, resulting in GPU starvation. In a side-by-side comparison, WekaIO Matrix™delivers 50% more performance, with 66% less storage nodes in 44% less space and at 50% of the cost than that of Pure Storage Flashblade.

[Link]

Using Apple iWatch + Algorithms to Predict Massive Heart Failures

Long QT Syndrome is a scary heart condition that can be sniffed out with pre-screening with EKG’s, and treated with medication instead of emergency procedures. AliveCor has created a deep learning solution to perform this sensing using just a single sensor lead that could fit on an Apple iWatch band, proving so in a small 2,000 sample trial.

Machine Learning Blueprint's Take

It is surprising that they were able to get a reliable model with just 2000 training examples (note: no FDA approved go-ahead yet). Pharmaceutical trials don’t usually have too many trial-subjects, but with ML that might be different. The FDA should begin creating an ML-expert board to begin evaluating AI-Powered medical advancements to maintain device reliability, but also to efficiently allow forward progress.

[Link]

Hidden Sound Attacks Can Wake Your Digital Assistant

A new adversarial attack is able to embed wake words and instructions in arbitrary audio samples to popular home assistants. While the research paper shows that it might be hard to hijack an existing popular song to do this, the attack method and machine learning behind it are pretty interesting.

[Link]

Deep Learning-Powered Knitting

A large collection (~6000) of knitting patterns as textual instructions were scraped from the web and fed into a deep net. The network then generates sequences of knitting instructions that result in novel patterns that human knitters carried out. The resulting instructions were admittedly difficult for the humans and resulted in some alien-like patterns. See the full collection of knits here.

[Link]

Neural Nets in Artificial Agents Learn to Navigate with Grid-like Representations

Interesting Research

Per-Observation Feature Importances of Non-Linear Models with Shapely Additive Explanations

Using shapely values from game theory, researchers have developed a method to attribute credit to each feature in an observation in a classification model. What’s unique about this, compared to other methods like LIME, is that it also works on non-linear models. You can get a better understanding of shapely values here, and this blog also explains the overall concept of the paper.

[Link]

Learning to See in the Dark: Boosting Low Light Performance with Algorithmic Amplification

#GlobalAIandDataScience
#GlobalDataScience

More Data Science News

Global AI & Data Science

Machine Learning Blueprint Newsletter, Edition 22, 5/13/18

Additional
Resources

Office

Quick Links

Global AI and Data Science

Global AI & Data Science

Machine Learning Blueprint Newsletter, Edition 22, 5/13/18

Additional Resources

Office

Quick Links

Additional
Resources