Machine Learning Blueprint Newsletter, Edition 22, 5/13/18

 View Only

Machine Learning Blueprint Newsletter, Edition 22, 5/13/18 

Wed June 19, 2019 03:50 PM

B
This newsletter is written and curated by Mike Tamir and Mike Mansour. 

May 13, 2018

Hi all,
Hope you enjoy this week's ML Blueprint. This week is brought to you by fastdata.io.
9588101f-ba34-416f-a3e0-77b49ad3d5fa.png

Spotlight Articles
OpenAI released an analysis showing that the amount of compute used in the largest AI training runs since 2012 has an exponential 3.5 month doubling time, which is far quicker than Moore’s 18 month doubling period. Since advances in compute have been one of the factors contributing to AI progress, they suggest that it’s worth preparing for the implications of systems far outside today’s capabilities.
Machine Learning Blueprint's Take
The 3.5 month doubling time is remarkable. However, there are several factors to consider when looking at these results. First, it is not clear that a snapshot over the past 5.5 years is representative of future growth rates. While it is almost always safe to say that the past trends to do not guarantee future ones, we have seen a turning point in leveraging of Deep Learning algorithms over the past half decade that may indicate more of a phase change than continued trend. Second, as industry use of Deep Learning has matured, techniques like transfer learning which allow us to often sidestep extensive “AlphaGo Zero like” training from scratch. This suggest that while certain training use cases will have significant spikes, it might not be the case that these needs will be uniform across industry applications, especially if the trend of releasing more and more pretrained models for less computationally expensive fine tuning is something that continues.
b10ce589-7f46-464f-b200-d8b8c500c1db.png
In light of France’s, China’s and the EU’s recently announced plans for nationally prioritizing AI research and development, the American administration reveals their intentions by inviting practitioners to a meeting to forge a plan. There, they discussed giving certain companies access to select government datasets and creating a committee to help government agencies use ML technologies. Keeping “America First” is also a priority, so they’re considering retraining programs for the eventual labor displacement, and investing $200M into STEM programs to train future generations.
Machine Learning Blueprint's Take
While any progress forward to keeping the United States competitive in the future of AI, the plan lacks substance in comparison to other nation’s. There’s no mention of funding agendas, or importantly councils on the ethical/social implications of the technology. In fact, there does not exist a formalized plan to unify around. Furthermore, the plan to invest $200M into STEM programs is a pitifully small sum that should be greatly increased. Lastly, the “select” access to government data might become highly political and prone to favoritism depending on what companies are grated access. It’d be hard to see Trump granting Amazon special access to data in this current climate.

Learning Machine Learning
StitchFix’s Brad Klingenberg provides an friendly yet mathematically precise review of the False Discovery Rate problem in traditional significance testing. A great read for data scientists not familiar with the impact of statistical power on experimental inference and the difference between type 1 and type 2 hypothesis rejection errors.
While actually somewhat light on the technical details, this article outlines how Cambridge Analytica collected data, what the feature-set looked like, and what their predictors were for psychologically profiling users for effective political targeting.
More than just your run of the mill flappy-bird showcase, this author explains how to train and run an agent against the actual brower output by reading the canvas and converting it to an image, and sending commands over a web-sockets. They implement a Deep-Q Network, and use a combination of JS, Python and TF.
ce7e06a9-3342-4046-bc55-ae7feff21820.gif
An alternative to Matplotlib, with an albeit easier to understand API, Altair lets you make interactive visualizations in your notebooks. This jupyter notebook series walks through getting started and beyond.

Machine Learning News
A contention in current benchmarks and performance claims is that the hardware infrastructure has a large impact on final metrics; on top of that many chip manufacturers are pushing algorithm research and then making claims on their proprietary hardware. MLPerf is meant to change that by making final metrics agnostic to hardware implementations. It also seems that it might be useful for comparing hardware vendors.
Machine Learning Blueprint's Take
The number of available benchmarks has been growing, each with their own specialties. It’s unlikely we’ll ever see “one benchmark to rule them all”, and having too many benchmarks might become confusing, as it opens up a new domain to understand for businesses making choices. Might we see a new domain expertise arise of those who specialize in just comparing hardware/algs? Gartner for Algorithms.
Sponsored Content
Traditional storage systems that feed the GPU servers for machine learning workloads can be too slow or have insufficient throughput to keep pace with the GPU, resulting in GPU starvation. In a side-by-side comparison, WekaIO Matrix™delivers 50% more performance, with 66% less storage nodes in 44% less space and at 50% of the cost than that of Pure Storage Flashblade.
Long QT Syndrome is a scary heart condition that can be sniffed out with pre-screening with EKG’s, and treated with medication instead of emergency procedures. AliveCor has created a deep learning solution to perform this sensing using just a single sensor lead that could fit on an Apple iWatch band, proving so in a small 2,000 sample trial.
Machine Learning Blueprint's Take
It is surprising that they were able to get a reliable model with just 2000 training examples (note: no FDA approved go-ahead yet). Pharmaceutical trials don’t usually have too many trial-subjects, but with ML that might be different. The FDA should begin creating an ML-expert board to begin evaluating AI-Powered medical advancements to maintain device reliability, but also to efficiently allow forward progress.
A new adversarial attack is able to embed wake words and instructions in arbitrary audio samples to popular home assistants. While the research paper shows that it might be hard to hijack an existing popular song to do this, the attack method and machine learning behind it are pretty interesting.
A large collection (~6000) of knitting patterns as textual instructions were scraped from the web and fed into a deep net. The network then generates sequences of knitting instructions that result in novel patterns that human knitters carried out. The resulting instructions were admittedly difficult for the humans and resulted in some alien-like patterns. See the full collection of knits here.
df66140c-77ac-4f4f-b653-8e7754b0b6ed.png
Interesting Research
Using shapely values from game theory, researchers have developed a method to attribute credit to each feature in an observation in a classification model. What’s unique about this, compared to other methods like LIME, is that it also works on non-linear models. You can get a better understanding of shapely values here, and this blog also explains the overall concept of the paper.


#GlobalAIandDataScience
#GlobalDataScience

Statistics

0 Favorited
8 Views
0 Files
0 Shares
0 Downloads