Machine Learning Blueprint Newsletter, Edition 11, 12/17/17

Wed June 19, 2019 03:13 PM

Michael Tamir

December 17, 2017

Hi *|FNAME|*,

Happy New year from MLBP! Here are the greatest hits of 2017. Please forward to your friends and help us grow our 12k person audience as we enter the new year!

2017 Machine Learning Trends & Greatest Hits

Deep Learning Achievements Over the Past Year

2017 was a big year for advances in Deep Learning, from visual reasoning and the promise of Hinton’s new Capsul net techniques in computer vision to continued acceleration in text applications.

[Link]

ONNX Framework For Sharing DL Models Across Major DL Systems

As adoption of deep learning in industry has taken hold, the landscape of open source options for deep learning frameworks has started to crystallize in 2017. Amazon, Microsoft, and closely affiliated FaceBook projects like Pytorch have banded together in the ONNX framework, breaking the Google-TensorFlow ecosystem dominance. The popularity of Pytorch in particular has been a good thing, not just for developers, but also in spurring on TensorFlow to break out of the static runtime paradigm and provide dynamic execution options.

We have also seen new offerings from number of public cloud providers for GPU’s targeted towards deep learning applications. This helped fuel the growth for deep learning exploration and productization, since now developers don’t need invest in hardware themselves, or they can just use an API for access to a pre-trained net.

Interestingly NVIDIA's new EULA prohibits their GPU’s from being used in public cloud infrastructure. This could be a temporary roadblock at best for AWS’s ambitions. Maybe this is a hint for NVIDIA’s own GPU public cloud offering in 2018, and push the developers towards the more interoperable ONNX that won’t rely on CUDA?

[Link]

Adversarial Neural Cryptography in Theano

What stood out here is the sunset of Theano, the deep learning framework. Its obsolescence, marked by the retirement of active development, is not entirely bad. It was born and maintained out of academia before commercial players got involved. Google and others taking the lead on the open source deep learning frameworks signals that deep learning is being taken much more seriously by industry, as industry does not generally rely on academic-grade tools.

[Link]

Adversarial Attacks on ML Systems

Adversarial ML showed us that we might not possess as deep an understanding of the tools currently being built and deployed out in the wild as previously thought. Several defense mechanisms were published, but they were frequently subverted. While the threat may be overplayed, since adversarial ML appears most effective when the attacker has access to the model weights, it cannot be ignored and will hopefully push us towards better understanding as a better defense.

[Link]

The Power of Seq2Seq - In 2017

While the core techniques found early exploration, 2017 has been a big year to use attention mechanism enhanced LSTMs for encoder-decoder methods in a multitude of impressive use cases: Abstractive Text Summarization (and enrichment), Emotional Chatting Machines, Machine Translation, and even applications in organic chemistry (NIPS paper winner 2017):

A New AI Algorithm Summarizes Text Amazingly Well

[Link]

Emotional Chatting Machine Assesses Your Emotion & Copies It

[Link]

To Build A Smarter Chatbot, First Teach It A Second Language

[Link]

NIPS 2017 Best Paper Award: [1711.04810] "Found in Translation": Predicting Outcomes of Complex Organic Chemistry Reactions Using Neural Sequence-to-Sequence Models

How to Visualize Recurrent Neural Networks with Attention in Keras

[Link]

Other Notable Posts and Trends in 2017

AlphaGo Zero – How and Why it Works

Self learning algorithms, in particular incorporating reinforcement learning and evolutionary algorithms to guide ML pipeline construction and deep learning architectures has seen a lot of attention in research this year. We can likely expect to see these research techniques pushing into industry in coming years.

[Link]

New Turi (Dato / GraphLab) ML Framework for Developing and Deploying ML Algorithms on Apple Products

After a period of dormancy following the acquisition, Apple is using Turi to make training neural networks on Apple machines more accessible by leveraging the proprietary data formats of Turi and the new “metal” hardware provisions from Apple. Unlike the the ML as an API offered by AWS and Google Cloud, this framework allows developers to directly deploy algorithms that can provide inference on the device. This was previously an enormous hurdle as it required domain expertise from disparate backgrounds of application development and machine learning expertise.

[Link]

Dropping Python 2.7 Support In Numpy

The writing is in the sand for Python 2.7 as the Numpy Dev team will not be supporting moving forward. Numpy underpins so many of the existing numerical, mathematical and data processing libraries available in Python, and without them being the catalyst for change, others might have been slow to consider dropping support as well.

[Link]

Why Machine Learning Companies Can’t Be Lean Startups

A venture capitalist from FirstMark Capital gives his viewpoint on what it takes to have a successful ML-based startup, touching on a number of dimensions and helping to define the opportunity for shrewd movers for this past year and the upcoming future.

[Link]

How to Train Your Own Object Detector with TensorFlow’s Object Detector API

Understanding SSD MultiBox — Real-Time Object Detection

Distributed Tensorflow Tutorial

Streaming Dataframes

Apache Spark already made it easy for data engineers and scientists to work with big data, and the DataFrame/DataSet paradigm made it even more portable for those coming from data analysis to work with. However, Spark Streaming always relied on the traditional RDD paradigm - until this year when streaming was brought to DataFrames. But on top of this, we saw DataBricks also offer the ability to port deep learning algorithms through UDF’s, and others offer the ability to distribute TensorFlow on a Spark Cluster.

[Link]

#GlobalAIandDataScience
#GlobalDataScience

More Data Science News

Machine Learning Blueprint Newsletter, Edition 11, 12/17/17

Machine Learning Blueprint Newsletter, Edition 11, 12/17/17

Statistics