Global AI and Data Science

Global AI & Data Science

Train, tune and distribute models with generative AI and machine learning capabilities

 View Only

Machine Learning Blueprint Newsletter, Edition 19, 4/15/18 

Wed June 19, 2019 03:44 PM

This newsletter is written and curated by Mike Tamir and Mike Mansour. 

April 15, 2018

Hi all,
Hope you enjoy this week's ML Blueprint. This week is brought to you by fastdata.io.

Spotlight Articles
AI and computing advances undoubtedly are having profound effects on society. Frequently, researchers of new technologies demonstrate an embarrassing intellectual gap that is inconsiderate of the negative outcomes of their tech while loudly touting positive benefits. This is akin to a medical researcher posting results without mentioning any of the harmful side effects. The ACM is changing this by requiring that submitters seriously discuss, not just handwave, both the positive and negative possible implications of their research. Peer Reviewers will act as the gatekeepers, now judging papers on this exercise. The desired effect is to guide research funding; if the net negative effects outweigh the positive ones, then public money should not be used for something that does not advance public interest. This new directive should not stop research from being published, but submitters’ ethical compass should be called into question if they continually delve into anti-social research.
Machine Learning Blueprint's Take
The ACM’s new directive is directly inline with one of the main suggestions of the “Malicious uses of AI” article from OpenAI a while back, that researchers discuss these outcomes in their findings. Without a requirement from a body like this with detailed guidelines, it’s unlikely researchers would have done it on their own. A strange ground appears when they believe that those consistently publishing with outweighing negative effects affects their academic standing; it could get political or encourage minimal effort in exploring the negative outcomes for fear of reprisal. Also, this might not stop black-budget type research, or potentially anti-social research from migrating to other countries.
Following up on last week’s World-Model Spotlight, this guide provides a walkthrough and Keras code for recreating the variational autoencoder, RNN, & controller in your own environment. They provide the requisite data, and break down the steps to operate the code. Some of the data comes from the OpenAI Gym, but this is a good way to get your feet wet with that if you have not already.
A message from our sponsors...
Now witness the real-time power of this fully GPU-armed and operational stream processing engine.
FDIO Engine™ can process Spark Structured Streaming workloads 100 to 1000 times faster than Apache Spark, as it runs natively on NVIDIA GPUs and Apache Arrow.
To put 1000x performance in perspective, FDIO processes 1 Terabyte on a single AWS instance in 35 seconds, where Spark takes over 9 hours.
Sign up for a Test Flight today at fastdata.io
Learning Machine Learning
When doing audio processing for machine learning, there are a number of signal processing transformations that are required for handling audio data. This post explains the motivations and implementations behind the preprocessing tools with accompanying images and code. These tools transform the data closer to how humans perceive it, and also to de-correlate the data to prevent obvious issues that would occur with some ML algs.
A hacky but actually somewhat smart way to quickly aggregate images; the author leverages the Bing Search API to download images (apparently the Google Image API closed down?), and uses some native OS tools for quickly pruning out bad examples. They provide code for getting started very quickly with the requests and API endpoints.
Machine Learning Blueprint's Take
This approach might work for small pet projects where the classification task is relatively simple, and the data not super noisy -- Some applications might get off the ground quickly. However, a more complex image classifier would likely require much more data, and getting diverse data that can teach your classifier to generalize well might be impractical to do here.
Implementing published RL experiments is harder and more time consuming than one would think. There is a plethora of knowledge and best practices that researchers have built upon that leads into their latest work, and it’s sometimes not evident in their papers. This might be further worsening the “reproducibility” crisis. For newcomers to the field, this author lends his lessons learned over 8-months.
This is a useful (and rarely done) investigation as to why the author’s model actually didn’t work. They dig into the tree’s internals to identify behavior that doesn't make sense, and locate patterns in the data that may have misled their model.

Machine Learning News
TF now competes with Uber’s PyTorch Probabilistic Programming Library, due to new updates. They’ve added statistical distributions, layers that incorporate uncertainty into their outputs, a slew of inference methods like MCMC and Variational Inference, along with some canned models. You can quickly calculate Gaussian Copulas too! They don’t mention if this new module will be ONNX compatible yet.
ARM is integrating Nvidia’s open source deep learning accelerator architectures into their chips. ARM’s main focus is on low power chips for IoT, and they’re keen on making edge devices capable of doing inference out in the field instead of sending tremendous amounts of data over the wire. Currently they have a number of hardware modules that are optimized for a limited set of tasks like object detection and NN inference.
Researchers recently strapped a number sensors to a dog to collect data to predict dog behavior. Using what is fundamentally a seq2seq architecture, their algorithm encodes sequences of images with a combination of resnet stacks and an LSTM. The encoded result is then decoded as a sequence of canine movements which were also tracked as part of the data gathering stage. The authors claim: “Little has been done in terms of ‘understanding visual data to the extent that an agent can take actions and perform tasks in the visual world.’In other words, act not as the eye, but as the thing controlling the eye”. They chose a dog because, unlike a human, the goals and motivations of its action are unknown to us.
Interesting Research
Researchers at Uber’s AI labs have developed a novel technique which they call differentiable plasticity to increases the ability of an neural net to adapt connections in response to ongoing experience. Instead of learning fixed weights, in this architecture the values passed to a neuron is separated into a combination of a fixed weight and a “plastic” component which responsive to earlier layer inputs and outputs. The authors close with the suggestion that such techniques might be leveraged to enhance performance in traditional non-plastic architectures or neural units like LSTMs in future work.
A Generative Adversarial Network (GAN) based algorithm for lossy compression but with lower corruption than state of the art. The algorithm leverages priors and the semantic label map learned by the GANs to reduce storage of "unimportant regions." Results show that this new method saves up to 67% more compared to BPG the next-best method.


#GlobalAIandDataScience
#GlobalDataScience

Statistics
0 Favorited
12 Views
0 Files
0 Shares
0 Downloads