Community
Search Options
Search Options
Log in
Skip to main content (Press Enter).
Sign in
Skip auxiliary navigation (Press Enter).
AI and Data Science
Topic areas
AI and DS Skills
Decision Optimization
Embeddable AI
Global AI and Data Science
IBM Advanced Studies
SPSS Statistics
watsonx Assistant
Watson Discovery
User groups
Events
Upcoming AI Events
IBM TechXchange Webinars
All IBM TechXchange Community Events
Participate
Gamification Program
Community Manager's Welcome
Post to Forum
Share a Resource
Share Your Expertise
Blogging on the Community
Connect with Data Science Users
All IBM TechXchange Community Users
Resources
IBM TechXchange Group
AI Learning
IBM Champions
IBM Cloud Support
IBM Documentation
IBM Support
IBM Technology Zone
IBM Training
TechXchange Conference
IBM TechXchange Conference 2024
Marketplace
Marketplace
AI and Data Science
Master the art of data science.
Join now
Skip main navigation (Press Enter).
Toggle navigation
Search Options
Machine Learning Blueprint Newsletter, Edition 25, 6/17/18
Group Navigator
View Only
Community Home
Discussion
2.2K
Library
272
Blogs
732
Events
5
Members
27.9K
Machine Learning Blueprint Newsletter, Edition 25, 6/17/18
0
Like
Wed June 19, 2019 03:56 PM
Michael Tamir
This newsletter is written and curated by Mike Tamir and Mike Mansour.
June 17, 2018
Hi all,
Hope you enjoy this week's ML Blueprint. This week is brought to you by
fastdata.io
.
Spotlight Articles
ML Alg Predicts World Cup Winner
Traditional approaches to predicting the outcomes of sporting events previously have statistically combined several different bookie-generated odds, but German researchers have taken a more machine learning approach by utilizing random forests on a large feature set. Currently, bookies have marked Germany to win the cup, but this algorithm pins Spain to win - however this may change if in the unlikely case there is an upset, or of course one of those teams doesn’t make it to the finals. The calculation is difficult due to the massive number of different outcome configurations. Kaggle released an analysis of the
data here
, and the
research paper on the proposed method
Machine Learning Blueprint's Take
Just like how a good trader does not reveal their successful trading strategy (or algorithm for that matter), it’s likely that a good gambler does not reveal their accurate prediction algorithm if they want to retain the upper hand. In this case, there probably exists much more sophisticated models for predicting sporting event-outcomes that have not been shared. But on another note, publicly releasing a prediction machine might now have an effect on the payouts; what if the researchers chose a model that provided a certain outcome given the data to influence the global betting markets and payouts to their advantage? Just putting on the old tin foil hat...
[Link]
Machine Learning: The High-Interest Credit Card of Technical Debt
Technical debt is a software engineering term for costs, brittleness and slowed innovation accumulated overtime in a system by the dilemma between speed of execution, and quality of engineering. The cost of debt compounds overtime as well, resulting in expensive cleanups. Google takes a ‘systems view’ approach to productionized machine learning, and show that it is not immune to various forms of technical debt. Frequently, debt comes from hastily adding data-features, or model complexity for a small accuracy boost. Other times there may be hidden feedback loops or unintended users of a signal that then require more maintenance of an internal product. They share best practices for avoiding technical debt at beginning stages of architectural planning and systems crossroads.
Machine Learning Blueprint's Take
Given the lack of industry solutions 4 years on, this topic is worth a revisit: This classic paper points out a surprisingly large number of channels in which the various incarnations of technical debt can sneak into an ML system. As the paper’s title suggests that it’s way too easy to accumulate it, and detecting the effects of debt subtle or difficult. This may come from the fact that data science is an inherently multidisciplinary field - engineers must be computer scientists, statisticians and mathematicians, thus making it easier to gloss over systems engineering. One way of sharing systems-level-thinking is to invest in more diverse teams, having engineers and data scientists work closely together.
[Link]
Learning Machine Learning
Why & How to Improve Your Training Dataset
Well labeled & copious training data is hard to come by, frequently forcing engineers to employ more advanced techniques to model well. However, there are frequently bad-labels that break these models, and the data-collection assumptions of academic datasets typically don’t match a commercial application. For example, ImageNet is unsuited for drone-related tasks, as all the shots are taken by humans at ground level, whereas the drone perceives from a birds-eye-view. This author shares more examples and his lessons learned for improving a dataset that start with the simple manual-investigation of a large sample, and end with investing in better processes for collecting + labeling data.
Machine Learning Blueprint's Take
A sticking point here is that data labeling is usually a human-process, and that comes with hidden biases that affect the label-distributions, or even data collection. Even the data-labeling task description will have effects on the output. The suggestions for improving training data and effectively collecting more, are, well manual and require more humans-in-the-loop, but that cost is arguably lower than having to develop more complex models that require deeper understanding + computation power. Plus, a dataset can be more of an asset than a model in a secondary market since it can serve several potential model-training needs.
[Link]
Add Constrained Optimization to Your Toolbelt - StitchFix Tutorial
Constrained optimization is the problem of minimizing or maximizing some objective subject to other constraints. Frequently, these constraints might be business requirements like keeping cost or labor to a minimum. StitchFix lays out a toy example and works through the math while recommending some python libraries for implementing your model with required solvers.
Lessons from Two Years of AI Research - Tips for the Budding Researcher
Most of the tips here might not apply for someone in a corporate setting, but rather academic. They suggest finding someone to ask “dumb” questions of, share methods for finding research inspiration, tricks to manage time and track progress.
[Link]
A Visual Introduction to Bias-Variance Trade-off Learning
Building a Custom Facial Recognition Dataset
50-Free Datasets for Machine Learning
Machine Learning News
Bringing Machine Learning to Health Systems and Hospital Operations
Not another article about how AI is curing the next disease, but rather a highlight of all the ways St. Joseph's is integrating machine learning to optimize processes across its 51-different hospitals. They’re able to predict and reduce appointment no-shows, recommend health-related content to patients based on their medical history, utilize propensity models to better target patients for services they’re likely to acquire, and optimize delivery systems to minimize care disruption.
Machine Learning Blueprint's Take
AI for medical research is risky and the data is rather limited. In the case here, there is an abundant amount of data, and the outcomes can move the needle on providing better care. Additionally, the cost of false positives or false negatives is far less costly in these applications, suggesting a higher chance of acceptability by healthcare leaders. There are a few opportunity areas for data scientists to tackle with statistics and software if you read deeper.
[Link]
IBM Watson’s Predecessor Can Now Explain Its Answers & Debate You
Dubbed “Project Debater”, this new software makes an argument on a topic, gives rebuttals and even forms closing sentences in what is a clear extension of the Jeopardy-playing machine with extra bells and whistles. Some are lambasting the system for not actually having any understanding, but just combining previous argument elements and points off of Wikipedia.
Machine Learning Blueprint's Take
Aside from the glitzy IBM-style marketing demo of the system against human debaters, they cite some potential use cases for this technology like helping people make more rational decisions. On the flipside, it could also be trained to troll humans on internet forums - putting many out of work at the Russian Internet Research Agency.
[Link]
TuSimple selects WekaIO to Fuel Artificial Intelligence (AI) for Autonomous Fleet Vehicle Machine Learning.
Sponsored
TuSimple evaluated WekaIO Matrix™ to standard NAS solutions and legacy file systems, and found that Matrix delivered better scalability and performance. WekaIO leapfrogs legacy storage infrastructures and future-proofs datacenters by delivering the world’s fastest parallel file system with the most flexible deployment options—on-premises, cloud, or cloud bursting. Matrix software is ideally suited for latency-sensitive business applications at scale such as AI and machine learning.
[Link]
Synapse-Like Computing Chips for Deep Learning
New chips that have microelectronic synapses mimic the brain to achieve a neuron-like structure. The approach uses two kinds of “synapses”, shorter term ones for computation and longer term ones for memory. The proposed benefit in mimicking the brain is mainly in energy consumption, since the human brain as a model consumes a low-amount of power.
Research paper here.
[Link]
Clarifai Exposed Working on Pentagon Projects Because They Got Hacked by Russians
Machine Learning Algorithm Learns to Solve A Rubik's Cube
Interesting Research
FontCode: Embedding Information in Text Documents [Steganography] using Glyph Perturbation [GANs]
Steganography is the art of concealing information, and may be used for exfiltrating data out of monitored environments or even for generating covert communications channels. In this case, researchers use GAN’s to to alter fonts ever so slightly to encode secret messages. A font-manifold is learned across a few select fonts, and various points along it are assigned to another ascii-character. Characters are then transformed by a GAN along this manifold, and the resulting covert-text channel is printed out or converted to an image. To extract the hidden message, characters are bounded and extracted, fed to a trained CNN that then outputs the secret code.
Machine Learning Blueprint's Take
Apparently this method breaks down if the paper with the hidden message is crumpled, suggesting that this communication channel has low capacity and could be defeated with strategically placed noise (they do train the CNN with some Gaussian noise on the generated characters). There are several modifications that would make this approach much more robust to errors, like leveraging statistical properties in language to encode information, or even using multiple characters to transmit a single character.
[Link]
Augmented Space Planning: Using Procedural Generation to Automate Desk Layouts
Opening Closed-Eyes in Photographs with GANs
#GlobalAIandDataScience
#GlobalDataScience
More Data Science News
Statistics
0 Favorited
8 Views
0 Files
0 Shares
0 Downloads
IBM Community Home
Browse
Discussions
Resources
Groups
Events
IBM TechXchange Conference 2023
IBM Community Webinars
All IBM Community Events
Participate
Gamification Program
Community Manager's Welcome
Post to Forum
Share a Resource
Blogging on the Community
All IBM Community Users
Resources
Community Front Porch
IBM Champions
IBM Cloud Support
IBM Documentation
IBM Support
IBM Technology Zone
IBM Training
Marketplace
Marketplace
AI and Data Science
Topic areas
AI and DS Skills
Decision Optimization
Embeddable AI
Global AI and Data Science
IBM Advanced Studies
SPSS Statistics
watsonx Assistant
Watson Discovery
User groups
Events
Upcoming AI Events
IBM TechXchange Webinars
All IBM TechXchange Community Events
Participate
Gamification Program
Community Manager's Welcome
Post to Forum
Share a Resource
Share Your Expertise
Blogging on the Community
Connect with Data Science Users
All IBM TechXchange Community Users
Resources
IBM TechXchange Group
AI Learning
IBM Champions
IBM Cloud Support
IBM Documentation
IBM Support
IBM Technology Zone
IBM Training
TechXchange Conference
IBM TechXchange Conference 2024
Marketplace
Marketplace
Powered by Higher Logic