Global AI and Data Science

Global AI & Data Science

Train, tune and distribute models with generative AI and machine learning capabilities

 View Only

Data Science Community News | Volume 1, Issue 1

By Christina Howell posted Mon August 05, 2019 05:36 PM

  
IBM Data Science Community Newsletter

August 2019 | Volume 1, Issue 1

Welcome to the first edition of the IBM Data Science Community Newsletter!

Here, you will find curated articles and content produced by and for our community members. Thank you for contributing to the conversation.

–Editorial Team

Spotlight

The Implications of The Batch Normalization Patent

Summary

Google has been trying to get their batch normalization technique for training DNN's patented since 2015. Earlier this year the US-Patent Office denied that application after extensive thought and review. They cited 14 prior-art references in an unusually long 58-page response. You can read a legal-dissection of that rejection here. That did not dissuade Google from still pursuing it and resubmitting–it's currently being reconsidered.

Commentary

Patenting algorithms is like patenting salt in the kitchen. If this trend continues, we may see ourselves needing to obtain licenses for implementing algorithms or end up in a headache akin to what Oracle did with Java to everyone. Google has apparently not sued over IP before though except in the Lewandowski case, and some may think they are acting as a benevolent overlord by obtaining the patent before a patent troll files it.  Furthermore, it might be difficult to prove when an algorithm has been trained with batch normalization; reducing the success rate of IP infringement. Either way, the risk of patents in this space could slow down development of new algorithms if it is hard to build upon previous ones, or it could frighten business stakeholders from implementing ML if they perceive an infringement risk.

READ MORE »

AI Skills

10 Machine Learning Methods that Every Data Scientist Should Know

The speed and complexity of new machine learning methodologies makes keeping up with new techniques difficult - for experts and for beginners. To demystify machine learning, let's look at 10 different machine learning methods that include simple descriptions, visualizations, and examples of each. Read more

Image Registration Techniques

Image registration is the process of aligning two or more images of the same scene by designating one image as the reference image and applying geometric transformations or local displacements to the other images so that they align with the reference. This tutorial covers classical feature-extraction methods available in OpenCV, and new deep learning based approaches. View tutorial

Tools & Libraries

AI Fairness 360 Open Source Toolkit

AI Fairness 360 is the top Python open-source toolkit that helps you measure and remove unwanted bias from data & machine learning models. Using the most advanced bias removal techniques in the industry, AI Fairness 360 contains over 75 fairness metrics and 10 bias mitigation algorithms. Read more

Introducing the Plato Research Dialogue System: A Flexible Conversational AI Platform

Summary

Uber open sources yet another internal project, this time a framework for designing, training and deploying conversational AI to further the research state in this field. In comparison to other frameworks, their framework Plato claims to have the most flexible architecture for plugging in different deep learning libraries, building different flows, and supporting multi-agent conversational dialogs enabling agents how to learn to exchange information between themselves. The framework is fully downloadable from their github page.

Commentary

This matters because as we deploy more machine learning applications into the public, they can start to have more user-interaction enabled by understanding humans in the way we communicate best, through natural conversation. It also enhances the information an AI agent can convey to a user.  It makes sense that Uber would work on a project like this; for the case of a self driving car, the system may need to get more information from a rider about destination or changes to the plan (eg: "I'm feeling car sick" -> Car should stop or open the windows, or someone could alternatively ask for information on an area). Read more

Solutions & Products

How IBM and a Supply Chain Company Predict Employee Retention

IBM worked with a supply chain company to predict their employee retention. They identified the key features that drove their sales staff away, and used those for plans of redress. Those details are used as context to explain the basics of essential ML algorithms. This is an essential use-case for data science practitioners. Read more

Using Reinforcement Learning for Test Case Scheduling at Netflix

Summary

Netflix showcases how they implemented the paper: "Reinforcement Learning for Automatic Test Case Prioritization and Selection in Continuous Integration" for testing their SDK across thousands of consumer devices that offer Netflix functionality.  The tests are time consuming, so this helps in prioritizing and scheduling test executions to expedite detection of test failures.  The RL agents can be given various rewards, and can be encouraged to pursue either a more exploitative or exploitive strategy in running tests.  The framework is impressive in its ability to scale and provide a self-service model for feature developers. 

Commentary

This not only has the implication to make a QA team more effective in finding bugs/regressions to improve the user-experience, but can also greatly reduce the cost of a QA team and the carbon footprint of the compute time as a product becomes more feature-rich.  Worth noting is that it appears that the software side of this project may be the more important aspect to its success, instead of the actual RL implementation. While the framework, dubbed "Lerner", is not yet open-sourced, Netflix's trend is to release these types of tools to the public. Read more

Research

XLNet: Generalized Autoregressive Pretraining for Language Understanding

What we're reading in the IBM Data Science community: XLNet takes on BERT for NLP State-of-the-art. From the abstract: "BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy...XLNet, a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation." Read more

JOIN THE CONVERSATION »

Commentary provided by Michael Mansour and Mike Tamir.
This content is the opinion of an IBM Data Science Community member and not of IBM.
1 New Orchard Road, Armonk, NY 10504


#In-the-know-feature
#GlobalAIandDataScience
#GlobalDataScience
#In-the-know
1 comment
120 views

Permalink

Comments

Mon August 26, 2019 05:33 AM

hey members
am sheila from Kenya. am a finance officer who is looking forward to be a professional financial data scientist. i have learnt a lot from the above article , its my dream to do a data research in developing  IBM one day by becoming an employee to this great company
thanks Christina