Global AI and Data Science

Global AI & Data Science

Train, tune and distribute models with generative AI and machine learning capabilities

 View Only

Andreesen Horowitz’s Tips for ML Success in Production in Startup Contexts

By Michael Mansour posted Wed August 26, 2020 06:21 PM

  

Andreesen Horowitz’s Tips for ML Success in Production in Startup Contexts
[Link]


As more and more startups are building AI-centric products, a commonality is emerging in the realm of problems they face: The underlying “economies of [AI] data”. The “economies of data” analogy draws a parallel to economies of scale, however, the real relationship is inverse.  A large proportion of the problem-space startups are trying to solve exist in a long-tail of available data.  As more data is collected and edge cases increase, the marginal returns decrease at an exponential rate; tweaking and tuning here eats up most of a data scientist’s time.  That long-tail data is furthermore hard to collect and maintain but might represent critical failure modes of a product.


There’s no easy way of directly solving your dataset issue in a complex problem space, however, they offer some tricks to reformulate the problem to minimize the length of the data long-tail. 


  • Componentize the problem: Instead of trying to solve a global problem, break it into components such that a model tackles a slice of the data.  Deep domain expertise helps guide where those delineations should be made.

  • Build around the long tail: Consider reducing the number of acceptable user inputs with something as simple as an auto-complete functionality.  With this, the length of the tail is truncated. 

  • Build an edge-case engine: Focus on gathering samples of data from the long-tail in a repeatable fashion.  It’s expensive, but pays off and may enhance an active-learning solution




#Featured-area-2
#Featured-area-2-home
#GlobalAIandDataScience
#GlobalDataScience
#ML
0 comments
29 views

Permalink