Andreesen Horowitz’s Tips for ML Success in Production in Startup Contexts[Link]
As more and more startups are building AI-centric products, a commonality is emerging in the realm of problems they face: The underlying “economies of [AI] data”. The “economies of data” analogy draws a parallel to economies of scale, however, the real relationship is inverse. A large proportion of the problem-space startups are trying to solve exist in a long-tail of available data. As more data is collected and edge cases increase, the marginal returns decrease at an exponential rate; tweaking and tuning here eats up most of a data scientist’s time. That long-tail data is furthermore hard to collect and maintain but might represent critical failure modes of a product.
There’s no easy way of directly solving your dataset issue in a complex problem space, however, they offer some tricks to reformulate the problem to minimize the length of the data long-tail.