IBM AI →
The Community for AI architects and builders to learn, share ideas and connect with others
Join/Log In
In Shumailov et al. (2023)'s paper, The Curse of Recursion: Training on Generated Data Makes Models Forget, the concept of Model Collapse is explored. The TLDR concept being, when we extrapolate a model from data, we often lose the extremes or variance in the data. Over time, if we then use data generated from our models, we will lose this variation entirely and converge down to a single point. Does this mean one day our foundation models will respond to all questions with a single word if we are not careful?
This interesting paper underscores the importance of retaining human interaction and real-world data in the system-- for our variance and lack of perfect predictability is essential to maintaining parity between the real world and our computer generated reflections of it. Additionally, this underscores the importance of retaining training data and taking a proactive approach to training data management for an ethical model governance practice.
See full paper here.