With the IBM TechXchange Conference 2025 being only 9 days away, the countdown continues and so does our Speaker Spotlight Series! Among them is Shalisha Witherspoon, an Award Winning Software Engineer at IBM TJ Watson Research Centre, whose work showcases the backbone of technology: data.
For Shalisha, the upcoming IBM TechXchange Conference 2025 is a chance for the most brilliant minds in the industry to share their knowledge, collaborate with peers, and get a glimpse of where the future is headed.
“It’s a one-stop place to experience the present and the future of quantum, open source, and agentic technologies.”
Data as the Unsung Hero of Open Source
AI models may be the centre of conversations, right, but Shalisha reminds us that the entire foundation of AI lies in data preparation. Her work on the Data Prep Kit, created when LLMs and automation began to rise, showcases just how messy, unorganized data can end up undermining even the most advanced models. The Data Prep Kit ends up simplifying the entire process from deduplication and tokenization to filtering.
What has made the project incredibly powerful is its open-source foundation. Developers are encouraged to help shape the toolkit as new challenges emerge. This collaborative approach ensures the Data Prep Kit evolves alongside the fast-moving world of LLMs, empowering a community to build more resilient and inclusive systems. Shalisha is very clear on what should be done: garbage in, garbage out. Good data is not optional but essential, and open-source collaboration is how we will keep it that way.
🔗 Check out IBMs Data Prep Kit
Bias, Guardrails, and the Future of Data
When asked what gets overlooked in LLM development, Shalisha says it is bias and a lack of careful planning. Teams often rush to deploy their products without taking into consideration what their data may contain or what it could be missing. Unfortunately, when systems are trained on incomplete or unbalanced datasets, it not only puts the general populace at risk but also can put already vulnerable communities at an even higher risk, especially women and people of colour.
This is why she stresses that guardrails are a must-have and must be implemented at every stage:
-
Before Training
-
Through Multiple filters
-
After Deployment
These are protections that prevent unsafe interactions in real time. Of course, they are not perfect, with adversaries often trying to work around them. However, they are a must-have if we ever want AI to be truly responsible.
Shalisha believes it is up to developers to be more mindful about where their data is coming from and how it is being used. Everyday users of AI also need to stay cautious about how their information is shared, especially as LLMs evolve at lightning speed and policy struggles to keep up. For her, the challenge isn’t just technical; it is social. AI can either end up amplifying the already existing inequalities if the data behind it is flawed, or it can end up opening doors to more inclusive systems if approached responsibly!
Watch Shalisha Witherspoon at the IBM TechXchange Conference 2025, October 6 to 9 in Orlando, Florida 📅
Monday, Oct 6 | 5:00 PM - 6:00 PM EDT | Lake Louise, Lobby Level, Hilton
Tuesday, Oct 7 | 1:00 PM - 1:20 PM EDT | Stage 08 | 490, Sandbox, OCCC