Global AI and Data Science

Global AI & Data Science

Train, tune and distribute models with generative AI and machine learning capabilities

 View Only

JSON vs TOON (Token-Oriented Object Notation): Choosing the Right Data Format for LLM

By Ranjeet Kumar posted yesterday

  

TOON (Token-Oriented Object Notation) — The Smarter, Lighter Data Format for LLMs.

For decades, JSON (JavaScript Object Notation) has been the go-to format for data serialization. It’s simple, lightweight, and universally accepted — but when it comes to Large Language Models (LLMs), JSON starts to show its limits.

That’s where TOON (Token-Oriented Object Notation) steps in — a modern data format designed specifically for the AI era.

The Problem with JSON in AI Workflows

JSON wasn’t built with LLMs in mind. While perfect for web APIs and application data exchange, it introduces inefficiencies when communicating with models like GPT or Gemini:

  • Token Overhead: JSON tags, keys, and brackets increase token count — meaning higher cost and slower inference.
  • Verbose Nesting: Deeply nested JSON structures add unnecessary complexity for models.
  • Human vs. Machine Optimization: JSON is human-readable, but not token-efficient for LLM processing.

TOON (Token-Oriented Object Notation)

TOON reimagines how data is passed to AI systems by focusing on token efficiency and semantic clarity. It’s a format designed to communicate with LLMs in their native language — tokens.

Key Advantages:

  • Compact Representation: Reduces redundant structure and minimizes token usage.
  • LLM-Optimized Parsing: Built for fast encoding/decoding and context preservation.
  • Context Awareness: Enables structured yet flexible data exchange tailored for reasoning tasks.
  • Cost-Efficient: Less token consumption directly translates to reduced API costs.

Sample Data format for JSON 

{
  "users": [
    { "first_name": "Ranjeet", "last_name": "Kumar", "email": "ranjeet.kumar@example.com", "address": "Bangalore, India", "role": "Developer" },
    { "first_name": "Manjunath", "last_name": "Subra", "email": "amit.sharma@example.com", "address": "Pune, India", "role": "Tester" },
    { "first_name": "Neha", "last_name": "Singh", "email": "neha.singh@example.com", "address": "Delhi, India", "role": "Manager" },
    { "first_name": "Vikas", "last_name": "Gupta", "email": "vikas.gupta@example.com", "address": "Hyderabad, India", "role": "Developer" },
    { "first_name": "Priya", "last_name": "Mehta", "email": "priya.mehta@example.com", "address": "Mumbai, India", "role": "Tester" },
    { "first_name": "Ankit", "last_name": "Verma", "email": "ankit.verma@example.com", "address": "Chennai, India", "role": "Manager" },
    { "first_name": "Sneha", "last_name": "Patil", "email": "sneha.patil@example.com", "address": "Nagpur, India", "role": "Tester" },
    { "first_name": "Rohit", "last_name": "Yadav", "email": "rohit.yadav@example.com", "address": "Lucknow, India", "role": "Developer" },
    { "first_name": "Kiran", "last_name": "Nair", "email": "kiran.nair@example.com", "address": "Kochi, India", "role": "Developer" },
    { "first_name": "Meena", "last_name": "Joshi", "email": "meena.joshi@example.com", "address": "Jaipur, India", "role": "Developer" }
  ]
}

Sample Data format for TOON

users[10]{first_name,last_name,email,address,role}:
  Ranjeet,Kumar,ranjeet.kumar@example.com,"Bangalore, India",Developer
  Manjunath,Subra,amit.sharma@example.com,"Pune, India",Tester
  Neha,Singh,neha.singh@example.com,"Delhi, India",Manager
  Vikas,Gupta,vikas.gupta@example.com,"Hyderabad, India",Developer
  Priya,Mehta,priya.mehta@example.com,"Mumbai, India",Tester
  Ankit,Verma,ankit.verma@example.com,"Chennai, India",Manager
  Sneha,Patil,sneha.patil@example.com,"Nagpur, India",Tester
  Rohit,Yadav,rohit.yadav@example.com,"Lucknow, India",Developer
  Kiran,Nair,kiran.nair@example.com,"Kochi, India",Developer
  Meena,Joshi,meena.joshi@example.com,"Jaipur, India",Developer

Created a small POC by passing both JSON and TOON data format to GPT 4o mini AI model.

Showing Token consumed and saved with both data format.

Result

349 Token consumed with JSON data format

172 Token consumed with TOON data format

177 Token Saved with TOON data format.

Result of Token Difference

Current Limitations

While TOON (Token-Oriented Object Notation) shows immense promise, it’s important to note that it’s still evolving. At present, TOON works well with only flat objects, and it consumes more tokens in case of nested or hierarchical structures like JSON does.

Link to the Repository

Github: json-toon-llm-tokens

0 comments
0 views

Permalink