Skip to content
forked from tonl-dev/tonl

TONL (Token-Optimized Notation Language)

License

Notifications You must be signed in to change notification settings

sidarkincal/tonl

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

236 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

TONL - Token-Optimized Notation Language

TONL (Token-Optimized Notation Language)

TONL is a production-ready data platform that combines compact serialization with powerful query, modification, indexing, and streaming capabilities. Designed for LLM token efficiency while providing a rich API for data access and manipulation.

πŸŽ‰ Latest Release: v2.0.6 - Nested Array Length Fix

✨ Key Features in v2.0.6:

  • πŸ› Fixed nested array length preservation - Perfect round-trip for [[]], [[[]]], etc.
  • πŸ”„ Revolutionary dual-mode system (quoting + preprocessing)
  • βœ… Perfect round-trip safety - 100% data preservation in default mode
  • πŸ› οΈ Advanced quoting for special characters (#, @, "", etc.)
  • 🌐 Browser playground now handles all JSON inputs flawlessly
  • πŸ“‹ Optional --preprocess flag for clean, readable output
  • πŸ”„ Zero data loss guaranteed for both modes

npm version License: MIT TypeScript

🏠 Homepage: tonl.dev πŸ“¦ GitHub: github.com/tonl-dev/tonl πŸ“– Documentation: Complete Guides

πŸ“‹ Table of Contents


Why TONL?

πŸ—œοΈ Up to 60% Smaller - Reduce JSON size and LLM token costs πŸ‘οΈ Human-Readable - Clear text format, not binary πŸš€ Blazingly Fast - 10-1600x faster than targets πŸ”’ Production Secure - 100% security hardened (v2.0.3) πŸ› οΈ TypeScript-First - Full type safety & IntelliSense πŸ“¦ Zero Dependencies - Pure TypeScript, no bloat 🌐 Browser Ready - 10.5 KB gzipped bundle (IIFE/UMD) βœ… 100% Tested - 496/496 tests passing (core functionality)


πŸš€ Quick Start

Installation

npm install tonl

Basic Usage

import { TONLDocument, encodeTONL, decodeTONL } from 'tonl';

// Create from JSON
const doc = TONLDocument.fromJSON({
  users: [
    { id: 1, name: "Alice", role: "admin", age: 30 },
    { id: 2, name: "Bob", role: "user", age: 25 }
  ]
});

// Query with JSONPath-like syntax
doc.get('users[0].name');                          // 'Alice'
doc.query('users[*].name');                        // ['Alice', 'Bob']
doc.query('users[?(@.role == "admin")]');          // [{ id: 1, ... }]
doc.query('$..age');                               // All ages recursively

// Modify data
doc.set('users[0].age', 31);
doc.push('users', { id: 3, name: "Carol", role: "editor", age: 28 });

// Navigate and iterate
for (const [key, value] of doc.entries()) {
  console.log(key, value);
}

doc.walk((path, value, depth) => {
  console.log(`${path}: ${value}`);
});

// Export
const tonl = doc.toTONL();
const json = doc.toJSON();
await doc.save('output.tonl');

// Classic API (encode/decode)
const data = { users: [{ id: 1, name: "Alice" }] };
const tonlText = encodeTONL(data);
const restored = decodeTONL(tonlText);

// Advanced Optimization (v2.0.1+)
import { AdaptiveOptimizer, BitPacker, DeltaEncoder } from 'tonl/optimization';

// Automatic optimization
const optimizer = new AdaptiveOptimizer();
const result = optimizer.optimize(data);  // Auto-selects best strategies

// Specific optimizers
const packer = new BitPacker();
const packed = packer.packBooleans([true, false, true]);

const delta = new DeltaEncoder();
const timestamps = [1704067200000, 1704067201000, 1704067202000];
const compressed = delta.encode(timestamps, 'timestamp');

CLI Usage

# Get started (shows help)
tonl

# Version info
tonl --version

# Encode JSON to TONL (perfect round-trip, quotes special keys)
tonl encode data.json --out data.tonl --smart --stats

# Encode with preprocessing (clean, readable keys)
tonl encode data.json --preprocess --out data.tonl

# Decode TONL to JSON
tonl decode data.tonl --out data.json

# Query data
tonl query users.tonl "users[?(@.role == 'admin')]"
tonl get data.json "user.profile.email"

# Validate against schema
tonl validate users.tonl --schema users.schema.tonl

# Format and prettify
tonl format data.tonl --pretty --out formatted.tonl

# Compare token costs
tonl stats data.json --tokenizer gpt-5

πŸ“Š Format Overview

Arrays of Objects (Tabular Format)

JSON (245 bytes, 89 tokens):

{
  "users": [
    { "id": 1, "name": "Alice", "role": "admin" },
    { "id": 2, "name": "Bob, Jr.", "role": "user" },
    { "id": 3, "name": "Carol", "role": "editor" }
  ]
}

TONL (158 bytes, 49 tokens - 45% reduction):

#version 1.0
users[3]{id:u32,name:str,role:str}:
  1, Alice, admin
  2, "Bob, Jr.", user
  3, Carol, editor

Nested Objects

JSON:

{
  "user": {
    "id": 1,
    "name": "Alice",
    "contact": {
      "email": "alice@example.com",
      "phone": "+123456789"
    },
    "roles": ["admin", "editor"]
  }
}

TONL:

#version 1.0
user{id:u32,name:str,contact:obj,roles:list}:
  id: 1
  name: Alice
  contact{email:str,phone:str}:
    email: alice@example.com
    phone: +123456789
  roles[2]: admin, editor

✨ Complete Feature Set

πŸ”„ Core Serialization

  • Compact Format - 32-45% smaller than JSON (bytes + tokens)
  • Human-Readable - Clear text format with minimal syntax
  • Round-Trip Safe - Perfect bidirectional JSON conversion
  • Smart Encoding - Auto-selects optimal delimiters and formatting
  • Type Hints - Optional schema information for validation

πŸ” Query & Navigation API

  • JSONPath Queries - users[?(@.age > 25)], $..email
  • Filter Expressions - ==, !=, >, <, &&, ||, contains, matches
  • Wildcard Support - users[*].name, **.email
  • Tree Traversal - entries(), keys(), values(), walk()
  • LRU Cache - >90% cache hit rate on repeated queries

✏️ Modification API

  • CRUD Operations - set(), get(), delete(), push(), pop()
  • Bulk Operations - merge(), update(), removeAll()
  • Change Tracking - diff() with detailed change reports
  • Snapshots - Document versioning and comparison
  • Atomic File Edits - Safe saves with automatic backups

⚑ Performance & Indexing

  • Hash Index - O(1) exact match lookups
  • BTree Index - O(log n) range queries
  • Compound Index - Multi-field indexing
  • Stream Processing - Handle multi-GB files with <100MB memory
  • Pipeline Operations - Chainable filter/map/reduce transformations

πŸ—œοΈ Advanced Optimization

  • Dictionary Encoding - Value compression via lookup tables (30-50% savings)
  • Delta Encoding - Sequential data compression (40-60% savings)
  • Run-Length Encoding - Repetitive value compression (50-80% savings)
  • Bit Packing - Boolean and small integer bit-level compression (87.5% savings)
  • Numeric Quantization - Precision reduction for floating-point numbers (20-40% savings)
  • Schema Inheritance - Reusable column schemas across data blocks (20-40% savings)
  • Hierarchical Grouping - Common field extraction for nested structures (15-30% savings)
  • Tokenizer-Aware - LLM tokenizer optimization for minimal token usage (5-15% savings)
  • Column Reordering - Entropy-based ordering for better compression
  • Adaptive Optimizer - Automatic strategy selection based on data patterns

βœ… Schema & Validation

  • Schema Definition - .schema.tonl files with TSL (TONL Schema Language)
  • 13 Constraints - required, min, max, pattern, unique, email, etc.
  • TypeScript Generation - Auto-generate types from schemas
  • Runtime Validation - Validate data programmatically or via CLI
  • Strict Mode - Enforce schema compliance

πŸ› οΈ Developer Tools

  • Interactive REPL - Explore data interactively in terminal
  • CLI Suite - encode, decode, query, validate, format, stats
  • Browser Support - ESM, UMD, IIFE builds (8.84 KB gzipped)
  • VS Code Extension - Syntax highlighting for .tonl files
  • TypeScript-First - Full IntelliSense and type safety

πŸ“Š Performance Comparison

Metric JSON TONL TONL Smart Improvement
Size (bytes) 245 167 158 36% smaller
Tokens (GPT-5) 89 54 49 45% fewer
Encoding Speed 1.0x 15x 12x 12-15x faster
Decoding Speed 1.0x 10x 10x 10x faster
Query Speed - - 1600x Target: <1ms

Benchmarks based on typical e-commerce product catalog data


πŸ”’ Security & Quality

βœ… Tests:          496/496 passing (100% coverage)
βœ… Security:       All vulnerabilities fixed (100%)
βœ… Security Tests: 96 security tests passing
βœ… Code Quality:   TypeScript strict mode
βœ… Dependencies:   0 runtime dependencies
βœ… Bundle Size:    10.5 KB gzipped (browser)
βœ… Performance:    10-1600x faster than targets
βœ… Production:     Ready & Fully Secure

Security:

  • βœ… ReDoS, Path Traversal, Buffer Overflow protection
  • βœ… Prototype Pollution, Command Injection prevention
  • βœ… Integer Overflow, Type Coercion fixes
  • βœ… Comprehensive input validation and resource limits

See SECURITY.md and CHANGELOG.md for details.


🎯 Use Cases

LLM Prompts

Reduce token costs by 32-45% when including structured data in prompts:

const prompt = `Analyze this user data:\n${doc.toTONL()}`;
// 45% fewer tokens = lower API costs

Configuration Files

Human-readable configs that are compact yet clear:

config{env:str,database:obj,features:list}:
  env: production
  database{host:str,port:u32,ssl:bool}:
    host: db.example.com
    port: 5432
    ssl: true
  features[3]: auth, analytics, caching

API Responses

Efficient data transmission with schema validation:

app.get('/api/users', async (req, res) => {
  const doc = await TONLDocument.load('users.tonl');
  const filtered = doc.query('users[?(@.active == true)]');
  res.type('text/tonl').send(encodeTONL(filtered));
});

Data Pipelines

Stream processing for large datasets:

import { createEncodeStream, createDecodeStream } from 'tonl/stream';

createReadStream('huge.json')
  .pipe(createDecodeStream())
  .pipe(transformStream)
  .pipe(createEncodeStream({ smart: true }))
  .pipe(createWriteStream('output.tonl'));

Log Aggregation

Compact structured logs:

logs[1000]{timestamp:i64,level:str,message:str,metadata:obj}:
  1699564800, INFO, "User login", {user_id:123,ip:"192.168.1.1"}
  1699564801, ERROR, "DB timeout", {query:"SELECT...",duration:5000}
  ...

🌐 Browser Usage

ESM (Modern Browsers)

<script type="module">
  import { encodeTONL, decodeTONL } from 'https://cdn.jsdelivr.net/npm/tonl@2.0.6/+esm';

  const data = { users: [{ id: 1, name: "Alice" }] };
  const tonl = encodeTONL(data);
  console.log(tonl);
</script>

UMD (Universal)

<script src="https://unpkg.com/tonl@2.0.6/dist/browser/tonl.umd.js"></script>
<script>
  const tonl = TONL.encodeTONL({ hello: "world" });
  console.log(tonl);
</script>

Bundle Sizes:

  • ESM: 15.5 KB gzipped
  • UMD: 10.7 KB gzipped
  • IIFE: 10.6 KB gzipped

πŸ“š Complete API Reference

TONLDocument Class

// Creation
TONLDocument.fromJSON(data)
TONLDocument.fromTONL(text)
TONLDocument.load(filepath)

// Query
doc.get(path: string)                              // Single value
doc.query(query: string)                           // Multiple values
doc.has(path: string)                              // Check existence

// Modification
doc.set(path: string, value: any)                  // Set value
doc.delete(path: string)                           // Delete value
doc.push(path: string, value: any)                 // Append to array
doc.pop(path: string)                              // Remove last from array
doc.merge(path: string, value: object)             // Deep merge objects

// Navigation
doc.entries()                                      // Iterator<[key, value]>
doc.keys()                                         // Iterator<string>
doc.values()                                       // Iterator<any>
doc.walk(callback: WalkCallback)                   // Tree traversal
doc.find(predicate: Predicate)                     // Find single value
doc.findAll(predicate: Predicate)                  // Find all matching
doc.some(predicate: Predicate)                     // Any match
doc.every(predicate: Predicate)                    // All match

// Indexing
doc.createIndex(field: string, type?: IndexType)   // Create index
doc.removeIndex(field: string)                     // Remove index
doc.getIndex(field: string)                        // Get index

// Export
doc.toTONL(options?: EncodeOptions)                // Export as TONL
doc.toJSON()                                       // Export as JSON
doc.save(filepath: string, options?)               // Save to file
doc.getSize()                                      // Size in bytes
doc.getStats()                                     // Statistics object

Encode/Decode API

// Encoding
encodeTONL(data: any, options?: {
  delimiter?: "," | "|" | "\t" | ";";
  includeTypes?: boolean;
  version?: string;
  indent?: number;
  singleLinePrimitiveLists?: boolean;
}): string

// Smart encoding (auto-optimized)
encodeSmart(data: any, options?: EncodeOptions): string

// Decoding
decodeTONL(text: string, options?: {
  delimiter?: "," | "|" | "\t" | ";";
  strict?: boolean;
}): any

Schema API

import { parseSchema, validateTONL } from 'tonl/schema';

// Parse schema
const schema = parseSchema(schemaText: string);

// Validate data
const result = validateTONL(data: any, schema: Schema);

if (!result.valid) {
  result.errors.forEach(err => {
    console.error(`${err.field}: ${err.message}`);
  });
}

Streaming API

import { createEncodeStream, createDecodeStream, encodeIterator, decodeIterator } from 'tonl/stream';

// Node.js streams
createReadStream('input.json')
  .pipe(createEncodeStream({ smart: true }))
  .pipe(createWriteStream('output.tonl'));

// Async iterators
for await (const line of encodeIterator(dataStream)) {
  console.log(line);
}

βœ… Schema Validation

Define schemas with the TONL Schema Language (TSL):

@schema v1
@strict true
@description "User management schema"

# Define custom types
User: obj
  id: u32 required
  username: str required min:3 max:20 pattern:^[a-zA-Z0-9_]+$
  email: str required pattern:email lowercase:true
  age: u32? min:13 max:150
  roles: list<str> required min:1 unique:true

# Root schema
users: list<User> required min:1
totalCount: u32 required

13 Built-in Constraints:

  • required - Field must exist
  • min / max - Numeric range or string/array length
  • length - Exact length
  • pattern - Regex validation (or shortcuts: email, url, uuid)
  • unique - Array elements must be unique
  • nonempty - String/array cannot be empty
  • positive / negative - Number sign
  • integer - Must be integer
  • multipleOf - Divisibility check
  • lowercase / uppercase - String case enforcement

See docs/SCHEMA_SPECIFICATION.md for complete reference.


πŸ› οΈ Development

Build & Test

# Install dependencies
npm install

# Build TypeScript
npm run build

# Run all tests (496 tests)
npm test

# Watch mode
npm run dev

# Clean build artifacts
npm run clean

Benchmarking

# Byte size comparison
npm run bench

# Token estimation (GPT-5, Claude 3.5, Gemini 2.0, Llama 4)
npm run bench-tokens

# Comprehensive performance analysis
npm run bench-comprehensive

CLI Development

# Install CLI locally
npm run link

# Test commands
tonl encode test.json
tonl query data.tonl "users[*].name"
tonl format data.tonl --pretty

πŸ—ΊοΈ Roadmap

βœ… v2.0+ - Complete

  • βœ… Advanced optimization module (60% additional compression)
  • βœ… Complete query, modification, indexing, streaming APIs
  • βœ… Schema validation & TypeScript generation
  • βœ… Browser support (10.5 KB bundles)
  • βœ… 100% test coverage & security hardening

πŸš€ Future

  • Enhanced VS Code extension (IntelliSense, debugging)
  • Web playground with live conversion
  • Python, Go, Rust implementations
  • Binary TONL format for extreme compression

See ROADMAP.md for our comprehensive development vision.


πŸ“– Documentation

For Users

For Implementers (Other Languages)

Implementing TONL in Python, Go, Rust, or another language? Check out the Implementation Reference for complete algorithms, pseudo-code, and test requirements!


🀝 Contributing

Contributions are welcome! Please read CONTRIBUTING.md for:

  • Development setup
  • Code style guidelines
  • Testing requirements
  • Pull request process
  • Architecture overview

πŸ“„ License

MIT License - see LICENSE file for details.


🌟 Links


TONL: Making structured data LLM-friendly without sacrificing readability. πŸš€

Built with ❀️ by Ersin Koc

About

TONL (Token-Optimized Notation Language)

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • TypeScript 66.7%
  • HTML 23.7%
  • JavaScript 9.6%