Skip to content

Latest commit

 

History

History
1171 lines (871 loc) · 24.1 KB

File metadata and controls

1171 lines (871 loc) · 24.1 KB

TOON PHP - Complete Documentation

Version: 2.x (PHP 8.1+) | View Legacy Docs (PHP 7.0-8.0)

Complete guide to using TOON PHP for token-efficient data serialization in AI/LLM applications.

Table of Contents


Introduction

TOON (Token-Oriented Object Notation) is a compact data format designed to reduce token consumption when sending structured data to Large Language Models (LLMs). It achieves 30-60% token savings compared to JSON while maintaining readability and full round-trip serialization.

Why TOON?

Traditional JSON uses many structural characters ({, }, [, ], ") that consume tokens but add no semantic value. TOON eliminates this overhead:

// JSON: 82 tokens
{"users":[{"id":1,"name":"Alice","role":"admin"},{"id":2,"name":"Bob","role":"user"}]}

// TOON: 36 tokens (56% savings!)
users[2]{id,name,role}:
  1,Alice,admin
  2,Bob,user

Installation

composer require iamgerwin/toon-php

Requirements:

  • PHP 8.1 or higher
  • No other dependencies

Automatic Loading:

// Composer autoload includes all classes and helper functions
require 'vendor/autoload.php';

use iamgerwin\Toon\Toon;
// Helper functions are now available: toon(), toon_decode(), etc.

Quick Start

Basic Encoding & Decoding

use iamgerwin\Toon\Toon;

// Encode data to TOON
$data = [
    'name' => 'Alice',
    'age' => 30,
    'active' => true,
    'balance' => 1250.50
];

$toon = Toon::encode($data);
echo $toon;
/* Output:
  name: Alice
  age: 30
  active: true
  balance: 1250.5
*/

// Decode back to PHP
$decoded = Toon::decode($toon);
// $decoded is identical to $data

Using Helper Functions

// Shorter syntax using global helper
$toon = toon($data);              // Same as Toon::encode()
$decoded = toon_decode($toon);    // Same as Toon::decode()

Core Concepts

1. Key-Value Pairs

Objects and associative arrays use colon syntax:

$data = ['name' => 'Alice', 'age' => 30];
// TOON:
//   name: Alice
//   age: 30

2. Arrays

Simple arrays with length markers:

$data = [1, 2, 3, 4, 5];
// TOON: [5]: 1,2,3,4,5

3. Tabular Format

Uniform arrays use tabular format for maximum efficiency:

$users = [
    ['id' => 1, 'name' => 'Alice'],
    ['id' => 2, 'name' => 'Bob']
];
// TOON:
// [2]{id,name}:
//   1,Alice
//   2,Bob

4. Nested Structures

Full support for nested objects and arrays:

$data = [
    'user' => [
        'profile' => ['name' => 'Alice'],
        'settings' => ['theme' => 'dark']
    ]
];
// TOON maintains hierarchy with indentation

API Reference

Toon Class

The main facade for all TOON operations.

Toon::encode(mixed $value, ?EncodeOptions $options = null): string

Encodes a PHP value to TOON format.

Parameters:

  • $value - Any PHP value (primitives, arrays, objects, DateTime, enums)
  • $options - Optional encoding configuration

Returns: TOON-encoded string

Throws: EncodingException if encoding fails

Examples:

// Basic encoding
$toon = Toon::encode(['name' => 'Alice']);

// With options
use iamgerwin\Toon\EncodeOptions;

$options = new EncodeOptions(
    indent: 4,
    preferTabular: true
);
$toon = Toon::encode($data, $options);

Toon::decode(string $toon, ?DecodeOptions $options = null): mixed

Decodes a TOON string back to PHP.

Parameters:

  • $toon - TOON-formatted string
  • $options - Optional decoding configuration

Returns: Decoded PHP value

Throws: DecodingException if decoding fails

Examples:

// Basic decoding
$data = Toon::decode($toonString);

// With options
use iamgerwin\Toon\DecodeOptions;

$options = DecodeOptions::lenient();
$data = Toon::decode($toonString, $options);

Toon::compact(mixed $value): string

Encodes to compact format (minimal whitespace).

Parameters:

  • $value - Value to encode

Returns: Compact TOON string

Example:

$compact = Toon::compact(['foo' => 'bar', 'items' => [1, 2, 3]]);
// Output: foo: bar\nitems[3]: 1,2,3

Toon::readable(mixed $value): string

Encodes to readable format (4-space indentation).

Parameters:

  • $value - Value to encode

Returns: Readable TOON string

Example:

$readable = Toon::readable(['foo' => 'bar']);
// Output:
//     foo: bar

Toon::tabular(mixed $value): string

Encodes arrays to tabular format.

Parameters:

  • $value - Array with uniform structure

Returns: Tabular TOON string

Example:

$users = [
    ['id' => 1, 'name' => 'Alice'],
    ['id' => 2, 'name' => 'Bob']
];

$tabular = Toon::tabular($users);
// Output:
// [2]{id,name}:
//   1,Alice
//   2,Bob

Toon::compare(mixed $value, ?EncodeOptions $options = null): array

Compares TOON vs JSON token usage.

Parameters:

  • $value - Value to compare
  • $options - Optional encoding options

Returns: Array with comparison data

[
    'toon' => string,           // TOON-encoded version
    'json' => string,           // JSON-encoded version
    'toon_tokens' => int,       // Estimated TOON tokens
    'json_tokens' => int,       // Estimated JSON tokens
    'savings_percent' => float  // Percentage saved
]

Example:

$comparison = Toon::compare($data);
echo "Token savings: {$comparison['savings_percent']}%";
// Output: Token savings: 42.5%

Toon::estimateTokens(string $toon): int

Estimates token count for a TOON string.

Parameters:

  • $toon - TOON string

Returns: Estimated token count (uses ~4 characters per token)

Example:

$tokens = Toon::estimateTokens($toonString);
$cost = ($tokens / 1000) * 0.03; // GPT-4 pricing
echo "Estimated cost: $$cost";

Helper Functions

Global helper functions for convenience.

toon(mixed $value, ?EncodeOptions $options = null): string

Alias for Toon::encode().

$toon = toon(['name' => 'Alice']);

toon_decode(string $toon, ?DecodeOptions $options = null): mixed

Alias for Toon::decode().

$data = toon_decode($toonString);

toon_compact(mixed $value): string

Alias for Toon::compact().

$compact = toon_compact($data);

toon_readable(mixed $value): string

Alias for Toon::readable().

$readable = toon_readable($data);

toon_tabular(mixed $value): string

Alias for Toon::tabular().

$tabular = toon_tabular($users);

toon_compare(mixed $value, ?EncodeOptions $options = null): array

Alias for Toon::compare().

$comparison = toon_compare($data);

toon_estimate_tokens(string $toon): int

Alias for Toon::estimateTokens().

$tokens = toon_estimate_tokens($toonString);

Encoding Options

Configure how TOON encoding behaves.

Constructor

new EncodeOptions(
    int $indent = 2,
    ToonDelimiter $delimiter = ToonDelimiter::COMMA,
    bool $useLengthMarker = true,
    bool $preferTabular = true,
    bool $quoteStrings = false,
    bool $sortKeys = false
)

Properties

Property Type Default Description
indent int 2 Number of spaces for indentation (0 = compact)
delimiter ToonDelimiter COMMA Array value delimiter (,, \t, or |)
useLengthMarker bool true Include array length markers like [5]
preferTabular bool true Use tabular format for uniform arrays
quoteStrings bool false Always quote string values
sortKeys bool false Sort object keys alphabetically

Factory Methods

// Compact mode (minimal whitespace)
$options = EncodeOptions::compact();

// Readable mode (4-space indentation)
$options = EncodeOptions::readable();

// Tabular mode (optimized for arrays)
$options = EncodeOptions::tabular();

Examples

use iamgerwin\Toon\{Toon, EncodeOptions, Enums\ToonDelimiter};

// Custom indentation
$options = new EncodeOptions(indent: 4);
$toon = Toon::encode($data, $options);

// Tab delimiters
$options = new EncodeOptions(delimiter: ToonDelimiter::TAB);
$toon = Toon::encode($data, $options);
// [3]: value1	value2	value3

// Pipe delimiters
$options = new EncodeOptions(delimiter: ToonDelimiter::PIPE);
$toon = Toon::encode($data, $options);
// [3]: value1|value2|value3

// Sorted keys
$options = new EncodeOptions(sortKeys: true);
$toon = Toon::encode(['z' => 1, 'a' => 2], $options);
// Output:
//   a: 2
//   z: 1

// No length markers
$options = new EncodeOptions(useLengthMarker: false);
$toon = Toon::encode([1, 2, 3], $options);
// Output: 1,2,3  (no [3]:)

Decoding Options

Configure how TOON decoding behaves.

Constructor

new DecodeOptions(
    bool $strict = true,
    bool $associative = true,
    int $depth = 512
)

Properties

Property Type Default Description
strict bool true Enable strict validation during decoding
associative bool true Return objects as associative arrays
depth int 512 Maximum nesting depth

Factory Methods

// Strict mode (validation enabled)
$options = DecodeOptions::strict();

// Lenient mode (relaxed validation)
$options = DecodeOptions::lenient();

Examples

use iamgerwin\Toon\{Toon, DecodeOptions};

// Strict decoding (default)
$data = Toon::decode($toon, DecodeOptions::strict());

// Lenient decoding (forgiving)
$data = Toon::decode($toon, DecodeOptions::lenient());

// Return stdClass objects instead of arrays
$options = new DecodeOptions(associative: false);
$data = Toon::decode($toon, $options);
// $data contains stdClass objects

// Custom depth limit
$options = new DecodeOptions(depth: 100);
$data = Toon::decode($toon, $options);

Data Type Support

TOON PHP supports all PHP data types with full round-trip serialization.

Primitives

// Null
Toon::encode(null);           // "null"

// Booleans
Toon::encode(true);            // "true"
Toon::encode(false);           // "false"

// Integers
Toon::encode(42);              // "42"
Toon::encode(-100);            // "-100"

// Floats
Toon::encode(3.14);            // "3.14"
Toon::encode(1.5e10);          // "15000000000"

// Special floats
Toon::encode(INF);             // "null"
Toon::encode(NAN);             // "null"

// Strings
Toon::encode('hello');         // "hello"
Toon::encode('hello world');   // "hello world"

// Strings with special characters (auto-quoted)
Toon::encode('contains:colon'); // "\"contains:colon\""
Toon::encode("line1\nline2");   // "\"line1\\nline2\""

Arrays

// Simple arrays
Toon::encode([1, 2, 3]);
// [3]: 1,2,3

// Associative arrays
Toon::encode(['name' => 'Alice', 'age' => 30]);
// name: Alice
// age: 30

// Nested arrays
Toon::encode([
    'users' => [
        ['id' => 1, 'name' => 'Alice'],
        ['id' => 2, 'name' => 'Bob']
    ]
]);
// users[2]{id,name}:
//   1,Alice
//   2,Bob

// Mixed arrays
Toon::encode([
    'numbers' => [1, 2, 3],
    'strings' => ['a', 'b', 'c']
]);

Objects

// stdClass objects
$obj = new stdClass();
$obj->name = 'Alice';
$obj->age = 30;

Toon::encode($obj);
// name: Alice
// age: 30

// Arrays are treated as objects
Toon::encode((object)['name' => 'Alice']);
// name: Alice

DateTime Objects

use DateTime;
use DateTimeImmutable;

// DateTime
$date = new DateTime('2024-01-15 10:30:00');
Toon::encode(['created' => $date]);
// created: 2024-01-15T10:30:00+00:00

// DateTimeImmutable
$date = new DateTimeImmutable('2024-01-15');
Toon::encode(['date' => $date]);
// date: 2024-01-15T00:00:00+00:00

// Round-trip preserves ISO 8601 format
$toon = Toon::encode(['date' => new DateTime('2024-01-15')]);
$decoded = Toon::decode($toon);
// $decoded['date'] is a string: "2024-01-15T00:00:00+00:00"

Enums (PHP 8.1+)

// Backed enums
enum Status: string {
    case Active = 'active';
    case Pending = 'pending';
    case Inactive = 'inactive';
}

Toon::encode(['status' => Status::Active]);
// status: active

// Unit enums (no backing value)
enum Color {
    case Red;
    case Green;
    case Blue;
}

Toon::encode(['color' => Color::Red]);
// color: Red

// Round-trip encoding/decoding
$data = ['status' => Status::Active];
$toon = Toon::encode($data);
$decoded = Toon::decode($toon);
// $decoded['status'] is the string "active"

Empty Values

// Empty array
Toon::encode([]);              // "[]"

// Empty object
Toon::encode(new stdClass());  // "{}"

// Empty string
Toon::encode('');              // (empty string)

Format Modes

TOON PHP supports three main format modes.

1. Default Mode (Balanced)

Balances readability and compactness.

$data = [
    'user' => ['name' => 'Alice', 'age' => 30],
    'items' => [1, 2, 3]
];

$toon = Toon::encode($data);
/* Output:
user:
  name: Alice
  age: 30
items[3]: 1,2,3
*/

2. Compact Mode (Minimal)

Removes unnecessary whitespace for maximum efficiency.

$compact = Toon::compact($data);
/* Output:
user:
name: Alice
age: 30
items[3]: 1,2,3
*/

3. Readable Mode (Spacious)

Adds extra indentation for better human readability.

$readable = Toon::readable($data);
/* Output:
    user:
        name: Alice
        age: 30
    items[3]: 1,2,3
*/

4. Tabular Mode (Arrays)

Optimized for uniform arrays (best token efficiency).

$users = [
    ['id' => 1, 'name' => 'Alice', 'role' => 'admin'],
    ['id' => 2, 'name' => 'Bob', 'role' => 'user'],
    ['id' => 3, 'name' => 'Charlie', 'role' => 'user']
];

$tabular = Toon::tabular($users);
/* Output:
[3]{id,name,role}:
  1,Alice,admin
  2,Bob,user
  3,Charlie,user
*/

Advanced Usage

Custom Delimiters

Change array delimiters for specific use cases.

use iamgerwin\Toon\{Toon, EncodeOptions, Enums\ToonDelimiter};

$data = ['items' => [1, 2, 3, 4, 5]];

// Tab delimiter
$options = new EncodeOptions(delimiter: ToonDelimiter::TAB);
echo Toon::encode($data, $options);
// items[5]: 1	2	3	4	5

// Pipe delimiter
$options = new EncodeOptions(delimiter: ToonDelimiter::PIPE);
echo Toon::encode($data, $options);
// items[5]: 1|2|3|4|5

Sorting Keys

Sort object keys alphabetically for consistent output.

$data = [
    'zebra' => 1,
    'apple' => 2,
    'mango' => 3
];

$options = new EncodeOptions(sortKeys: true);
echo Toon::encode($data, $options);
/* Output:
apple: 2
mango: 3
zebra: 1
*/

Forced String Quoting

Always quote strings even when not necessary.

$data = ['name' => 'Alice', 'city' => 'NYC'];

$options = new EncodeOptions(quoteStrings: true);
echo Toon::encode($data, $options);
/* Output:
name: "Alice"
city: "NYC"
*/

Lenient Decoding

Handle malformed TOON with lenient parsing.

// Strict mode (default) - throws exception on errors
try {
    $data = Toon::decode($malformedToon);
} catch (DecodingException $e) {
    // Handle error
}

// Lenient mode - attempts to parse anyway
$data = Toon::decode($malformedToon, DecodeOptions::lenient());

Deep Nesting Control

Limit maximum nesting depth to prevent stack overflow.

$options = new DecodeOptions(depth: 50);
$data = Toon::decode($deeplyNestedToon, $options);
// Will throw exception if depth exceeds 50 levels

Real-World Examples

Example 1: ChatGPT Conversation History

use iamgerwin\Toon\Toon;

$conversation = [
    ['role' => 'system', 'content' => 'You are a helpful assistant.'],
    ['role' => 'user', 'content' => 'What is TOON?'],
    ['role' => 'assistant', 'content' => 'TOON is a token-efficient format...'],
    ['role' => 'user', 'content' => 'How much can I save?']
];

// JSON: ~280 tokens = $0.0084 per API call
$json = json_encode($conversation);

// TOON: ~165 tokens = $0.00495 per API call (41% savings!)
$toon = Toon::tabular($conversation);

// For 100K API calls:
// JSON cost: $840
// TOON cost: $495
// Savings: $345/year

Example 2: E-commerce Product Catalog

$products = [
    ['id' => 1, 'name' => 'Laptop', 'price' => 999.99, 'stock' => 15],
    ['id' => 2, 'name' => 'Mouse', 'price' => 29.99, 'stock' => 50],
    ['id' => 3, 'name' => 'Keyboard', 'price' => 79.99, 'stock' => 30]
];

// JSON: 156 characters
$json = json_encode($products);

// TOON: 89 characters (43% savings!)
$toon = Toon::tabular($products);
/* Output:
[3]{id,name,price,stock}:
  1,Laptop,999.99,15
  2,Mouse,29.99,50
  3,Keyboard,79.99,30
*/

// Decode back when needed
$decoded = Toon::decode($toon);

Example 3: User Profile with Nested Data

$profile = [
    'id' => 12345,
    'username' => 'alice_dev',
    'email' => 'alice@example.com',
    'profile' => [
        'firstName' => 'Alice',
        'lastName' => 'Johnson',
        'bio' => 'Full-stack developer'
    ],
    'settings' => [
        'theme' => 'dark',
        'notifications' => true,
        'language' => 'en'
    ],
    'metadata' => [
        'created' => new DateTime('2024-01-15'),
        'lastLogin' => new DateTime('2024-11-09')
    ]
];

$toon = Toon::encode($profile);
$comparison = Toon::compare($profile);
echo "Token savings: {$comparison['savings_percent']}%";
// Typical savings: 35-40%

Example 4: API Response Compression

// Typical API response
$response = [
    'success' => true,
    'data' => [
        'users' => [
            ['id' => 1, 'name' => 'Alice', 'active' => true],
            ['id' => 2, 'name' => 'Bob', 'active' => false],
            ['id' => 3, 'name' => 'Charlie', 'active' => true]
        ],
        'total' => 3,
        'page' => 1
    ],
    'meta' => [
        'timestamp' => time(),
        'version' => '1.0'
    ]
];

// Before sending to LLM
$compressed = Toon::encode($response);

// LLM processes the compact format
// Then decode on the other end
$original = Toon::decode($compressed);

Example 5: Training Data for ML Models

$trainingData = [];

for ($i = 0; $i < 1000; $i++) {
    $trainingData[] = [
        'features' => [rand(1, 100), rand(1, 100), rand(1, 100)],
        'label' => rand(0, 1)
    ];
}

// JSON: ~45,000 tokens
// TOON: ~22,000 tokens (51% savings!)
$compactData = Toon::tabular($trainingData);

// Massive cost savings when processing large datasets

Performance & Benchmarks

Token Efficiency

Based on official TOON benchmarks:

Use Case JSON Tokens TOON Tokens Savings
E-commerce Orders 3,245 2,170 33.1%
User Lists 150 82 45.3%
Product Catalogs 320 180 43.8%
Event Logs 1,890 1,606 15.0%
Config Files 2,456 1,687 31.3%
Conversation History 280 165 41.1%

LLM Retrieval Accuracy

Tested across 209 questions on 4 LLM models:

Format Accuracy Tokens Used
TOON 73.9% 2,744
JSON (compact) 70.7% 3,081
JSON (formatted) 69.7% 4,545

TOON achieves +3.2% better accuracy while using 39.6% fewer tokens.

Cost Savings

At OpenAI's GPT-4 pricing ($0.03/1K input tokens, $0.06/1K output tokens):

// Calculate potential savings
$data = [...]; // your data

$comparison = Toon::compare($data);
$tokensaved = $comparison['json_tokens'] - $comparison['toon_tokens'];

// For 1M API calls with 100-token payloads
$jsonCost = (100 * 1_000_000 / 1000) * 0.03;      // $3,000
$toonCost = (60 * 1_000_000 / 1000) * 0.03;       // $1,800 (40% savings)

echo "Annual savings: $" . ($jsonCost - $toonCost);
// Output: Annual savings: $1,200

Performance Metrics

  • Encoding Speed: ~5-10μs for small objects, ~50-100μs for large datasets
  • Decoding Speed: ~10-20μs for small objects, ~100-200μs for large datasets
  • Memory Usage: Minimal overhead, similar to json_encode/decode
  • Round-trip Accuracy: 100% for all supported data types

Error Handling

TOON PHP uses exceptions for error handling.

Exception Hierarchy

ToonException (base)
├── EncodingException (encoding errors)
└── DecodingException (decoding errors)

Handling Encoding Errors

use iamgerwin\Toon\{Toon, Exceptions\EncodingException};

try {
    $toon = Toon::encode($data);
} catch (EncodingException $e) {
    echo "Encoding failed: " . $e->getMessage();
    // Log error, use fallback, etc.
}

Handling Decoding Errors

use iamgerwin\Toon\{Toon, Exceptions\DecodingException};

try {
    $data = Toon::decode($toonString);
} catch (DecodingException $e) {
    echo "Decoding failed: " . $e->getMessage();
    // Handle malformed input
}

Common Error Scenarios

// Unsupported type
try {
    $resource = fopen('file.txt', 'r');
    Toon::encode($resource); // Resources not supported
} catch (EncodingException $e) {
    echo $e->getMessage(); // "Unsupported type: resource"
}

// Malformed TOON
try {
    Toon::decode("invalid:::toon::: format");
} catch (DecodingException $e) {
    echo $e->getMessage(); // Error details
}

// Depth exceeded
try {
    $options = new DecodeOptions(depth: 5);
    Toon::decode($deeplyNested, $options);
} catch (DecodingException $e) {
    echo "Nesting too deep";
}

Best Practices

1. Use Tabular Format for Uniform Arrays

// ✅ Good - uses tabular format
$users = [
    ['id' => 1, 'name' => 'Alice'],
    ['id' => 2, 'name' => 'Bob']
];
$toon = Toon::tabular($users);
// Maximum token efficiency

// ❌ Less efficient - default format
$toon = Toon::encode($users);

2. Choose the Right Format Mode

// For LLM consumption - use compact
$forLLM = Toon::compact($data);

// For human debugging - use readable
$forDebug = Toon::readable($data);

// For uniform datasets - use tabular
$forArrays = Toon::tabular($arrayData);

3. Benchmark Your Data

// Always measure actual savings for your use case
$comparison = Toon::compare($yourData);

if ($comparison['savings_percent'] < 20) {
    // TOON might not be worth it for this data
    // Consider using JSON instead
}

4. Handle Errors Gracefully

// Always wrap in try-catch for production
try {
    $toon = Toon::encode($userInput);
} catch (EncodingException $e) {
    // Fallback to JSON or handle error
    $toon = json_encode($userInput);
    Log::warning('TOON encoding failed, using JSON fallback');
}

5. Use Lenient Mode for Untrusted Input

// For data from external sources
$options = DecodeOptions::lenient();
$data = Toon::decode($externalInput, $options);

6. Cache Encoded Results

// For frequently sent data, cache the TOON encoding
$cacheKey = 'toon_' . md5(serialize($data));

$toon = Cache::remember($cacheKey, 3600, function() use ($data) {
    return Toon::encode($data);
});

7. Monitor Token Usage

// Track token usage in production
$toon = Toon::encode($data);
$tokens = Toon::estimateTokens($toon);

Metrics::gauge('llm.tokens.input', $tokens);
Metrics::gauge('llm.cost.estimated', $tokens / 1000 * 0.03);

8. Validate Round-Trip Accuracy

// In tests, ensure data survives round-trip
$original = ['name' => 'Alice', 'age' => 30];
$toon = Toon::encode($original);
$decoded = Toon::decode($toon);

assert($original === $decoded);

Additional Resources

Version Information


Made with ❤️ for the PHP and AI community