cashed: intelligent SQL caching using Abstract Syntax Trees

cashed is a research-oriented benchmarking framework for analyzing the performance of different database caching strategies under realistic read/write workloads.
It provides a controlled environment for comparing:

NoCache — baseline database-only execution
TTLCache — traditional time-based expiration
IntelligentCache — dependency-aware invalidation using table-level read/write tracking

The goal of this system is to study how caching algorithms behave under Twitter-like microservice workloads and provide detailed measurements—latency, hit rate, invalidations, and system throughput—over time.

Running The Code

To Install the necessary dependencies, run the following:

pip install sqlglot

sudo apt update
sudo apt-install redis-server
sudo systemctl enable redis
sudo systemctl start redis

sudo apt install sqlite3

To run the application with default settings:

python3 -m app.api

Run with the --help flag to see how you can configure cache type, cache size, cache TTL or if you want to enable periodic cache statistics. On a seperate terminal, run any script from the /benchmarks directory, or directly run any requests. directly.

AST-Based Dependency Tracking (sqlglot)

The IntelligentCache relies on a custom Abstract Syntax Tree (AST) dependency analyzer built using sqlglot.
This module interprets each SQL query, identifies which tables it reads or writes, and extracts simple predicate information used for fine-grained cache invalidation.

Overview

Each incoming SQL query is parsed into an AST using sqlglot.parse_one.
From this syntax tree, the dependency tracker identifies:

Modified tables — tables that the query mutates
Read tables — tables whose data contributes to the query result
Predicates — simple column constraints in the WHERE clause (e.g., a = 3, b BETWEEN 2 AND 5)

This allows IntelligentCache to perform targeted invalidation of cache entries, rather than relying on coarse TTL expiration.

Table Classification

The classify_tables function determines whether a query affects (writes) or depends on (reads) any table.
Major SQL classes handled:

INSERT → marks the target table as modified; reads from source SELECTs if present
UPDATE → marks the target table as modified; all other referenced tables are read
DELETE → marks the target table as modified; WHERE-referenced tables treated as reads
SELECT → purely read-only; all referenced tables appear in the read set
DDL (CREATE, DROP, ALTER, TRUNCATE) → classified as writes to the affected table

The classification is conservative: any table touched by a query is considered relevant for dependency tracking.

⚙️ Caching Architectures

1. NoCache — Baseline Execution

A simple pass-through cache layer that performs no caching and introduces a fixed micro-sleep to simulate minimal framework overhead.
Used for baseline comparisons of:

Raw database latency
Concurrency overhead
Maximum potential improvement from caching

This model ensures all reads and writes go directly to SQLite.

2. TTLCache — Time-Based Expiration

A lightweight caching strategy where each query is hashed and stored with a fixed TTL.

Key properties:

No semantic invalidation
Cache entries expire after ttl_seconds
Good performance for mostly-static data
May serve stale reads under high write rates

This provides a baseline traditional cache for comparison with dependency-based schemes.

3. IntelligentCache — Dependency-Aware Invalidation

A more advanced caching strategy that tracks each query's read set (the tables it touches) and invalidates cached results when writes modify overlapping tables.

Mechanism:

SQL is parsed to extract read and write tables
Cached entries store metadata about which tables they depend on
Writes enqueue invalidation tasks
A dedicated background thread processes invalidations asynchronously

This model aims to balance freshness, hit rate, and efficiency without relying solely on TTL expiration.

API & Workload Model

The system includes a Twitter-inspired microservice API used to generate realistic workloads.
Endpoints include:

GET /readTweet
GET /likeTweet
GET /reTweet
GET /numberLikes
GET /numberReTweet
GET /getFriends
GET /getReplies
GET /getHashtagTweets
GET /getIfLiked
GET /getReTweets
GET /befriend
POST /updateTweet
POST /reply

These endpoints collectively simulate a real-world mix of reads, counters, and user interactions that stress caching systems.

Workload scripts (scripts/*.sh) enable:

Warmup phases
Read-intensive workloads
Mixed workloads
Write-heavy tests

Benchmark Data & Logging

The system automatically records:

Per-query latency (ms)
Query type (read/write)
Cache hit or miss
Invalidation events
Cache size usage
Timeline-based statistics

Outputs are stored in JSON and JSONL, enabling easy visualization and analysis.

Example summary metrics:

{
  "total_requests": 23820,
  "final_stats": {
    "hits": 12423,
    "misses": 11417,
    "hit_rate": 0.5210989932885906,
    "total_queries": 23840,
    "latency": {
      "avg_read_ms": 0.2891422510147095,
      "median_read_ms": 0.26702880859375,
      "p95_read_ms": 0.3979206085205078,
      "p99_read_ms": 0.47969818115234375,
      "min_read_ms": 0.1838207244873047,
      "max_read_ms": 3.0846595764160156,
      "avg_write_ms": 6.181085109710693,
      "median_write_ms": 6.107807159423828,
      "avg_cache_hit_ms": 0.261239218711853,
      "avg_cache_miss_ms": 0.3652518272399902
    },
    "invalidations": 6033
  },
  "completed_at": "2025-11-20T16:43:09.097480"
}

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
app		app
benchmarks		benchmarks
cache		cache
plots		plots
scripts		scripts
README.md		README.md
__init__.py		__init__.py
app.db		app.db
dataset.py		dataset.py
populate_table.py		populate_table.py
twitter_dataset.csv		twitter_dataset.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cashed: intelligent SQL caching using Abstract Syntax Trees

Running The Code

AST-Based Dependency Tracking (sqlglot)

Overview

Table Classification

⚙️ Caching Architectures

1. NoCache — Baseline Execution

2. TTLCache — Time-Based Expiration

3. IntelligentCache — Dependency-Aware Invalidation

API & Workload Model

Benchmark Data & Logging

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

cashed: intelligent SQL caching using Abstract Syntax Trees

Running The Code

AST-Based Dependency Tracking (sqlglot)

Overview

Table Classification

⚙️ Caching Architectures

1. NoCache — Baseline Execution

2. TTLCache — Time-Based Expiration

3. IntelligentCache — Dependency-Aware Invalidation

API & Workload Model

Benchmark Data & Logging

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages