Skip to content

npunati27/cashed

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cashed: intelligent SQL caching using Abstract Syntax Trees

cashed is a research-oriented benchmarking framework for analyzing the performance of different database caching strategies under realistic read/write workloads.
It provides a controlled environment for comparing:

  • NoCache — baseline database-only execution
  • TTLCache — traditional time-based expiration
  • IntelligentCache — dependency-aware invalidation using table-level read/write tracking

The goal of this system is to study how caching algorithms behave under Twitter-like microservice workloads and provide detailed measurements—latency, hit rate, invalidations, and system throughput—over time.


Running The Code

To Install the necessary dependencies, run the following:

pip install sqlglot

sudo apt update
sudo apt-install redis-server
sudo systemctl enable redis
sudo systemctl start redis

sudo apt install sqlite3

To run the application with default settings:

python3 -m app.api

Run with the --help flag to see how you can configure cache type, cache size, cache TTL or if you want to enable periodic cache statistics. On a seperate terminal, run any script from the /benchmarks directory, or directly run any requests. directly.

AST-Based Dependency Tracking (sqlglot)

The IntelligentCache relies on a custom Abstract Syntax Tree (AST) dependency analyzer built using sqlglot.
This module interprets each SQL query, identifies which tables it reads or writes, and extracts simple predicate information used for fine-grained cache invalidation.

Overview

Each incoming SQL query is parsed into an AST using sqlglot.parse_one.
From this syntax tree, the dependency tracker identifies:

  • Modified tables — tables that the query mutates
  • Read tables — tables whose data contributes to the query result
  • Predicates — simple column constraints in the WHERE clause (e.g., a = 3, b BETWEEN 2 AND 5)

This allows IntelligentCache to perform targeted invalidation of cache entries, rather than relying on coarse TTL expiration.


Table Classification

The classify_tables function determines whether a query affects (writes) or depends on (reads) any table.
Major SQL classes handled:

  • INSERT → marks the target table as modified; reads from source SELECTs if present
  • UPDATE → marks the target table as modified; all other referenced tables are read
  • DELETE → marks the target table as modified; WHERE-referenced tables treated as reads
  • SELECT → purely read-only; all referenced tables appear in the read set
  • DDL (CREATE, DROP, ALTER, TRUNCATE) → classified as writes to the affected table

The classification is conservative: any table touched by a query is considered relevant for dependency tracking.

⚙️ Caching Architectures

1. NoCache — Baseline Execution

A simple pass-through cache layer that performs no caching and introduces a fixed micro-sleep to simulate minimal framework overhead.
Used for baseline comparisons of:

  • Raw database latency
  • Concurrency overhead
  • Maximum potential improvement from caching

This model ensures all reads and writes go directly to SQLite.


2. TTLCache — Time-Based Expiration

A lightweight caching strategy where each query is hashed and stored with a fixed TTL.

Key properties:

  • No semantic invalidation
  • Cache entries expire after ttl_seconds
  • Good performance for mostly-static data
  • May serve stale reads under high write rates

This provides a baseline traditional cache for comparison with dependency-based schemes.


3. IntelligentCache — Dependency-Aware Invalidation

A more advanced caching strategy that tracks each query's read set (the tables it touches) and invalidates cached results when writes modify overlapping tables.

Mechanism:

  • SQL is parsed to extract read and write tables
  • Cached entries store metadata about which tables they depend on
  • Writes enqueue invalidation tasks
  • A dedicated background thread processes invalidations asynchronously

This model aims to balance freshness, hit rate, and efficiency without relying solely on TTL expiration.


API & Workload Model

The system includes a Twitter-inspired microservice API used to generate realistic workloads.
Endpoints include:

  • GET /readTweet
  • GET /likeTweet
  • GET /reTweet
  • GET /numberLikes
  • GET /numberReTweet
  • GET /getFriends
  • GET /getReplies
  • GET /getHashtagTweets
  • GET /getIfLiked
  • GET /getReTweets
  • GET /befriend
  • POST /updateTweet
  • POST /reply

These endpoints collectively simulate a real-world mix of reads, counters, and user interactions that stress caching systems.

Workload scripts (scripts/*.sh) enable:

  • Warmup phases
  • Read-intensive workloads
  • Mixed workloads
  • Write-heavy tests

Benchmark Data & Logging

The system automatically records:

  • Per-query latency (ms)
  • Query type (read/write)
  • Cache hit or miss
  • Invalidation events
  • Cache size usage
  • Timeline-based statistics

Outputs are stored in JSON and JSONL, enabling easy visualization and analysis.

Example summary metrics:

{
  "total_requests": 23820,
  "final_stats": {
    "hits": 12423,
    "misses": 11417,
    "hit_rate": 0.5210989932885906,
    "total_queries": 23840,
    "latency": {
      "avg_read_ms": 0.2891422510147095,
      "median_read_ms": 0.26702880859375,
      "p95_read_ms": 0.3979206085205078,
      "p99_read_ms": 0.47969818115234375,
      "min_read_ms": 0.1838207244873047,
      "max_read_ms": 3.0846595764160156,
      "avg_write_ms": 6.181085109710693,
      "median_write_ms": 6.107807159423828,
      "avg_cache_hit_ms": 0.261239218711853,
      "avg_cache_miss_ms": 0.3652518272399902
    },
    "invalidations": 6033
  },
  "completed_at": "2025-11-20T16:43:09.097480"
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors