Skip to content

launix-de/memcp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1,193 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MemCP — Persistent Main Memory Database with MySQL Compatibility

MemCP is a MySQL-compatible database that keeps your data fully in main memory for maximum speed. Your data is safe — it's persisted to disk automatically and survives restarts — but because there's no disk I/O on the query path, reads and aggregations run 10–100× faster than MySQL.

Use your existing MySQL connector, ORMs, and SQL queries. No migration, no rewrite.

Status: Beta. Core SQL (SELECT, INSERT, UPDATE, DELETE, JOIN, GROUP BY, subqueries, triggers) works well in production. Some advanced SQL edge cases are still being improved — check open issues if a specific query doesn't behave as expected.

Why Switch from MySQL? 💡

10-100x Faster on Aggregations and Reports

  • No disk I/O on the query path — data lives in main memory, always ready
  • Fast for writes and reads — handles inserts and complex statistics in the same database
  • Built-in REST API — query your database directly over HTTP, no middleware needed
  • Sub-millisecond response times even on large tables with GROUP BY and aggregations

Why is it so much faster for statistics? MySQL reads entire rows to compute a SUM() or COUNT(). MemCP stores each column separately, so SELECT region, SUM(revenue) FROM orders GROUP BY region only reads the two columns it actually needs — not every field of every row.

🔌 Drop-in MySQL Compatibility

-- Your existing MySQL queries work immediately
CREATE TABLE users (id INT, name VARCHAR(100), email VARCHAR(255));
INSERT INTO users VALUES (1, 'Alice', 'alice@example.com');
SELECT * FROM users WHERE id = 1;

🌐 Built-in REST API Server

# Start MemCP with REST API
./memcp --api-port=4321 lib/main.scm

# Query via HTTP instantly
curl -X POST http://localhost:4321/sql/mydb \
  -d "SELECT * FROM users" \
  -H "Authorization: Basic cm9vdDphZG1pbg=="

API Endpoints:

  • /sql/<database> — MySQL-dialect SQL
  • /psql/<database> — PostgreSQL-dialect SQL
  • /rdf/<database> — SPARQL queries
  • /rdf/<database>/load_ttl — load RDF/Turtle data
  • /dashboard — admin dashboard with live CPU/memory/connection gauges, database browser, shard and compression statistics, and user management

📊 Perfect for Modern Workloads

  • Microservices - Embedded database per service
  • APIs and Web Apps - Ultra-low latency responses
  • Real-time Analytics - Process data as fast as it arrives
  • Development & Testing - Instant setup, no configuration

Architecture & Languages 🏗️

MemCP combines the best of multiple worlds with a carefully chosen tech stack:

Go (Storage Engine & Core)

  • High-performance storage engine built in Go
  • Concurrent request handling with goroutines
  • Memory-efficient data structures
  • Cross-platform compatibility

Scheme (SQL Parser & Compiler)

  • Advanced SQL parser written in Scheme
  • Query optimization and compilation
  • Extensible language for complex transformations
  • Functional programming advantages for parsing

Flexible Scripting Support

  • Command-line argument support for automation
  • Dynamic query generation and processing
  • Easy integration with existing workflows

Key Advantages 🎯

🔥 Ultra-Fast REST APIs

Traditional setup: Client → HTTP Server → Database Connection → Disk I/O MemCP: Client → HTTP Server → Main Memory

// Response times you'll see
MySQL (with network + disk):  10-50ms
MemCP (main memory):          0.1-1ms  // 50x faster!

⚡ Docker

docker pull carli2/memcp
docker run -it -p 4321:4321 -p 3307:3307 carli2/memcp

🧠 Persistent and Safe

  • Data is written to disk — restarts and crashes don't lose your data
  • S3, MinIO, and Ceph backends for cloud and distributed deployments
  • Automatic compression reduces storage footprint significantly vs. MySQL
  • Configurable data directory — point it at any local or remote path

Memory Management 🧩

MemCP is designed to run alongside other services on the same machine without blowing up your RAM.

Automatic compression — MemCP stores each column in the most compact format that fits the data: small integers get bit-packed, repeated strings become dictionary-encoded, sequential IDs are stored as ranges. A table that takes 10 GB in MySQL often fits in 1–3 GB in MemCP.

Configurable memory budget — by default MemCP uses at most 50% of your server's RAM. You can set an exact limit via the dashboard or the settings API:

# Limit to 4 GB total
curl -u root:admin -X POST http://localhost:4321/dashboard/api/settings \
  -d '{"key":"MaxRamBytes","value":4294967296}'

# Or as a percentage of total RAM (default: 50)
curl -u root:admin -X POST http://localhost:4321/dashboard/api/settings \
  -d '{"key":"MaxRamPercent","value":40}'

Automatic eviction — when MemCP approaches its memory limit, it automatically unloads the least recently used data from RAM. That data stays safe on disk and is transparently reloaded the next time a query needs it. Frequently accessed hot data stays in memory; cold data steps aside.

System-wide pressure awareness — if the whole server runs low on free RAM (below 10%), MemCP detects this and proactively releases its own cache — even if its own budget isn't exhausted yet. This keeps your application, web server, and OS responsive regardless of load spikes.

Separate budget for persistent data — a second budget (default: 30% of RAM) controls how much space the on-disk data loaded into RAM may occupy, independently of temporary query working memory. Both limits are tunable at runtime without restart.

🔧 Developer-Friendly

  • Comprehensive test suite with 2470+ SQL tests across 100+ test suites
  • YAML-based testing framework
  • Extensive error handling and validation
  • Built-in performance monitoring

Quick Start 🚀

# 1. Build MemCP
go mod download
make

# 2. Start with REST API
./memcp --api-port=4321 --mysql-port=3307 lib/main.scm

# Run as a background daemon (use --no-repl to avoid exiting when stdin closes)
./memcp --no-repl --api-port=4321 --mysql-port=3307 lib/main.scm &

# 3. Create your first database
curl -X POST http://localhost:4321/sql/system \
  -d "CREATE DATABASE myapp" \
  -u root:admin

# 4. Start building lightning-fast apps!
curl -X POST http://localhost:4321/sql/myapp \
  -d "CREATE TABLE products (id INT, name VARCHAR(100), price DECIMAL(10,2))" \
  -u root:admin

CLI Flags

Flag Default Description
--api-port=PORT 4321 HTTP API listen port
--mysql-port=PORT 3307 MySQL protocol listen port
--mysql-socket=PATH /tmp/memcp.sock MySQL Unix socket path
--root-password=PASSWORD admin Initial root password (first run only)
--disable-api Disable HTTP API server
--disable-mysql Disable MySQL protocol server
--no-repl Disable interactive REPL (required for daemon/background use)
-data DIR ./data Data directory

Authentication

Security note: Never expose MemCP directly to the internet with default credentials. Always set a strong --root-password before any network-accessible deployment.

  • Default credentials: root / admin.
  • Set the initial root password via CLI: --root-password=supersecret at the first run (on a fresh -data folder), or via Docker env ROOT_PASSWORD.
  • Docker Compose example:
services:
  memcp:
    image: carli2/memcp:latest
    environment:
      - ROOT_PASSWORD=supersecret
      - PARAMS=--api-port=4321
    ports:
      - "4321:4321"  # HTTP API
      - "3307:3307"  # MySQL protocol
    volumes:
      - memcp_data:/data
volumes:
  memcp_data: {}
  • Change the credentials with:
curl -X POST http://localhost:4321/sql/system \
  -d "ALTER USER root IDENTIFIED BY 'supersecret'" \
  -u root:admin

Performance Comparison 📈

Query MySQL (SSD) MemCP
SELECT * FROM users WHERE id = 1 1–5ms 0.1ms
SELECT region, SUM(revenue) FROM orders GROUP BY region 200–800ms 2–10ms
SELECT COUNT(*), AVG(price) FROM products WHERE category = ? 50–200ms 0.5ms
INSERT INTO events VALUES (...) 10–30ms 0.2ms
REST API call (HTTP → query → JSON response) 20–100ms 1–10ms

Measured on standard dev hardware. Aggregation queries show the largest speedup because MemCP only reads the columns a query actually uses.

Use Cases 💼

  • 📊 Dashboards and Reports - GROUP BY queries that take seconds in MySQL run in milliseconds
  • 📡 Realtime Monitoring - Aggregate metrics and counters over millions of rows without slowing down
  • 🛒 E-commerce - Product catalog queries, price calculations, and inventory stats at any scale
  • 🎮 Gaming Backends - Leaderboards, player statistics, and session data with sub-millisecond latency
  • 💰 Financial Applications - Aggregations, risk calculations, and transaction summaries in real time
  • 🧪 Development & Testing - Instant database setup, no configuration, throw it away when done

Contributing 🤝

We'd love your help making MemCP even better!

🌟 Why Contribute?

  • Work with cutting-edge database technology
  • Learn Go, Scheme, and database internals
  • Help shape an early-stage project where your contributions have real impact
  • Build ultra-high-performance systems

🛠️ Easy Ways to Contribute

  • 📝 Add test cases - Expand our comprehensive test suite
  • 🐛 Fix bugs - Help us squash issues and improve stability
  • ⚡ Performance optimization - Make fast even faster
  • 📚 Documentation - Help other developers get started
  • 🔧 New features - SQL functions, operators, and capabilities

🚀 Getting Started

# 1. Fork the repository
# 2. Clone your fork
git clone https://github.com/launix-de/memcp.git

# 3. Set up development environment
cd memcp
go build -o memcp

# 4. Run the test suite (starts its own server automatically)
python3 run_sql_tests.py tests/01_basic_sql.yaml

# 5. Make your changes and add tests
# 6. Submit a pull request!

🎯 Current Contribution Opportunities

  • Vector database features - Advanced similarity search
  • Additional SQL functions - String, math, and date functions
  • Performance benchmarking - Automated performance testing
  • Driver development - Language-specific database drivers
  • Documentation examples - Real-world usage scenarios

Testing 🧪

MemCP includes a comprehensive test framework:

# Run all tests
make test

# Or if you want to contribute, deploy this as a Pre-commit hook:
cp git-pre-commit .git/hooks/pre-commit

# Run specific test suites (starts its own server automatically)
python3 run_sql_tests.py tests/01_basic_sql.yaml      # Basic operations
python3 run_sql_tests.py tests/02_functions.yaml      # SQL functions
python3 run_sql_tests.py tests/07_error_cases.yaml    # Error handling

# Connect to an already-running instance (skip startup)
python3 run_sql_tests.py tests/01_basic_sql.yaml 4321 --connect-only

Performance Testing 📊

MemCP includes an auto-calibrating performance test framework that adapts to your machine.

Running Performance Tests

# Run perf tests (uses calibrated baselines)
PERF_TEST=1 make test

# Calibrate for your machine (run ~10 times to reach target time range)
PERF_TEST=1 PERF_CALIBRATE=1 make test

# Freeze row counts for bisecting performance regressions
PERF_TEST=1 PERF_NORECALIBRATE=1 make test

# Show query plans for each test
PERF_TEST=1 PERF_EXPLAIN=1 make test

How Calibration Works

  1. Initial run starts with 10,000 rows per test
  2. Each calibration run scales row counts by 30% up/down
  3. Target is 10-20 seconds query time per test
  4. Baselines are stored in .perf_baseline.json
  5. After ~10 runs, row counts stabilize in the target range

Output Format

✅ Perf: COUNT (7.9ms / 8700ms, 20,000 rows, 0.39µs/row, 11.4MB heap)
         │       │        │           │        │           └─ Heap memory after insert
         │       │        │           │        └─ Time per row
         │       │        │           └─ Calibrated row count
         │       │        └─ Threshold (from baseline × 1.1)
         │       └─ Actual query time
         └─ Test name

Performance Debugging Cookbook

Detecting a performance regression:

# 1. Freeze baselines to use consistent row counts
PERF_TEST=1 PERF_NORECALIBRATE=1 make test

# 2. If a test fails threshold, you have a regression

Bisecting a performance bug:

# 1. Checkout the known-good commit, run calibration
git checkout good-commit
PERF_TEST=1 PERF_CALIBRATE=1 make test  # run 10x to calibrate

# 2. Save the baseline
cp .perf_baseline.json .perf_baseline_good.json

# 3. Bisect with frozen row counts
git bisect start
git bisect bad HEAD
git bisect good good-commit
git bisect run bash -c 'PERF_TEST=1 PERF_NORECALIBRATE=1 make test'

Analyzing slow queries:

# Show query plans to understand execution
PERF_TEST=1 PERF_EXPLAIN=1 make test

Environment Variables

Variable Values Description
PERF_TEST 0/1 Enable performance tests
PERF_CALIBRATE 0/1 Update baselines with new times
PERF_NORECALIBRATE 0/1 Freeze row counts (for bisecting)
PERF_EXPLAIN 0/1 Show query plans

Remote Storage Backends 🗄️

MemCP supports storing databases on remote storage backends instead of the local filesystem. To configure a remote backend, create a JSON configuration file in the data folder instead of a directory.

S3 / MinIO Storage

Store your database on Amazon S3 or any S3-compatible storage (MinIO, Ceph RGW, etc.).

Configuration file (data/mydb.json):

{
  "backend": "s3",
  "access_key_id": "your-access-key",
  "secret_access_key": "your-secret-key",
  "region": "us-east-1",
  "bucket": "memcp-data",
  "prefix": "databases"
}

For MinIO or self-hosted S3-compatible storage:

{
  "backend": "s3",
  "access_key_id": "minioadmin",
  "secret_access_key": "minioadmin",
  "endpoint": "http://localhost:9000",
  "bucket": "memcp",
  "prefix": "data",
  "force_path_style": true
}

Quick MinIO setup for testing:

# Start MinIO with Docker
docker run -d --name minio \
  -p 9000:9000 -p 9001:9001 \
  -e MINIO_ROOT_USER=minioadmin \
  -e MINIO_ROOT_PASSWORD=minioadmin \
  minio/minio server /data --console-address ":9001"

# Create a bucket (via MinIO Console at http://localhost:9001)
# Or via mc CLI:
mc alias set local http://localhost:9000 minioadmin minioadmin
mc mb local/memcp

Ceph/RADOS Storage

Store your database directly on Ceph RADOS for high-performance distributed storage.

Why is Ceph optional? The Ceph backend uses CGO to link against librados (the Ceph client library). This requires the C headers and library to be installed at compile time and the shared library at runtime. To keep the default build simple and portable, Ceph support is behind a build tag.

# Install Ceph development libraries (Ubuntu/Debian)
sudo apt-get install librados-dev

# Build MemCP with Ceph support
make ceph
# or: go build -tags=ceph

Configuration file (data/mydb.json):

{
  "backend": "ceph",
  "username": "client.memcp",
  "cluster": "ceph",
  "pool": "memcp",
  "prefix": "databases"
}

Optional fields:

  • conf_file: Path to ceph.conf (defaults to /etc/ceph/ceph.conf)

Setting up a Ceph development cluster with vstart.sh:

# Clone Ceph source
git clone https://github.com/ceph/ceph.git
cd ceph

# Install dependencies and build (only vstart target needed)
./install-deps.sh
pip install cython setuptools
./do_cmake.sh
cd build && ninja vstart

# Start a development cluster
cd ..
MON=1 OSD=3 MDS=0 MGR=1 ./build/bin/vstart.sh -d -n -x

# Create a pool for MemCP
./build/bin/ceph osd pool create memcp 32

# Create a user for MemCP (optional, can also use client.admin)
./build/bin/ceph auth get-or-create client.memcp \
  mon 'allow r' \
  osd 'allow rwx pool=memcp' \
  -o ceph.client.memcp.keyring

Environment variables for vstart cluster:

export CEPH_CONF=/path/to/ceph/build/ceph.conf
export CEPH_KEYRING=/path/to/ceph/build/keyring

Backend Configuration Reference

Field Backend Description
backend all Backend type: "s3" or "ceph"
prefix all Object key prefix for database objects
access_key_id S3 AWS or S3-compatible access key
secret_access_key S3 AWS or S3-compatible secret key
region S3 AWS region (e.g., "us-east-1")
endpoint S3 Custom endpoint URL (for MinIO, etc.)
bucket S3 S3 bucket name
force_path_style S3 Use path-style URLs (required for MinIO)
username Ceph Ceph user (e.g., "client.admin")
cluster Ceph Cluster name (usually "ceph")
conf_file Ceph Path to ceph.conf (optional)
pool Ceph RADOS pool name

License 📄

MemCP is open source software. See the LICENSE file for details.


Ready to experience database performance like never before? Get StartedContributeJoin our Community

MemCP: Because your applications deserve better than "good enough" performance.

About

A columnar In-Memory Database as Drop-In Replacement for MySQL supporting 10x performance in OLAP workloads and similar performance in OLTP

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors