MongoDB re-imagined for the lakehouse era.
npm install mongolakeMongoLake is a MongoDB-compatible database that stores data as Parquet files. It combines the developer experience of MongoDB with the analytical power of the lakehouse.
Core Features:
- MongoDB API - Familiar CRUD operations (insertOne, find, update, delete)
- Local-first development - File-based storage with
.mongolake/directory - Lakehouse-native - Data stored as queryable Parquet files
- Cloudflare Workers - Deploy globally on Workers + R2
- Variant encoding - Automatic handling of flexible document schemas
- Multiple storage backends - Local filesystem, Cloudflare R2, S3-compatible storage
New in this release:
- Wire Protocol server (mongosh, Compass compatibility)
- CLI dev command for local development
- B-tree indexing with automatic _id index
- Enhanced aggregation pipeline ($unwind, $lookup)
- Change streams with real-time watch() support
- Iceberg metadata generation
Start a local development server with the CLI:
npx mongolake devThis starts a server with:
- MongoDB wire protocol on port 27017 (use with mongosh)
- REST API on port 3000
Or use the programmatic API:
import { db } from 'mongolake';
// Uses local .mongolake/ folder by default
const users = db('myapp').collection('users');
await users.insertOne({
name: 'Alice',
email: 'alice@example.com'
});
const user = await users.findOne({ email: 'alice@example.com' });// worker.ts
import { MongoLakeWorker } from 'mongolake/worker';
import { MongoLakeDO } from 'mongolake/do';
export default MongoLakeWorker;
export { MongoLakeDO };# wrangler.toml
name = "my-mongolake"
[[r2_buckets]]
binding = "BUCKET"
bucket_name = "my-data"
[[durable_objects.bindings]]
name = "SHARD"
class_name = "MongoLakeDO"Use the MongoDB API you know:
// All the familiar operations
await users.insertMany([...]);
await users.find({ status: 'active' }).sort({ createdAt: -1 }).limit(10);
await users.updateOne({ _id }, { $set: { status: 'inactive' } });
await users.aggregate([
{ $match: { status: 'active' } },
{ $group: { _id: '$department', count: { $sum: 1 } } }
]);Your data is stored as Parquet files:
.mongolake/
myapp/
users.parquet # Collection data
users_001.parquet # Delta (new writes)
orders.parquet
_iceberg/ # Optional Iceberg metadata
Query with DuckDB:
SELECT * FROM '.mongolake/myapp/users*.parquet'
WHERE status = 'active'Configure how documents are stored:
const lake = new MongoLake({
database: 'myapp',
schema: {
users: {
// Promote fields to native Parquet columns
columns: {
_id: 'string',
email: 'string',
createdAt: 'timestamp',
profile: { name: 'string', avatar: 'string' },
tags: ['string'],
},
// Auto-promote fields in >90% of documents
autoPromote: { threshold: 0.9 },
},
},
});Branching and time travel features are planned but not yet implemented. They will enable:
// Create a branch (not yet implemented)
await db('myapp').branch('feature-x');
// Query at a point in time (not yet implemented)
const historical = db('myapp', { asOf: '2024-01-15T00:00:00Z' });
const oldData = await historical.collection('users').find({});Connect with mongosh or MongoDB Compass:
# Start the server
npx mongolake dev
# In another terminal
mongosh mongodb://localhost:27017/myapp// In mongosh
db.users.insertOne({ name: 'Alice', email: 'alice@example.com' })
db.users.find({ name: 'Alice' })
db.users.aggregate([{ $match: { status: 'active' } }])┌─────────────────────────────────────────────────────────────────┐
│ Clients │
├──────────┬──────────┬──────────┬──────────┬─────────────────────┤
│ mongosh │ Compass │ Drivers │ SDK │ DuckDB / Spark │
└────┬─────┴────┬─────┴────┬─────┴────┬─────┴──────────┬──────────┘
│ │ │ │ │
└────────┬─┴──────────┘ │ │
│ │ │
Wire Protocol (TCP) RPC (WS) Direct Parquet
│ │ │
▼ ▼ │
┌─────────────────┐ ┌──────────────────┐ │
│ mongolake CLI │ │ Cloudflare Worker│ │
│ (Bun binary) │ │ │ │
└────────┬────────┘ └────────┬─────────┘ │
│ │ │
└──────────┬───────────┘ │
│ │
┌────────▼─────────┐ │
│ Durable Object │ │
│ (WAL + Buffer) │ │
└────────┬─────────┘ │
│ │
┌────────▼─────────┐ │
│ R2 │◄────────────────────┘
│ Parquet + Iceberg │
└──────────────────┘
# Cloudflare (for managed service)
R2_DATA_CATALOG_TOKEN=xxx # Enables Iceberg mode
# Local development
MONGOLAKE_PATH=.mongolake # Local storage pathconst lake = new MongoLake({
// Storage backend
local: '.mongolake', // Local filesystem
// OR
bucket: env.R2_BUCKET, // Cloudflare R2
// OR
endpoint: 'https://s3.amazonaws.com', // S3-compatible
accessKeyId: '...',
secretAccessKey: '...',
// Database
database: 'myapp',
// Iceberg integration (auto-enabled if R2_DATA_CATALOG_TOKEN set)
iceberg: {
token: env.R2_DATA_CATALOG_TOKEN,
catalog: 'my-catalog',
},
// Schema configuration
schema: { /* ... */ },
});- Client API - Full CRUD operations (
insertOne,insertMany,findOne,find,updateOne,updateMany,deleteOne,deleteMany) - Parquet I/O - Read/write with hyparquet
- Variant encoding - Schema-less fields as Parquet variant
- Storage backends - Local, R2, S3 (optional), Memory
- Worker & Durable Object - Cloudflare deployment
- Wire Protocol Server - TCP server compatible with mongosh
- CLI dev command - Local development server with REST API
- B-tree Indexing - Automatic _id index, range queries
- Aggregation Pipeline -
$match,$group,$project,$sort,$limit,$skip,$unwind,$lookup - Change Streams - Real-time
watch()support - Iceberg Metadata - Manifest generation in Avro format
- Filter operators - Most comparison/logical operators (
$eq,$ne,$gt,$gte,$lt,$lte,$in,$nin,$exists,$regex,$not) - Update operators -
$set,$unset,$inc(basic)
- Branching & merging - Git-like database versioning
- Time travel - Point-in-time queries
- Full Iceberg catalog integration
Deploy MongoLake as a Cloudflare Worker with Durable Object persistence:
// worker.ts
import { MongoLakeWorker } from 'mongolake/worker';
import { MongoLakeDO } from 'mongolake/do';
export default MongoLakeWorker;
export { MongoLakeDO };# wrangler.toml
name = "my-mongolake"
[[r2_buckets]]
binding = "BUCKET"
bucket_name = "my-data"
[[durable_objects.bindings]]
name = "SHARD"
class_name = "MongoLakeDO"The Worker provides an RPC interface for MongoLake operations, while Durable Objects handle Write-Ahead Logging (WAL) and buffering.
MIT