Skip to content

IgorAmi52/NoSQL-Engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

133 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

NoSQL Engine

Table of Contents


Introduction

NoSQL Engine is a high-performance, production-ready NoSQL database engine written in Go. Designed for speed, reliability, and scalability, it implements modern database technologies including LSM trees, advanced caching, and comprehensive data integrity features.

🎯 Perfect For

  • High-throughput applications requiring fast writes and efficient reads
  • Time-series data and append-heavy workloads
  • Caching layers and session storage systems
  • IoT data collection and real-time analytics
  • Microservices requiring embedded database capabilities

πŸ—οΈ Architecture Highlights

Our NoSQL engine draws inspiration from industry leaders like Apache Cassandra, RocksDB, and DynamoDB, implementing:

  • Log-Structured Merge (LSM) Trees for optimal write performance
  • Multi-level caching with LRU eviction policies
  • Advanced indexing with Bloom filters and Merkle trees
  • ACID compliance through Write-Ahead Logging (WAL)
  • Horizontal scalability with user-based data partitioning

Features

πŸš€ Core Features

  • πŸ—ƒοΈ SSTable-Based Storage: Efficient immutable storage with LSM tree architecture
  • πŸ“ Write-Ahead Logging (WAL): Protects data integrity by logging changes before committing them to disk
  • 🧠 Advanced Memtable System: Configurable in-memory storage with multiple concurrent instances
  • ⚑ Data Caching & Block Management: LRU-based block cache for optimal performance
  • πŸ” Bloom Filters: Probabilistic data structures for fast key existence checks
  • 🌳 Merkle Trees: Data integrity verification and consistency checks
  • πŸ—œοΈ SSTable Compaction: Automated background compaction with configurable thresholds

🎯 Query & Data Access Features

  • πŸ”§ Multi-User Support: User-based data isolation and access control
  • πŸ”„ Prefix Iteration: Efficient prefix-based key scanning and iteration
  • πŸ“„ Range Queries: Support for key range scanning operations
  • πŸ—‘οΈ Tombstone Deletion: Proper deletion handling with tombstone markers
  • βš–οΈ Rate Limiting: Token bucket algorithm for request throttling

πŸ› οΈ Advanced Features

  • πŸ“Š Real-time Statistics: Engine performance metrics and monitoring
  • πŸŽ›οΈ Configurable Architecture: Extensive configuration options for all components
  • πŸ”§ CLI Interface: Beautiful command-line interface with interactive operations
  • πŸ§ͺ Comprehensive Testing: Full integration and unit test suite
  • πŸ“ˆ LSM Tree Levels: Multi-level storage optimization for read/write performance

Installation

To get started with this NoSQL engine, follow the steps below to install and set it up on your system.

Prerequisites

  • Go 1.19+ (Golang) Make sure Go is installed on your system. You can check your Go version by running:

    go version

    If Go is not installed, download it from the official Go website.

  • Git for cloning the repository

  • Minimum 4GB RAM for optimal performance

  • SSD storage recommended for best I/O performance

πŸ“¦ Steps to Install

  1. Clone the repository

    git clone https://github.com/IgorAmi52/NoSQL-Engine.git
  2. Navigate to the project directory

    cd NoSQL-Engine
  3. Install dependencies

    go mod tidy
  4. Build the engine

    go build -o nosql-engine ./cmd
  5. Run the CLI

    ./nosql-engine

🐳 Docker Support (Optional)

# Build Docker image
docker build -t nosql-engine .

# Run in container
docker run -it --rm nosql-engine

Usage

πŸš€ Getting Started

The NoSQL Engine provides a beautiful command-line interface and programmatic API for database operations.

πŸ“‹ Complete Usage Guide

For detailed usage instructions, examples, and advanced operations, please see our comprehensive usage guide:

πŸ‘‰ CLI_USAGE.md

This guide covers:

  • Interactive CLI commands and examples
  • Programmatic API usage
  • Integration testing procedures
  • Performance optimization tips
  • Troubleshooting and best practices

πŸƒ Quick Start

# Build and run the CLI
go build -o nosql-engine ./cmd
./nosql-engine

# Or run directly
go run ./cmd

Architecture Overview

The NoSQL Engine implements a Log-Structured Merge (LSM) Tree architecture designed for high-throughput write operations and efficient reads. The system separates write and read paths to optimize performance for different access patterns.

Write Path ✍️:

Optimized for fast data ingestion and durability guarantees.

write path

1. Write-Ahead Log (WAL)

  • When a user sends a PUT or DELETE request, it is first logged in the Write-Ahead Log (WAL)
  • WAL ensures durability by persisting operations before applying them to in-memory structures
  • Implements segmented logging with fixed-size segments containing a configurable number of records
  • Each WAL record includes CRC for data integrity verification
  • WAL segments cannot be deleted until data is permanently persisted in SSTables

2. Memtable

  • After WAL confirms the write, data is added to the Memtable - a strictly in-memory structure
  • Implemented as a hash map with configurable maximum size (specified by number of elements)
  • When the predefined Memtable size is reached, values are sorted by key and a new SSTable is created on disk
  • During system startup, Memtable is populated with records from WAL for crash recovery

3. SSTable Creation & Compaction

  • Sorted data from Memtable is written to disk as immutable SSTables
  • After SSTable creation, the system checks if compaction conditions are met
  • Size-tiered compaction algorithm is triggered when thresholds are exceeded
  • Compactions on one level can trigger compactions on subsequent levels in the LSM tree

4. Block Manager

  • Manages all disk I/O operations using fixed-size blocks (4KB, 8KB, or 16KB)
  • All file access must go through the Block Manager layer
  • Supports block-level reading and writing with configurable block sizes
  • Integrates with Block Cache for optimized performance

Read Path πŸ“–:

Multi-level search strategy optimized for fast data retrieval.

read path

Read Operation Flow:

1. Memtable Check

  • When a user sends a GET request, first check if the record exists in the Memtable
  • If found, return the result immediately (fastest path)

2. Cache Layer Check

  • If not in Memtable, check the Cache structure (LRU-based block cache)
  • Block cache consists of a doubly linked list storing block data and a hash map for constant-time access
  • If found in cache, return the result

3. SSTable Traversal

  • Check SSTables one by one, starting from the most recent
  • For each SSTable, load its Bloom Filter into memory and query for key presence
  • If Bloom Filter indicates the key is definitely not present, skip to the next SSTable
  • If the key might be present, check additional structures in the current SSTable

4. LSM Tree Level Traversal

  • SSTable candidates are determined based on the selected compaction algorithm
  • After unsuccessfully searching all SSTable candidates on one LSM tree level, move to the next level
  • Process repeats until the key is found or the last level is reached

5. SSTable Internal Search

  • Summary Structure: Check if the key falls within Summary ranges (loaded in memory)
  • If within range, find the position in the Index structure to access
  • Index Structure: Find the position in the Data structure from which to read the record
  • Data Structure: Read the actual value and return the response to the user

Data Storage Components

The NoSQL Engine implements a sophisticated storage layer with multiple components working together to ensure data durability, integrity, and performance.

πŸ—‚οΈ Write-Ahead Log (WAL)

WAL provides ACID compliance and crash recovery capabilities:

wal structure

Key Features:

  • Segmented logging: Each segment contains a fixed number of records (user-configurable)
  • Data integrity: Every WAL record includes CRC fields for corruption detection
  • Sequential access: WAL records are read from disk one by one, not loaded entirely into memory
  • Durability guarantee: WAL segments cannot be deleted until data is persisted in SSTables
  • Crash recovery: On system startup, Memtable is reconstructed from WAL records

WAL Record Structure:

  • Timestamp, Key Size, Value Size, Key, Value, CRC, and operation type fields
  • Fixed-length segments with configurable block-based sizing
  • Supports record fragmentation when necessary, with padding applied where needed

🧠 Memtable

In-memory structure optimized for fast writes and reads:

  • Implementation: Hash map-based structure for O(1) access time
  • Configurable size: Maximum number of elements specified by user
  • Write operations: All PUT/DELETE operations first go to Memtable after WAL
  • Crash recovery: Automatically populated from WAL segments during startup
  • Flush trigger: When maximum size reached, data is sorted and written to SSTable

πŸ“Š SSTable (Sorted String Table)

Immutable disk-based storage with multiple specialized components:

index

SSTable Components:

1. Data Structure

  • Stores actual key-value pairs in sorted order
  • Structure can be identical to WAL records or optimized format
  • Accessed block by block (cannot load entire structure into memory)
  • Supports tombstone markers for deleted keys

2. Filter (Bloom Filter)

  • Loaded into memory during read operations
  • Probabilistic data structure for all keys in the Data structure
  • Eliminates unnecessary disk seeks for non-existent keys
  • Configurable false positive rate

3. Index Structure

  • Maps every key to its corresponding offset in Data structure
  • Contains key and offset pairs for efficient lookups
  • Accessed block by block to manage memory usage
  • Critical for translating key searches to exact data locations

4. Summary Structure

  • Sparse index for the Index structure (loaded into memory)
  • Contains boundaries: minimum and maximum key values
  • Configurable sparsity level (e.g., every 5th Index entry)
  • Enables quick range determination before Index access

5. Metadata (Merkle Tree)

  • Data integrity verification for all values in Data structure
  • User can initiate validation operations to detect corruption
  • System identifies if and where modifications occurred in data structure
  • Essential for distributed system consistency checks

πŸ”§ LSM Tree Organization & Compaction

Multi-level storage optimization for balanced read/write performance:

  • LSM Tree Levels: User-configurable maximum number of levels
  • Size-tiered Compaction: When compaction conditions are met, algorithm merges SSTables
  • Level Triggering: Compactions on one level can cascade to subsequent levels
  • Background Process: Compaction runs automatically based on configurable thresholds
  • Performance Optimization: Reduces read amplification by merging overlapping key ranges

![index](/assets/lsm tree.png)

πŸ“ Storage Configuration Options

Flexible storage formats to suit different use cases:

  • Single-file SSTable: All components stored in one file
  • Multi-file SSTable: Each component (Data, Index, Summary, Filter, Metadata) in separate files
  • Backward Compatibility: Configuration changes don't affect reading existing SSTables
  • Block-based Access: All large structures accessed via configurable block sizes (4KB, 8KB, 16KB)

πŸ” SSTable Read Operation Details

Memory vs Disk Components During Reads:

🧠 In-Memory Components (loaded during read operations):

  • Summary: Sparse key-to-offset mapping for quick Index positioning
  • Metadata: SSTable configuration and management information
  • Bloom Filter: Probabilistic key existence checking to avoid unnecessary disk access
  • Merkle Tree: Data integrity verification and consistency validation

πŸ’½ On-Disk Components (accessed on-demand):

  • Index: Complete key-to-data-offset mapping (accessed via Summary positioning)
  • Data: Actual key-value pairs (accessed via exact Index offsets)

⚑ Optimized Access Process:

  1. Filter Check: Bloom filter quickly determines if key might exist
  2. Summary Consultation: If filter indicates possible match, Summary provides Index position
  3. Index Lookup: Exact data offset retrieved from Index structure
  4. Data Retrieval: Direct access to required data without full file scanning

This multi-layered approach minimizes disk I/O operations and significantly enhances read performance through strategic caching and probabilistic filtering.


Configuration

πŸ”§ Engine Configuration

The NoSQL engine is highly configurable through the src/config/config.json file. Key configuration options include:

Performance Settings

  • Block Size: Configurable block size for optimal I/O performance
  • Memtable Size & Count: Control memory usage and flush frequency
  • WAL Buffer Size: Write-ahead log buffer configuration
  • Compaction Threshold: Automatic SSTable compaction triggers

LSM Tree Configuration

  • LSM Levels: Number of storage levels for optimal read/write balance
  • Compaction Strategy: Background compaction settings

Filter & Index Settings

  • Bloom Filter: False positive rate and expected element count
  • Skip List Levels: In-memory index structure optimization
  • Prefix Scan: Min/max prefix length for efficient scanning

Rate Limiting

  • Token Bucket: Request throttling with configurable refill rates
  • Max Tokens: Burst capacity for handling traffic spikes

Storage Configuration

  • Tombstone Marker: Configurable deletion marker
  • WAL Segment Size: Write-ahead log segment management

Example configuration structure:

{
  "BLOCK_SIZE": 4096,
  "MEMTABLE_SIZE": 1000,
  "LSM_LEVELS": 3,
  "COMPACTION_THRESHOLD": 4,
  "BLOOM_FILTER_FALSE_POSITIVE_RATE": 0.01,
  "TOKEN_REFILL_RATE": 0.1,
  "MAX_TOKEN": 1000
}

For complete configuration options, see the src/config/config.json file.


License

This project is licensed under the MIT License. You are free to use, modify, and distribute this software under the terms of the MIT License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages