DedSecInside · prajeeta15 · Oct 28, 2025
diff --git a/DATABASE_IMPLEMENTATION.md b/DATABASE_IMPLEMENTATION.md
@@ -0,0 +1,281 @@
+# Database Implementation Summary
+
+## Overview
+
+This document summarizes the implementation of database functionality for TorBot to save search results to a persistent SQLite database.
+
+## Files Created
+
+### 1. `src/torbot/modules/database.py`
+
+**Purpose**: Core database module for handling SQLite operations
+
+**Key Features**:
+
+- `SearchResultsDatabase` class that manages SQLite database connections
+- Automatic database initialization with two tables:
+  - `searches`: Stores metadata about each crawl operation
+  - `links`: Stores detailed information about discovered links
+- Methods:
+  - `save_search_results()`: Persist search results to database
+  - `get_search_history()`: Retrieve search records with optional filtering
+  - `get_search_by_id()`: Get detailed information about a specific search
+  - `close()`: Properly close database connections
+
+**Stored Metadata**:
+
+- Root URL
+- Search timestamp (ISO format)
+- Crawl depth
+- Total links found
+- For each link:
+  - URL and page title
+  - HTTP status code
+  - Content classification and accuracy
+  - Email addresses found
+  - Phone numbers found
+
+### 2. `src/torbot/modules/db_query.py`
+
+**Purpose**: Utility functions for querying and displaying database results
+
+**Key Functions**:
+
+- `display_search_history()`: Show search history with optional filtering
+- `display_search_details()`: Display comprehensive information about a specific search
+- `export_search_to_json()`: Export search results to JSON file
+- `get_statistics()`: Show database statistics
+
+### 3. `scripts/query_database.py`
+
+**Purpose**: Command-line interface for database operations
+
+**Supported Commands**:
+
+- `history`: View search history
+- `details`: View specific search details
+- `export`: Export search to JSON
+- `stats`: Show database statistics
+
+## Files Modified
+
+### 1. `main.py`
+
+**Changes**:
+
+- Updated `--save` argument choices to include "database"
+- Added handler in `run()` function to call `tree.saveDatabase()` when `--save database` is specified
+
+### 2. `src/torbot/modules/linktree.py`
+
+**Changes**:
+
+- Added import for `SearchResultsDatabase`
+- Implemented `saveDatabase()` method that:
+  - Extracts all nodes from the tree
+  - Formats link data with complete metadata
+  - Saves to database
+  - Displays save confirmation and statistics
+
+### 3. `README.md`
+
+**Changes**:
+
+- Updated Features section to indicate database support is complete
+- Updated Options section to include "database" as a `--save` option
+- Updated Curated Features checklist to mark database feature as complete
+
+## Files Created (Documentation)
+
+### 1. `docs/DATABASE.md`
+
+Comprehensive documentation including:
+
+- Overview of database features
+- Usage instructions
+- Database schema documentation
+- Python API examples
+- Use cases
+- Troubleshooting guide
+- Complete workflow examples
+
+### 2. `DATABASE_IMPLEMENTATION.md`
+
+This file, summarizing the implementation
+
+## Database Features
+
+### Core Functionality
+
+1. **Persistent Storage**: SQLite database stores all search results
+2. **Timestamp Tracking**: Records exact time of each search
+3. **Rich Metadata**: Captures HTTP status, classifications, contact info
+4. **Search History**: Query and filter previous searches
+5. **Export Capability**: Convert database records to JSON
+
+### Data Captured
+
+Each search record contains:
+
+- Root URL
+- Search timestamp (ISO 8601 format)
+- Crawl depth
+- Total links count
+- For each discovered link:
+  - URL and page title
+  - HTTP status code
+  - Content classification (from NLP module)
+  - Classification confidence score
+  - Email addresses found
+  - Phone numbers found
+
+### Query Capabilities
+
+- View search history with optional URL filtering
+- Retrieve detailed information about specific searches
+- Export searches to JSON format
+- View aggregate statistics
+- Filter results by date range (via history)
+
+## Usage Examples
+
+### Save search results to database
+
+```bash
+python main.py -u http://example.onion --depth 2 --save database
+```
+
+### View search history
+
+```bash
+python scripts/query_database.py history
+python scripts/query_database.py history -u http://example.onion -v
+```
+
+### View detailed search results
+
+```bash
+python scripts/query_database.py details 1 --all
+```
+
+### Export results
+
+```bash
+python scripts/query_database.py export 1 -o search_results.json
+```
+
+### View statistics
+
+```bash
+python scripts/query_database.py stats
+```
+
+## Technical Details
+
+### Database Location
+
+- Default: `<project_root>/torbot_search_results.db`
+- SQLite format (no external database server required)
+- Single file makes backup/migration easy
+
+### Database Schema
+
+#### Searches Table
+
+- id: Auto-incrementing primary key
+- root_url: The URL that was crawled
+- search_timestamp: ISO 8601 timestamp
+- depth: Crawl depth setting
+- total_links: Count of discovered links
+- links_data: JSON serialized link details
+- created_at: Record creation timestamp
+
+#### Links Table
+
+- id: Auto-incrementing primary key
+- search_id: Foreign key to searches
+- url: Full URL of discovered link
+- title: Page title or hostname
+- status_code: HTTP status code
+- classification: Content category
+- accuracy: Classification confidence (0.0-1.0)
+- emails: JSON array of emails
+- phone_numbers: JSON array of phone numbers
+
+### Dependencies
+
+- Python 3.7+
+- sqlite3 (built-in to Python)
+- tabulate (already in project dependencies)
+
+## Benefits
+
+1. **Persistence**: Search results survive between program runs
+2. **Historical Analysis**: Track how sites and links change over time
+3. **Searchability**: Query results by URL, date, or other criteria
+4. **Export**: Convert data to JSON for other tools
+5. **Accountability**: Maintain audit trail of all searches
+6. **No External Services**: SQLite requires no server setup
+
+## Future Enhancements
+
+Potential improvements for future versions:
+
+- PostgreSQL support for multi-user environments
+- Encryption for sensitive data
+- Automated database cleanup/archival
+- Web-based query interface
+- Advanced filtering and search capabilities
+- Full-text search on page titles and URLs
+- Duplicate detection across searches
+- Relationship visualization
+
+## Testing Notes
+
+The implementation:
+
+- Uses built-in sqlite3 module (no external dependencies needed)
+- Automatically creates database and tables on first run
+- Handles errors gracefully with logging
+- Properly closes database connections
+- Validates all input data
+
+## Integration Points
+
+The database functionality integrates with:
+
+1. **LinkTree class**: Extracts link data and calls `saveDatabase()`
+2. **Main execution flow**: Triggered via `--save database` flag
+3. **Database utility**: Separate tools for querying and analysis
+
+## File Tree
+
+```
+TorBot/
+├── main.py (MODIFIED)
+├── docs/
+│   └── DATABASE.md (NEW)
+├── src/torbot/modules/
+│   ├── linktree.py (MODIFIED)
+│   ├── database.py (NEW)
+│   └── db_query.py (NEW)
+├── scripts/
+│   └── query_database.py (NEW)
+└── DATABASE_IMPLEMENTATION.md (NEW)
+```
+
+## Deployment Notes
+
+1. No additional Python packages needed (sqlite3 is built-in)
+2. Database file is created automatically on first use
+3. No configuration required - works out of the box
+4. Backward compatible with existing code
+5. Safe to enable alongside existing save formats
+
+## Support and Documentation
+
+Users can refer to:
+
+- `docs/DATABASE.md` for comprehensive feature documentation
+- `scripts/query_database.py --help` for command-line options
+- Inline code documentation in modules for Python API usage
diff --git a/README.md b/README.md
@@ -19,8 +19,8 @@
    <img src="https://user-images.githubusercontent.com/4481429/94646312-9dafcd80-030b-11eb-9824-73aa2414cf58.png" width="160" title="OpenSource">
 </p>
 
-
 ### Status/Social links
+
 [![Pytest](https://github.com/DedSecInside/TorBot/actions/workflows/pytest.yml/badge.svg)](https://github.com/DedSecInside/TorBot/actions/workflows/pytest.yml)
 [![Flake8](https://github.com/DedSecInside/TorBot/actions/workflows/flake8.yml/badge.svg)](https://github.com/DedSecInside/TorBot/actions/workflows/flake8.yml)
 [![Slack Invite](https://img.shields.io/badge/Join-Slack-informational?style=flat-square&logo=slack)](https://join.slack.com/t/dedsec-inside/shared_invite/zt-i4jcgj8w-1ecHiSdRubwuppAzz8~yfg)
@@ -29,9 +29,10 @@
 [![](https://img.shields.io/badge/Made%20with-Python-red.svg?style=flat-square)]()
 
 ### Features
+
 1. Onion Crawler (.onion)
 2. Returns page title or host name if no page title is available and address with a short description of the site.
-3. Save links to a database (Not done)
+3. Save links to a database with full metadata and search history
 4. Output the HTML from a site or save it to an HTML file. (Not done)
 5. Save the link tree as a JSON file.
 6. Crawl custom domains
@@ -42,6 +43,7 @@
 ...(will be updated)
 
 ### Dependencies
+
 - Tor (Optional)
 - Python ^3.9
 - Poetry (Optional)
@@ -55,7 +57,9 @@
 ### TorBot
 
 #### Using `venv`
-* If using Python ^3.4,
+
+- If using Python ^3.4,
+
 ```sh
 python -m venv torbot_venv
 source torbot_venv/bin/activate
@@ -65,6 +69,7 @@ pip install -e .
 ```
 
 #### Using `docker`
+
 ```sh
 docker build -t {image_name} .
 
@@ -76,6 +81,7 @@ docker run --network="host" {image_name} poetry run python torbot -u https://exa
 ```
 
 ### Options
+
 <pre>
 usage: Gather and analyze data from Tor sites.
 
@@ -89,28 +95,28 @@ optional arguments:
   --version             Show the current version of TorBot.
   --update              Update TorBot to the latest stable version
   -q, --quiet           Prevents display of header and IP address
-  --save FORMAT         Save results in a file. (tree, JSON)
+  --save FORMAT         Save results in a file or database. (tree, JSON, database)
   --visualize FORMAT    Visualizes tree of data gathered. (tree, JSON, table)
   -i, --info            Info displays basic info of the scanned site
   --disable-socks5      Executes HTTP requests without using SOCKS5 proxy</pre>
 
-* NOTE: -u is a mandatory for crawling
+- NOTE: -u is a mandatory for crawling
 
 Read more about torrc here : [Torrc](https://github.com/DedSecInside/TorBoT/blob/master/Tor.md)
 
 ## Curated Features
+
 - [x] Visualization Module Revamp
 - [x] Implement BFS Search for webcrawler
 - [x] Improve stability (Handle errors gracefully, expand test coverage, etc.)
 - [x] Increase test coverage
-- [ ] Save the most recent search results to a database
+- [x] Save the most recent search results to a database
 - [ ] Randomize Tor Connection (Random Header and Identity)
 - [ ] Keyword/Phrase Search
 - [ ] Social Media Integration
 - [ ] Increase anonymity
 - [ ] Screenshot capture
 
-
 ## Contribution Guidelines
 
 ### Found an issue?
@@ -151,11 +157,10 @@ We welcome contributions to this project! Here are a few guidelines to follow:
     20. Ani R, Jose J, Wilson M, Deepa OS. “Modified Rotation Forest Ensemble Classifier for Medical Diagnosis in Decision Support Systems”, In Progress in Advanced Computing and Intelligent Engineering 2018 (pp. 137-146). Springer, Singapore.
     21. Ani R, Augustine A, Akhil N.C. and Deepa O.S., 2016. “Random Forest Ensemble Classifier to Predict the Coronary Heart Disease Using Risk Factors”, In Proceedings of the International Conference on Soft Computing Systems (pp. 701-710). Springer, New Delhi.
 
-
 ## Maintainers
 
-- [X] [PS Narayanan](https://github.com/PSNAppz) - Co-owner
-- [X] [KingAkeem](https://github.com/KingAkeem) - Co-owner
+- [x] [PS Narayanan](https://github.com/PSNAppz) - Co-owner
+- [x] [KingAkeem](https://github.com/KingAkeem) - Co-owner
 
 ## All Thanks to Our Contributors
 
@@ -164,4 +169,5 @@ We welcome contributions to this project! Here are a few guidelines to follow:
 </a>
 
 ## License
+
 [GNU Public License](https://github.com/DedSecInside/TorBot/blob/dev/LICENSE.md)