A powerful, fast, and accurate system for extracting logging templates from Java codebases. This tool statically analyzes Java source files to create template patterns with <*> placeholders for variable content, enabling efficient log analysis and debugging workflows.
- π Advanced Pattern Recognition: Supports SLF4J, String.format, concatenation, StringBuilder, and method call patterns
- π§ Inter-Procedural Analysis: Traces method calls to extract meaningful patterns from complex logging scenarios
- πΏ Branch-Aware Extraction: Handles conditional log message construction with configurable variant limits
- β‘ Parallel Processing: Multi-worker file processing optimized for large codebases
- π Smart Output Management: Automatic timestamped output files organized in dedicated folders
- π― Comprehensive Matching: Trie-based matcher for efficient runtime log-to-template matching (Under Development)
- π§ͺ Robust Testing: Comprehensive test suite with integration tests
# Clone the repository
git clone <repository-url>
cd COCA_RCA
# Install dependencies
pip install -r requirements.txt# Extracts to output_templates/myproject_20250916_143022.jsonl
python extract_templates.py --src /path/to/java/project
# Extract from current directory
python extract_templates.py --src .# Custom output path with exclusions
python extract_templates.py \
--src /path/to/kafka \
--out kafka_templates.jsonl \
--exclude '*/test/*' '*/examples/*' \
--workers 8// Simple placeholders
log.info("User {} logged in from {}", username, ipAddress);
logger.error("Failed to process {} records", count);
// Marker-based logging
logger.warn(marker, "Connection timeout for {}", host);
logger.log(Level.INFO, "Processing {} items", itemCount);// Direct format calls
log.debug(String.format("Processing file %s (%d bytes)", filename, size));
// Variable assignment
String message = String.format("Error in %s: %s", component, error);
logger.error(message);
// Concatenated format strings
String msg = String.format("Part 1: %s " + "Part 2: %d", value1, value2);// Simple concatenation
log.info("Started processing " + filename + " at " + timestamp);
// Complex expressions
logger.error("Failed to connect to " + host + ":" + port +
" after " + attempts + " attempts");// Method calls as log arguments
log.error("Error occurred: {}", exception.getMessage());
logger.debug("Event details: {}", formatEventDetails(event));
// Custom method results
String details = buildErrorMessage(error, context);
log.error(details);
// The system traces into these methods to extract meaningful patterns!// Class constants
private static final String ERROR_MSG = "System failure occurred";
log.error(ERROR_MSG + ": {}", details);
// Method parameters
public void logError(String message, Exception ex) {
log.error(message, ex); // Extracts as <param:message>
}
// Lambda expressions
events.forEach(event -> log.debug("Processing: {}", event.getId()));Templates are saved as JSONL (JSON Lines) with detailed metadata:
{
"template_id": "a1b2c3d4e5f6g7h8",
"pattern": "User <*> logged in from <*>",
"static_token_count": 4,
"location": {
"file_path": "/src/main/java/com/example/UserService.java",
"class_name": "UserService",
"method_name": "handleLogin",
"line_number": 45
},
"level": "info",
"branch_variant": 0
}Extract log templates from Java source code.
Usage: extract_templates.py [OPTIONS]
Options:
-s, --src DIRECTORY Source repository root directory [required]
-o, --out PATH Output JSONL file (default: auto-generated)
-i, --include TEXT Include patterns (default: *.java)
-e, --exclude TEXT Exclude patterns (can specify multiple)
-w, --workers INTEGER Number of parallel workers (default: auto)
--max-variants INTEGER Maximum template variants per logging site
-v, --verbose Enable verbose output
--help Show help messageExamples:
# Basic extraction (creates output_templates/myproject_YYYYMMDD_HHMMSS.jsonl)
python extract_templates.py --src /path/to/project
# With exclusions and custom workers
python extract_templates.py --src . --exclude '*/test/*' --workers 4
# Verbose output with custom filename
python extract_templates.py --src . --out my_templates.jsonl -vMatch runtime log lines against extracted templates.
Run the comprehensive test suite:
# Run all tests
python run_tests.py
# Run specific test module
python run_tests.py test_templating
# Run with verbose output
python run_tests.py --verboseTest Coverage:
- Template extraction from various Java patterns
- Inter-procedural analysis scenarios
- Trie-based matching algorithms
- Integration tests with real Java code samples
JavaLogExtractor: Main extraction engine with tree-sitter Java parsingLogTemplateBuilder: Template rule engine supporting multiple logging frameworksIntraproceduralSlicer: Backward slicing for variable definition trackingTemplateTrie: Efficient trie-based matching structure (Under Development)- Template Rules: Pluggable rules for different logging patterns (SLF4J, String.format, etc.)
- Template Extraction: Uses tree-sitter to parse Java AST and identify logging calls
- Backward Slicing: Traces variable definitions within method scope to reconstruct log messages
- Inter-Procedural Analysis: Follows method calls to extract patterns from called methods
- Branch-Aware Processing: Handles conditional logging with variant limits
- Trie Matching: Efficient runtime matching using token-based trie structure
- Large Codebases: Tested on Apache Kafka and Apache ZooKeeper
- Parallel Processing: Scales with available CPU cores
- Memory Efficient: Streaming processing for large log files
- Fast Matching: Sub-millisecond template matching for typical log lines
max_branch_variants: Limit template variants per logging site (default: 16)parallel_workers: Number of processing threads (default: auto-detect)- Include/exclude patterns for file filtering
The system automatically organizes outputs:
project/
βββ output_templates/
β βββ kafka_20250916_143022.jsonl # Auto-generated timestamp
β βββ zookeeper_20250916_150430.jsonl # Multiple extractions
β βββ myproject_20250916_163015.jsonl
βββ matches/
β βββ server_matches.csv
β βββ application_matches.jsonl
βββ logs/
βββ server.log
βββ application.log
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with tree-sitter for robust Java parsing
- Inspired by log analysis research in software engineering
- Designed for practical use in large-scale software systems