-
Notifications
You must be signed in to change notification settings - Fork 0
Add Echo Rule watermark analysis for LLM learning article #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Echo Rule watermark analysis for LLM learning article #9
Conversation
johnzfitch
commented
Nov 22, 2025
- Add manual_analysis.py script for watermark detection when spaCy model unavailable
- Include sample article text (LLM learning research) for analysis
- Generate detailed JSON analysis output with 46 clause pairs analyzed
- Result: LIKELY_HUMAN verdict with 0.209 final score (below 0.45 threshold)
- Add manual_analysis.py script for watermark detection when spaCy model unavailable - Include sample article text (LLM learning research) for analysis - Generate detailed JSON analysis output with 46 clause pairs analyzed - Result: LIKELY_HUMAN verdict with 0.209 final score (below 0.45 threshold)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a manual watermark analysis script for detecting Echo Rule watermarks in text when spaCy models are unavailable. The script analyzes an LLM learning research article and generates a detailed JSON report with 46 clause pairs analyzed, concluding with a "LIKELY_HUMAN" verdict (score: 0.209).
Key Changes:
- Implements phonetic, structural, and semantic echo analysis for watermark detection
- Provides fallback analysis without requiring full spaCy NLP models
- Generates comprehensive JSON output with detailed scoring breakdown
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 11 comments.
| File | Description |
|---|---|
| scripts/manual_analysis.py | New script implementing Echo Rule watermark detection with phonetic, structural, and semantic analysis using cmudict, Levenshtein, and rule-based POS tagging |
| data/analysis_output.json | Generated JSON output containing analysis results for 46 clause pairs with detailed phonetic, structural, and semantic scores |
| data/analysis_input.txt | Sample input text about LLM learning research used for watermark analysis testing |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| sem_score, sem_details = analyze_semantic_echo(pair.zone_a, pair.zone_b) | ||
|
|
||
| # Combined score (using Tier 1 weights: 40% phonetic, 30% structural, 30% semantic) | ||
| combined = 0.4 * phon_score + 0.3 * struct_score + 0.3 * sem_score |
Copilot
AI
Nov 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same weights (0.4, 0.3, 0.3) are duplicated here. This creates a maintenance issue where changes to the weighting algorithm must be made in multiple places. Consider extracting these to module-level constants.
| if report.final_score >= 0.45: | ||
| report.verdict = "HIGH_PROBABILITY_AI" | ||
| report.confidence = min(0.95, 0.5 + report.final_score) | ||
| report.reasoning.append(f"High echo score ({report.final_score:.3f}) suggests Echo Rule watermark presence") | ||
| elif report.final_score >= 0.35: | ||
| report.verdict = "MODERATE_PROBABILITY_AI" | ||
| report.confidence = 0.3 + report.final_score | ||
| report.reasoning.append(f"Moderate echo score ({report.final_score:.3f}) - possible watermark presence") | ||
| elif report.final_score >= 0.25: | ||
| report.verdict = "LOW_PROBABILITY_AI" | ||
| report.confidence = 0.2 + report.final_score * 0.5 | ||
| report.reasoning.append(f"Low echo score ({report.final_score:.3f}) - unlikely watermark presence") | ||
| else: |
Copilot
AI
Nov 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The threshold values (0.45, 0.35, 0.25) for verdict determination are hardcoded magic numbers without clear documentation. These critical thresholds should be extracted as named constants (e.g., HIGH_PROBABILITY_THRESHOLD, MODERATE_PROBABILITY_THRESHOLD, LOW_PROBABILITY_THRESHOLD) with documentation explaining their basis in research or testing.
| if report.average_phonetic > 0.4: | ||
| report.reasoning.append(f"Elevated phonetic echoes ({report.average_phonetic:.3f}) - sound patterns at clause boundaries") | ||
| if report.average_structural > 0.5: | ||
| report.reasoning.append(f"Strong structural parallelism ({report.average_structural:.3f}) - similar grammatical patterns") | ||
| if report.average_semantic > 0.4: | ||
| report.reasoning.append(f"Semantic coherence ({report.average_semantic:.3f}) - related concepts across boundaries") |
Copilot
AI
Nov 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The threshold values (0.4, 0.5, 0.4) for generating specific observations about phonetic, structural, and semantic echoes are magic numbers. Extract these as named constants for clarity and consistency.
| def main(): | ||
| """Main entry point.""" | ||
| # Read input file | ||
| input_file = Path('/home/user/specHO/data/analysis_input.txt') |
Copilot
AI
Nov 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file paths are hardcoded with absolute paths to /home/user/specHO/. This reduces portability and will fail if the script is run in a different environment or by a different user. Consider using relative paths or making the paths configurable via command-line arguments or environment variables.
| """Check if a word is a content word (not a function word).""" | ||
| return word.lower() not in FUNCTION_WORDS and len(word) > 2 | ||
|
|
||
|
|
Copilot
AI
Nov 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function is_content_word filters words with len(word) > 2, which will exclude valid two-letter content words like "AI" (line 399), "us", "be", etc. While some of these are function words, this threshold may incorrectly filter legitimate short content words. Consider adjusting the logic to handle acronyms and meaningful short words appropriately.
| """Check if a word is a content word (not a function word).""" | |
| return word.lower() not in FUNCTION_WORDS and len(word) > 2 | |
| """Check if a word is a content word (not a function word). | |
| - Excludes words in FUNCTION_WORDS. | |
| - Allows acronyms (all uppercase, length >= 2). | |
| - Allows any word not in FUNCTION_WORDS, regardless of length. | |
| """ | |
| if word.lower() in FUNCTION_WORDS: | |
| return False | |
| if len(word) >= 2 and word.isupper(): | |
| return True # Acronym (e.g., "AI", "US") | |
| return True |
| print_report(report, verbose=True) | ||
|
|
||
| # Save JSON report | ||
| output_file = Path('/home/user/specHO/data/analysis_output.json') |
Copilot
AI
Nov 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file paths are hardcoded with absolute paths to /home/user/specHO/. This reduces portability and will fail if the script is run in a different environment or by a different user. Consider using relative paths or making the paths configurable via command-line arguments or environment variables.
| if len(parts[i].strip()) > 10 and len(parts[i+1].strip()) > 10: | ||
| pairs.append(create_clause_pair(parts[i], parts[i+1], sep)) |
Copilot
AI
Nov 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The clause length check (len(parts[i].strip()) > 10) uses a magic number 10. This threshold seems arbitrary and could filter out valid short clauses. Consider making this a named constant with a clear rationale, or adjusting the threshold based on word count rather than character count for more semantic meaning.
| if len(before.strip()) > 10 and len(after.strip()) > 10: | ||
| pairs.append(create_clause_pair(before, after, match.group(1))) |
Copilot
AI
Nov 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same magic number check (len(before.strip()) > 10 and len(after.strip()) > 10) is repeated. This should be extracted to a named constant for consistency and maintainability.
| final_echo = check_final_sounds(phonetics_a, phonetics_b) | ||
|
|
||
| # Combine scores (weighted) | ||
| combined = 0.4 * avg_sim + 0.3 * initial_echo + 0.3 * final_echo |
Copilot
AI
Nov 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The weights (0.4, 0.3, 0.3) for phonetic, structural, and semantic scores are hardcoded magic numbers. These should be extracted as named constants (e.g., PHONETIC_WEIGHT, STRUCTURAL_WEIGHT, SEMANTIC_WEIGHT) for better maintainability and documentation of the scoring algorithm.
| HAS_LEVENSHTEIN = False | ||
|
|
||
| try: | ||
| import numpy as np |
Copilot
AI
Nov 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'np' is not used.
| import numpy as np |