Add Echo Rule watermark analysis for LLM learning article #8

johnzfitch · 2025-11-22T12:34:42Z

Add manual_analysis.py script for watermark detection when spaCy model unavailable
Include sample article text (LLM learning research) for analysis
Generate detailed JSON analysis output with 46 clause pairs analyzed
Result: LIKELY_HUMAN verdict with 0.209 final score (below 0.45 threshold)

- Add manual_analysis.py script for watermark detection when spaCy model unavailable - Include sample article text (LLM learning research) for analysis - Generate detailed JSON analysis output with 46 clause pairs analyzed - Result: LIKELY_HUMAN verdict with 0.209 final score (below 0.45 threshold)

Copilot

Pull request overview

This PR adds a manual Echo Rule watermark analysis capability for detecting AI-generated text when the full spaCy NLP model is unavailable. The implementation analyzes phonetic, structural, and semantic "echoes" at clause boundaries to determine if text exhibits watermark patterns characteristic of LLM-generated content.

Implements a standalone watermark detection script with fallback dependencies (cmudict, Levenshtein)
Includes sample analysis of an LLM learning research article with 46 clause pairs analyzed
Provides detailed JSON output showing LIKELY_HUMAN verdict (0.209 score below 0.45 threshold)

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 5 comments.

File	Description
scripts/manual_analysis.py	Core analysis script implementing phonetic, structural, and semantic echo detection with pattern matching algorithms
data/analysis_output.json	Generated analysis results with detailed scoring for 46 clause pairs and final classification verdict
data/analysis_input.txt	Sample input text (research article on LLM learning) used for watermark analysis demonstration

Comments suppressed due to low confidence (1)

scripts/manual_analysis.py:39

Import of 'np' is not used.

    import numpy as np

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-22T12:41:56Z

scripts/manual_analysis.py

+    input_file = Path('/home/user/specHO/data/analysis_input.txt')
+
+    if not input_file.exists():
+        print("Error: Input file not found")
+        sys.exit(1)
+
+    text = input_file.read_text()
+    print(f"Analyzing text ({len(text):,} characters)...")
+
+    # Run analysis
+    report = analyze_text(text)
+
+    # Print report
+    print_report(report, verbose=True)
+
+    # Save JSON report
+    output_file = Path('/home/user/specHO/data/analysis_output.json')


Hardcoded absolute paths make the script non-portable. Consider using relative paths or command-line arguments to allow users to specify input/output file locations.

Example fix:

import argparse def main(): parser = argparse.ArgumentParser(description='Manual Echo Rule Watermark Analysis') parser.add_argument('--input', default='data/analysis_input.txt', help='Input text file') parser.add_argument('--output', default='data/analysis_output.json', help='Output JSON file') args = parser.parse_args() input_file = Path(args.input) output_file = Path(args.output) # ...

Copilot · 2025-11-22T12:41:56Z

scripts/manual_analysis.py

+try:
+    import numpy as np
+    HAS_NUMPY = True
+except ImportError:
+    HAS_NUMPY = False


The numpy import is unused throughout the script. The HAS_NUMPY flag is set but never checked or used in any conditional logic. Consider removing this unused import to reduce dependencies.

Suggested change

try:

import numpy as np

HAS_NUMPY = True

except ImportError:

HAS_NUMPY = False

Copilot · 2025-11-22T12:41:56Z

scripts/manual_analysis.py

+from pathlib import Path
+from typing import List, Dict, Tuple, Optional
+from dataclasses import dataclass, field
+from collections import Counter


The Counter import from collections is unused in the script. Consider removing this import.

Suggested change

from collections import Counter

Copilot · 2025-11-22T12:41:57Z

scripts/manual_analysis.py

+    for sentence in sentences:
+        # Rule A: Punctuation-linked clauses (semicolon, em-dash, colon)
+        for sep in [';', ' – ', ': ', ' — ']:
+            if sep in sentence:
+                parts = sentence.split(sep)
+                for i in range(len(parts) - 1):
+                    if len(parts[i].strip()) > 10 and len(parts[i+1].strip()) > 10:
+                        pairs.append(create_clause_pair(parts[i], parts[i+1], sep))
+
+        # Rule B: Conjunction-linked clauses
+        conj_pattern = r',?\s*(but|and|or|yet)\s+'
+        matches = list(re.finditer(conj_pattern, sentence, re.IGNORECASE))
+        for match in matches:
+            before = sentence[:match.start()]
+            after = sentence[match.end():]
+            if len(before.strip()) > 10 and len(after.strip()) > 10:
+                pairs.append(create_clause_pair(before, after, match.group(1)))


The clause splitting logic can create duplicate pairs when a sentence contains multiple separators. For example, if a sentence has both a semicolon and an "and" conjunction, it may generate overlapping clause pairs. Consider deduplicating pairs or tracking already-processed text spans to avoid analyzing the same text multiple times.

Copilot · 2025-11-22T12:41:57Z

scripts/manual_analysis.py

+    with open(output_file, 'w') as f:
+        json.dump(report_dict, f, indent=2, default=str)
+
+    print(f"\nJSON report saved to: {output_file}")
+


Missing error handling for file I/O operations. If the output directory doesn't exist or if there are permission issues, the script will crash with an unhandled exception. Consider adding proper error handling:

try: output_file.parent.mkdir(parents=True, exist_ok=True) with open(output_file, 'w') as f: json.dump(report_dict, f, indent=2, default=str) print(f"\nJSON report saved to: {output_file}") except (IOError, PermissionError) as e: print(f"Error saving output file: {e}", file=sys.stderr) sys.exit(1)

Suggested change

with open(output_file, 'w') as f:

json.dump(report_dict, f, indent=2, default=str)

print(f"\nJSON report saved to: {output_file}")

try:

output_file.parent.mkdir(parents=True, exist_ok=True)

with open(output_file, 'w') as f:

json.dump(report_dict, f, indent=2, default=str)

print(f"\nJSON report saved to: {output_file}")

except (IOError, PermissionError) as e:

print(f"Error saving output file: {e}", file=sys.stderr)

sys.exit(1)

Copilot AI review requested due to automatic review settings November 22, 2025 12:34

johnzfitch merged commit 728dbdc into main Nov 22, 2025
3 checks passed

Copilot started reviewing on behalf of johnzfitch November 22, 2025 12:35 View session

Copilot finished reviewing on behalf of johnzfitch November 22, 2025 12:36

Copilot AI reviewed Nov 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Echo Rule watermark analysis for LLM learning article #8

Add Echo Rule watermark analysis for LLM learning article #8

Uh oh!

johnzfitch commented Nov 22, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 22, 2025

Uh oh!

Copilot AI Nov 22, 2025

Uh oh!

Copilot AI Nov 22, 2025

Uh oh!

Copilot AI Nov 22, 2025

Uh oh!

Copilot AI Nov 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add Echo Rule watermark analysis for LLM learning article #8

Add Echo Rule watermark analysis for LLM learning article #8

Uh oh!

Conversation

johnzfitch commented Nov 22, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants