Skip to content

Conversation

@github-actions
Copy link
Contributor

Analyzing changes...

Commits:

  • feat(linguist): implement Phase 2 auto-generation infrastructure
  • feat: add Option 2 - Extended Linguist Integration with File Classification
  • feat: align registry with GitHub Linguist as authoritative source

Changed Files:

.github/workflows/docs.yml | 106 ----------------
Cargo.lock | 2 +-
Cargo.toml | 2 +-
LINGUIST_INTEGRATION.md | 261 ++++++++++++++++++++++++++++++++++++++
build.rs | 60 ++++++++-
examples/usage.rs | 7 +-
flake.lock | 17 +++
justfile | 13 ++
renovate.json5 | 44 +++++++
scripts/sync_linguist_patterns.py | 219 ++++++++++++++++++++++++++++++++
src/file_classifier.rs | 242 +++++++++++++++++++++++++++++++++++
src/lib.rs | 4 +
src/metadata.rs | 26 ++--
src/registry.rs | 215 +++++++++++++++++++++++--------
src/utils.rs | 8 +-
tools/linguist_sync.rs | 148 +++++++++++++++++++++
16 files changed, 1192 insertions(+), 182 deletions(-)

Detailed Changes:

mikkihugo and others added 3 commits November 12, 2025 18:41
- Add `supported_in_singularity` flag (defaults to false, explicitly true for our 24 languages)
- Add `language_type` field aligned with Linguist's classification
- Update all 24 language registrations with new fields
- Source of truth: <https://github.com/github-linguist/linguist/blob/main/lib/linguist/languages.yml>

## Governance Model
Language definitions now follow GitHub Linguist's standard:
- Prevents ad-hoc language additions
- Ensures consistency across ecosystem
- Automatic tracking via Renovate (weekly)

## Build Script Enhancement
Updated build.rs with future capability for:
- Automatic Linguist languages.yml synchronization
- Code generation from Linguist definitions
- Auto-update when Linguist adds new languages

## Renovate Configuration
- New rule to track Linguist releases (weekly)
- Labels: linguist, language-registry
- Manual review for language definition changes

This prepares Singularity for scalable language support while
maintaining explicit governance over what's actually supported.

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
…cation

## What's New

FileClassifier Module: Detect vendored, generated, and binary files
- Uses patterns from GitHub Linguist (vendor.yml, generated.rb)
- Supports: vendored detection, generated file detection, binary detection
- Methods: is_vendored(), is_generated(), is_binary(), classify(), should_analyze()

Phase 1: Language Definitions - DONE
- Languages synced from Linguist languages.yml
- supported_in_singularity flag for explicit support
- Weekly Renovate alerts

Phase 2: File Classification - READY
- FileClassifier implementation complete
- Ready to auto-generate from Linguist patterns
- Supports: vendor paths, generated extensions, binary formats, documentation markers

Phase 3: Detection Heuristics - PLANNED
- Future: Auto-generate from Linguist heuristics.yml
- Fallback language detection for ambiguous extensions

New Files:
- src/file_classifier.rs: File classification engine
- LINGUIST_INTEGRATION.md: Complete documentation
- Updated build.rs: 3-phase roadmap
- Updated renovate.json5: Enhanced PR instructions

Benefits:
✅ Skip vendored code (node_modules/, vendor/)
✅ Skip generated files (.pb.rs, .generated.ts, etc.)
✅ Skip binary files (images, archives, executables)
✅ Auto-updated with Linguist releases
✅ Reduces false positives in code analysis

Testing: All tests pass, Clippy and fmt clean

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
Phase 2 Implementation: Auto-generate File Classification Patterns

New Files Added:

scripts/sync_linguist_patterns.py (200+ lines)
- Downloads vendor.yml from Linguist
- Downloads generated.rb from Linguist
- Parses YAML and Ruby code
- Extracts vendored, generated, and binary file patterns
- Generates Rust code arrays for FileClassifier

tools/linguist_sync.rs (130+ lines)
- Rust implementation roadmap
- Pattern parsing architecture
- Code generation infrastructure

Updated Files:

build.rs: Enhanced documentation
- Added manual synchronization workflow
- Documented automated (future) workflow
- Phase 2 in-progress status
- Maintenance instructions

justfile: New command
- just sync-linguist: Run Python script to sync patterns
- Provides step-by-step next actions
- Integrates into development workflow

LINGUIST_INTEGRATION.md: Detailed Phase 2 documentation
- Status: FileClassifier, Script, Integration, CI
- Manual + Automated sync workflows
- Implementation details
- Usage examples

Workflow:

For Maintainers (When Linguist Updates):
  just sync-linguist
  cargo test
  git add .
  git commit

For Automation (Future):
  cargo xtask sync-linguist

What Gets Synced:
- Vendored paths: node_modules/, vendor/, .yarn/
- Generated files: .pb.rs, .generated.ts, .designer.cs
- Binary formats: images, archives, executables

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants