Skip to content

feat(extractor): Add Hugging Face Transformers config.json extractor#1989

Open
SteppL10n-prog wants to merge 24 commits intogoogle:mainfrom
SteppL10n-prog:feat/huggingface-transformers-extractor
Open

feat(extractor): Add Hugging Face Transformers config.json extractor#1989
SteppL10n-prog wants to merge 24 commits intogoogle:mainfrom
SteppL10n-prog:feat/huggingface-transformers-extractor

Conversation

@SteppL10n-prog
Copy link
Copy Markdown

Summary

This PR adds a new filesystem extractor for Hugging Face Transformers AI models that:

  • Detects config.json and adapter_config.json files in model directories
  • Extracts transformers_version field from JSON config
  • Generates standard PURL: pkg:pypi/transformers@<version>
  • Enables vulnerability matching via OSV.dev, GHSA, and NVD

Why This Matters

  • 500K+ public AI models on Hugging Face Hub
  • Many models depend on transformers library with known CVEs
  • No existing extractor scans AI model configs for vulnerable dependencies
  • This PR fills that gap with minimal, performant code

Category Compliance

This is a software inventory extractor, NOT a secret/vulnerability detector.

Testing

  • 24 test cases covering success, error, and edge scenarios
  • 100% statement coverage
  • Docker-verifiable via Dockerfile.test

Files Added

  • extractor/filesystem/ai/huggingface-transformers/extractor.go
  • extractor/filesystem/ai/huggingface-transformers/extractor_test.go

Related Issues

Signed-off-by: Mahmut Boz mahmutbozl10n@gmail.com

@google-cla
Copy link
Copy Markdown

google-cla Bot commented Apr 19, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

- Detects transformers_version from model config files
- Generates PURL: pkg:pypi/transformers@<version>
- 100% test coverage with comprehensive edge case handling
- Docker-verifiable with test fixtures

Signed-off-by: Mahmut Boz <mahmutbozl10n@gmail.com>
@SteppL10n-prog SteppL10n-prog force-pushed the feat/huggingface-transformers-extractor branch from 95ff23c to 2276d38 Compare April 19, 2026 22:53
- Remove trailing whitespace from string literals in tests
- Convert comments to English per Go style guide
- Ensure JSON test data has no extra spaces
- Run go fmt for consistent formatting
- Remove all Turkish comments
- Remove all trailing whitespace from string literals
- Ensure JSON test data has no extra spaces
- Run go fmt for consistent formatting
- Remove all Turkish comments
- Remove all trailing whitespace from string literals
- Ensure JSON test data has no extra spaces
- Run go fmt for consistent formatting
Return an error when JSON decoding fails in the extractor.
Updated comments to clarify extractor functionality and file requirements.
Fix missing newline at end of file in extractor_test.go
Refactor error handling in JSON decoding to avoid unnecessary variable declaration. Ensure proper return values when decoding fails.
Fix formatting by adding a newline at the end of the file.
Updated comment for clarity on error handling without defining an error variable.
Refactor error handling for JSON decoding in Extractor.
Removed unused import for the plugin package.
Refactor extractor methods for clarity and error handling.
Return nil error on JSON decode failure to prevent crashing on non-HuggingFace config.json files.
Refactor tests for Extractor to handle errors and invalid JSON gracefully.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant