Skip to content

New vinscan#494

Merged
simsong merged 6 commits intosimsong:mainfrom
kamwoods:new-vinscan
Nov 7, 2025
Merged

New vinscan#494
simsong merged 6 commits intosimsong:mainfrom
kamwoods:new-vinscan

Conversation

@simsong
Copy link
Owner

@simsong simsong commented Nov 6, 2025

No description provided.

New files associated with this scanner are:

- scan_vehicle.flex: A pattern scanner to recognize vehicle related
                     identifiers. Currently limited to VINs, but
                     could be expanded to include HINs, etc.
                     Patterned after scan_accts.flex

- scan_vin.cpp:      A validator for Vehicle Identification Numbers
                     (VINs). Validates rules for World Manufacturer
                     Identifier, Vehicle Descriptor Section (including
                     check digit) and Vehicle Identifier Section to
                     ensure capture of valid VINs. Structure adapted
                     from scan_ccns2.cpp

- scan_vin.h:        Header for scan_vin.cpp

Some initial test files are also included, using synthesized valid
and invalid VINs:

- src/tests/test_synthetic_vin.json
- src/tests/test_vin.txt
- src/tests/test_vin_doc.odt

Build and scanner registration files modified:

- src/Makefile.am
- src/bulk_extractor_scanners.h
Individual scanner functionality tests added to test_be1 for scan_vin
scanner to validate operation, consistent with other test structures.

Included tests:
- scan_vin_validation: general tests for valid and invalid VINs
- scan_vin1:           test with valid VINs labeled in context
- scan_vin2:           test with VINs in open text, no context
- scan_vin3:           invalid VIN tests (disallowed characters, bad check
                       digits, lowercase characters, all-digit VINs)
- scan_vin_json:       scan valid/invalid VINs appearing in JSON file
- scan_vin_year_codes: test year codes
- scan_vin_context:    test that VINs in hex dump context are disregarded

Verified via: make test_be && test_be "[scanners]"
Integration tests added to test_be2 for scan_vehicle to validate operation,
consistent with existing test structures.

Included tests:

- test_vin: test with test_vin.txt from the tests/ directory
- test_vin_json: test with test_synthetic_vin.json from the tests/ directory
- test_vin_odt: test with the ODT doc in the tests/ directory

Verified via: make test_be && test_be "[phase1]"
@simsong simsong requested a review from Copilot November 6, 2025 13:06
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a Vehicle Identification Number (VIN) scanner to bulk_extractor for forensic analysis. The implementation validates VINs according to ISO 3779 standards, including check digit verification and character validation.

Key changes:

  • New VIN validation logic with check digit calculation and WMI/VDS/VIS validation
  • Flex scanner for detecting VINs with context-aware filtering to reduce false positives
  • Comprehensive test suite covering valid/invalid VINs, different formats, and edge cases

Reviewed Changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/scan_vin.cpp Core VIN validation implementation with check digit and structural validation
src/scan_vin.h Header file defining VIN validation functions
src/scan_vehicle.flex Flex scanner for detecting VINs in data streams with context filtering
src/test_be1.cpp Unit tests for VIN validation and scanner functionality
src/test_be2.cpp Integration tests for VIN scanner with various file formats
src/tests/test_vin.txt Test data with valid and invalid VINs
src/tests/test_synthetic_vin.json Synthetic VIN test data
src/tests/test_vin_doc.odt Binary ODT test file containing VINs
src/bulk_extractor_scanners.h Registered vehicle scanner
src/Makefile.am Build system updates for new scanner

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@simsong simsong self-assigned this Nov 6, 2025
@simsong
Copy link
Owner Author

simsong commented Nov 6, 2025

@kamwoods@copilot's suggestions are all pretty good. Can you look them over and implement them? (Normally it offers to, but not now, apparently).

@simsong
Copy link
Owner Author

simsong commented Nov 6, 2025

@kamwoods — Any idea why codecov didn't run here?

- Error suppresion logic in scan_vehicle.flex
- Character check bug in scan_vin.cpp
- Misc comment typos
- Removed an unused pattern from scan_vehicle.flex
Copy link
Owner Author

@simsong simsong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great.

@simsong simsong marked this pull request as ready for review November 6, 2025 14:39
- Removed dead code blocks from a previous approach in scan_vin.cpp
  and scan_vin.h
@kamwoods
Copy link
Collaborator

kamwoods commented Nov 6, 2025

Cleaned up some dead code blocks after a codecov run. Coverage on scan_accts.flex and scan_vin.cpp improved.

Minor modification of scan_vehicle.flex to stop generating
extraneous vin_manufacturer.txt and vin_year.txt histograms.

Brings the vehicle scan histogram generation behavior in line
with other bulk_extractor scanners, generating a single
vin_histogram.txt on each run.
@simsong simsong merged commit a6be88e into simsong:main Nov 7, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants