Skip to content

[b/505303589] Updated parser validation script#710

Open
prasoonbirla-google wants to merge 1 commit intomainfrom
updated-parser-validation-script
Open

[b/505303589] Updated parser validation script#710
prasoonbirla-google wants to merge 1 commit intomainfrom
updated-parser-validation-script

Conversation

@prasoonbirla-google
Copy link
Copy Markdown
Contributor

@prasoonbirla-google prasoonbirla-google commented Apr 22, 2026

Fix: Improve parser validation output structure and timestamp handling


Description

What problem does this PR solve?
This PR addresses several issues in the parser validation script (run_parser_validations.py) to make test event comparisons more accurate and to improve debugging capabilities:

  1. The script was previously discarding many UDM fields returned by the parser API, selectively picking only metadata and additional, which caused structure mismatches with the expected test_events.json files.
  2. The log_type passed to the validation API was hardcoded to a dummy value, which could affect parser routing or results.
  3. Timestamps were causing false-positive test failures due to differences in microsecond zero-padding (e.g., .198Z vs .198000Z) and due to the diffing tool not ignoring the camelCase eventTimestamp field when the year falls back to the current execution year.

How does this PR solve the problem?

  • Preserve Full UDM Structure: Modified the event transformation logic to nest the complete raw event payload returned by the parser under idm.readOnlyUdm (using camelCase as expected by the tests), ensuring fields like principal, target, and observer are correctly included in the validation phase.
  • Dynamic Log Type Resolution: The script now checks for metadata.json in the cbn directory and extracts the actual logType to pass into ``chronicle_client.run_parser(), gracefully falling back to a default if unavailable.
  • Timestamp Formatting and Filtering Fixes:
    • Updated normalize_timestamp() to cleanly strip trailing zeros from microseconds, matching the canonical expected log format.
    • Updated filter_timestamps() to properly ignore the camelCase eventTimestamp field during the symmetric diff, preventing false failures caused by fallback execution years.
  • JSON Output for Debugging: Replaced the print(validation_results) statement with file I/O that cleanly dumps the API payload into a validation_results.json file in the same directory as the generated markdown report.

Any other relevant information (e.g., design choices, tradeoffs, known issues):
The timestamp normalizer handles cases with and without existing milliseconds. By keeping the complete unmodified event mapped directly into readOnlyUdm, the validation suite now acts as a much stricter and more accurate gate against parser regressions across all UDM fields.

Checklist:

Please ensure you have completed the following items before submitting your PR.
This helps us review your contribution faster and more efficiently.

General Checks:

  • I have read and followed the project's contributing.md guide.
  • My code follows the project's coding style guidelines.
  • I have performed a self-review of my own code.
  • My changes do not introduce any new warnings.
  • My changes pass all existing tests.
  • I have added new tests where appropriate to cover my changes. (If applicable)
  • I have updated the documentation where necessary (e.g., README, API docs). (If applicable)

Open-Source Specific Checks:

  • My changes do not introduce any Personally Identifiable Information (PII) or sensitive customer data.
  • My changes do not expose any internal-only code examples, configurations, or URLs.
  • All code examples, comments, and messages are generic and suitable for a public repository.
  • I understand that any internal context or sensitive details related to this work are handled separately in internal systems (Buganizer for Google team members).

For Google Team Members and Reviewers Only:

  • I have included the Buganizer ID in the PR title or description (e.g., "Internal Buganizer ID: 123456789" or "Related Buganizer: go/buganizer/123456789").
  • I have ensured that all internal discussions and PII related to this work remain in Buganizer.
  • I have tagged the PR with one or more labels that reflect the pull request purpose.

Screenshots (If Applicable)

If your changes involve UI or visual elements, please include screenshots or GIFs here.
Ensure any sensitive data is redacted or generalized.


Further Comments / Questions

Any additional comments, questions, or areas where you'd like specific feedback.

@prasoonbirla-google prasoonbirla-google requested a review from a team as a code owner April 22, 2026 11:41
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the parser validation script to improve test accuracy and debugging capabilities. By preserving the full UDM structure, dynamically resolving log types, and refining timestamp normalization, the changes reduce false-positive test failures and provide a more robust validation process for parser regressions.

Highlights

  • UDM Structure Preservation: Modified event transformation to nest the complete raw event payload under readOnlyUdm, ensuring all UDM fields are included in validation.
  • Dynamic Log Type Resolution: Updated the script to extract the logType from metadata.json if available, replacing the previous hardcoded dummy value.
  • Timestamp Normalization: Improved timestamp handling by stripping trailing zeros from microseconds and ignoring the eventTimestamp field during diffing to prevent false-positive test failures.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors timestamp normalization to handle microseconds, updates timestamp filtering, and introduces dynamic log type detection from a metadata file. It also simplifies the UDM event construction by nesting the original event under a new 'readOnlyUdm' key. Feedback includes a security concern regarding the use of 'json.loads' instead of 'yaml.safe_load' as per the style guide, a suggestion for more robust I/O error handling, and a note about missing logic for dumping validation results mentioned in the PR description.

Comment thread tools/parsers/validations/run_parser_validations.py
Comment thread tools/parsers/validations/run_parser_validations.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants