Skip to content

Conversation

@moseshll
Copy link
Contributor

@moseshll moseshll commented Jun 24, 2025

From the ticket description:

Metadata Workflow Verifier gives false positives when records are removed from the downstream file due to all HTIDs on the record having supp/* rights, e.g.
PostZephirProcessing::Verifier::PostZephir: catalog archive line count (/htapps/archive/catalog/zephir_full_20250531_vufind.json.gz = 9413024) != bib export line count (/htprep/zephir/zephir/ht_bib_export_full_2025-05-31.json.gz = 9413852)

There is a monthly log file written somewhere under /htprep/zephir/zephir that may have a count of the suppressed records.

  • Add an "expected delta" to the comparison between monthly Zephir file and catalog archive derivative
  • Delta is based on matching lines in [ZEPHIR_DATA]/full/zephir_full_monthly_rpt.txt
    • This is the "more yucky" solution since we grep -c to find the number of suppressed records
    • The "less yucky" solution would be to have postZephir.pm write the total number, but we'd still have to grep to extract that
    • Both solutions are brittle in depending on a logfile no one really cares about

Reviewer: does a high-level scrutiny cause only low to moderate discomfort? I'm willing to tolerate some yuckiness to get this in place by July 1.

- Add an "expected delta" to the comparison between monthly Zephir file and catalog archive derivative
- Delta is based on matching lines in `[ZEPHIR_DATA]/full/zephir_full_monthly_rpt.txt`
  - This is the "more yucky" solution since we `grep -c` to find the number of suppressed records
  - The "less yucky" solution would be to have postZephir.pm write the total number, but we'd still have to grep to extract that
  - Both solutions are brittle in depending on a logfile no one really cares about
@moseshll moseshll requested a review from mwarin June 24, 2025 18:58
Copy link
Contributor

@mwarin mwarin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anything particularly yucky here. Noted one possible additional logging opportunity, but no need to stand in the way of an APPROVE.

- Log error if `count_suppressed_records` grep command fails
@moseshll moseshll marked this pull request as ready for review June 25, 2025 15:23
@moseshll moseshll merged commit 3887e4a into main Jun 25, 2025
1 check passed
@moseshll moseshll deleted the ETT-339_verifier_supp_records branch June 25, 2025 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants