Skip to content

Conversation

@moseshll
Copy link
Collaborator

@moseshll moseshll commented Oct 9, 2025

  • Remove deprecated ht_proxies
  • Remove ht_logs, ht_contacts, and ht_contact_types which now have an otis_ prefix
  • Remove ht_institutions.template
    • Regenerate mysqldump of ht_institutions without template column.
    • This is the full dump, I believe the previous iteration was hand-crafted (with some columns excluded).
    • Prefer to use data that can be regenerated with a minimum of fuss, hoping to help mitigate schema drift.
    • ... as long as this does not expose sensitive data.

- Regenerate mysqldump of `ht_institutions` without `template` column.
  - This is the full dump, I believe the previous iteration was hand-crafted (with some columns excluded).
  - Prefer to use data that can be regenerated with a minimum of fuss, hoping to help mitigate schema drift.
@moseshll moseshll requested a review from aelkiss October 9, 2025 16:26
@aelkiss
Copy link
Member

aelkiss commented Oct 9, 2025

@moseshll Some of the data from ht_institutions makes me a little bit itchy -- maybe especially the list of IdPs for institutions that aren't enabled (these may be former members, members in testing, etc) -- certainly the IdPs for enabled ones is available via the wayf and the public institutions dump anyway so that stuff is fine. Likewise the specific scopes we allow per IdP I'm not sure about - again, I think this is probably OK at least for "enabled" ht_institutions. I might be inclined
to omit the institutions where enabled=0 - that might be a reasonable balance of ease of regeneration and not committing information that potentially shouldn't be public. (The one exception there is probably the hathitrust inst_id.) I'm also not really sure at this point for what purposes we even need this information in dbimage anyway (rather than just having the schema and adding fake data); better understanding that might change my position here.

@moseshll
Copy link
Collaborator Author

moseshll commented Oct 9, 2025

@aelkiss it's actually fine to have some custom "dump schemas" for lack of a better term. I'll restore the file with something more like the existing dump, and keep the code that creates it around for posterity next time we want to recreate the data.

Would it be too involved to close/delete this PR and erase the branch from history?

@aelkiss
Copy link
Member

aelkiss commented Oct 9, 2025

I think if we close the PR and delete the branch GitHub with garbage collect it. I don't think it's a major issue.

@aelkiss
Copy link
Member

aelkiss commented Oct 21, 2025

@moseshll Do we still want to close this PR and delete the branch?

@moseshll
Copy link
Collaborator Author

@aelkiss Closing and deleting. I have notes on what should be done as a follow-up.

@moseshll moseshll closed this Oct 21, 2025
@moseshll moseshll deleted the updates_aug_2025 branch October 21, 2025 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants