-
Notifications
You must be signed in to change notification settings - Fork 7
feat: canonicalize MSCI to MSE course codes #170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
AD1938
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR! The team noticed this before but did not get a chance to address it.
A few things we might want to consider:
- the
merge_coursefunction is only one-way (old->new), it might be better for MSCI and MSE courses to appear interchangeably. We might want a way to associate multiple course codes together, so that querying any of them gives all the associated ones. - i assume this solution intends to handle future course merges by applying
merge_coursefunctions through Hasura migrations, this would save the mapping information in thehasura/migrations/default. But we might want the DB itself to know the merge information between course codes as well.
Curious to hear your thoughts on this!
| @@ -0,0 +1,89 @@ | |||
| -- Merge MSCI course codes into MSE and make the helper reusable for future renames. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this migration on prod data. Applying it raises some errors due to review_check_course_taken trigger, likely due to some weird data on prod, i need to look into this further and add more details later.
| FROM review | ||
| WHERE course_id = old_id | ||
| ON CONFLICT (course_id, user_id) DO NOTHING; | ||
| DELETE FROM review WHERE course_id = old_id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid removing old courses. If a new user who has taken a MSCI course signs up after we applied this migration. The transcript still records MSCI, the parsing would find no matches for this MSCI course and the user would not be able to leave reviews.
Also, if someone (say an alumni) who has taken MSCI before but does not know the MSCI -> MSE subject code change wants to search some MSCI data, they won't find anything.
UW Flow has been keeping lots of migrated and eliminated course codes. For example, CM is a subject code that is no longer in use but we still keep their data (reviews + ratings)
|
|
||
| func canonicalCourseCode(subj, num string) string { | ||
| return strings.ToLower(canonicalSubject(subj) + num) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo this is not needed. If MSCI has been changed to MSE, UW API would give us MSE only.
If UW API does give MSCI in the future, it means there are some real MSCI courses being offered, then we should not map them to MSE.
Overall, I don't think importer needs to know any remapping information
Consolidate old MSCI subjects into MSE so users see a single course with unified reviews, clearer ratings, and cleaner search results.
Proposed Changes:
Importer canonicalization:
canonical.goand normalize course codes during fetch and convertDatabase migration:
merge_course(old_code, new_code)functionhasura/migrations/default/1770000000000_merge_msci_to_mse/Testing:
go test ./importer/uw/parts/courseFuture usage:
subjectRenamesmap as neededmerge_coursefunction for other subject renames