Skip to content

Auto-cleanup courses for ended semesters #455#502

Open
MithishR wants to merge 8 commits intomainfrom
455-mithish-autocleanup
Open

Auto-cleanup courses for ended semesters #455#502
MithishR wants to merge 8 commits intomainfrom
455-mithish-autocleanup

Conversation

@MithishR
Copy link
Copy Markdown
Collaborator

@MithishR MithishR commented Mar 28, 2026

Description

This PR implements an automated course cleanup service to manage server space usage and removing course documents from ended semesters. The cronjob runs monthly (1st of the month for warnings and 15th for archival), and it sends warning emails to professors 2 weeks before archival. The archival is an automated process which cleans up associated data and archives the courses. For the sake of robustness this service will only process semesters ending after January 1, 2023.
image
image

Closes #455

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update
  • This requires a run of yarn install
  • This change requires an addition/change to the production .env variables. These changes are below:
  • This change requires developers to add new .env variables. The file and variables needed are below:
  • This change requires a database query to update old data on production. This query is below:

How Has This Been Tested?

Testing the warning process by running the cron job every minute, and the archival process every two minutes. Verified the sent emails in the email log, and created test courses across various test semesters to see if the archival actually happened.

Checklist:

  • I have performed a code review of my own code (under the "Files Changed" tab on github) to ensure nothing is committed that shouldn't be (e.g. leftover console.logs, leftover unused logic, or anything else that was accidentally committed)
  • I have commented my code where needed
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked that new and existing tests pass locally with my changes
  • Any work that this PR is dependent on has been merged into the main branch
  • Any UI changes have been checked to work on desktop, tablet, and mobile

@MithishR MithishR requested review from AdamFipke, bhunt02 and patrickma6199 and removed request for AdamFipke and patrickma6199 March 31, 2026 01:04
@MithishR MithishR self-assigned this Mar 31, 2026
@MithishR MithishR marked this pull request as ready for review March 31, 2026 01:07
Copy link
Copy Markdown
Collaborator

@AdamFipke AdamFipke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave it an initial pass, see comments

courseId: course.id,
});

await CourseModel.softRemove(course);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I would love to use this built-in typeorm function, I think various parts of the backend/frontend use the enabled attribute. You could go around and try to update it so that is uses deletedAt and figure all that stuff out, but idk if that'd be worth it atm

Comment on lines +307 to +309
await LMSCourseIntegrationModel.delete({
courseId: course.id,
});
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idk if this is all you have to do to cleanup a connection. @bhunt02 you will want to review this step

Comment on lines +303 to +305
await ChatbotDocPdfModel.delete({
courseId: course.id,
});
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

erm, this only deletes the full documents (used for citations). You should also delete the corresponding document (and its chunks) in the chatbot DB, basically copying the logic for deleteDocument() endpoint inside chatbot.controller.ts.

 @Delete('document/:courseId/:docId')
 @UseGuards(CourseRolesGuard)
 @Roles(Role.PROFESSOR, Role.TA)
 async deleteDocument(
   @Param('courseId', ParseIntPipe) courseId: number,
   @Param('docId') docId: string,
   @User({ chat_token: true }) user: UserModel,
 ) {
   handleChatbotTokenCheck(user);
   const chatbotDeleteResponse = await this.chatbotApiService.deleteDocument(
     docId,
     courseId,
     user.chat_token.token,
   );
   // if that succeeded (an error would have been thrown if it didn't), then delete the document from database
   await ChatbotDocPdfModel.delete({
     docIdChatbotDB: docId,
   });
   return chatbotDeleteResponse;
 }

I'm thinking you can just loop over all ChatbotDocPdfModel entities for the course and use that docIdChatbotDB to delete the corresponding documents using chatbotApiService.deleteDocument, and then if that succeeds, delete the ChatbotDocPdfModel

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, you also need to figure out how to delete any LMS chunks too. Which means we might need a new endpoint in the chatbot repo for deleting all chunks and documents (including LMS ones)? Unless I'm missing something which is very likely.

@bhunt02 easiest way to delete all LMS chunks for a course?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually wait, deleting all chunks is also a bad idea since some of the chunks are ones you would really want to keep (like custom made ones, or ones made from question, or ones that have been edited (i forget if we have an attribute for that but i don't think so)).

So maybe just deleting all chunks from uploaded documents and LMS chunk is best

return false;
}
}
private async archiveCourse(course: CourseModel): Promise<void> {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we tie this into the existing archive feature?

There's a button for admins and professors to archive their courses (that calls the rather confusingly named /update_course_access endpoint in organization.controller which should probably be renamed and moved) and all that does is toggle the enabled attribute of a course (which then hides it from most UI components).

I imagine the frontend should also be changed so that there would be a check box beside the archive button asking (default checked true):

  • delete chatbot documents (saves server space)?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so in that case should i update the endpoint first to accept a request body with cleanup options and then use the archival logic currently implemented?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep pretty much

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep pretty much

Comment thread packages/server/src/mail/course-cleanup.service.ts Outdated
Comment thread packages/server/src/mail/course-cleanup.service.ts Outdated
}

@Cron('0 0 0 15 * *') // 15th of every month do the archival process and send the confirmation emails
async archiveEndedCourses(): Promise<void> {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should also note that the system should keep track to make sure it only actually archives courses for those that have received the email. I would hate it if the archival went through but the user was never notified since the email never sent.

I think we already have the sent-email.entity that could possibly be used for this as one idea. I think .sendEmail() has a param you can pass that will save the email which you can use for queries later (just don't forget to delete the email after the archival is done since we don't need to keep it).


for (const [professorId, { professor, courses }] of professorCourseMap) {
try {
if (await this.isUnsubscribed(professorId, MailServiceType.COURSE_CLEANUP_NOTIFICATION)) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

um, i don't think we need to have a check if the user is unsubscribed from these emails. These should probably just always be sent.

Comment thread packages/server/migration/1774728300000-add-course-cleanup-mail-service.ts Outdated
Copy link
Copy Markdown
Collaborator

@AdamFipke AdamFipke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I just realised that you haven't actually requested my review just yet (idk why i thought that) and it wasn't until course-cleanup.service.ts that I realised, sorry in advance, but I did manage to find some important things (and some less important) that might help you.

import { CourseModel } from '../course/course.entity';
import * as cheerio from 'cheerio';

function validateHtml(html: string): void {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to be almost the exact same function from weekly-summary.builder.ts? Maybe they could both use the same function

private static buildFinalWarningHeader(): string {
return `
<div style="background: linear-gradient(135deg, #d32f2f, #b71c1c); padding: 24px 32px; border-radius: 8px 8px 0 0;">
<h1 style="color: #fff; margin: 0; font-size: 22px;">FINAL NOTICE: Courses Will Be Archived in 4 Days</h1>
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<h1 style="color: #fff; margin: 0; font-size: 22px;">FINAL NOTICE: Courses Will Be Archived in 4 Days</h1>
<h1 style="color: #fff; margin: 0; font-size: 22px;">FINAL NOTICE: Courses Will Be Archived in 3 Days</h1>

Despite me saying the warnings are on the 11th and the 15th, I think it should say 3 days since in reality it can be a little less than 4 days based on when the prof reads the email:

  • 11th at 12am email is sent
  • 11th at say 9am the professor reads the email
  • 15th at 12am cleanup happens (some ~85h or less from when the warning email was sent)

That way we give them a little extra time rather than a little bit less time to deal with it (better that they have 3.5 days to deal with it rather than 2.5 days). You could also mention this in a comment i guess

</p>
${courses.map((c) => this.buildCourseActionsSummary(c, '#856404', '#fff8e1', '#ffc107')).join('')}
<p style="font-size: 15px; line-height: 1.6;">
<strong>To prevent this</strong>, simply update the course's semester to one that has not yet ended before ${archiveDateStr}.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<strong>To prevent this</strong>, simply update the course's semester to one that has not yet ended before ${archiveDateStr}.
<strong>If you want to prevent this</strong>, simply update the course's semester to an active semester (one that has not yet ended before ${archiveDateStr}).

<p style="font-size: 15px; line-height: 1.6;">
<strong>To prevent this</strong>, simply update the course's semester to one that has not yet ended before ${archiveDateStr}.
</p>
<p style="font-size: 13px; color: #888; margin-top: 24px; border-top: 1px solid #eee; padding-top: 16px;">
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<p style="font-size: 13px; color: #888; margin-top: 24px; border-top: 1px solid #eee; padding-top: 16px;">
<p style="font-size: 13px; color: #666; margin-top: 24px; border-top: 1px solid #eee; padding-top: 16px;">

personal preference, i just want the text here to be a little more emphasized

<strong>To prevent this</strong>, simply update the course's semester to one that has not yet ended before ${archiveDateStr}.
</p>
<p style="font-size: 13px; color: #888; margin-top: 24px; border-top: 1px solid #eee; padding-top: 16px;">
This is an automated cleanup process. If this course has ended and is no longer in use, you can safely disregard this email.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This is an automated cleanup process. If this course has ended and is no longer in use, you can safely disregard this email.
This is an automated cleanup process. <strong>If this course has ended and is no longer in use, you can safely disregard this email</strong>.

}


private static buildCourseList(courses: CourseModel[], linkColor: string): string {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems to be unused, but I think I would include it just below this paragraph in the buildNotificationSection

<p style="font-size: 15px; line-height: 1.6;">
          The following course${courses.length > 1 ? 's are' : ' is'} assigned to a semester that has ended and will be
          <strong>automatically cleaned up and archived on ${archiveDateStr}</strong> unless the semester is updated.
        </p>

and just below this paragraph for the final warning

<p style="font-size: 16px; line-height: 1.6; color: #d32f2f; font-weight: 600;">
          This is your last chance to update the semester for the following course${courses.length > 1 ? 's' : ''} to one that hasn't yet ended to avoid automatic cleanup and archival.
        </p>

Comment on lines +151 to +154
<li style="margin-bottom: 6px;">
<a href="${process.env.DOMAIN}/course/${c.id}/settings" style="color: ${linkColor}; text-decoration: underline; font-weight: 600;">${c.name}</a>
${c.semester ? `<span style="color: #888;"> (${c.semester.name})</span>` : ''}
</li>`,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of showing the semester beside each course. I would also include when the semester ended. Perhaps in the format "ended X days/months/weeks/etc. ago" (I think the moment library has some nice functions for formatting dates like that, we already have it installed I think). Though as another idea, I think doing "ended X days ago" would be really easy to consume.

I'm thinking the "ended X days ago" text could be gray or black or washed out red (probably the latter).

.innerJoinAndSelect('course.semester', 'semester')
.leftJoinAndSelect('course.lmsIntegration', 'lmsIntegration')
.leftJoinAndSelect('lmsIntegration.orgIntegration', 'orgIntegration')
.leftJoinAndSelect('course.chatbot_doc_pdfs', 'chatbot_doc_pdfs')
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woah um I don't think you want to go querying all chatbot doc pdfs directly. Each row can be upwards of 80MB. You'd probably want to take another approach (maybe just get the count? or adjust the query to not select docData)

Comment on lines +38 to +39
const archiveDate = new Date();
archiveDate.setDate(15);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is partially just a note for me (and @bhunt02 ) since I think our backend on production uses a different timezone so this might be an issue idk.

Maybe a fix for this is to grab the timezone from the course (course.timezone)

</p>
${courses.map((c) => this.buildCourseActionsSummary(c, '#c62828', '#ffebee', '#d32f2f')).join('')}
<p style="font-size: 15px; line-height: 1.6; background: #e8f5e9; padding: 12px 16px; border-radius: 4px; border-left: 4px solid #4caf50;">
<strong style="color: #2e7d32;">To prevent this:</strong> Update the course's semester to one that has not yet ended before ${archiveDateStr}.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I just had a big brain moment: We have a course clone feature. We should say basically "you can update this course's semester OR consider cloning the course to a fresh semester".

I might also make a semester on prod called "Sometime in the Future" or something to that effect for professors that want to clone their course for a future semester but don't yet know what said semester may be (I'll handle this step).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Auto-cleanup courses for ended semesters

2 participants