Updated CourseInfo uses multithreading instead of multiprocessing#147
Updated CourseInfo uses multithreading instead of multiprocessing#147AssassinMagic wants to merge 2 commits intomainfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
| from .abstract import EnhanceBase | ||
| from db.Models import DepartmentDistribution, ClassDistribution, Libed, Session, and_ | ||
| import requests | ||
| import httpx |
There was a problem hiding this comment.
Also we use aiohttp for RMP already. I would encourage using that so we keep our surface area low with libraries. Check the RMP folder for how that looks like.
| finally: | ||
| session.close() | ||
|
|
||
| async def enhance_helper(self, dept_dist: DepartmentDistribution, semaphore: asyncio.Semaphore) -> None: |
There was a problem hiding this comment.
does not fit abstract class structure.
| with Pool() as pool: | ||
| pool.map(self.enhance_helper, dept_dists) No newline at end of file | ||
|
|
||
| semaphore = asyncio.Semaphore(9) # Limit concurrent tasks to something under 5 due to rate limiting |
There was a problem hiding this comment.
Comment conflicts with implementation
| campus_str = str(campus) | ||
| def _process_course_data(self, courses: list[dict], dept: str, campus: str) -> None: | ||
| session = Session() | ||
| try: |
There was a problem hiding this comment.
If even one course fails the entire batch will be considered faulty.
| response.raise_for_status() | ||
| req = response.json() | ||
| courses = req.get("courses", []) | ||
| except httpx.HTTPStatusError as e: |
There was a problem hiding this comment.
I'd like to know if this is a prohibited error 4xx or a server error 5xx in our logs.
| campus = dept_dist.campus | ||
| campus_str = str(campus) | ||
| def _process_course_data(self, courses: list[dict], dept: str, campus: str) -> None: | ||
| session = Session() |
There was a problem hiding this comment.
Sessions should be made for every granular change. You're batching a massive amount of data in one commit.
| # Only process UMNTC and UMNRO campuses | ||
| if campus_str not in ["UMNTC", "UMNRO"]: | ||
| return | ||
| courses = [] |
There was a problem hiding this comment.
Do you think this is necessary? Courses will either be assigned a value if the try succeeds. If it fails it should return with no-ops./
| courses = req.get("courses", []) | ||
| except ValueError: | ||
| print("Json malformed, icky!") | ||
| await asyncio.to_thread(self._process_course_data, courses, dept, campus) |
There was a problem hiding this comment.
This spawns a seperate thread to run a synchronous function. Can you make _process_course_data async and just await it instead? Would reduce thread overhead.
|
|
||
| def enhance(self, dept_dists: list[DepartmentDistribution]) -> None: | ||
| async def enhance(self, dept_dists: list[DepartmentDistribution]) -> None: | ||
| """Enhance the data for a list of department distributions in a multiprocessing pool.""" |
|
In addition, you want to change the signature and logic of CourseDogEnhance. Even though we no longer use it, I would like to have backwards compatibility. |
|
Once you merge in changes from #148 can you also add the following to the pipenv so we force Python 3.14t usage? |
Using asyncio with python 3.14 lets python run multi threaded instead of single threaded.