Updated to continue working in 2025, refactored codebase #8

am-becker · 2025-02-26T01:52:55Z

New features:

Supports Pages as well as Modules/Files
Supports caching so pages linked to from multiple pages are not re-requested
Performs HEAD request before actual file GET request to compare file size (much faster)
Fixes a bunch of bugs with canvas API returning weird results for locked/hidden files
Supports pagination (kinda) by increasing max page count

benjavicente · 2025-02-26T22:47:46Z

Hi @aaroexxt! Thanks for the PR! I will review soon.

I don't plan to maintain this library in the long-term, so I will see if a student org takes the repo. If you are interested on helping to maintaining it, I would be happy to known :)

benjavicente

The only required changes would be to revert where there was the Google Drive download functionality and to fix my name :)

There is a lot of improvements that I will comment on issues, but all come form the fact that I wasn't really experienced on making good code back then 😅

benjavicente · 2025-02-26T22:54:49Z

canvas.py

    def __get(self, query: str, **kwarg):
+        """Performs a GET request to the Canvas API with 250 items per page."""
+        # Extract existing parameters if provided
+        params = kwarg.pop("params", {})
+
+        # Ensure 'per_page' is set to 500
+        params["per_page"] = 500  


I thing that's better for the per_page param to be defined per query, some endpoints might restrict the page size if I remember correctly.

Also, if you want to add a default to the params, it might be better to do something like this:

def __get(self, query: str, *, params = {"per_pagee": 500}, **kwarg):

Right now that implementation seems to override what was passed to __get

benjavicente · 2025-02-26T22:58:30Z

canvas.py

+        pages_api_url = f"https://{self.domain}/api/v1/courses/{course_id}/pages"
+        response = requests.get(
+            pages_api_url, 
+            headers={"Authorization": f"Bearer {self.token}"},
+            params={"per_page": 250}  # Ensuring 250 pages per request
+        )


This ideally should use the __get method.

benjavicente · 2025-02-26T23:06:33Z

canvas.py

+                    if self._is_canvas_url(item["external_url"]):
+                        self._download_canvas_page(course_id, item["external_url"], course_name)


Here there was a method to download files from Google Drive, since at UC Chile it was common for courses to link to files there. I would keep the get_external_download_url check, or at least add a comment to say that here is where one could add a custom downloader for external services.

benjavicente · 2025-02-26T23:21:40Z

readme.md

+Complete file downloader for Canvas (Instructure)!

-Features:
+Originally by [Ben Javicente](https://github.com/benjavicente/canvas-file-downloader), essentially rewritten from scratch by [Aaron Becker](https://github.com/aaroexxt).


Suggested change

Originally by [Ben Javicente](https://github.com/benjavicente/canvas-file-downloader), essentially rewritten from scratch by [Aaron Becker](https://github.com/aaroexxt).

Originally by [Benjamin Vicente](https://github.com/benjavicente/canvas-file-downloader), essentially rewritten from scratch by [Aaron Becker](https://github.com/aaroexxt).

Benja is the spanish equivalent of Ben in english ;)

am-becker added 5 commits February 25, 2025 20:35

Final script live with all of my changes :)

9d58ea3

Final readme update

63aa12c

access token image commit

8b5977e

Merge branch 'master' of https://github.com/aaroexxt/CanvasScraper

7028bd7

Last last change :)

c069076

benjavicente requested changes Feb 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Updated to continue working in 2025, refactored codebase #8

Updated to continue working in 2025, refactored codebase #8

Uh oh!

am-becker commented Feb 26, 2025

Uh oh!

benjavicente commented Feb 26, 2025

Uh oh!

benjavicente left a comment

Uh oh!

benjavicente Feb 26, 2025

Uh oh!

benjavicente Feb 26, 2025

Uh oh!

benjavicente Feb 26, 2025

Uh oh!

benjavicente Feb 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if self._is_canvas_url(item["external_url"]):
		self._download_canvas_page(course_id, item["external_url"], course_name)

	Originally by [Ben Javicente](https://github.com/benjavicente/canvas-file-downloader), essentially rewritten from scratch by [Aaron Becker](https://github.com/aaroexxt).
	Originally by [Benjamin Vicente](https://github.com/benjavicente/canvas-file-downloader), essentially rewritten from scratch by [Aaron Becker](https://github.com/aaroexxt).

Updated to continue working in 2025, refactored codebase #8

Are you sure you want to change the base?

Updated to continue working in 2025, refactored codebase #8

Uh oh!

Conversation

am-becker commented Feb 26, 2025

Uh oh!

benjavicente commented Feb 26, 2025

Uh oh!

benjavicente left a comment

Choose a reason for hiding this comment

Uh oh!

benjavicente Feb 26, 2025

Choose a reason for hiding this comment

Uh oh!

benjavicente Feb 26, 2025

Choose a reason for hiding this comment

Uh oh!

benjavicente Feb 26, 2025

Choose a reason for hiding this comment

Uh oh!

benjavicente Feb 26, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants