A place for discussion & code review#1
Conversation
|
Tracking new feature requests unrelated to this review under -> #2 |
| if file_list_resp.status_code == 200: | ||
| objs = file_list_resp.json()['value'] | ||
|
|
||
| for obj in objs: |
There was a problem hiding this comment.
I'd like to point these lines out in particular. There are a few ways to determine if an object is a file or a folder. You can check that it has the file property, and then use the download URL since you know it will exist. The second way is to just check that it has a download URL.
I think the way it is done here is fine since we don't care about folders, and only downloadable content in the drive. Just something worth pointing out since we can also use the properties to check for images and/or videos as well. This could be useful elsewhere.
There was a problem hiding this comment.
Included change in pull -> #3
The existence of obj['file'] is almost the same as what we're doing above but this 'file' property will exist only on filetype objects. I don't know if there are any non-filetype objects that would have a downloadURL (folders do not) but this change should prevent us from grabbing those if there are.
Add filenames function, create local path param
mingchaoliao
left a comment
There was a problem hiding this comment.
- add a
requirements.txtfile containing your project's dependencies. - I found out that the file names are hard to follow as they are all lowercase and have no separator. I think most people prefer all lowercase and use underscores to improve readability. You can probably follow the PEP8 style guide https://peps.python.org/pep-0008/#package-and-module-names
- I encourage you to consider adding types to all possible places. Similar to JavaScript, Python is a weak-type language. But writing strong-typed code (like why we replace JS with TypeScript) can greatly reduce type related bugs and improve code readability (as other developers can easily inspect the code in an IDE, e.g. know exactly the parameter structure, go to the class definition, etc.). see more about Python type hint system at https://docs.python.org/3/library/typing.html
- It seems like Microsoft has a Graph API SDK for python, https://github.com/microsoftgraph/msgraph-sdk-python-core, are you able to utilize it? (instead of doing it manually using
requests). - The way you download/upload files won't work well with large files. For large files (several Gig), people usually download/upload using streams, or do it in multiple chunks. Although I don't think we will run into that issue very often.
| @@ -0,0 +1,2 @@ | |||
| .env | |||
| venv No newline at end of file | |||
There was a problem hiding this comment.
missing a new line at the end of the file.
| @@ -0,0 +1,14 @@ | |||
| from setuptools import find_packages, setup | |||
|
|
|||
| setup( | |||
There was a problem hiding this comment.
I assume you eventually want to render the README file to your package's profile page on the PyPI. PyPI won't automatically pick up the README file. You have to read the file and assign it to long_description. see more details at https://packaging.python.org/en/latest/guides/making-a-pypi-friendly-readme/
| Module to unify and simplify configuration of a SharePoint document library for use in | ||
| a Python ETL | ||
| """ | ||
| class DocumentLibrary: |
There was a problem hiding this comment.
Based on the content of this class and how it is used in the SimpleETL class. I feel like calling it DocumentLibraryConfig, DocumentLibraryOptions, or something similar might be more approperiate.
| self.library = document_library | ||
| self.__thumbprint = thumbprint | ||
| self.__private_key = private_key | ||
| self.__token = self.__acquire_token() |
There was a problem hiding this comment.
It's better to move the token exchange process out of the constructor and probably run it right before the first time an action (read/write/etc) requiring the authentication is executed, giving the following reasons:
- It doesn't make sense to spend time before it is actually needed. If someone set up the package but ends up not using it, then the resource is wasted, although developers shouldn't do that.
- It is harder to test. General speaking, in OOP, (with a few exceptions), the only thing a constructor should do is dependency injection, meaning assigns dependencies from the caller to its instance members. We usually don't have to test constructors, but doing business logic in it will then requires you to test the constructor itself.
|
|
||
|
|
||
| @staticmethod | ||
| def __get_item_id(file_items, target_name): |
There was a problem hiding this comment.
It doesn't have to stay inside the class as a static method. It seems like a utility function and can put it into a dedicate utility py file, or still stays in the same file with the class but not inside the class.
| raise Exception(result.get('error')) | ||
|
|
||
|
|
||
| def filenames(self, remote_path): |
There was a problem hiding this comment.
My personal preference, listFiles is more self-explainable.
| objs = file_list_resp.json()['value'] | ||
| for obj in objs: | ||
| if obj['file']: | ||
| filenames.append(obj['name']) |
There was a problem hiding this comment.
I think it can be simplified to:
files = filter(lambda obj: obj['file'], objs)
filenames = [ file['name'] for file in files ]| file_data = requests.get(obj['@microsoft.graph.downloadUrl']) | ||
| if file_data.status_code == 200: | ||
| try: | ||
| clean_path = re.sub(r'^(\\|\/)+|(\\|\/)+$', '', local_path) |
There was a problem hiding this comment.
seems like you can use os.path.normpath(...) here https://docs.python.org/3/library/os.path.html#os.path.normpath
| list_url = f'{self.library.base_url}/root:/{remote_path}:/children' | ||
| file_list_response = requests.get(list_url, | ||
| headers={'Authorization': 'Bearer ' + self.__token}) |
There was a problem hiding this comment.
hmm, I saw similar code in the filenames method. To improve reusability, maybe you can create another method to handle the list files/dirs work and let filenames and delete methods use it rather than doing their own file listing.
| headers={'Authorization': 'Bearer ' + self.__token}) | ||
| if delete_response.status_code != 204: | ||
| raise Exception(f'Failed to delete {file_name}. \ | ||
| {delete_response.raise_for_status()}') |
There was a problem hiding this comment.
raise_for_status will actually raise an exception. It makes your raise Exception(...) pointless. There are several cases like that in this file.
| filenames (string[]): List of file names in the remote_path directory | ||
| """ | ||
| filenames = [] | ||
| file_list_resp = requests.get(f'{self.library.base_url}/root:/{remote_path}:/children', |
There was a problem hiding this comment.
You may be able to use filter and select query parameter to:
- return files only https://docs.microsoft.com/en-us/graph/query-parameters#filter-parameter
- return only the attributes you need https://docs.microsoft.com/en-us/graph/query-parameters#select-parameter
This comment also applies to other places that make api calls.
|
No description provided.