Optimize excel reading workflow#22
Merged
FelixCAAuer merged 8 commits intofeature/markovTransitionfrom Oct 30, 2025
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull Request Overview
This PR improves Excel file reading performance by switching from openpyxl to the faster calamine engine and introducing optional parallel file reading capabilities. The changes refactor the version checking mechanism to work with open ExcelFile objects and reorganize the initialization flow to enable concurrent data loading.
Key changes:
- Switched pandas ExcelFile reading from
openpyxltocalamineengine for better performance - Added optional parallel reading of Excel files using ThreadPoolExecutor with configurable worker count
- Refactored
check_LEGOExcel_version()to operate on open ExcelFile objects instead of file paths
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| environment.yml | Added python-calamine=0.5.3 dependency to support the new Excel reading engine |
| ExcelReader.py | Refactored version checking to work with open ExcelFile objects; switched to calamine engine for reading |
| CaseStudy.py | Reorganized initialization to support parallel file reading with ThreadPoolExecutor; added parallel_read and n_jobs parameters |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Now the pandas ExcelFile function is using the engine
calaminebased on Rust which is faster than the default engineopenpyxl.Now the versioning checking is done over the already opened file, before every file was reading twice.
Now the reading of the excel files are being parallelized when building the CaseStudy object, except for the files Global_Parameters, Global_Scenarios and Power_Parameters that are read sequentially at the beginning of the workflow.