Skip to content

Optimize excel reading workflow#22

Merged
FelixCAAuer merged 8 commits intofeature/markovTransitionfrom
feature/optimizeExcelReader
Oct 30, 2025
Merged

Optimize excel reading workflow#22
FelixCAAuer merged 8 commits intofeature/markovTransitionfrom
feature/optimizeExcelReader

Conversation

@MarcoAnarmo
Copy link
Copy Markdown
Member

  • Now the pandas ExcelFile function is using the engine calamine based on Rust which is faster than the default engine openpyxl.

  • Now the versioning checking is done over the already opened file, before every file was reading twice.

  • Now the reading of the excel files are being parallelized when building the CaseStudy object, except for the files Global_Parameters, Global_Scenarios and Power_Parameters that are read sequentially at the beginning of the workflow.

@MarcoAnarmo MarcoAnarmo self-assigned this Oct 30, 2025
@MarcoAnarmo MarcoAnarmo added the enhancement New feature or request label Oct 30, 2025
@MarcoAnarmo MarcoAnarmo changed the title Optimize excel reader Optimize excel reading workflow Oct 30, 2025
@FelixCAAuer FelixCAAuer changed the base branch from main to feature/markovTransition October 30, 2025 13:46
@FelixCAAuer FelixCAAuer requested a review from Copilot October 30, 2025 13:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves Excel file reading performance by switching from openpyxl to the faster calamine engine and introducing optional parallel file reading capabilities. The changes refactor the version checking mechanism to work with open ExcelFile objects and reorganize the initialization flow to enable concurrent data loading.

Key changes:

  • Switched pandas ExcelFile reading from openpyxl to calamine engine for better performance
  • Added optional parallel reading of Excel files using ThreadPoolExecutor with configurable worker count
  • Refactored check_LEGOExcel_version() to operate on open ExcelFile objects instead of file paths

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
environment.yml Added python-calamine=0.5.3 dependency to support the new Excel reading engine
ExcelReader.py Refactored version checking to work with open ExcelFile objects; switched to calamine engine for reading
CaseStudy.py Reorganized initialization to support parallel file reading with ThreadPoolExecutor; added parallel_read and n_jobs parameters

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

FelixCAAuer and others added 2 commits October 30, 2025 15:01
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@FelixCAAuer FelixCAAuer merged commit 9967832 into feature/markovTransition Oct 30, 2025
1 check passed
@FelixCAAuer FelixCAAuer deleted the feature/optimizeExcelReader branch October 30, 2025 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants