Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
79e54d3
Minimal viable product: partial bulk download
pt-kkraemer Apr 10, 2025
9fbfcc8
Merge pull request #634 from OpenEnergyPlatform/production
nesnoj Apr 19, 2025
2173418
Merge remote-tracking branch 'upstream/develop' into feature-616-part…
pt-kkraemer Apr 25, 2025
6a86ac5
Remove "partial-bulk" from helpers functions
pt-kkraemer Apr 25, 2025
3580bfd
Remove "partial bulk" from Mastr.download function
pt-kkraemer Apr 25, 2025
6c5a052
Add download completeness check, add sequential download functionality
pt-kkraemer Apr 25, 2025
17e8347
Remove default branch for test pypi publication
nesnoj Apr 29, 2025
c00d605
Update changelog
nesnoj Apr 29, 2025
aa4bafc
Remove trailing white space in changelog to trigger tests
nesnoj Apr 29, 2025
5576e53
Merge pull request #636 from OpenEnergyPlatform/fix-612-pypi-test-pub…
nesnoj Apr 29, 2025
d18fafe
Add unzip_http as own function instead of install and import
pt-kkraemer May 5, 2025
c579ab3
Add katalogwerte_bool and fit code to new utils.unzip_http
pt-kkraemer May 5, 2025
4fe0a0a
Remove unnecessary imports from unzip_http
pt-kkraemer May 12, 2025
e3d3b3d
Prepare metadata file creation
pt-kkraemer Jul 16, 2025
169d4ea
Merge remote-tracking branch 'upstream/develop' into feature-616-part…
pt-kkraemer Jul 16, 2025
84b8647
Moving two check functions outside download_xml function #616
pt-kkraemer Jul 18, 2025
2d681cf
Deprecation of date='existing', merging of partial and full download …
pt-kkraemer Jul 18, 2025
118f750
Change print statements #616
FlorianK13 Jul 21, 2025
c8177fb
Create test function for delete_xl_files #616
FlorianK13 Jul 21, 2025
e219fa1
Add test for partial download #616
FlorianK13 Jul 21, 2025
566c75f
Remove "cleansing" from print statements #644
FlorianK13 Jul 21, 2025
7251b33
Update Changelog #644
FlorianK13 Jul 21, 2025
99af202
Add test for delete_zip_file_if_corrupted #616
pt-kkraemer Jul 23, 2025
7974865
Add Kevin Krämer to CITATION.cff
pt-kkraemer Jul 31, 2025
e392714
Add PR to CHANGELOG #616
pt-kkraemer Jul 31, 2025
ed3f249
Add "Einheittyp" to system_catalog #651
pt-kkraemer Jul 31, 2025
09fc84e
Add PR to changelog #651
pt-kkraemer Jul 31, 2025
0bfa49f
Merge pull request #650 from OpenEnergyPlatform/644-change-cleansing-…
FlorianK13 Aug 4, 2025
0f739e3
Merge pull request #653 from pt-kkraemer/bug-651-additional-system-ca…
FlorianK13 Aug 4, 2025
49e6c42
Update docstring gescription of partial download when using "data" #616
pt-kkraemer Aug 18, 2025
258724a
Delete unused print message #616
FlorianK13 Aug 19, 2025
a2e3dc7
Extend docs #616
FlorianK13 Aug 19, 2025
c8c16ce
Merge pull request #652 from pt-kkraemer/feature-616-partial-bulk-dow…
FlorianK13 Aug 19, 2025
d603388
Replace print() statements by logging #657
nesnoj Oct 20, 2025
a9081ad
Update changelog
nesnoj Oct 20, 2025
ed96ecb
Replace print() statements by logging in unzip_http.py and apply blac…
nesnoj Oct 20, 2025
e7ca886
Logging: add formatter for debug messages #664
nesnoj Oct 20, 2025
2f7ba6c
Logging: set package log level instead of global log level #664
nesnoj Oct 20, 2025
57a5896
Logging: do not propagate messages to global logger #664
nesnoj Oct 20, 2025
5d6d4d5
Logging: set default console log level to info #664
nesnoj Oct 20, 2025
bf7e463
Add splash screen
nesnoj Oct 20, 2025
0ff6c81
Logging: extend instructions in docs #664
nesnoj Oct 20, 2025
2899552
Logging: extend instructions in docs #664
nesnoj Oct 20, 2025
40f8d2d
Merge branch 'develop' into feature-564-keep-old-bulk-files
nesnoj Oct 21, 2025
ca2229c
Add option to keep old zip files on download #564
nesnoj Oct 21, 2025
8a6911f
Complete docstring #564
nesnoj Oct 21, 2025
7200b29
Add technology checks to full bulk download: do not download if data …
nesnoj Oct 21, 2025
33903e0
Complete docstring and adjust messages #564
nesnoj Oct 21, 2025
95f2dab
Update changelog
nesnoj Oct 21, 2025
d1206f0
Adjust docs on partial downloads
nesnoj Oct 21, 2025
b019354
Extend docs #564
nesnoj Oct 21, 2025
c51f379
Update changelog
nesnoj Oct 21, 2025
b5a4aac
Add fixture zipped_xml_file_path to test_mastr.py
nesnoj Oct 21, 2025
9a007ca
Add test: check if keeping old downloads works #564
nesnoj Oct 21, 2025
d6ef330
Set number of parallel CI jobs to 1
nesnoj Oct 21, 2025
0f53502
Update changelog
nesnoj Oct 21, 2025
4eab04d
Merge pull request #669 from OpenEnergyPlatform/fix-660-fix-tests-lim…
nesnoj Oct 25, 2025
1621734
Merge branch 'develop' into feature-657-664-improve-logging
nesnoj Oct 25, 2025
0ec5ed7
Merge branch 'develop' into feature-564-keep-old-bulk-files
nesnoj Oct 25, 2025
846d500
Move check of keep_old_downloads outside of function #564
nesnoj Oct 25, 2025
d2bcd43
Merge pull request #666 from OpenEnergyPlatform/feature-657-664-impro…
nesnoj Oct 25, 2025
e68ae1d
Merge branch 'develop' into feature-564-keep-old-bulk-files
nesnoj Oct 25, 2025
2d1b715
Repair or delete broken links #679
FlorianK13 Nov 11, 2025
cffe9eb
Fix formatting in README #679
FlorianK13 Nov 11, 2025
e8fb628
Merge pull request #667 from OpenEnergyPlatform/feature-564-keep-old-…
nesnoj Nov 11, 2025
de05479
Merge pull request #680 from OpenEnergyPlatform/679-fix-readme-badges
FlorianK13 Nov 13, 2025
0bce4ea
Version update v0.16.0
nesnoj Nov 25, 2025
dde7516
Change release title
nesnoj Nov 25, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 0.15.0
current_version = 0.16.0
parse = (?P<major>\d+)\.(?P<minor>\d+)\.(?P<patch>\d+)((?P<release>(a|na))+(?P<build>\d+))?
serialize =
{major}.{minor}.{patch}{release}{build}
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/ci-develop.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ jobs:
runs-on: ${{ matrix.os }}
if: ${{ !github.event.pull_request.draft }}
strategy:
max-parallel: 1
matrix:
os: [macos-latest, ubuntu-latest, windows-latest]
python-version: ['3.10', '3.11', '3.12']
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/ci-production.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ jobs:
runs-on: ${{ matrix.os }}
if: ${{ !github.event.pull_request.draft }}
strategy:
max-parallel: 1
matrix:
os: [macos-latest, ubuntu-latest, windows-latest]
python-version: ['3.10', '3.11', '3.12']
Expand All @@ -32,7 +33,7 @@ jobs:
- name: create package
run: python -m build --sdist
- name: import open-mastr
run: python -m pip install ./dist/open_mastr-0.15.0.tar.gz
run: python -m pip install ./dist/open_mastr-0.16.0.tar.gz
- name: Create credentials file
env:
MASTR_TOKEN: ${{ secrets.MASTR_TOKEN }}
Expand Down
2 changes: 0 additions & 2 deletions .github/workflows/test-pypi-publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,6 @@ jobs:
environment: pypi-publish
steps:
- uses: actions/checkout@v4
with:
ref: release
- name: Set up Python 3.10
uses: actions/setup-python@v3
with:
Expand Down
20 changes: 18 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,25 @@ For each version important additions, changes and removals are listed here.
The format is inspired from [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
and the versioning aims to respect [Semantic Versioning](http://semver.org/spec/v2.0.0.html).

## [v0.XX.X] unreleased - 202X-XX-XX
## [v0.16.0] Partial downloads with open-MaStR PartialPumpkinPull - 2025-11-26
### Added
- Add partial bulk download
[#652](https://github.com/OpenEnergyPlatform/open-MaStR/pull/652)
### Changed
- Updates the system_catalog dict with missing Einheittyp values
[#653](https://github.com/OpenEnergyPlatform/open-MaStR/pull/653)
- Fix package publication workflow
[#636](https://github.com/OpenEnergyPlatform/open-MaStR/pull/636)
- Change print statement about data cleansing
[#650](https://github.com/OpenEnergyPlatform/open-MaStR/pull/650)
- Improve logging
[#666](https://github.com/OpenEnergyPlatform/open-MaStR/pull/666)
- Several improvements in bulk download: Support retaining old zip bulk files;
Prevent zip file deletion on full download; Add technology checks to full
bulk download
[#667](https://github.com/OpenEnergyPlatform/open-MaStR/pull/667)
- Limit number of parallel CI jobs
[#669](https://github.com/OpenEnergyPlatform/open-MaStR/pull/669)
### Removed


Expand All @@ -35,7 +51,7 @@ and the versioning aims to respect [Semantic Versioning](http://semver.org/spec/
[#621](https://github.com/OpenEnergyPlatform/open-MaStR/pull/621)
### Removed
- Moved old code artefacts from `scripts` folder to paper specific
[repository](https://github.com/FlorianK13/verify-marktstammdaten)
[repository](https://github.com/FlorianK13/verify-marktstammdaten)
[#561](https://github.com/OpenEnergyPlatform/open-MaStR/pull/561)
- Remove old dependencies and broken README links
[#619](https://github.com/OpenEnergyPlatform/open-MaStR/pull/619)
Expand Down
8 changes: 6 additions & 2 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -34,10 +34,14 @@ authors:
given-names: "Alexandra-Andreea"
alias: "@AlexandraImbrisca"
affiliation: "Technical University of Munich"
- family-names: 'Krämer'
given-names: "Kevin"
alias: "pt-kkraemer"
affiliation: "ProjectTogether gGmbH"
title: "open-MaStR"
type: software
license: AGPL-3.0
version: 0.15.0
version: 0.16.0
doi:
date-released: 2025-04-19
date-released: 2025-11-26
url: "https://github.com/OpenEnergyPlatform/open-MaStR/"
5 changes: 2 additions & 3 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,6 @@ These projects already use open-mastr:
- `Wasserstoffatlas <https://wasserstoffatlas.de/>`_
- `EE-Status App <https://ee-status.de/>`_
- `Digiplan Anhalt <https://digiplan.rl-institut.de/>`_
- `Data Quality Assessment of the MaStR <https://marktstammdaten.kotthoff.dev/>`_
- `EmPowerPlan <https://epp.rl-institut.de/>`_
- `Goal100 Monitor <https://goal100.org/monitor>`_

Expand All @@ -119,7 +118,7 @@ changes in a `Pull Request <https://github.com/OpenEnergyPlatform/open-MaStR/pul
External Resources
===================
Besides open-mastr, some other resources exist that ease the process of working with the Marktstammdatenregister:
- If you are interested in browsing the MaStR online, check out the github organisation `Marktstammdatenregister.dev <https://github.com/marktstammdatenregister-dev>`_.

- The `bundesAPI/Marktstammdaten-API <https://github.com/bundesAPI/marktstammdaten-api>`_ is another implementation to access data via an official API.

Collaboration
Expand All @@ -146,7 +145,7 @@ Data


.. |badge_license| image:: https://img.shields.io/github/license/OpenEnergyPlatform/open-MaStR
:target: LICENSE.txt
:target: LICENSE.md
:alt: License

.. |badge_rtd| image:: https://readthedocs.org/projects/open-mastr/badge/?style=flat
Expand Down
20 changes: 18 additions & 2 deletions docs/advanced.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ The project home directory is structured as follows (files and folders below `da
File names are defined here.
* `logging.yml` <br>
Logging configuration. For changing the log level to increase or decrease details of log
messages, edit the level of the handlers.
messages, edit the level of the handlers. See below for details on logging.
* **data**
* `dataversion-<date>` <br>
Contains exported data as csv files from method [`to_csv`][open_mastr.Mastr.to_csv]
Expand All @@ -83,6 +83,19 @@ The project home directory is structured as follows (files and folders below `da
For the download via the API, logs are stored in a single file in `/$HOME/<user>/.open-MaStR/logs/open_mastr.log`.
New logging messages are appended. It is recommended to delete the log file from time to time because of its required disk space.

By default, the log level is set to `INFO`. You can increase or decrease the verbosity by either changing `logging.yml` (see above)
or adjusting it manually in your code. E.g. to enable `DEBUG` messages in `open_mastr.log` you can use the following snippet:

```python

import logging
from open_mastr import Mastr

# Increase to DEBUG to show more details in open_mastr.log
# Must be called after importing open_mastr to have the open-MaStR logger imported
logging.getLogger("open-MaStR").setLevel(logging.DEBUG)
```


### Data

Expand Down Expand Up @@ -148,8 +161,11 @@ If needed, the tables in the database can be obtained as csv files. Those files

=== "Disadvantages"
* No single tables or entries can be downloaded
* Download takes long time
* Download takes long time (you can use the partial download though, see [Getting Started](getting_started.md#bulk-download))

**Note**: By default, existing zip files in `$HOME/.open-MaStR/data/xml_download` are deleted when a new file is
downloaded. You can change this behavior by setting `keep_old_downloads`=True in
[`Mastr.download()`][open_mastr.Mastr.download].

## SOAP API download

Expand Down
11 changes: 10 additions & 1 deletion docs/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,16 @@ db = Mastr()
db.download()
```

When a `Mastr` object is initialized, a sqlite database is created in `$HOME/.open-MaStR/data/sqlite`. With the function `Mastr.download()`, the **whole MaStR is downloaded** in the zipped xml file format. It is then read into the sqlite database and simple data cleansing functions are started.
When a `Mastr` object is initialized, a sqlite database is created in `$HOME/.open-MaStR/data/sqlite`. With the function [`Mastr.download()`][open_mastr.Mastr.download], the **whole MaStR is downloaded** in the zipped xml file format. It is then read into the sqlite database and simple data cleansing functions are started.

If you are interested in a specific part of the dataset, you can specify this by using the `data` parameter:

```python
from open_mastr import Mastr

db = Mastr()
db.download(data=["wind","hydro"])
```

More detailed information can be found in the section [bulk download](advanced.md#bulk-download).

Expand Down
2 changes: 1 addition & 1 deletion environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ channels:
- conda-forge
- defaults
dependencies:
- python=3.10
- python=3.11
62 changes: 39 additions & 23 deletions open_mastr/mastr.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,10 @@
from sqlalchemy import inspect, create_engine

# import xml dependencies
from open_mastr.xml_download.utils_download_bulk import download_xml_Mastr
from open_mastr.xml_download.utils_download_bulk import (
download_xml_Mastr,
delete_xml_files_not_from_given_date,
)
from open_mastr.xml_download.utils_write_to_database import (
write_mastr_xml_to_database,
)
Expand All @@ -23,6 +26,10 @@
create_db_query,
db_query_to_csv,
reverse_fill_basic_units,
delete_zip_file_if_corrupted,
create_database_engine,
rename_table,
create_translated_database_engine,
)
from open_mastr.utils.config import (
create_data_dir,
Expand All @@ -33,13 +40,6 @@
)
import open_mastr.utils.orm as orm

# import initialize_database dependencies
from open_mastr.utils.helpers import (
create_database_engine,
rename_table,
create_translated_database_engine,
)

# constants
from open_mastr.utils.constants import TECHNOLOGIES, ADDITIONAL_TABLES

Expand Down Expand Up @@ -92,7 +92,10 @@ def __init__(self, engine="sqlite", connect_to_translated_db=False) -> None:
else:
self.engine = create_database_engine(engine, self._sqlite_folder_path)

print(
log.info(
"\n==================================================\n"
"---------> open-MaStR started <---------\n"
"==================================================\n"
f"Data will be written to the following database: {self.engine.url}\n"
"If you run into problems, try to "
"delete the database and update the package by running "
Expand All @@ -107,6 +110,7 @@ def download(
data=None,
date=None,
bulk_cleansing=True,
keep_old_downloads: bool = False,
api_processes=None,
api_limit=50,
api_chunksize=1000,
Expand All @@ -126,8 +130,8 @@ def download(
from marktstammdatenregister.de,
(see :ref:`Configuration <Configuration>`). Default to 'bulk'.
data : str or list or None, optional
Determines which types of data are written to the database. If None, all data is
used. If it is a list, possible entries are listed below with respect to the download method. Missing categories are
Determines which data is partially downloaded from the bulk download and written to the database. If None, all data is downloaded and written to the database.
If it is a list, possible entries are listed below with respect to the download method. Missing categories are
being developed. If only one data is of interest, this can be given as a string. Default to None, where all data is included.

| Data | Bulk | API |
Expand Down Expand Up @@ -157,7 +161,7 @@ def download(
|-----------------------|------|------|
| "today" | latest files are downloaded from marktstammdatenregister.de | - |
| "20230101" | If file from this date exists locally, it is used. Otherwise it throws an error (You can only receive todays data from the server) | - |
| "existing" | Use latest downloaded zipped xml files, throws an error if the bulk download folder is empty | - |
| "existing" | Deprecated since 0.16, see [#616](https://github.com/OpenEnergyPlatform/open-MaStR/issues/616#issuecomment-3089377062) | - |
| "latest" | - | Retrieve data that is newer than the newest data already in the table |
| datetime.datetime(2020, 11, 27) | - | Retrieve data that is newer than this time stamp |
| None | set date="today" | set date="latest" |
Expand All @@ -168,6 +172,8 @@ def download(
In its original format, many entries in the MaStR are encoded with IDs. Columns like
`state` or `fueltype` do not contain entries such as "Hessen" or "Braunkohle", but instead
only contain IDs. Cleansing replaces these IDs with their corresponding original entries.
keep_old_downloads: bool
If set to True, prior downloaded MaStR zip files will be kept.
api_processes : int or None or "max", optional
Number of parallel processes used to download additional data.
Defaults to `None`. If set to "max", the maximum number of possible processes
Expand Down Expand Up @@ -233,12 +239,20 @@ def download(
xml_folder_path,
f"Gesamtdatenexport_{bulk_download_date}.zip",
)
download_xml_Mastr(zipped_xml_file_path, date, xml_folder_path)

print(
f"\nWould you like to speed up the bulk download?\n"
f"Try our new parallelized processing by setting os.environ['USE_RECOMMENDED_NUMBER_OF_PROCESSES'] = True "
f"or configure your own number of processes via os.environ['NUMBER_OF_PROCESSES'] = your_number\n"
delete_zip_file_if_corrupted(zipped_xml_file_path)
if not keep_old_downloads:
delete_xml_files_not_from_given_date(
zipped_xml_file_path,
xml_folder_path,
)

download_xml_Mastr(zipped_xml_file_path, date, data, xml_folder_path)

log.info(
"\nWould you like to speed up the creation of your MaStR database?\n"
"Try our new parallelized processing by setting os.environ['USE_RECOMMENDED_NUMBER_OF_PROCESSES'] = True "
"or configure your own number of processes via os.environ['NUMBER_OF_PROCESSES'] = your_number\n"
)

write_mastr_xml_to_database(
Expand All @@ -255,8 +269,8 @@ def download(
# Set api_processes to None in order to avoid the malfunctioning usage
if api_processes:
api_processes = None
print(
"Warning: The implementation of parallel processes "
log.warning(
"The implementation of parallel processes "
"is currently under construction. Please let "
"the argument api_processes at the default value None."
)
Expand Down Expand Up @@ -425,9 +439,11 @@ def translate(self) -> None:
try:
os.remove(new_path)
except Exception as e:
print(f"An error occurred: {e}")
log.error(
f"An error occurred while removing old translated database: {e}"
)

print("Replacing previous version of the translated database...")
log.info("Replacing previous version of the translated database...")

for table in inspector.get_table_names():
rename_table(table, inspector.get_columns(table), self.engine)
Expand All @@ -436,9 +452,9 @@ def translate(self) -> None:

try:
os.rename(old_path, new_path)
print(f"Database '{old_path}' changed to '{new_path}'")
log.info(f"Database '{old_path}' changed to '{new_path}'")
except Exception as e:
print(f"An error occurred: {e}")
log.error(f"An error occurred while renaming database: {e}")

self.engine = create_engine(f"sqlite:///{new_path}")
self.is_translated = True
11 changes: 8 additions & 3 deletions open_mastr/soap_api/metadata/description.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
from io import BytesIO
import logging
import re
from urllib.request import urlopen
from zipfile import ZipFile
import xmltodict
from collections import OrderedDict

log = logging.getLogger(__name__)


class DataDescription(object):
"""
Expand Down Expand Up @@ -150,9 +153,11 @@ def functions_data_documentation(self):
fcn["sequence"]["element"]["@type"].split(":")[1]
]["sequence"]["element"]
else:
print(type(fcn["sequence"]))
print(fcn["sequence"])
raise ValueError
log.error(f"Unexpected sequence type: {type(fcn['sequence'])}")
log.error(f"Sequence content: {fcn['sequence']}")
raise ValueError(
f"Unexpected sequence structure in function metadata"
)

# Add data for inherited columns from base types
if "@base" in fcn:
Expand Down
11 changes: 6 additions & 5 deletions open_mastr/utils/config/logging.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ disable_existing_loggers: False
formatters:
standard:
format: "%(asctime)s [%(levelname)s] %(message)s"
debug:
format: "%(asctime)s [%(levelname)s] %(name)s:%(funcName)s:%(lineno)d - %(message)s"

handlers:
console:
Expand All @@ -12,14 +14,13 @@ handlers:
class: "logging.StreamHandler"
stream: "ext://sys.stdout"
file:
class: "logging.FileHandler"
level: "DEBUG"
formatter: "standard"
formatter: "debug"
class: "logging.FileHandler"
mode: "a"

root:
level: "DEBUG"

loggers:
open-MaStR:
level: "INFO"
handlers: ["console", "file"]
propagate: no
Loading