Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
4a0effd
Add draft for dynamic table creation
Simon-Will Dec 7, 2025
7cafcf4
Continue with implementing XSD reading
Simon-Will Dec 14, 2025
10f957c
Continue with implementation
Simon-Will Dec 21, 2025
b954457
Continue
Simon-Will Dec 27, 2025
4704ca6
Rename mastr_2.py to mastr.py
Simon-Will Dec 27, 2025
783f2b2
Get to working state
Simon-Will Jan 4, 2026
0aa8335
Make a couple small adjustments
Simon-Will Jan 4, 2026
fc5bfec
Implement date-based docs download
Simon-Will Jan 24, 2026
49b07e0
Make wrong XML schema a bit easier to see
Simon-Will Feb 2, 2026
1f6a670
Implement adding a custom primary key column
Simon-Will Feb 2, 2026
774b7ad
Add basic CSV export
Simon-Will Feb 2, 2026
f31622c
Implement translation feature
Simon-Will Feb 3, 2026
f743960
Generate SQLAlchemy core tables, not ORM models
Simon-Will Feb 4, 2026
4e248bf
Add functions for formatting SQLAlchemy tables
Simon-Will Feb 4, 2026
26899b8
Fix bug where docs download failed for non-given URL (due to typo)
Simon-Will Feb 4, 2026
661d980
Start fixing tests
Simon-Will Feb 4, 2026
9e1aecc
Add docstrings
Simon-Will Feb 5, 2026
5c7e5be
Remove unused code
Simon-Will Feb 5, 2026
2158b8a
Make code improvements (unused imports, etc.)
Simon-Will Feb 5, 2026
bd76ef1
Make existing tests work
Simon-Will Feb 16, 2026
2dac26b
Fix bug with primary keys
Simon-Will Feb 16, 2026
09cc1e0
Add tests for english & data model generation
Simon-Will Feb 16, 2026
cb932df
Update documentation
Simon-Will Feb 26, 2026
4fc054b
Add Simon Will to authors
Simon-Will Feb 26, 2026
bfb5aa4
Add missing conftest.py file
Simon-Will Feb 26, 2026
ba69e75
Improve clarity around artificial primary keys
Simon-Will Feb 26, 2026
e214d98
Add changelog entry
Simon-Will Feb 26, 2026
9b92e57
Add views for old table names + a few other things
Simon-Will Mar 3, 2026
6bb869e
Merge remote-tracking branch 'upstream/develop' into 516-dynamic-tabl…
Simon-Will Mar 3, 2026
c40fb9a
Remove some unused code
Simon-Will Mar 3, 2026
21e7ce7
Fix test
Simon-Will Mar 4, 2026
f3e2333
Address review comments
Simon-Will Mar 9, 2026
8c6a1db
516: Add test for XSD fallback
Simon-Will Mar 9, 2026
3d2f1da
Let open-MaStR logger propagate and handle with root logger
Simon-Will Mar 28, 2026
91e8ee2
Merge remote-tracking branch 'upstream/develop' into 516-dynamic-tabl…
Simon-Will Apr 8, 2026
60b9e77
516: Run black
Simon-Will Apr 8, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,14 @@ and the versioning aims to respect [Semantic Versioning](http://semver.org/spec/

## [v0.xx.x] Unreleased - 202x-xx-xx
### Added
- Add the option to pass a custom database schema
[#718](https://github.com/OpenEnergyPlatform/open-MaStR/pull/718)

### Changed
- Switch to dynamic table generation based on parsing of XSD files;
change table names and column names to align more closely with original names;
simplify CSV export by removing table joins
[#718](https://github.com/OpenEnergyPlatform/open-MaStR/pull/718)

### Removed

Expand Down Expand Up @@ -46,7 +52,6 @@ and the versioning aims to respect [Semantic Versioning](http://semver.org/spec/
[#685](https://github.com/OpenEnergyPlatform/open-MaStR/pull/685)



## [v0.16.0] Partial downloads with open-MaStR PartialPumpkinPull - 2025-11-26
### Added
- Add partial bulk download
Expand Down
13 changes: 8 additions & 5 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -14,30 +14,33 @@ authors:
given-names: "Christoph"
alias: "@chrwm"
affiliation: "Reiner Lemoine Institut"
orcid: " https://orcid.org/0000-0001-8144-5260"
orcid: "https://orcid.org/0000-0001-8144-5260"
- family-names: "Kotthoff"
given-names: "Florian"
alias: "@FlorianK13"
affiliation: "fortiss"
orcid: " https://orcid.org/0000-0003-3666-6122"
orcid: "https://orcid.org/0000-0003-3666-6122"
- family-names: "Tepe"
given-names: "Deniz"
alias: "@deniztepe"
affiliation: "fortiss"
orcid: " https://orcid.org/0000-0002-7605-0173"
orcid: "https://orcid.org/0000-0002-7605-0173"
- family-names: "Amme"
given-names: "Jonathan"
alias: "@nesnoj"
affiliation: "Reiner Lemoine Institut"
orcid: " https://orcid.org/0000-0002-8563-5261"
orcid: "https://orcid.org/0000-0002-8563-5261"
- family-names: "Imbrisca"
given-names: "Alexandra-Andreea"
alias: "@AlexandraImbrisca"
affiliation: "Technical University of Munich"
- family-names: 'Krämer'
given-names: "Kevin"
alias: "pt-kkraemer"
alias: "@pt-kkraemer"
affiliation: "ProjectTogether gGmbH"
- family-names: "Will"
given-names: "Simon"
alias: "@Simon-Will"
title: "open-MaStR"
type: software
license: AGPL-3.0
Expand Down
111 changes: 82 additions & 29 deletions docs/advanced.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ or the [SOAP API download](#soap-api-download).

## Configuration
### Database settings
#### Using a custom database


Configure your database with the `engine` parameter of [`Mastr`][open_mastr.Mastr].
Expand All @@ -12,22 +13,70 @@ It defines the engine of the database where the MaStR is mirrored to. Default is
The possible databases are:

* **sqlite**: By default the database will be stored in `$HOME/.open-MaStR/data/sqlite/open-mastr.db`.
* **own database**: The Mastr class accepts a sqlalchemy.engine.Engine object as engine which enables the user to
* **own database**: The Mastr class accepts a `sqlalchemy.engine.Engine` object as engine which enables the user to
use any other desired database such as PostgreSQL. The tables are created in the default DB schema, in PostgreSQL
this is `public`.
If you use an own database so, you need to insert the connection parameter into the engine variable. In the
example below, the following parameters are used: user `open-mastr`, password `open-mastr-pw`, database
`open-mastr-db`. Make sure it exists and the user has sufficient permissions.

!!! warning MySQL needs special table definitions
You can pass an engine for a MySQL database, but MySQL demands maximum lengths for its `VARCHAR` fields.
Since open-mastr generates its database string fields without maximum length, using MySQL will fail by default.
You can make it work by defining your own tables beforehand and [passing your own database schema](#using-a-custom-database-schema).

```python
from sqlalchemy import create_engine

# SQLite DB
engine_sqlite = create_engine("sqlite:///path/to/sqlite/database.db")
# PostgreSQL DB
engine_postgres = create_engine("postgresql+psycopg2://open-mastr:open-mastr-pw@localhost:55443/open-mastr-db")
mastr = Mastr(engine=engine_sqlite) # or engine=engine_postgres
mastr.download()
```

#### Using a custom database schema

By default, `Mastr.download` will download the MaStR documentation, generate a database schema from the contained XSD
files and create all database tables necessary for storing MaStR data.

If you want to prepare the database yourself, you can pass your own mapping from the original MaStR table name to your
database table to `Mastr.download` with the `mastr_table_to_db_table` parameter.
To get started with the default database schema, we recommend generating it from the MaStR docs using
`Mastr.generate_data_model` and then adjusting it:

```python
from sqlalchemy import create_engine
from open_mastr import Mastr, format_mastr_table_to_db_table

engine_postgres = create_engine("postgresql+psycopg2://open-mastr:open-mastr-pw@localhost:55443/open-mastr-db")
mastr = Mastr(engine=engine_postgres)

# Generate SQLAlchemy table definitions without creating the tables
mastr_table_to_db_table = mastr.generate_data_model()
# Print the tables so that you can see what was generated.
print(format_mastr_table_to_db_table(mastr_table_to_db_table))

# Now you need to go and create the tables in your database and adjust them to your needs.
# It's best to use the table definitions we generated and adjust them.
# Finally, you need your custom version of mastr_table_to_db_table.
mastr_table_to_your_custom_db_table = ...

# Download MaStR data into your custom tables.
mastr.download(mastr_table_to_db_table=mastr_table_to_your_custom_db_table)
```

from sqlalchemy import create_engine
When open-mastr encounters XML files in the MaStR download that have additional columns when compared to the database
tables, it will issue `ALTER` statements to add the columns to the database on the fly. To avoid this and to instead
skip the additional columns during import, you can pass the parameter `alter_database_tables=False`:

# SQLite DB
engine_sqlite = create_engine("sqlite:///path/to/sqlite/database.db")
# postgreSQL DB
engine_postgres = create_engine("postgresql+psycopg2://open-mastr:open-mastr-pw@localhost:55443/open-mastr-db")
db = Mastr(engine=engine_sqlite)
```python
# Download MaStR data into your custom tables.
mastr.download(
mastr_table_to_db_table=mastr_table_to_your_custom_db_table,
alter_database_tables=False,
)
```

### Project directory
Expand All @@ -37,24 +86,24 @@ You can change this default path, see [environment variables](#environment-varia
Default config files are copied to this directory which can be modified - but with caution.
The project home directory is structured as follows (files and folders below `data/` just an example).

```bash

```
.open-MaStR/
├── config
│   ├── credentials.cfg
│   ├── filenames.yml
│   ├── logging.yml
├── data
│   ├── dataversion-<date>
│ ├── docs_download
│ │ └── Dokumentation MaStR Gesamtdatenexport_<date>.zip
│   ├── sqlite
│      └── open-mastr.db
└── xml_download
└── Gesamtdatenexport_<date>.zip
│     └── open-mastr.db
└── xml_download
└── Gesamtdatenexport_<date>.zip
└── logs
└── open_mastr.log
```


* **config**
* `credentials.cfg` <br>
Credentials used to access
Expand All @@ -69,15 +118,15 @@ The project home directory is structured as follows (files and folders below `da
Contains exported data as csv files from method [`to_csv`][open_mastr.Mastr.to_csv]
* `sqlite` <br>
Contains the sqlite database in `open-mastr.db`
* `docs_download` <br>
Contains the documentation of the MaStR download.
* `xml_download` <br>
Contains the bulk download in `Gesamtdatenexport_<date>.zip` <br>
New bulk download versions overwrite older versions.
* **logs**
* `open_mastr.log` <br>
* `open_mastr.log` <br>
The files stores the logging information from executing open-mastr.



### Logs

For the download via the API, logs are stored in a single file in `/$HOME/<user>/.open-MaStR/logs/open_mastr.log`.
Expand All @@ -87,7 +136,6 @@ By default, the log level is set to `INFO`. You can increase or decrease the ver
or adjusting it manually in your code. E.g. to enable `DEBUG` messages in `open_mastr.log` you can use the following snippet:

```python

import logging
from open_mastr import Mastr

Expand All @@ -96,7 +144,6 @@ or adjusting it manually in your code. E.g. to enable `DEBUG` messages in `open_
logging.getLogger("open-MaStR").setLevel(logging.DEBUG)
```


### Data

If the zipped dump of the MaStR is downloaded, it is saved in the folder `$HOME/.open-MaStR/data/xml_download`.
Expand All @@ -122,8 +169,8 @@ There are some environment variables to customize open-MaStR:
## Bulk download

On the homepage [MaStR/Datendownload](https://www.marktstammdatenregister.de/MaStR/Datendownload) a zipped folder containing the whole
MaStR is offered. The data is delivered as xml-files. The official documentation can be found
on the same page (in german). This data is updated on a daily base.
MaStR is offered. The data is delivered as XML files. The official documentation can be found
on the same page (in German). This data is updated on a daily basis.

``` mermaid
flowchart LR
Expand All @@ -132,9 +179,8 @@ flowchart LR
id2 --> id3[("📗 open-mastr database")]
id3 --> id4("🔧 Decode and cleanse data")
id4 --> id3
id3 --> id5("Merge corresponding tables
and save as csv")
id5 --> id6>"📜 open-mastr csv files"]
id3 --> id5("Export to CSV")
id5 --> id6>"📜 open-mastr CSV files"]
click id1 "https://www.marktstammdatenregister.de/MaStR/Datendownload" _blank
click id2 "https://github.com/OpenEnergyPlatform/open-MaStR/blob/7b155a9ebdd5204de8ae6ba7a96036775a1f4aec/open_mastr/xml_download/utils_write_to_database.py#L17C6-L17C6" _blank
click id4 "https://github.com/OpenEnergyPlatform/open-MaStR/blob/7b155a9ebdd5204de8ae6ba7a96036775a1f4aec/open_mastr/xml_download/utils_cleansing_bulk.py#L10" _blank
Expand All @@ -143,17 +189,24 @@ flowchart LR
```


In the following, the process is described that is started when calling the [`Mastr.download`][open_mastr.Mastr.download] function with the parameter `method`="bulk".
First, the zipped files are downloaded and saved in `$HOME/.open-MaStR/data/xml_download`. The zipped folder contains many xml files,
which represent the different tables from the MaStR. Those tables are then parsed to a sqlite database. If only some specific
tables are of interest, they can be specified with the parameter `data`. Every table that is selected in `data` will be deleted from the local database, if existent, and then filled with data from the xml files.
In the following, the process is described that is started when calling the [`Mastr.download`][open_mastr.Mastr.download] without parameters.
First, the zipped documentation is downloaded and saved in `$HOME/.open-MaStR/data/docs_download`. The zipped documentation contains
XSD files that describe the MaStR XML files that contain the data. open-mastr reads the XSD files and generates a database schema for
importing the data. I.e., for each MaStR table, it defines a database table and then creates it in a SQLite database.

Then, the zipped files are downloaded and saved in `$HOME/.open-MaStR/data/xml_download`. The zipped folder contains
many XML files, which represent the different tables from the MaStR. Those XML files are then read and imported into the
previously created SQLite database tables.

If only some specific tables are of interest, they can be specified with the parameter `data`. Every table that is
selected in `data` will be deleted from the local database, if existent, and then filled with data from the xml files.

In the next step, a basic data cleansing is performed. Many entries in the MaStR from the bulk download are replaced by numbers.
As an example, instead of writing the german states where the unit is registered (Saxony, Brandenburg, Bavaria, ...) the MaStR states
As an example, instead of writing the German states where the unit is registered (Saxony, Brandenburg, Bavaria, ...) the MaStR states
corresponding digits (7, 2, 9, ...). One major step of cleansing is therefore to replace those digits with their original meaning.
Moreover, the datatypes of different entries are set in the data cleansing process and corrupted files are repaired.

If needed, the tables in the database can be obtained as csv files. Those files are created by first merging corresponding tables (e.g all tables that contain information about solar) and then dumping those tables to `.csv` files with the [`to_csv`][open_mastr.Mastr.to_csv] method.
The tables in the database can be exported to CSV files using the [`to_csv`][open_mastr.Mastr.to_csv] method.

**Note**: By default, existing zip files in `$HOME/.open-MaStR/data/xml_download` are deleted when a new file is
downloaded. You can change this behavior by setting `keep_old_downloads`=True in
Expand Down
Loading
Loading