-
Notifications
You must be signed in to change notification settings - Fork 2
Architecture
Matthias Günter edited this page Mar 5, 2025
·
2 revisions
The solution is implemented in Python.
The XSD schema for NeTEx is loaded into the folder xsd as a submodule
All required libraries are defined in the requirements.txt
We had to use some very advanced libraries (__future):
- xxx
- xsdata is used for mapping the xml data into python classes.
- lmdb is used for intermediate storage
- GTFS
- shapes
- NeTEx
- xxx
- We always have a format to db step.
- Then we have a db to db step. The goal there is to xxx
- Then the db is written to the target format
There are auxiliary steps that:
- validate stuff
- load data from URL
- write result files to an ftp server
- The pipelines can be defined as simple scripts (in the form of a block with a sequence of scripts).
- Some variables are supported.
- A block stops, if it runs into an error.
sequenceDiagram
title PyNeTExConv2
participant NeTEx XSD
participant NeTEx XML
participant Python Dataclasses
participant Parsing
participant DuckDB
NeTEx XSD->>Python Dataclasses: xsdata generate -c netex.conf
NeTEx XML->>Parsing:lxml.etree.iterparse<br />event driven sax based<br />XML-parsing
loop #ff00ff for each first class object
Parsing->>Parsing:Inheritance stack from<br />FrameDefaults:<br /> 1. DataSourceRef<br /> 2. ResponsibilitySet<br /> 3. SrsName
Parsing->>Python Dataclasses:unmarshall etree into<br />python object
note over Parsing: Prior to marshall, execute any code<br />which does not have interdependencies<br />but does generate changes to the object.
Python Dataclasses->>DuckDB: marshall object into:<br /> 1. pickle<br /> 2. XML (legacy)<br /><br />INSERT OR REPLACE
Python Dataclasses->>DuckDB: Recursively resolve:<br /> 1. embedded objects with ids<br /> 2. objects referencing other objects
end
sequenceDiagram
title PyNeTExConv: Transformation (generalisation)
participant NeTEx XML Source
participant DuckDB Source
participant DuckDB Target
participant NeTEx XML Target
NeTEx XML Source->>DuckDB Source: Parse XML into database<br /><br />execute: netex_to_db.py
DuckDB Source->>DuckDB Target: Apply transformations and<br />introduce new objects:<br /> 1. changes to the timingmodel<br /> 2. calendars vs availabilitycondions<br /> 3. geographic projections<br /><br />execute: *_db_to_db.py
DuckDB Target->>NeTEx XML Target: Query both databases<br />for the objects which the target<br />database has created, or<br />exists as-is in the source.<br /><br />execute: *_db_to_xml.py
DuckDB Source->>NeTEx XML Target: .