Skip to content

Architecture

Matthias Günter edited this page Mar 5, 2025 · 2 revisions

Modules / Programming

The solution is implemented in Python.

Submodules

The XSD schema for NeTEx is loaded into the folder xsd as a submodule

Requirements

All required libraries are defined in the requirements.txt

We had to use some very advanced libraries (__future):

  • xxx

Important processing

  • xsdata is used for mapping the xml data into python classes.
  • lmdb is used for intermediate storage

Handling of Embeddings

Handling of operating days

Handling of interchanges

Omitted parts

  • GTFS
    • shapes
  • NeTEx
    • xxx

Pipeline

  • We always have a format to db step.
  • Then we have a db to db step. The goal there is to xxx
  • Then the db is written to the target format

There are auxiliary steps that:

  • validate stuff
  • load data from URL
  • write result files to an ftp server

Scripts

  • The pipelines can be defined as simple scripts (in the form of a block with a sequence of scripts).
  • Some variables are supported.
  • A block stops, if it runs into an error.

Logging

Importing NeTEx sequence diagram

sequenceDiagram
title PyNeTExConv2

participant NeTEx XSD
participant NeTEx XML
participant Python Dataclasses
participant Parsing
participant DuckDB

NeTEx XSD->>Python Dataclasses: xsdata generate -c netex.conf
NeTEx XML->>Parsing:lxml.etree.iterparse<br />event driven sax based<br />XML-parsing

loop #ff00ff for each first class object
Parsing->>Parsing:Inheritance stack from<br />FrameDefaults:<br /> 1. DataSourceRef<br /> 2. ResponsibilitySet<br /> 3. SrsName
Parsing->>Python Dataclasses:unmarshall etree into<br />python object

note over Parsing: Prior to marshall, execute any code<br />which does not have interdependencies<br />but does generate changes to the object.

Python Dataclasses->>DuckDB: marshall object into:<br /> 1. pickle<br /> 2. XML (legacy)<br /><br />INSERT OR REPLACE
Python Dataclasses->>DuckDB: Recursively resolve:<br /> 1. embedded objects with ids<br /> 2. objects referencing other objects
end
Loading

Transformation NeTEx sequence diagram

sequenceDiagram
title PyNeTExConv: Transformation (generalisation)

participant NeTEx XML Source
participant DuckDB Source
participant DuckDB Target

participant NeTEx XML Target

NeTEx XML Source->>DuckDB Source: Parse XML into database<br /><br />execute: netex_to_db.py

DuckDB Source->>DuckDB Target: Apply transformations and<br />introduce new objects:<br /> 1. changes to the timingmodel<br /> 2. calendars vs availabilitycondions<br /> 3. geographic projections<br /><br />execute: *_db_to_db.py

DuckDB Target->>NeTEx XML Target: Query both databases<br />for the objects which the target<br />database has created, or<br />exists as-is in the source.<br /><br />execute: *_db_to_xml.py
DuckDB Source->>NeTEx XML Target: .
Loading

Clone this wiki locally