EDIParse is a lightweight and extensible REST API service that parses EDIFACT message text inputs or files (e.g. APERAK, MSCONS, etc.) and returns structured JSON output. Built with Python and Pydantic, the service offers type-safe, high-performance parsing for energy and logistics data exchange formats.
- 📦 CURRENTLY, only supports EDIFACT message types: MSCONS, APERAK
- 🔄 Converts raw EDIFACT into clean, nested JSON structures
- 🧩 Fully extensible architecture using clean and hexagonal architecture concept
- 🔥 FastAPI-powered REST endpoints
- 📐 Pydantic v2 for modern data validation and serialization
- ⛓️ Easy integration into existing microservices and ETL pipelines
This document provides guidelines and instructions for developing and maintaining the EDIParse project.
This Python project uses OpenAPI Generator to generate the API endpoints (DTOs and router endpoints) while the core business logic is manually created following a clean (hexagonal) architecture pattern:
- Generator version: 7.14.0-SNAPSHOT (project's latest from current commit)
- Build package: org.openapitools.codegen.languages.PythonFastAPIServerCodegen
NOTE: Before starting the project, please generate the missing APIs first, see generate-openapi-endpoints.md otherwise, the application will not be built properly!
-
Python Version: This project requires Python 3.9 or higher.
-
Virtual Environment Setup:
# Create a virtual environment named .venv python -m venv .venv # Activate the virtual environment # On Windows: #.venv\Scripts\activate # On macOS/Linux: source .venv/bin/activate
-
Dependencies Installation:
NOTE: Please execute the following from the root directory with the virtual environment activated!
# Install uv if not already installed pip install uv # Install dependencies using uv uv pip install -e . # For development dependencies uv pip install -e ".[dev]"
NOTE: This project uses pyproject.toml for dependency management with the uv package manager.
-
Running the Application:
You can run the application in several ways:
With uv:
# Install dependencies and start the application uv pip install -e . PYTHONPATH=src uv run -m uvicorn ediparse.main:app --host 0.0.0.0 --port 8000
Without uv (should work in most of the cases):
# Install dependencies and start the application pip install -e . PYTHONPATH=src python -m uvicorn ediparse.main:app --host 0.0.0.0 --port 8000
Direct execution with uvicorn (should work in most of the cases):
# After installing dependencies PYTHONPATH=src uvicorn ediparse.main:app --host 0.0.0.0 --port 8000Or simply (if you have added the module
ediparseto your working environment):uvicorn ediparse.main:app --host 0.0.0.0 --port 8000
After starting the application, open your browser at
http://0.0.0.0:8000/docs/to see the API documentation.
To run the server on a Docker container, please execute the following from the root directory:
-
Building the Docker Image:
docker build -t ediparse .The Dockerfile uses a multi-stage build process:
openapi_generatorstage: Generates API endpoints from OpenAPI specificationbuilderstage: Installs the application and dependenciestest_runnerstage: Runs the testsservicestage: Creates the final image for deployment
For detailed information about the Docker build process, including important notes about file copying and known issues, see the Docker Build Process Documentation.
-
Running with Docker Compose:
docker compose up
# or including the build docker compose up --buildThis will start the service on port 8000, which maps to the internal port 8000.
Performance Optimization:
The project is configured to use Docker Compose's bake feature for better build performance. This is enabled by the
COMPOSE_BAKE=trueenvironment variable in the.envfile. No additional action is required.
-
Running all tests with pytest:
python -m pytest
-
Running all tests with unittest (standard library):
python -m unittest discover -s tests -v
-
Running a specific test module:
python -m pytest tests/ediparse/infrastructure/libs/edifactparser/test_edifact_parser.py
-
Notes:
- The repository includes both pytest- and unittest-style tests. If you encounter environment-specific pytest plugin/dependency issues, use the unittest command above as a fallback.
-
Test Structure:
- Place test files in the
testsdirectory - Use the naming convention
test_*.pyfor test files - Use unittest or pytest style tests (the project uses both, unittest is preferred)
- Place test files in the
-
Sample Tests:
import os import unittest from pathlib import Path from ediparse.infrastructure.libs.edifactparser.parser import EdifactParser class TestSimpleEdifactParsers(unittest.TestCase): """A simple test case to demonstrate testing in this project.""" def setUp(self): """Set up test fixtures before each test method.""" self.parser = EdifactParser() self.samples_dir = Path(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) / "tests" / "samples" self.aperak_sample_file_path_request = self.samples_dir / "aperak-message-example-request.txt" self.mscons_sample_file_path_request = self.samples_dir / "mscons-message-example-request.txt" def test_parse_aperak_sample_file(self): """Test that the parser can parse the sample file.""" # Read the sample file file_path = self.aperak_sample_file_path_request with open(file_path, encoding='utf-8') as f: edifact_data = f.read() # Parse the data parsed_object = self.parser.parse(edifact_data) # Verify some basic properties self.assertIsNotNone(parsed_object) self.assertEqual(parsed_object.unz_nutzdaten_endsegment.datenaustauschzaehler, 1) self.assertEqual(len(parsed_object.unh_unt_nachrichten), 1) def test_parse_mscons_sample_file(self): """Test that the parser can parse the sample file.""" # Read the sample file file_path = self.mscons_sample_file_path_request with open(file_path, encoding='utf-8') as f: edifact_data = f.read() # Parse the data parsed_object = self.parser.parse(edifact_data) # Verify some basic properties self.assertIsNotNone(parsed_object) self.assertEqual(parsed_object.unz_nutzdaten_endsegment.datenaustauschzaehler, 2) self.assertEqual(len(parsed_object.unh_unt_nachrichten), 2)
-
Test Fixtures:
- The project uses pytest fixtures defined in
tests/conftest.py - These fixtures provide a FastAPI test client for API testing
- The project uses pytest fixtures defined in
-
Sample Data:
- Sample EDIFACT messages are provided in the
tests/samplesdirectory:- MSCONS messages: mscons-message-example-request.txt
- APERAK messages: aperak-message-example-request.txt
- Use these for testing or create new ones as needed
- Sample EDIFACT messages are provided in the
- Use Arrange, Act, Verify (AAA) style for tests to improve readability and maintainability.
- The project contains both unittest and pytest tests; unittest is preferred for unit tests.
- When testing segment handlers, initialize the converter via the name-mangled private attribute on the handler instance:
This mirrors production behavior where converters are auto-detected but keeps tests deterministic.
handler._SegmentHandler__converter = SomeSegmentConverter(syntax_helper)
- New unit tests were added for GroupStateResolver and GroupStateResolverFactory; follow similar minimal, focused testing patterns.
-
Code Style:
- The project uses flake8 for linting
- Maximum line length is 120 characters
- Run flake8 to check for style issues:
flake8 src tests
-
Project Structure:
-
The project follows a hexagonal (clean) architecture pattern:
EDIParse # the root folder ├── docs # contains documentation related to this project, e.g.: openapi specs, how-to guides, etc. ├── scripts # helper scripts that can supports the project ├── src # the source code folder │ └── ediparse # main package folder │ ├── adapters # adapter folder │ │ └── inbound # Contains inbound adapters │ │ └── rest # Inbound REST API implementation │ │ ├── apis # **generated** from openapi-spec (not git commited) │ │ ├── impl # custom implementation of the adapters, contains controllers, routers, filters, etc. │ │ └── models # **generated** from openapi-spec (not git commited) │ ├── application # Contains application services and use cases │ │ ├── services # Contains service classes that orchestrate the business logic │ │ └── usecases # Contains use case implementations of inbound ports │ ├── domain # Contains the domain models and interfaces │ │ ├── models # Contains domain models of the business logic │ │ └── ports # Contains domain interfaces (inbound/outbound ports) │ │ └── inbound # Contains inbound ports (interfaces implemented by use cases) │ └── infrastructure # Contains infrastructure components │ └── libs # Contains library code that can be extracted as separate packages │ └── edifactparser # Parser library for EDIFACT specific messages │ ├── converters # Contains segment converters │ ├── exceptions # Contains parser-specific exceptions │ ├── handlers # Contains segment handlers │ ├── mods # Contains message-specific implementations │ │ ├── aperak # Contains APERAK-specific implementations (see specific documentation) │ │ └── mscons # Contains MSCONS-specific implementations (see specific documentation) │ ├── resolvers # Contains segment group resolvers │ ├── utils # Contains utility and helper functions │ └── wrappers # Contains library model wrappers │ └── segments # Contains common and global wrappers shared between message types └── tests # Contain all tests of the project
-
NOTE: Thus our root domain code source is
src/ediparse.
-
Parser Implementation:
- The parser follows the hexagonal architecture with a clear separation between the application and domain layers
- The application layer contains the
ParserServicewhich uses theParseMessageUseCaseto parse EDIFACT messages - The
ParseMessageUseCaseimplements theMessageParserPortinterface from the domain layer - The actual parsing logic is implemented in the
libs/edifactparserpackage - The parser uses a factory pattern to handle different segment types and message types
- Each segment type has its own converter in
libs/edifactparser/convertersand handlers inlibs/edifactparser/handlers - Each message type has its own resolver in
libs/edifactparser/mods/{message_type}/group_state_resolver.py - The main parser class is:
EdifactParserinlibs/edifactparser/parser.pyfor all EDIFACT message types
- The parser uses two factory classes:
SegmentHandlerFactoryfor creating handlers for different segment typesGroupStateResolverFactoryfor creating resolvers for different message types
- Domain models are located in:
libs/edifactparser/wrappers/segmentsfor common and global segmentslibs/edifactparser/wrappers/aperak/segmentsfor APERAK-specific segmentslibs/edifactparser/wrappers/mscons/segmentsfor MSCONS-specific segments
- Library related exceptions are defined in
libs/edifactparser/exceptionsand use these exceptions for library-specific error handling - For a detailed explanation of the parsing process, see MSCONS Parsing Process Documentation and APERAK Parsing Process Documentation
- For information about how the parser fits into the overall system architecture, see Architecture Documentation
-
API Implementation:
- The API is implemented using FastAPI
- The OpenAPI specification is in
docs/ediparse.openapi.yaml - The generated API endpoints are defined in
adapters/inbound/rest/apis - The custom implementation of the controllers/routers are in
adapters/inbound/rest/impland for the business logic it is thedomaindirectory - For details on API generation, see section openapi-generated-fastapi-server-components
-
Docker Deployment:
- The application is containerized and can be deployed using Docker
- The Docker image exposes port
8000 - Use the provided docker-compose.yaml for simple testings and deployments
-
Environment Variables:
- The application uses environment variables for configuration
- These can be set in the docker-compose.yaml file or passed to the container
This project follows Semantic Versioning 2.0.0 principles with a specific adaptation for the EDFIACT Parser Restify specification version.
The version format is MAJOR.MINOR.PATCH where:
1 . 2 . 0
| | |
1---|---|--> **MAJOR**: Incremented when making incompatible API changes
2---|--> **MINOR**: Contains the specification version
0--> **PATCH**: Incremented when making backward compatible bug fixes
The current version can be found in the pyproject.toml file.
There are several important places specifying the version numbers in this project:
- Project Version: Defined in
pyproject.toml - API Version: Defined in the project's OpenAPI specification
- Version on OpenCode: Defined in the publiccode.yml
Please ensure those places contain the same version number when a new version is released.
- Increment the MAJOR version when you make incompatible API changes
- Update the MINOR version when the supported EDIFACT message type specification version changes or introducing new EDIFACT-specific messages
- Increment the PATCH version when you make backward compatible bug fixes or enhancements
The project includes comprehensive documentation:
- Architecture Documentation: Provides an overview of the system architecture, including the hexagonal architecture pattern, building blocks, and architectural decisions.
- MSCONS Parsing Process Documentation: Describes the business logic for parsing EDIFACT MSCONS messages, including the parser architecture, segment types, handlers, converters, and how to extend the parser.
- APERAK Parsing Process Documentation: Describes the business logic for parsing EDIFACT APERAK messages, including the parser architecture, segment types, handlers, converters, and how to extend the parser.
The project also includes additional utilities:
- API Generation Documentation: Explains how to generate the API endpoints from the OpenAPI specification.
This project is licensed under the Apache-2.0 license.
We welcome contributions to improve EDIParse! There are several ways you can contribute to the project:
- Report bugs: If you find a bug, please open an issue with a detailed description.
- Suggest features: Do you have ideas for new features? Share them by creating an issue.
- Contribute code: Pull requests for bug fixes or new features are always welcome.
For detailed information on how to contribute, please refer to our CONTRIBUTING.md file:
- The CONTRIBUTING.md file contains comprehensive guidelines and instructions for anyone who wants to contribute to the project and participate in the development process.
Please refer to project documentation and scripts in the docs and scripts directories for more details.

