Trunk Data Platform is an Open Source, free, Hadoop distribution built from Apache projects source code.
This distribution is built by EDF (French electricity provider) & DGFIP (Tax Office by the French Ministry of Finance), through an association called TOSIT (The Open source I Trust).
In order to build the TDP components, two distinct images have been made:
- tdp-builder containing Maven for Java compilation of the Apache Hadoop environment components.
- tdp-builder-python containing a manylinux2014 image with different Python versions for the packaging of JupyterHub, Jupyterlab, Sparkmagic and Hue.
Run the following script to build the image, open the container and be ready to compile the components with Maven:
./bin/start-build-env.shThe components' versions and their repositories are found in tdp-core and must be compiled in the follwing order:
- Zookeeper
- Hadoop
- Tez
- Spark3
- Hive
- HBase
- Ranger
- Phoenix
- Phoenix-queryserver
- Knox
- HBase Operator tools
- Iceberg
The Maven compilation commands of the different components can be found in the tdp/README.md file of each project.
Although the python coded components use the same tdp-builder-python image, they must be packaged seperately in different containers since each component needs its own envrionment:
Apache Incubator Livy has its own compilation environment with its own instrsuctions which can be found in the Incubator Livy project.
Contributions are always welcome!
See CONTRIBUTING.md for ways to get started.
