Skip to content

TOSIT-IO/TDP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

92 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Trunk Data Platform

Trunk Data Platform is an Open Source, free, Hadoop distribution built from Apache projects source code.

Authors

This distribution is built by EDF (French electricity provider) & DGFIP (Tax Office by the French Ministry of Finance), through an association called TOSIT (The Open source I Trust).

Local build environment for components

In order to build the TDP components, two distinct images have been made:

  • tdp-builder containing Maven for Java compilation of the Apache Hadoop environment components.
  • tdp-builder-python containing a manylinux2014 image with different Python versions for the packaging of JupyterHub, Jupyterlab, Sparkmagic and Hue.

tdp-builder

Run the following script to build the image, open the container and be ready to compile the components with Maven:

./bin/start-build-env.sh

The components' versions and their repositories are found in tdp-core and must be compiled in the follwing order:

  • Zookeeper
  • Hadoop
  • Tez
  • Spark3
  • Hive
  • HBase
  • Ranger
  • Phoenix
  • Phoenix-queryserver
  • Knox
  • HBase Operator tools
  • Iceberg

The Maven compilation commands of the different components can be found in the tdp/README.md file of each project.

tdp-builder-python

Although the python coded components use the same tdp-builder-python image, they must be packaged seperately in different containers since each component needs its own envrionment:

Special case for the Apache Livy compilation

Apache Incubator Livy has its own compilation environment with its own instrsuctions which can be found in the Incubator Livy project.

Contributing

Contributions are always welcome!

See CONTRIBUTING.md for ways to get started.

About

Main TDP repository

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 14