diff --git a/docs/how/add-custom-ingestion-source.md b/docs/how/add-custom-ingestion-source.md index 3678a54b2c349c..20b008451d51f0 100644 --- a/docs/how/add-custom-ingestion-source.md +++ b/docs/how/add-custom-ingestion-source.md @@ -22,7 +22,13 @@ the [metadata-ingestion source guide](../../metadata-ingestion/adding-source.md) To be able to use this source you just need to do a few things. 1. Build a python package out of your project including the custom source class. -2. Install this package in your working environment where you are using the Datahub CLI to ingest metadata. +2. Publish the package to your internal PyPI server using a tool like `twine`. +3. Configure `pip` to use your internal PyPI server by modifying `pip.conf` (on Linux) or `pip.ini` (on Windows) to include your internal PyPI server URL: + ```ini + [global] + index-url = https://your-internal-pypi-server/simple + ``` +4. Install this package in your working environment where you are using the Datahub CLI to ingest metadata. Now you are able to just reference your ingestion source class as a type in the YAML recipe by using the fully qualified package name. For example if your project structure looks like this `/src/my-source/custom_ingestion_source.py` @@ -35,6 +41,19 @@ source: # place for your custom config defined in the configModel ``` +To install the custom source package from your internal PyPI server, use the following `pip` command: + +```bash +pip install my-source +``` + +```yaml +source: + type: my-source.custom_ingestion_source.MySourceClass + config: + # place for your custom config defined in the configModel +``` + If you now execute the ingestion the datahub client will pick up your code and call the `get_workunits` method and do the rest for you. That's it. diff --git a/docs/modeling/extending-the-metadata-model.md b/docs/modeling/extending-the-metadata-model.md index 8b308fb65d243c..d750896e5526c0 100644 --- a/docs/modeling/extending-the-metadata-model.md +++ b/docs/modeling/extending-the-metadata-model.md @@ -300,6 +300,42 @@ You'll also be able to import those models, with IDE support, by changing your i +### Publishing Custom Sources to Internal PyPI + +To build, publish, and use custom ingestion sources with DataHub, follow these steps: + +1. **Create a Custom Source**: Develop your custom source as a Python package. Ensure you have a `setup.py` file to define the package. + +2. **Publish to Internal PyPI**: Use `twine` to publish your package to an internal PyPI server. Ensure your server is configured to host Python packages. + +3. **Configure pip to Use Internal PyPI**: Modify `pip.conf` (Linux) or `pip.ini` (Windows) to point to your internal PyPI server: + + ```ini + [global] + index-url = https://your-internal-pypi-server/simple + ``` + +4. **Install the Custom Source**: Use `pip` to install your package from the internal PyPI server: + + ```bash + pip install your-custom-source-package + ``` + +5. **Reference in DataHub Ingestion Recipe**: Use the fully qualified class name of your custom source in the ingestion recipe: + + ```yaml + source: + type: your_custom_source_package.your_module.YourCustomSourceClass + config: + # Your custom source configuration + sink: + type: datahub-rest + config: + server: "http://localhost:8080" + ``` + +These steps will help you extend DataHub's capabilities with custom sources and integrate them within your infrastructure. + ### (Optional) Step 8: Extend the DataHub frontend to view your entity in GraphQL & React If you are extending an entity with additional aspects, and you can use the auto-render specifications to automatically render these aspects to your satisfaction, you do not need to write any custom code.