Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 20 additions & 1 deletion docs/how/add-custom-ingestion-source.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,13 @@ the [metadata-ingestion source guide](../../metadata-ingestion/adding-source.md)
To be able to use this source you just need to do a few things.

1. Build a python package out of your project including the custom source class.
2. Install this package in your working environment where you are using the Datahub CLI to ingest metadata.
2. Publish the package to your internal PyPI server using a tool like `twine`.
3. Configure `pip` to use your internal PyPI server by modifying `pip.conf` (on Linux) or `pip.ini` (on Windows) to include your internal PyPI server URL:
```ini
[global]
index-url = https://your-internal-pypi-server/simple
```
4. Install this package in your working environment where you are using the Datahub CLI to ingest metadata.

Now you are able to just reference your ingestion source class as a type in the YAML recipe by using the fully qualified
package name. For example if your project structure looks like this `<project>/src/my-source/custom_ingestion_source.py`
Expand All @@ -35,6 +41,19 @@ source:
# place for your custom config defined in the configModel
```

To install the custom source package from your internal PyPI server, use the following `pip` command:

```bash
pip install my-source
```

```yaml
source:
type: my-source.custom_ingestion_source.MySourceClass
config:
# place for your custom config defined in the configModel
```

If you now execute the ingestion the datahub client will pick up your code and call the `get_workunits` method and do
the rest for you. That's it.

Expand Down
36 changes: 36 additions & 0 deletions docs/modeling/extending-the-metadata-model.md
Original file line number Diff line number Diff line change
Expand Up @@ -300,6 +300,42 @@ You'll also be able to import those models, with IDE support, by changing your i
</TabItem>
</Tabs>

### Publishing Custom Sources to Internal PyPI

To build, publish, and use custom ingestion sources with DataHub, follow these steps:

1. **Create a Custom Source**: Develop your custom source as a Python package. Ensure you have a `setup.py` file to define the package.

2. **Publish to Internal PyPI**: Use `twine` to publish your package to an internal PyPI server. Ensure your server is configured to host Python packages.

3. **Configure pip to Use Internal PyPI**: Modify `pip.conf` (Linux) or `pip.ini` (Windows) to point to your internal PyPI server:

```ini
[global]
index-url = https://your-internal-pypi-server/simple
```

4. **Install the Custom Source**: Use `pip` to install your package from the internal PyPI server:

```bash
pip install your-custom-source-package
```

5. **Reference in DataHub Ingestion Recipe**: Use the fully qualified class name of your custom source in the ingestion recipe:

```yaml
source:
type: your_custom_source_package.your_module.YourCustomSourceClass
config:
# Your custom source configuration
sink:
type: datahub-rest
config:
server: "http://localhost:8080"
```

These steps will help you extend DataHub's capabilities with custom sources and integrate them within your infrastructure.

### <a name="step_8"></a>(Optional) Step 8: Extend the DataHub frontend to view your entity in GraphQL & React

If you are extending an entity with additional aspects, and you can use the auto-render specifications to automatically render these aspects to your satisfaction, you do not need to write any custom code.
Expand Down