Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 77 additions & 0 deletions docs/api/datahub-apis.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,88 @@ Learn more about the SDKs:

The `graphql` API serves as the primary public API for the platform. It can be used to fetch and update metadata programatically in the language of your choice. Intended as a higher-level API that simplifies the most common operations.

### Introduction to GraphQL in DataHub
GraphQL in DataHub is used to interact with metadata in a flexible and efficient manner. It allows users to specify exactly what data they need, reducing the amount of data transferred over the network and improving performance.

We recommend using the GraphQL API if you're getting started with DataHub since it's more user-friendly and straighfowrad. Here are some examples of how to use the GraphQL API:

- Search for datasets with conditions
- Update a certain field of a dataset

### Example Queries
To retrieve a list of URNs for entities, you can use the following GraphQL query:

```graphql
query listEntities {
search(input: {type: DATASET, query: "*", start: 0, count: 100}) {
start
count
total
searchResults {
entity {
urn
}
}
}
}
```

This query will return the URNs for the first 100 entities of type `DATASET`. You can adjust the `count` parameter to retrieve more or fewer entities as needed.

To execute this query using a `curl` command:

```bash
curl --location --request POST 'https://demo.datahubproject.io/api/graphql' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": "query listEntities { search(input: {type: DATASET, query: \"*\", start: 0, count: 100}) { start count total searchResults { entity { urn } } } }",
"variables": {}
}'
```

### Parameters Explanation
- **type**: Specifies the type of entity to search for, e.g., `DATASET`.
- **query**: The search query string, `*` is used to match all entities.
- **start**: The starting index for the search results.
- **count**: The number of results to return.

### Execution Methods
In addition to using `curl`, you can execute GraphQL queries in Python using the `requests` library. Here is an example:

```python
import requests
import os
import json

datahub_token = os.getenv("DATAHUB_TOKEN")
graphql_url = "https://demo.datahubproject.io/api/graphql"

query = """
query listEntities {
search(input: {type: DATASET, query: "*", start: 0, count: 100}) {
start
count
total
searchResults {
entity {
urn
}
}
}
}
"""

headers = {
'Authorization': f'Bearer {datahub_token}',
'Content-Type': 'application/json'
}

response = requests.post(graphql_url, headers=headers, data=json.dumps({'query': query}))
response.raise_for_status()
data = response.json()
print(json.dumps(data, indent=2))
```

Learn more about the GraphQL API:

- **[GraphQL API →](docs/api/graphql/getting-started.md)**
Expand Down
47 changes: 47 additions & 0 deletions docs/api/graphql/getting-started.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Getting Started With GraphQL

## Introduction to GraphQL in DataHub

GraphQL in DataHub is a powerful tool that allows users to query and manipulate the metadata graph efficiently. It provides a flexible and efficient way to retrieve exactly the data you need, making it significant for managing and exploring metadata.

## Reading an Entity: Queries

DataHub provides the following `graphql` queries for retrieving entities in your Metadata Graph.
Expand Down Expand Up @@ -65,6 +69,49 @@ The search term can be a simple string, or it can be a more complex query using
- `*[string]*` : Search for all entities that **match** aspects named \[string\].
- `[string]` : Search for all entities that **contain** the specified \[string\].

#### Example Queries for URN Retrieval

To retrieve a list of URNs, you can use the following GraphQL query:

```graphql
query listEntities {
search(input: {type: DATASET, query: "*", start: 0, count: 100}) {
start
count
total
searchResults {
entity {
urn
}
}
}
}
```

This query will return the URNs for the first 100 entities of type `DATASET`. You can adjust the `count` parameter to retrieve more or fewer entities as needed.

To execute this query using a `curl` command:

```bash
curl --location --request POST 'https://demo.datahubproject.io/api/graphql' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": "query listEntities { search(input: {type: DATASET, query: \"*\", start: 0, count: 100}) { start count total searchResults { entity { urn } } } }",
"variables": {}
}'
```

#### Parameters Explanation

- `type`: Specifies the type of entity to search for, e.g., `DATASET`.
- `query`: The search term, where `*` can be used to match all entities.
- `start`: The starting index for the search results.
- `count`: The number of results to return.

#### Execution Methods

In addition to using `curl`, you can execute GraphQL queries in DataHub using various programming languages and tools that support HTTP requests, such as Python with the `requests` library.

:::note
Note that by default Elasticsearch only allows pagination through 10,000 entities via the search API.
If you need to paginate through more, you can change the default value for the `index.max_result_window` setting in Elasticsearch, or using the scroll API to read from the index directly.
Expand Down
48 changes: 48 additions & 0 deletions docs/api/graphql/how-to-set-up-graphql.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,54 @@ For more information, please refer to [Datahub Quickstart Guide](/docs/quickstar
## Querying the GraphQL API

DataHub's GraphQL endpoint is served at the path `/api/graphql`, e.g. `https://my-company.datahub.com/api/graphql`.

### Introduction to GraphQL in DataHub

GraphQL in DataHub allows for flexible and efficient data retrieval by enabling clients to specify exactly what data they need. This reduces the amount of data transferred over the network and allows for more efficient data fetching.

### Example Queries

To retrieve a list of URNs for entities in DataHub, you can use the following GraphQL query:

```graphql
query listEntities {
search(input: {type: DATASET, query: "*", start: 0, count: 100}) {
start
count
total
searchResults {
entity {
urn
}
}
}
}
```

This query will return the URNs for the first 100 entities of type `DATASET`. You can adjust the `count` parameter to retrieve more or fewer entities as needed.

To execute this query using a `curl` command, you can use:

```bash
curl --location --request POST 'https://demo.datahubproject.io/api/graphql' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": "query listEntities { search(input: {type: DATASET, query: \"*\", start: 0, count: 100}) { start count total searchResults { entity { urn } } } }",
"variables": {}
}'
```

### Parameters Explanation

- **type**: Specifies the type of entity to search for, e.g., `DATASET`.
- **query**: The search query string, where `"*"` is used to match all entities.
- **start**: The starting index for the search results.
- **count**: The number of results to return.

### Execution Methods

In addition to using `curl`, you can execute GraphQL queries using various methods such as Postman, GraphQL Explorer (GraphiQL), or programmatically using a GraphQL client in your preferred programming language.

There are a few options when it comes to querying the GraphQL endpoint.

For **Testing**:
Expand Down
47 changes: 47 additions & 0 deletions docs/api/graphql/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,10 @@ For detailed guidance on using `graphql` for specific use cases, please refer to
For these reasons among others DataHub provides a GraphQL API on top of the Metadata Graph,
permitting easy exploration of the Entities & Relationships composing it.

### Introduction to GraphQL in DataHub

GraphQL in DataHub is used to interact with the Metadata Graph, allowing users to query and manipulate metadata entities and relationships efficiently. It provides a flexible and efficient way to retrieve only the data you need, reducing the number of API calls required.

For more information about the GraphQL specification, check out [Introduction to GraphQL](https://graphql.org/learn/).

## GraphQL Schema Reference
Expand All @@ -36,5 +40,48 @@ The Reference docs in the sidebar are generated from the DataHub GraphQL schema.
validated against this schema. You can use these docs to understand data that is available for retrieval and operations
that may be performed using the API.

### Example Queries for URN Retrieval

To retrieve a list of URNs in DataHub, you can use the following GraphQL query:

```graphql
query listEntities {
search(input: {type: DATASET, query: "*", start: 0, count: 100}) {
start
count
total
searchResults {
entity {
urn
}
}
}
}
```

This query retrieves the URNs for the first 100 entities of type `DATASET`. You can adjust the `count` parameter to retrieve more or fewer entities as needed.

To execute this query using `curl`, use the following command:

```bash
curl --location --request POST 'https://demo.datahubproject.io/api/graphql' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": "query listEntities { search(input: {type: DATASET, query: \"*\", start: 0, count: 100}) { start count total searchResults { entity { urn } } } }",
"variables": {}
}'
```

#### Parameters Explanation

- **type**: Specifies the type of entity to search for, e.g., `DATASET`.
- **query**: The search query string, `*` is used to match all entities.
- **start**: The starting index for the search results.
- **count**: The number of results to return.

#### Execution Methods

In addition to using `curl`, you can execute GraphQL queries in DataHub using various programming languages and tools that support HTTP requests, such as Python with the `requests` library.

- Available Operations: [Queries](/graphql/queries.md) (Reads) & [Mutations](/graphql/mutations.md) (Writes)
- Schema Types: [Objects](/graphql/objects.md), [Input Objects](/graphql/inputObjects.md), [Interfaces](/graphql/interfaces.md), [Unions](/graphql/unions.md), [Enums](/graphql/enums.md), [Scalars](/graphql/scalars.md)
92 changes: 92 additions & 0 deletions docs/managed-datahub/datahub-api/graphql-api/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,3 +44,95 @@ The entire GraphQL API can be explored & [introspected](https://graphql.org/lear
### Querying the API

Currently, we do not offer language-specific SDKs for accessing the DataHub GraphQL API. For querying the API, you can make use of a variety of per-language client libraries. For a full list, see [GraphQL Code Libraries, Tools, & Services](https://graphql.org/code/).

# GraphQL Query Examples for URN Retrieval

## Introduction to GraphQL in DataHub
GraphQL in DataHub allows for flexible and efficient data retrieval, enabling users to specify exactly what data they need. This is particularly useful for retrieving unique resource names (URNs) of entities within DataHub.

## Example Queries
To list the URNs for all entities in DataHub, you can use the following GraphQL query:

```graphql
query listEntities {
search(input: {type: DATASET, query: "*", start: 0, count: 100}) {
start
count
total
searchResults {
entity {
urn
}
}
}
}
```

This query retrieves the URNs for the first 100 entities of type `DATASET`. You can adjust the `count` parameter to retrieve more or fewer entities as needed.

### Executing the Query with `curl`
You can execute this query using a `curl` command:

```bash
curl --location --request POST 'https://demo.datahubproject.io/api/graphql' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": "query listEntities { search(input: {type: DATASET, query: \"*\", start: 0, count: 100}) { start count total searchResults { entity { urn } } } }",
"variables": {}
}'
```

## Parameters Explanation
- **type**: Specifies the type of entity to search for, e.g., `DATASET`.
- **query**: The search term, where `"*"` is used to match all entities.
- **start**: The starting index for the search results.
- **count**: The number of results to return.

## Execution Methods
In addition to using `curl`, you can execute GraphQL queries in Python using the `requests` library. Below is an example script:

```python
import requests
import os
import json

# Set your DataHub token and GraphQL endpoint URL
datahub_token = os.getenv("DATAHUB_TOKEN") # Ensure your token is set as an environment variable
graphql_url = "https://demo.datahubproject.io/api/graphql"

# Define the GraphQL query
query = """
query listEntities {
search(input: {type: DATASET, query: "*", start: 0, count: 100}) {
start
count
total
searchResults {
entity {
urn
}
}
}
}
"""

# Set the headers including the authorization token
headers = {
'Authorization': f'Bearer {datahub_token}',
'Content-Type': 'application/json'
}

# Make the POST request to the GraphQL endpoint
response = requests.post(graphql_url, headers=headers, data=json.dumps({'query': query}))

# Check for errors
response.raise_for_status()

# Parse the response
data = response.json()

# Print the results
print(json.dumps(data, indent=2))
```

This script demonstrates how to authenticate using a token and execute a GraphQL query to retrieve URNs.