diff --git a/docs/api/datahub-apis.md b/docs/api/datahub-apis.md index c46aacde3a0cb5..662d98a643d71a 100644 --- a/docs/api/datahub-apis.md +++ b/docs/api/datahub-apis.md @@ -29,11 +29,88 @@ Learn more about the SDKs: The `graphql` API serves as the primary public API for the platform. It can be used to fetch and update metadata programatically in the language of your choice. Intended as a higher-level API that simplifies the most common operations. +### Introduction to GraphQL in DataHub +GraphQL in DataHub is used to interact with metadata in a flexible and efficient manner. It allows users to specify exactly what data they need, reducing the amount of data transferred over the network and improving performance. + We recommend using the GraphQL API if you're getting started with DataHub since it's more user-friendly and straighfowrad. Here are some examples of how to use the GraphQL API: - Search for datasets with conditions - Update a certain field of a dataset +### Example Queries +To retrieve a list of URNs for entities, you can use the following GraphQL query: + +```graphql +query listEntities { + search(input: {type: DATASET, query: "*", start: 0, count: 100}) { + start + count + total + searchResults { + entity { + urn + } + } + } +} +``` + +This query will return the URNs for the first 100 entities of type `DATASET`. You can adjust the `count` parameter to retrieve more or fewer entities as needed. + +To execute this query using a `curl` command: + +```bash +curl --location --request POST 'https://demo.datahubproject.io/api/graphql' \ +--header 'Content-Type: application/json' \ +--data-raw '{ + "query": "query listEntities { search(input: {type: DATASET, query: \"*\", start: 0, count: 100}) { start count total searchResults { entity { urn } } } }", + "variables": {} +}' +``` + +### Parameters Explanation +- **type**: Specifies the type of entity to search for, e.g., `DATASET`. +- **query**: The search query string, `*` is used to match all entities. +- **start**: The starting index for the search results. +- **count**: The number of results to return. + +### Execution Methods +In addition to using `curl`, you can execute GraphQL queries in Python using the `requests` library. Here is an example: + +```python +import requests +import os +import json + +datahub_token = os.getenv("DATAHUB_TOKEN") +graphql_url = "https://demo.datahubproject.io/api/graphql" + +query = """ +query listEntities { + search(input: {type: DATASET, query: "*", start: 0, count: 100}) { + start + count + total + searchResults { + entity { + urn + } + } + } +} +""" + +headers = { + 'Authorization': f'Bearer {datahub_token}', + 'Content-Type': 'application/json' +} + +response = requests.post(graphql_url, headers=headers, data=json.dumps({'query': query})) +response.raise_for_status() +data = response.json() +print(json.dumps(data, indent=2)) +``` + Learn more about the GraphQL API: - **[GraphQL API →](docs/api/graphql/getting-started.md)** diff --git a/docs/api/graphql/getting-started.md b/docs/api/graphql/getting-started.md index dfa556051bd4d1..56846bea6fcbe7 100644 --- a/docs/api/graphql/getting-started.md +++ b/docs/api/graphql/getting-started.md @@ -1,5 +1,9 @@ # Getting Started With GraphQL +## Introduction to GraphQL in DataHub + +GraphQL in DataHub is a powerful tool that allows users to query and manipulate the metadata graph efficiently. It provides a flexible and efficient way to retrieve exactly the data you need, making it significant for managing and exploring metadata. + ## Reading an Entity: Queries DataHub provides the following `graphql` queries for retrieving entities in your Metadata Graph. @@ -65,6 +69,49 @@ The search term can be a simple string, or it can be a more complex query using - `*[string]*` : Search for all entities that **match** aspects named \[string\]. - `[string]` : Search for all entities that **contain** the specified \[string\]. +#### Example Queries for URN Retrieval + +To retrieve a list of URNs, you can use the following GraphQL query: + +```graphql +query listEntities { + search(input: {type: DATASET, query: "*", start: 0, count: 100}) { + start + count + total + searchResults { + entity { + urn + } + } + } +} +``` + +This query will return the URNs for the first 100 entities of type `DATASET`. You can adjust the `count` parameter to retrieve more or fewer entities as needed. + +To execute this query using a `curl` command: + +```bash +curl --location --request POST 'https://demo.datahubproject.io/api/graphql' \ +--header 'Content-Type: application/json' \ +--data-raw '{ + "query": "query listEntities { search(input: {type: DATASET, query: \"*\", start: 0, count: 100}) { start count total searchResults { entity { urn } } } }", + "variables": {} +}' +``` + +#### Parameters Explanation + +- `type`: Specifies the type of entity to search for, e.g., `DATASET`. +- `query`: The search term, where `*` can be used to match all entities. +- `start`: The starting index for the search results. +- `count`: The number of results to return. + +#### Execution Methods + +In addition to using `curl`, you can execute GraphQL queries in DataHub using various programming languages and tools that support HTTP requests, such as Python with the `requests` library. + :::note Note that by default Elasticsearch only allows pagination through 10,000 entities via the search API. If you need to paginate through more, you can change the default value for the `index.max_result_window` setting in Elasticsearch, or using the scroll API to read from the index directly. diff --git a/docs/api/graphql/how-to-set-up-graphql.md b/docs/api/graphql/how-to-set-up-graphql.md index 2be2f935b12b10..46e89587753705 100644 --- a/docs/api/graphql/how-to-set-up-graphql.md +++ b/docs/api/graphql/how-to-set-up-graphql.md @@ -8,6 +8,54 @@ For more information, please refer to [Datahub Quickstart Guide](/docs/quickstar ## Querying the GraphQL API DataHub's GraphQL endpoint is served at the path `/api/graphql`, e.g. `https://my-company.datahub.com/api/graphql`. + +### Introduction to GraphQL in DataHub + +GraphQL in DataHub allows for flexible and efficient data retrieval by enabling clients to specify exactly what data they need. This reduces the amount of data transferred over the network and allows for more efficient data fetching. + +### Example Queries + +To retrieve a list of URNs for entities in DataHub, you can use the following GraphQL query: + +```graphql +query listEntities { + search(input: {type: DATASET, query: "*", start: 0, count: 100}) { + start + count + total + searchResults { + entity { + urn + } + } + } +} +``` + +This query will return the URNs for the first 100 entities of type `DATASET`. You can adjust the `count` parameter to retrieve more or fewer entities as needed. + +To execute this query using a `curl` command, you can use: + +```bash +curl --location --request POST 'https://demo.datahubproject.io/api/graphql' \ +--header 'Content-Type: application/json' \ +--data-raw '{ + "query": "query listEntities { search(input: {type: DATASET, query: \"*\", start: 0, count: 100}) { start count total searchResults { entity { urn } } } }", + "variables": {} +}' +``` + +### Parameters Explanation + +- **type**: Specifies the type of entity to search for, e.g., `DATASET`. +- **query**: The search query string, where `"*"` is used to match all entities. +- **start**: The starting index for the search results. +- **count**: The number of results to return. + +### Execution Methods + +In addition to using `curl`, you can execute GraphQL queries using various methods such as Postman, GraphQL Explorer (GraphiQL), or programmatically using a GraphQL client in your preferred programming language. + There are a few options when it comes to querying the GraphQL endpoint. For **Testing**: diff --git a/docs/api/graphql/overview.md b/docs/api/graphql/overview.md index 3077d83416dff2..cdec72b2f8af70 100644 --- a/docs/api/graphql/overview.md +++ b/docs/api/graphql/overview.md @@ -28,6 +28,10 @@ For detailed guidance on using `graphql` for specific use cases, please refer to For these reasons among others DataHub provides a GraphQL API on top of the Metadata Graph, permitting easy exploration of the Entities & Relationships composing it. +### Introduction to GraphQL in DataHub + +GraphQL in DataHub is used to interact with the Metadata Graph, allowing users to query and manipulate metadata entities and relationships efficiently. It provides a flexible and efficient way to retrieve only the data you need, reducing the number of API calls required. + For more information about the GraphQL specification, check out [Introduction to GraphQL](https://graphql.org/learn/). ## GraphQL Schema Reference @@ -36,5 +40,48 @@ The Reference docs in the sidebar are generated from the DataHub GraphQL schema. validated against this schema. You can use these docs to understand data that is available for retrieval and operations that may be performed using the API. +### Example Queries for URN Retrieval + +To retrieve a list of URNs in DataHub, you can use the following GraphQL query: + +```graphql +query listEntities { + search(input: {type: DATASET, query: "*", start: 0, count: 100}) { + start + count + total + searchResults { + entity { + urn + } + } + } +} +``` + +This query retrieves the URNs for the first 100 entities of type `DATASET`. You can adjust the `count` parameter to retrieve more or fewer entities as needed. + +To execute this query using `curl`, use the following command: + +```bash +curl --location --request POST 'https://demo.datahubproject.io/api/graphql' \ +--header 'Content-Type: application/json' \ +--data-raw '{ + "query": "query listEntities { search(input: {type: DATASET, query: \"*\", start: 0, count: 100}) { start count total searchResults { entity { urn } } } }", + "variables": {} +}' +``` + +#### Parameters Explanation + +- **type**: Specifies the type of entity to search for, e.g., `DATASET`. +- **query**: The search query string, `*` is used to match all entities. +- **start**: The starting index for the search results. +- **count**: The number of results to return. + +#### Execution Methods + +In addition to using `curl`, you can execute GraphQL queries in DataHub using various programming languages and tools that support HTTP requests, such as Python with the `requests` library. + - Available Operations: [Queries](/graphql/queries.md) (Reads) & [Mutations](/graphql/mutations.md) (Writes) - Schema Types: [Objects](/graphql/objects.md), [Input Objects](/graphql/inputObjects.md), [Interfaces](/graphql/interfaces.md), [Unions](/graphql/unions.md), [Enums](/graphql/enums.md), [Scalars](/graphql/scalars.md) diff --git a/docs/managed-datahub/datahub-api/graphql-api/getting-started.md b/docs/managed-datahub/datahub-api/graphql-api/getting-started.md index 5993e2dfd773dd..5c13e82d299308 100644 --- a/docs/managed-datahub/datahub-api/graphql-api/getting-started.md +++ b/docs/managed-datahub/datahub-api/graphql-api/getting-started.md @@ -44,3 +44,95 @@ The entire GraphQL API can be explored & [introspected](https://graphql.org/lear ### Querying the API Currently, we do not offer language-specific SDKs for accessing the DataHub GraphQL API. For querying the API, you can make use of a variety of per-language client libraries. For a full list, see [GraphQL Code Libraries, Tools, & Services](https://graphql.org/code/). + +# GraphQL Query Examples for URN Retrieval + +## Introduction to GraphQL in DataHub +GraphQL in DataHub allows for flexible and efficient data retrieval, enabling users to specify exactly what data they need. This is particularly useful for retrieving unique resource names (URNs) of entities within DataHub. + +## Example Queries +To list the URNs for all entities in DataHub, you can use the following GraphQL query: + +```graphql +query listEntities { + search(input: {type: DATASET, query: "*", start: 0, count: 100}) { + start + count + total + searchResults { + entity { + urn + } + } + } +} +``` + +This query retrieves the URNs for the first 100 entities of type `DATASET`. You can adjust the `count` parameter to retrieve more or fewer entities as needed. + +### Executing the Query with `curl` +You can execute this query using a `curl` command: + +```bash +curl --location --request POST 'https://demo.datahubproject.io/api/graphql' \ +--header 'Content-Type: application/json' \ +--data-raw '{ + "query": "query listEntities { search(input: {type: DATASET, query: \"*\", start: 0, count: 100}) { start count total searchResults { entity { urn } } } }", + "variables": {} +}' +``` + +## Parameters Explanation +- **type**: Specifies the type of entity to search for, e.g., `DATASET`. +- **query**: The search term, where `"*"` is used to match all entities. +- **start**: The starting index for the search results. +- **count**: The number of results to return. + +## Execution Methods +In addition to using `curl`, you can execute GraphQL queries in Python using the `requests` library. Below is an example script: + +```python +import requests +import os +import json + +# Set your DataHub token and GraphQL endpoint URL +datahub_token = os.getenv("DATAHUB_TOKEN") # Ensure your token is set as an environment variable +graphql_url = "https://demo.datahubproject.io/api/graphql" + +# Define the GraphQL query +query = """ +query listEntities { + search(input: {type: DATASET, query: "*", start: 0, count: 100}) { + start + count + total + searchResults { + entity { + urn + } + } + } +} +""" + +# Set the headers including the authorization token +headers = { + 'Authorization': f'Bearer {datahub_token}', + 'Content-Type': 'application/json' +} + +# Make the POST request to the GraphQL endpoint +response = requests.post(graphql_url, headers=headers, data=json.dumps({'query': query})) + +# Check for errors +response.raise_for_status() + +# Parse the response +data = response.json() + +# Print the results +print(json.dumps(data, indent=2)) +``` + +This script demonstrates how to authenticate using a token and execute a GraphQL query to retrieve URNs.