Troubleshooting retrieval of OwnershipType entities in DataHub Graph API

Original Slack Thread

<@U06TM7M40RK> Why can’t I get all of the ownershipType entities with this

existing_urns = set(
graph.get_urns_by_filter(entity_types=[“ownershipType”], batch_size=10000)
)

datahub.configuration.common.GraphError: Error executing graphql query: [{‘message’: “Variable ‘types’ has an invalid value: Invalid input for enum ‘EntityType’. No value found for name ‘OWNERSHIP_TYPE’”

The error you’re encountering is due to the incorrect use of the entity_types parameter in the get_urns_by_filter method. The entity_types parameter expects a list of valid EntityType enum values, and it seems that “ownershipType” is not recognized as a valid EntityType.

According to the <https://datahubproject.io/docs/0.14.0/generated/metamodel/entities/ownershiptype|DataHub documentation>, the OwnershipType is a user-created category for ownership, but it might not be directly queryable using the get_urns_by_filter method with “ownershipType” as an entity type.

To retrieve ownership types, you might need to use a different approach, such as using a GraphQL query to list ownership types. Here’s an example of how you can list ownership types using GraphQL:

query listOwnershipTypes($input: ListOwnershipTypesInput!) {
    listOwnershipTypes(input: $input) {
        start
        count
        total
        ownershipTypes {
            urn
            type
            info {
                name
                description
            }
        }
    }
}

This query will return the list of ownership types along with their URNs, names, and descriptions.

Would you like more detailed information on how to implement this using a specific method or tool?

Sources:

<@U06TM7M40RK> but isnt it an entity how can an object have an urn if it isn’t an entity urn:li:ownershipType:Business Owner

In DataHub, the concept of an “entity” is quite broad and includes various types of metadata objects, each identified by a unique URN (Uniform Resource Name). The OwnershipType is indeed considered an entity within DataHub’s metadata model, and it has its own URN format, such as urn:li:ownershipType:Business Owner.

However, the issue you’re encountering is likely due to the fact that the OwnershipType entity might not be directly supported by the get_urns_by_filter method you are using. This method relies on a predefined set of EntityType enums, and if OwnershipType is not included in that set, it will not be recognized as a valid entity type for filtering.

To work with OwnershipType entities, you should use the appropriate GraphQL queries or API endpoints that are designed to handle these types of entities. As mentioned earlier, you can use a GraphQL query to list ownership types and retrieve their details.

If you need further assistance on how to implement this or if you have specific requirements, please let me know!

Sources:

<@U06TM7M40RK> can you give me a python code snippet using a DataHubGraph object to get all of the ownershipType urns

To retrieve all OwnershipType URNs using a DataHubGraph object in Python, you can execute a GraphQL query. Below is a Python code snippet that demonstrates how to perform this operation:

import requests
import json

# Set your DataHub token and GraphQL endpoint URL
datahub_token = "your_datahub_token"  # Replace with your actual DataHub token
graphql_url = "http://localhost:8080/api/graphql"  # Replace with your DataHub GraphQL endpoint

# Define the GraphQL query to list ownership types
query = """
query listOwnershipTypes {
  listOwnershipTypes(input: {start: 0, count: 100}) {
    start
    count
    total
    ownershipTypes {
      urn
      type
      info {
        name
        description
      }
    }
  }
}
"""

# Set the headers including the authorization token
headers = {
    'Authorization': f'Bearer {datahub_token}',
    'Content-Type': 'application/json'
}

# Make the POST request to the GraphQL endpoint
response = requests.post(graphql_url, headers=headers, data=json.dumps({'query': query}))

# Check for errors
response.raise_for_status()

# Parse the response
data = response.json()

# Print the URNs of ownership types
for ownership_type in data['data']['listOwnershipTypes']['ownershipTypes']:
    print(f"URN: {ownership_type['urn']}, Name: {ownership_type['info']['name']}, Description: {ownership_type['info']['description']}")

Explanation:

  1. GraphQL Query: The query listOwnershipTypes retrieves ownership types, including their URNs, names, and descriptions.
  2. Headers: The Authorization header includes the token for authentication.
  3. POST Request: The requests.post method sends the GraphQL query to the DataHub API endpoint.
  4. Response Handling: The response is checked for errors and then parsed as JSON. The URNs of the ownership types are printed.

Make sure to replace "your_datahub_token" and "http://localhost:8080/api/graphql" with your actual DataHub token and GraphQL endpoint URL.

Would you like more detailed steps or assistance with a specific part of the setup?

Sources:

  • <Slack Slack Thread> :+1: 0 button :-1: 0 button :light_bulb: Hint: Mention @runllm in the thread for followups.