Retrieving a List of All Structured Properties using Python SDK

Original Slack Thread

<@U06TM7M40RK> How can I get a list of all my structured properties using the Python SDK?

To retrieve a list of all structured properties using the Python SDK, you can utilize the DataHubGraph.get_aspect method. This method allows you to fetch various aspects of a dataset, including structured properties. Here’s a brief overview of how you can achieve this:

  1. Initialize the DataHubGraph: You need to create an instance of DataHubGraph to interact with the DataHub metadata.

  2. Fetch Structured Properties: Use the get_aspect method to retrieve the StructuredPropertiesClass aspect for a given dataset URN.

Here’s a sample code snippet to illustrate the process:

from datahub.ingestion.graph.client import DataHubGraph
from datahub.metadata.schema_classes import StructuredPropertiesClass

# Initialize the DataHubGraph
graph = DataHubGraph()

# Specify the dataset URN
dataset_urn = "urn:li:dataset:(urn:li:dataPlatform:your_platform,your_dataset,PROD)"

# Fetch the structured properties
structured_properties = graph.get_aspect(dataset_urn, StructuredPropertiesClass)

# Print the structured properties
if structured_properties:
    for prop in structured_properties.properties:
        print(f"Property URN: {prop.structuredProperty.urn}")
        print(f"Display Name: {prop.structuredProperty.definition.displayName}")
        print(f"Description: {prop.structuredProperty.definition.description}")
else:
    print("No structured properties found.")

This code will output the URN, display name, and description of each structured property associated with the specified dataset.

Would you like more detailed guidance on any of these steps?

Sources:

This gives me a list of properties for a given dataset. I want to get all of my properties, even if they haven’t being attached to a dataset already. <@U06TM7M40RK>

<@U06TM7M40RK> this gives me a list of properties for a given dataset. I want to get all of my properties, even if they haven’t being attached to a dataset already.

It seems likely that to retrieve all structured properties, even those not attached to any dataset, you would need to use the OpenAPI or GraphQL interfaces provided by DataHub. Here’s how you can do it:

  1. Using OpenAPI: You can make a request to the OpenAPI endpoint to get the property definitions of structured properties. This will include all properties, regardless of whether they are attached to a dataset.

    Example OpenAPI request:

    curl -X 'GET' -v \
      'http://localhost:8080/openapi/v2/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Aio.acryl.privacy.retentionTime/propertyDefinition' \
      -H 'accept: application/json' | jq
    

    This request will return the property definition, including allowed values, display name, description, and other metadata.

  2. Using GraphQL: You can also use a GraphQL query to fetch structured properties. This method allows you to query for structured properties without needing to specify a dataset.

    Example GraphQL query:

    {
      structuredProperties {
        properties {
          structuredProperty {
            urn
            definition {
              qualifiedName
              displayName
              description
              cardinality
              allowedValues {
                description
                value {
                  __typename
                }
              }
            }
          }
        }
      }
    }
    

These methods should help you retrieve all structured properties, not just those attached to datasets.

Would you like more detailed guidance on setting up these requests?

Sources:

<@U06TM7M40RK> Is there no way to do it through the Python SDK?

I’m not able to find any data sources to answer your question. You might want to specify if you are looking for a method to list all structured properties globally or within a specific context. If this doesn’t help, please reach out to the DataHub support team for further assistance. :+1: 0 button :-1: 0 button :bulb: Hint: Mention @runllm in the thread for followups.

<@U06TM7M40RK> I am looking for a method to list all structured properties globally.

It might be possible that the Python SDK does not directly support listing all structured properties globally. However, you could potentially achieve this by using the OpenAPI or GraphQL interfaces, as these methods are more comprehensive in terms of accessing metadata.

If you are specifically looking for a way to do this through the Python SDK, you might need to implement a custom solution that interacts with these APIs. This would involve making HTTP requests to the DataHub API endpoints from within your Python code.

Would you like more detailed guidance on how to set up such a solution using the OpenAPI or GraphQL interfaces?

Sources: