Creating and Linking Assets in DataHub Manually

Original Slack Thread

<@U06TM7M40RK>, how to manually create assets in Datahub? I want to create a dashboard and link to a dataset.

To manually create assets in DataHub, such as a dashboard and link it to a dataset, you can use the DataHub API or the Python SDK. Here’s a general approach:

Using the DataHub API

  1. Create a Dataset: You can create a dataset using the DataHub API by sending a POST request to the /entities?action=ingest endpoint. You need to define the dataset’s URN and its aspects, such as schema metadata.

  2. Create a Dashboard: Similarly, you can create a dashboard entity by sending a POST request to the same endpoint. Define the dashboard’s URN and its aspects, such as ownership and properties.

  3. Link the Dashboard to the Dataset: To link the dashboard to the dataset, you can use the DataJobInputOutput aspect to define the relationship between the dataset and the dashboard.

Using the Python SDK

  1. Install the DataHub Python SDK: Ensure you have the acryl-datahub package installed.

    pip install acryl-datahub[datahub-rest]
    
  2. Initialize the REST Emitter: Set up the REST emitter to communicate with your DataHub instance.

    from datahub.emitter.rest_emitter import DatahubRestEmitter
    
    emitter = DatahubRestEmitter(gms_server="http://localhost:8080")
    
  3. Create and Emit Dataset: Define the dataset’s schema and emit it to DataHub.

    from datahub.emitter.mce_builder import make_dataset_urn
    from datahub.metadata.schema_classes import SchemaMetadataClass, SchemaFieldClass, SchemaFieldDataTypeClass, StringTypeClass, AuditStampClass
    
    dataset_urn = make_dataset_urn(platform="hive", name="example_dataset", env="PROD")
    schema_fields = [
        SchemaFieldClass(
            fieldPath="field1",
            type=SchemaFieldDataTypeClass(type=StringTypeClass()),
            nativeDataType="VARCHAR(50)",
            description="Description of field1",
            lastModified=AuditStampClass(time=1640692800000, actor="urn:li:corpuser:ingestion")
        )
    ]
    schema_metadata = SchemaMetadataClass(
        schemaName="example_schema",
        platform=make_dataset_urn("hive"),
        version=0,
        fields=schema_fields,
        lastModified=AuditStampClass(time=1640692800000, actor="urn:li:corpuser:ingestion")
    )
    emitter.emit(schema_metadata)
    
  4. Create and Emit Dashboard: Define the dashboard and emit it to DataHub.

    from datahub.emitter.mce_builder import make_dashboard_urn
    from datahub.metadata.schema_classes import DashboardInfoClass
    
    dashboard_urn = make_dashboard_urn(platform="looker", name="example_dashboard", env="PROD")
    dashboard_info = DashboardInfoClass(
        title="Example Dashboard",
        description="This is an example dashboard",
        lastModified=AuditStampClass(time=1640692800000, actor="urn:li:corpuser:ingestion")
    )
    emitter.emit(dashboard_info)
    
  5. Link Dashboard to Dataset: Use the DataJobInputOutput aspect to link the dataset to the dashboard.

   from datahub.metadata.schema_classes import DataJobInputOutputClass

   datajob_io = DataJobInputOutputClass(
       inputDatasets=[dataset_urn],
       outputDatasets=[]
   )
   emitter.emit(datajob_io)

This approach allows you to manually create and link assets in DataHub using the API or Python SDK. Would you like more detailed instructions or have any specific questions about this process?

Sources:

<@U06TM7M40RK>, create to me a request to the API to create a dashboard with name foo dashboard that reads from table urn:li:dataset:(urn:li:dataPlatform:bigquery,mota-engil-datawarehouse.staging.stg_cority__waste_emissions_annual,PROD)

To create a dashboard named foo dashboard that reads from a specific dataset using the DataHub API, you can use a POST request to the /entities?action=ingest endpoint. Below is an example of how you can structure the request using curl:

curl 'http://localhost:8080/entities?action=ingest' -X POST --data '{
  "entity": {
    "value": {
      "com.linkedin.metadata.snapshot.DashboardSnapshot": {
        "urn": "urn:li:dashboard:(looker,foo_dashboard)",
        "aspects": [
          {
            "com.linkedin.dashboard.DashboardInfo": {
              "title": "foo dashboard",
              "description": "This is the foo dashboard",
              "charts": [],
              "inputs": [
                {
                  "string": "urn:li:dataset:(urn:li:dataPlatform:bigquery,mota-engil-datawarehouse.staging.stg_cority__waste_emissions_annual,PROD)"
                }
              ],
              "lastModified": {
                "created": {
                  "time": 0,
                  "actor": "urn:li:corpuser:jdoe"
                },
                "lastModified": {
                  "time": 0,
                  "actor": "urn:li:corpuser:datahub"
                }
              }
            }
          }
        ]
      }
    }
  }
}'

Explanation:

  • URN: The URN for the dashboard is specified as urn:li:dashboard:(looker,foo_dashboard).
  • Title and Description: The dashboard is given a title “foo dashboard” and a description “This is the foo dashboard”.
  • Inputs: The inputs field specifies the dataset that the dashboard reads from, using the provided dataset URN.
  • Last Modified: The lastModified field includes metadata about when the dashboard was created and last modified, along with the actors responsible.

This request will create a dashboard in DataHub with the specified name and link it to the given dataset.

Would you like more detailed guidance on any of these steps?

Sources:

<@U06TM7M40RK>, how to include a link to the dashboard in this request?

To include a link to the dashboard in your request, you can use the dashboardUrl field within the DashboardInfo aspect. This field allows you to specify a URL that can be used as an external link to access or view the dashboard.

Here’s how you can modify the previous request to include a link to the dashboard:

curl 'http://localhost:8080/entities?action=ingest' -X POST --data '{
  "entity": {
    "value": {
      "com.linkedin.metadata.snapshot.DashboardSnapshot": {
        "urn": "urn:li:dashboard:(looker,foo_dashboard)",
        "aspects": [
          {
            "com.linkedin.dashboard.DashboardInfo": {
              "title": "foo dashboard",
              "description": "This is the foo dashboard",
              "charts": [],
              "inputs": [
                {
                  "string": "urn:li:dataset:(urn:li:dataPlatform:bigquery,mota-engil-datawarehouse.staging.stg_cority__waste_emissions_annual,PROD)"
                }
              ],
              "dashboardUrl": "http://example.com/foo_dashboard",
              "lastModified": {
                "created": {
                  "time": 0,
                  "actor": "urn:li:corpuser:jdoe"
                },
                "lastModified": {
                  "time": 0,
                  "actor": "urn:li:corpuser:datahub"
                }
              }
            }
          }
        ]
      }
    }
  }
}'

Explanation:

  • dashboardUrl: This field is added to the DashboardInfo aspect to specify the URL of the dashboard. Replace "http://example.com/foo_dashboard" with the actual URL you want to use.

This modification will ensure that the dashboard entity in DataHub includes a link that users can click to access the dashboard directly.

Would you like more detailed guidance on any of these steps?

Sources:

<@U06TM7M40RK>, the api doesn’t accept “inputs” in a dashboard, should I create a chart to link to the dataset?

It seems likely that the DataHub API does not directly support specifying “inputs” for a dashboard entity. Instead, you can create a chart entity and link it to the dataset, then associate the chart with the dashboard. This approach aligns with the typical structure where dashboards are collections of charts, and charts are the entities that directly reference datasets.

Here’s how you can proceed:

  1. Create a Chart: First, create a chart entity that references the dataset. This chart will act as the intermediary between the dataset and the dashboard.

  2. Link the Chart to the Dataset: Use the inputs field in the ChartInfo aspect to specify the dataset URN.

  3. Associate the Chart with the Dashboard: Add the chart to the dashboard by including it in the charts field of the DashboardInfo aspect.

Example Request to Create a Chart

curl 'http://localhost:8080/entities?action=ingest' -X POST --data '{
  "entity": {
    "value": {
      "com.linkedin.metadata.snapshot.ChartSnapshot": {
        "urn": "urn:li:chart:(looker,foo_chart)",
        "aspects": [
          {
            "com.linkedin.chart.ChartInfo": {
              "title": "foo chart",
              "description": "This is the foo chart",
              "inputs": [
                {
                  "string": "urn:li:dataset:(urn:li:dataPlatform:bigquery,mota-engil-datawarehouse.staging.stg_cority__waste_emissions_annual,PROD)"
                }
              ],
              "lastModified": {
                "created": {
                  "time": 0,
                  "actor": "urn:li:corpuser:jdoe"
                },
                "lastModified": {
                  "time": 0,
                  "actor": "urn:li:corpuser:datahub"
                }
              }
            }
          }
        ]
      }
    }
  }
}'

Example Request to Create a Dashboard with the Chart

curl 'http://localhost:8080/entities?action=ingest' -X POST --data '{
  "entity": {
    "value": {
      "com.linkedin.metadata.snapshot.DashboardSnapshot": {
        "urn": "urn:li:dashboard:(looker,foo_dashboard)",
        "aspects": [
          {
            "com.linkedin.dashboard.DashboardInfo": {
              "title": "foo dashboard",
              "description": "This is the foo dashboard",
              "charts": [
                {
                  "string": "urn:li:chart:(looker,foo_chart)"
                }
              ],
              "dashboardUrl": "http://example.com/foo_dashboard",
              "lastModified": {
                "created": {
                  "time": 0,
                  "actor": "urn:li:corpuser:jdoe"
                },
                "lastModified": {
                  "time": 0,
                  "actor": "urn:li:corpuser:datahub"
                }
              }
            }
          }
        ]
      }
    }
  }
}'

This approach should help you create a dashboard that indirectly links to a dataset through a chart. Would you like more detailed guidance on any of these steps?

Sources:

<@U06TM7M40RK>, how to make this requests using the ingesting aspects endpoint?

To create a dashboard and a chart using the ingestProposal endpoint for ingesting aspects, you need to send separate requests for each aspect you want to ingest. Here’s how you can do it:

Step 1: Ingest the Chart Aspect

First, create a chart and link it to the dataset using the ingestProposal endpoint.

curl --location --request POST 'http://localhost:8080/aspects?action=ingestProposal' \
--header 'X-RestLi-Protocol-Version: 2.0.0' \
--header 'Content-Type: application/json' \
--data-raw '{
  "proposal": {
    "entityType": "chart",
    "entityUrn": "urn:li:chart:(looker,foo_chart)",
    "changeType": "UPSERT",
    "aspectName": "chartInfo",
    "aspect": {
      "value": {
        "title": "foo chart",
        "description": "This is the foo chart",
        "inputs": [
          {
            "string": "urn:li:dataset:(urn:li:dataPlatform:bigquery,mota-engil-datawarehouse.staging.stg_cority__waste_emissions_annual,PROD)"
          }
        ],
        "lastModified": {
          "time": 0,
          "actor": "urn:li:corpuser:jdoe"
        }
      },
      "contentType": "application/json"
    }
  }
}'

Step 2: Ingest the Dashboard Aspect

Next, create a dashboard and associate it with the chart using the ingestProposal endpoint.

curl --location --request POST 'http://localhost:8080/aspects?action=ingestProposal' \
--header 'X-RestLi-Protocol-Version: 2.0.0' \
--header 'Content-Type: application/json' \
--data-raw '{
  "proposal": {
    "entityType": "dashboard",
    "entityUrn": "urn:li:dashboard:(looker,foo_dashboard)",
    "changeType": "UPSERT",
    "aspectName": "dashboardInfo",
    "aspect": {
      "value": {
        "title": "foo dashboard",
        "description": "This is the foo dashboard",
        "charts": [
          {
            "string": "urn:li:chart:(looker,foo_chart)"
          }
        ],
        "dashboardUrl": "http://example.com/foo_dashboard",
        "lastModified": {
          "time": 0,
          "actor": "urn:li:corpuser:jdoe"
        }
      },
      "contentType": "application/json"
    }
  }
}'

Explanation:

  • entityType: Specifies the type of entity being ingested, such as chart or dashboard.
  • entityUrn: The unique identifier for the entity.
  • changeType: Indicates the type of change, typically UPSERT for creating or updating.
  • aspectName: The name of the aspect being ingested, such as chartInfo or dashboardInfo.
  • aspect: Contains the details of the aspect, including the value and contentType.

This approach allows you to ingest individual aspects for both the chart and the dashboard using the ingestProposal endpoint. Would you like more detailed guidance on any of these steps?

Sources:

<@U06TM7M40RK>, I received this error: “Unknown dereferenced type BYTES for DataMap’s schema "bytes"”,“status”:500

I’m not able to find any data sources to answer your question. Could you provide more details about the specific context or operation you were performing when you encountered this error? If this doesn’t help, please reach out to the DataHub support team for further assistance. :+1: 0 button :-1: 0 button :bulb: Hint: Mention @runllm in the thread for followups.

<@U06TM7M40RK>, I received this error: “Unknown dereferenced type BYTES for DataMap’s schema "bytes"”,“status”:500, when making a request to ingestProposal endpoint to create a chart

It seems likely that the error “Unknown dereferenced type BYTES for DataMap’s schema "bytes"” is related to an issue with the data type being used in your request payload. The BYTES type might not be correctly recognized or supported in the context of the aspect you are trying to ingest.

Here are a few steps you can take to troubleshoot and resolve this issue:

  1. Check Data Types: Ensure that the data types used in your request payload are supported by DataHub. The BYTES type might not be directly supported, and you may need to convert it to a supported type like STRING or BINARY.

  2. Review Aspect Schema: Verify that the aspect schema you are using in your request is correctly defined and does not include unsupported data types. You can refer to the <https://datahubproject.io/docs/0.14.0/metadata-service#ingesting-aspects|DataHub documentation> for guidance on supported data types and aspect schemas.

  3. Modify Request Payload: If the BYTES type is not supported, modify your request payload to use a different data type. For example, if you are using BYTES for a field, consider using STRING instead.

  4. Check for Known Issues: Review the <https://github.com/datahub-project/datahub/issues|DataHub GitHub issues> and <https://forum.datahubproject.io/|DataHub forum> for any known issues or discussions related to the BYTES type or similar errors.

  5. Provide More Details: If the issue persists, consider providing more details about the specific aspect and data types you are using in your request. This can help in diagnosing the problem more accurately.

If you would like more detailed guidance on any of these steps, please let me know!

Sources:

<@U06TM7M40RK>, how to upsert a chart with http://rest.li|rest.li API?

To upsert a chart using the Rest.li API in DataHub, you can use the ingestProposal endpoint to send a Metadata Change Proposal (MCP) for the chart entity. Here’s a step-by-step guide on how to do this:

Step 1: Prepare the Metadata Change Proposal

You need to create a JSON payload that represents the Metadata Change Proposal for the chart. This includes specifying the entity type, entity URN, change type, aspect name, and the aspect data.

Example JSON Payload

{
  "proposal": {
    "entityType": "chart",
    "entityUrn": "urn:li:chart:(looker,foo_chart)",
    "changeType": "UPSERT",
    "aspectName": "chartInfo",
    "aspect": {
      "value": {
        "title": "foo chart",
        "description": "This is the foo chart",
        "inputs": [
          {
            "string": "urn:li:dataset:(urn:li:dataPlatform:bigquery,mota-engil-datawarehouse.staging.stg_cority__waste_emissions_annual,PROD)"
          }
        ],
        "lastModified": {
          "time": 0,
          "actor": "urn:li:corpuser:jdoe"
        }
      },
      "contentType": "application/json"
    }
  }
}

Step 2: Send the Request

Use curl or any HTTP client to send a POST request to the ingestProposal endpoint with the JSON payload.

curl --location --request POST 'http://localhost:8080/aspects?action=ingestProposal' \
--header 'X-RestLi-Protocol-Version: 2.0.0' \
--header 'Content-Type: application/json' \
--data-raw '{
  "proposal": {
    "entityType": "chart",
    "entityUrn": "urn:li:chart:(looker,foo_chart)",
    "changeType": "UPSERT",
    "aspectName": "chartInfo",
    "aspect": {
      "value": {
        "title": "foo chart",
        "description": "This is the foo chart",
        "inputs": [
          {
            "string": "urn:li:dataset:(urn:li:dataPlatform:bigquery,mota-engil-datawarehouse.staging.stg_cority__waste_emissions_annual,PROD)"
          }
        ],
        "lastModified": {
          "time": 0,
          "actor": "urn:li:corpuser:jdoe"
        }
      },
      "contentType": "application/json"
    }
  }
}'

Explanation:

  • entityType: Specifies the type of entity, which is chart in this case.
  • entityUrn: The unique identifier for the chart.
  • changeType: Indicates the type of change, UPSERT is used to insert or update the entity.
  • aspectName: The name of the aspect being ingested, such as chartInfo.
  • aspect: Contains the details of the aspect, including the value and contentType.

This request will upsert the chart entity in DataHub with the specified details.

Would you like more detailed guidance on any of these steps?

Sources:

<@U06TM7M40RK>, how to upsert a chart with entities endpoint?

To upsert a chart using the /entities endpoint in DataHub, you can send a POST request to the /entities?action=ingest endpoint with the appropriate JSON payload. This payload should include the chart’s URN and its aspects, such as ChartInfo.

Here’s a step-by-step guide on how to do this:

Step 1: Prepare the JSON Payload

You need to create a JSON payload that represents the chart entity, including its URN and aspects.

Example JSON Payload

{
  "entity": {
    "value": {
      "com.linkedin.metadata.snapshot.ChartSnapshot": {
        "urn": "urn:li:chart:(looker,foo_chart)",
        "aspects": [
          {
            "com.linkedin.chart.ChartInfo": {
              "title": "foo chart",
              "description": "This is the foo chart",
              "inputs": [
                {
                  "string": "urn:li:dataset:(urn:li:dataPlatform:bigquery,mota-engil-datawarehouse.staging.stg_cority__waste_emissions_annual,PROD)"
                }
              ],
              "lastModified": {
                "created": {
                  "time": 0,
                  "actor": "urn:li:corpuser:jdoe"
                },
                "lastModified": {
                  "time": 0,
                  "actor": "urn:li:corpuser:datahub"
                }
              }
            }
          }
        ]
      }
    }
  }
}

Step 2: Send the Request

Use curl or any HTTP client to send a POST request to the /entities?action=ingest endpoint with the JSON payload.

curl 'http://localhost:8080/entities?action=ingest' -X POST --data '{
  "entity": {
    "value": {
      "com.linkedin.metadata.snapshot.ChartSnapshot": {
        "urn": "urn:li:chart:(looker,foo_chart)",
        "aspects": [
          {
            "com.linkedin.chart.ChartInfo": {
              "title": "foo chart",
              "description": "This is the foo chart",
              "inputs": [
                {
                  "string": "urn:li:dataset:(urn:li:dataPlatform:bigquery,mota-engil-datawarehouse.staging.stg_cority__waste_emissions_annual,PROD)"
                }
              ],
              "lastModified": {
                "created": {
                  "time": 0,
                  "actor": "urn:li:corpuser:jdoe"
                },
                "lastModified": {
                  "time": 0,
                  "actor": "urn:li:corpuser:datahub"
                }
              }
            }
          }
        ]
      }
    }
  }
}'