<@U06TM7M40RK>, how to manually create assets in Datahub? I want to create a dashboard and link to a dataset.
To manually create assets in DataHub, such as a dashboard and link it to a dataset, you can use the DataHub API or the Python SDK. Here’s a general approach:
Using the DataHub API
-
Create a Dataset: You can create a dataset using the DataHub API by sending a POST request to the
/entities?action=ingest
endpoint. You need to define the dataset’s URN and its aspects, such as schema metadata. -
Create a Dashboard: Similarly, you can create a dashboard entity by sending a POST request to the same endpoint. Define the dashboard’s URN and its aspects, such as ownership and properties.
-
Link the Dashboard to the Dataset: To link the dashboard to the dataset, you can use the
DataJobInputOutput
aspect to define the relationship between the dataset and the dashboard.
Using the Python SDK
-
Install the DataHub Python SDK: Ensure you have the
acryl-datahub
package installed.pip install acryl-datahub[datahub-rest]
-
Initialize the REST Emitter: Set up the REST emitter to communicate with your DataHub instance.
from datahub.emitter.rest_emitter import DatahubRestEmitter emitter = DatahubRestEmitter(gms_server="http://localhost:8080")
-
Create and Emit Dataset: Define the dataset’s schema and emit it to DataHub.
from datahub.emitter.mce_builder import make_dataset_urn from datahub.metadata.schema_classes import SchemaMetadataClass, SchemaFieldClass, SchemaFieldDataTypeClass, StringTypeClass, AuditStampClass dataset_urn = make_dataset_urn(platform="hive", name="example_dataset", env="PROD") schema_fields = [ SchemaFieldClass( fieldPath="field1", type=SchemaFieldDataTypeClass(type=StringTypeClass()), nativeDataType="VARCHAR(50)", description="Description of field1", lastModified=AuditStampClass(time=1640692800000, actor="urn:li:corpuser:ingestion") ) ] schema_metadata = SchemaMetadataClass( schemaName="example_schema", platform=make_dataset_urn("hive"), version=0, fields=schema_fields, lastModified=AuditStampClass(time=1640692800000, actor="urn:li:corpuser:ingestion") ) emitter.emit(schema_metadata)
-
Create and Emit Dashboard: Define the dashboard and emit it to DataHub.
from datahub.emitter.mce_builder import make_dashboard_urn from datahub.metadata.schema_classes import DashboardInfoClass dashboard_urn = make_dashboard_urn(platform="looker", name="example_dashboard", env="PROD") dashboard_info = DashboardInfoClass( title="Example Dashboard", description="This is an example dashboard", lastModified=AuditStampClass(time=1640692800000, actor="urn:li:corpuser:ingestion") ) emitter.emit(dashboard_info)
-
Link Dashboard to Dataset: Use the
DataJobInputOutput
aspect to link the dataset to the dashboard.
from datahub.metadata.schema_classes import DataJobInputOutputClass
datajob_io = DataJobInputOutputClass(
inputDatasets=[dataset_urn],
outputDatasets=[]
)
emitter.emit(datajob_io)
This approach allows you to manually create and link assets in DataHub using the API or Python SDK. Would you like more detailed instructions or have any specific questions about this process?
Sources:
- <https://github.com/datahub-project/datahub/blob/master/docs/api/restli/restli-overview.md|Rest.li API Overview>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK>, create to me a request to the API to create a dashboard with name foo dashboard
that reads from table urn:li:dataset:(urn:li:dataPlatform:bigquery,mota-engil-datawarehouse.staging.stg_cority__waste_emissions_annual,PROD)
To create a dashboard named foo dashboard
that reads from a specific dataset using the DataHub API, you can use a POST request to the /entities?action=ingest
endpoint. Below is an example of how you can structure the request using curl
:
curl 'http://localhost:8080/entities?action=ingest' -X POST --data '{
"entity": {
"value": {
"com.linkedin.metadata.snapshot.DashboardSnapshot": {
"urn": "urn:li:dashboard:(looker,foo_dashboard)",
"aspects": [
{
"com.linkedin.dashboard.DashboardInfo": {
"title": "foo dashboard",
"description": "This is the foo dashboard",
"charts": [],
"inputs": [
{
"string": "urn:li:dataset:(urn:li:dataPlatform:bigquery,mota-engil-datawarehouse.staging.stg_cority__waste_emissions_annual,PROD)"
}
],
"lastModified": {
"created": {
"time": 0,
"actor": "urn:li:corpuser:jdoe"
},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:datahub"
}
}
}
}
]
}
}
}
}'
Explanation:
- URN: The URN for the dashboard is specified as
urn:li:dashboard:(looker,foo_dashboard)
. - Title and Description: The dashboard is given a title “foo dashboard” and a description “This is the foo dashboard”.
- Inputs: The
inputs
field specifies the dataset that the dashboard reads from, using the provided dataset URN. - Last Modified: The
lastModified
field includes metadata about when the dashboard was created and last modified, along with the actors responsible.
This request will create a dashboard in DataHub with the specified name and link it to the given dataset.
Would you like more detailed guidance on any of these steps?
Sources:
- https://datahubproject.io/docs/0.13.1/metadata-service#create-a-chart|https://datahubproject.io/docs/0.13.1/metadata-service#create-a-chart
- https://datahubproject.io/docs/0.14.0/metadata-service#create-a-chart|https://datahubproject.io/docs/0.14.0/metadata-service#create-a-chart
0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK>, how to include a link to the dashboard in this request?
To include a link to the dashboard in your request, you can use the dashboardUrl
field within the DashboardInfo
aspect. This field allows you to specify a URL that can be used as an external link to access or view the dashboard.
Here’s how you can modify the previous request to include a link to the dashboard:
curl 'http://localhost:8080/entities?action=ingest' -X POST --data '{
"entity": {
"value": {
"com.linkedin.metadata.snapshot.DashboardSnapshot": {
"urn": "urn:li:dashboard:(looker,foo_dashboard)",
"aspects": [
{
"com.linkedin.dashboard.DashboardInfo": {
"title": "foo dashboard",
"description": "This is the foo dashboard",
"charts": [],
"inputs": [
{
"string": "urn:li:dataset:(urn:li:dataPlatform:bigquery,mota-engil-datawarehouse.staging.stg_cority__waste_emissions_annual,PROD)"
}
],
"dashboardUrl": "http://example.com/foo_dashboard",
"lastModified": {
"created": {
"time": 0,
"actor": "urn:li:corpuser:jdoe"
},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:datahub"
}
}
}
}
]
}
}
}
}'
Explanation:
- dashboardUrl: This field is added to the
DashboardInfo
aspect to specify the URL of the dashboard. Replace"http://example.com/foo_dashboard"
with the actual URL you want to use.
This modification will ensure that the dashboard entity in DataHub includes a link that users can click to access the dashboard directly.
Would you like more detailed guidance on any of these steps?
Sources:
- https://datahubproject.io/docs/0.13.1/python-sdk/models#datahub.metadata.schema_classes.DashboardInfoClass|DashboardInfoClass
0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK>, the api doesn’t accept “inputs” in a dashboard, should I create a chart to link to the dataset?
It seems likely that the DataHub API does not directly support specifying “inputs” for a dashboard entity. Instead, you can create a chart entity and link it to the dataset, then associate the chart with the dashboard. This approach aligns with the typical structure where dashboards are collections of charts, and charts are the entities that directly reference datasets.
Here’s how you can proceed:
-
Create a Chart: First, create a chart entity that references the dataset. This chart will act as the intermediary between the dataset and the dashboard.
-
Link the Chart to the Dataset: Use the
inputs
field in theChartInfo
aspect to specify the dataset URN. -
Associate the Chart with the Dashboard: Add the chart to the dashboard by including it in the
charts
field of theDashboardInfo
aspect.
Example Request to Create a Chart
curl 'http://localhost:8080/entities?action=ingest' -X POST --data '{
"entity": {
"value": {
"com.linkedin.metadata.snapshot.ChartSnapshot": {
"urn": "urn:li:chart:(looker,foo_chart)",
"aspects": [
{
"com.linkedin.chart.ChartInfo": {
"title": "foo chart",
"description": "This is the foo chart",
"inputs": [
{
"string": "urn:li:dataset:(urn:li:dataPlatform:bigquery,mota-engil-datawarehouse.staging.stg_cority__waste_emissions_annual,PROD)"
}
],
"lastModified": {
"created": {
"time": 0,
"actor": "urn:li:corpuser:jdoe"
},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:datahub"
}
}
}
}
]
}
}
}
}'
Example Request to Create a Dashboard with the Chart
curl 'http://localhost:8080/entities?action=ingest' -X POST --data '{
"entity": {
"value": {
"com.linkedin.metadata.snapshot.DashboardSnapshot": {
"urn": "urn:li:dashboard:(looker,foo_dashboard)",
"aspects": [
{
"com.linkedin.dashboard.DashboardInfo": {
"title": "foo dashboard",
"description": "This is the foo dashboard",
"charts": [
{
"string": "urn:li:chart:(looker,foo_chart)"
}
],
"dashboardUrl": "http://example.com/foo_dashboard",
"lastModified": {
"created": {
"time": 0,
"actor": "urn:li:corpuser:jdoe"
},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:datahub"
}
}
}
}
]
}
}
}
}'
This approach should help you create a dashboard that indirectly links to a dataset through a chart. Would you like more detailed guidance on any of these steps?
Sources:
- <https://github.com/datahub-project/datahub/blob/master/docs/api/restli/restli-overview.md|Rest.li API Overview>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK>, how to make this requests using the ingesting aspects endpoint?
To create a dashboard and a chart using the ingestProposal
endpoint for ingesting aspects, you need to send separate requests for each aspect you want to ingest. Here’s how you can do it:
Step 1: Ingest the Chart Aspect
First, create a chart and link it to the dataset using the ingestProposal
endpoint.
curl --location --request POST 'http://localhost:8080/aspects?action=ingestProposal' \
--header 'X-RestLi-Protocol-Version: 2.0.0' \
--header 'Content-Type: application/json' \
--data-raw '{
"proposal": {
"entityType": "chart",
"entityUrn": "urn:li:chart:(looker,foo_chart)",
"changeType": "UPSERT",
"aspectName": "chartInfo",
"aspect": {
"value": {
"title": "foo chart",
"description": "This is the foo chart",
"inputs": [
{
"string": "urn:li:dataset:(urn:li:dataPlatform:bigquery,mota-engil-datawarehouse.staging.stg_cority__waste_emissions_annual,PROD)"
}
],
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:jdoe"
}
},
"contentType": "application/json"
}
}
}'
Step 2: Ingest the Dashboard Aspect
Next, create a dashboard and associate it with the chart using the ingestProposal
endpoint.
curl --location --request POST 'http://localhost:8080/aspects?action=ingestProposal' \
--header 'X-RestLi-Protocol-Version: 2.0.0' \
--header 'Content-Type: application/json' \
--data-raw '{
"proposal": {
"entityType": "dashboard",
"entityUrn": "urn:li:dashboard:(looker,foo_dashboard)",
"changeType": "UPSERT",
"aspectName": "dashboardInfo",
"aspect": {
"value": {
"title": "foo dashboard",
"description": "This is the foo dashboard",
"charts": [
{
"string": "urn:li:chart:(looker,foo_chart)"
}
],
"dashboardUrl": "http://example.com/foo_dashboard",
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:jdoe"
}
},
"contentType": "application/json"
}
}
}'
Explanation:
- entityType: Specifies the type of entity being ingested, such as
chart
ordashboard
. - entityUrn: The unique identifier for the entity.
- changeType: Indicates the type of change, typically
UPSERT
for creating or updating. - aspectName: The name of the aspect being ingested, such as
chartInfo
ordashboardInfo
. - aspect: Contains the details of the aspect, including the
value
andcontentType
.
This approach allows you to ingest individual aspects for both the chart and the dashboard using the ingestProposal
endpoint. Would you like more detailed guidance on any of these steps?
Sources:
- <https://datahubproject.io/docs/0.14.0/metadata-service#ingesting-aspects|Ingesting Aspects>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK>, I received this error: “Unknown dereferenced type BYTES for DataMap’s schema "bytes"”,“status”:500
I’m not able to find any data sources to answer your question. Could you provide more details about the specific context or operation you were performing when you encountered this error? If this doesn’t help, please reach out to the DataHub support team for further assistance. 0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK>, I received this error: “Unknown dereferenced type BYTES for DataMap’s schema "bytes"”,“status”:500, when making a request to ingestProposal endpoint to create a chart
It seems likely that the error “Unknown dereferenced type BYTES for DataMap’s schema "bytes"” is related to an issue with the data type being used in your request payload. The BYTES
type might not be correctly recognized or supported in the context of the aspect you are trying to ingest.
Here are a few steps you can take to troubleshoot and resolve this issue:
-
Check Data Types: Ensure that the data types used in your request payload are supported by DataHub. The
BYTES
type might not be directly supported, and you may need to convert it to a supported type likeSTRING
orBINARY
. -
Review Aspect Schema: Verify that the aspect schema you are using in your request is correctly defined and does not include unsupported data types. You can refer to the <https://datahubproject.io/docs/0.14.0/metadata-service#ingesting-aspects|DataHub documentation> for guidance on supported data types and aspect schemas.
-
Modify Request Payload: If the
BYTES
type is not supported, modify your request payload to use a different data type. For example, if you are usingBYTES
for a field, consider usingSTRING
instead. -
Check for Known Issues: Review the <https://github.com/datahub-project/datahub/issues|DataHub GitHub issues> and <https://forum.datahubproject.io/|DataHub forum> for any known issues or discussions related to the
BYTES
type or similar errors. -
Provide More Details: If the issue persists, consider providing more details about the specific aspect and data types you are using in your request. This can help in diagnosing the problem more accurately.
If you would like more detailed guidance on any of these steps, please let me know!
Sources:
- <https://datahubproject.io/docs/0.14.0/metadata-service#ingesting-aspects|DataHub documentation>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK>, how to upsert a chart with http://rest.li|rest.li API?
To upsert a chart using the Rest.li API in DataHub, you can use the ingestProposal
endpoint to send a Metadata Change Proposal (MCP) for the chart entity. Here’s a step-by-step guide on how to do this:
Step 1: Prepare the Metadata Change Proposal
You need to create a JSON payload that represents the Metadata Change Proposal for the chart. This includes specifying the entity type, entity URN, change type, aspect name, and the aspect data.
Example JSON Payload
{
"proposal": {
"entityType": "chart",
"entityUrn": "urn:li:chart:(looker,foo_chart)",
"changeType": "UPSERT",
"aspectName": "chartInfo",
"aspect": {
"value": {
"title": "foo chart",
"description": "This is the foo chart",
"inputs": [
{
"string": "urn:li:dataset:(urn:li:dataPlatform:bigquery,mota-engil-datawarehouse.staging.stg_cority__waste_emissions_annual,PROD)"
}
],
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:jdoe"
}
},
"contentType": "application/json"
}
}
}
Step 2: Send the Request
Use curl
or any HTTP client to send a POST request to the ingestProposal
endpoint with the JSON payload.
curl --location --request POST 'http://localhost:8080/aspects?action=ingestProposal' \
--header 'X-RestLi-Protocol-Version: 2.0.0' \
--header 'Content-Type: application/json' \
--data-raw '{
"proposal": {
"entityType": "chart",
"entityUrn": "urn:li:chart:(looker,foo_chart)",
"changeType": "UPSERT",
"aspectName": "chartInfo",
"aspect": {
"value": {
"title": "foo chart",
"description": "This is the foo chart",
"inputs": [
{
"string": "urn:li:dataset:(urn:li:dataPlatform:bigquery,mota-engil-datawarehouse.staging.stg_cority__waste_emissions_annual,PROD)"
}
],
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:jdoe"
}
},
"contentType": "application/json"
}
}
}'
Explanation:
- entityType: Specifies the type of entity, which is
chart
in this case. - entityUrn: The unique identifier for the chart.
- changeType: Indicates the type of change,
UPSERT
is used to insert or update the entity. - aspectName: The name of the aspect being ingested, such as
chartInfo
. - aspect: Contains the details of the aspect, including the
value
andcontentType
.
This request will upsert the chart entity in DataHub with the specified details.
Would you like more detailed guidance on any of these steps?
Sources:
- <https://github.com/datahub-project/datahub/blob/master/docs/api/restli/restli-overview.md|Ingesting Aspects>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK>, how to upsert a chart with entities endpoint?
To upsert a chart using the /entities
endpoint in DataHub, you can send a POST request to the /entities?action=ingest
endpoint with the appropriate JSON payload. This payload should include the chart’s URN and its aspects, such as ChartInfo
.
Here’s a step-by-step guide on how to do this:
Step 1: Prepare the JSON Payload
You need to create a JSON payload that represents the chart entity, including its URN and aspects.
Example JSON Payload
{
"entity": {
"value": {
"com.linkedin.metadata.snapshot.ChartSnapshot": {
"urn": "urn:li:chart:(looker,foo_chart)",
"aspects": [
{
"com.linkedin.chart.ChartInfo": {
"title": "foo chart",
"description": "This is the foo chart",
"inputs": [
{
"string": "urn:li:dataset:(urn:li:dataPlatform:bigquery,mota-engil-datawarehouse.staging.stg_cority__waste_emissions_annual,PROD)"
}
],
"lastModified": {
"created": {
"time": 0,
"actor": "urn:li:corpuser:jdoe"
},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:datahub"
}
}
}
}
]
}
}
}
}
Step 2: Send the Request
Use curl
or any HTTP client to send a POST request to the /entities?action=ingest
endpoint with the JSON payload.
curl 'http://localhost:8080/entities?action=ingest' -X POST --data '{
"entity": {
"value": {
"com.linkedin.metadata.snapshot.ChartSnapshot": {
"urn": "urn:li:chart:(looker,foo_chart)",
"aspects": [
{
"com.linkedin.chart.ChartInfo": {
"title": "foo chart",
"description": "This is the foo chart",
"inputs": [
{
"string": "urn:li:dataset:(urn:li:dataPlatform:bigquery,mota-engil-datawarehouse.staging.stg_cority__waste_emissions_annual,PROD)"
}
],
"lastModified": {
"created": {
"time": 0,
"actor": "urn:li:corpuser:jdoe"
},
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:datahub"
}
}
}
}
]
}
}
}
}'