<@U06TM7M40RK> any api to scrap ingrestion related details for monitoring
To scrape ingestion-related details for monitoring in DataHub, you can use the DataHub GraphQL API. Specifically, the ingestionSourceExecutionRequests
query can be used to fetch details about ingestion runs, including their status. Here’s a brief overview of how you can achieve this:
-
GraphQL Query: Use the
ingestionSourceExecutionRequests
query to retrieve information about ingestion runs. This query will provide details such as the run ID, start time, end time, and status. -
Example GraphQL Query:
{ ingestionSourceExecutionRequests(input: {start: 0, count: 10}) { start count total executionRequests { id status startTime endTime ingestionSource { name } } } }
-
Using
curl
: You can execute the GraphQL query usingcurl
as follows:curl -X POST "http://localhost:8080/api/graphql" -H "Content-Type: application/json" -d '{ "query": "{ ingestionSourceExecutionRequests(input: {start: 0, count: 10}) { start count total executionRequests { id status startTime endTime ingestionSource { name } } } }" }'
-
Using Python: You can also use Python to execute the GraphQL query:
import requests url = "http://localhost:8080/api/graphql" query = { "query": """ { ingestionSourceExecutionRequests(input: {start: 0, count: 10}) { start count total executionRequests { id status startTime endTime ingestionSource { name } } } } """ } response = requests.post(url, json=query) print(response.json())
This approach allows you to programmatically monitor the status of ingestion processes in DataHub.
Would you like more detailed steps or additional examples?
Sources:
- <Slack Slack Discussion on Ingestion Status>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> by default is these data are exposed ?
By default, DataHub’s GraphQL API is designed to expose metadata-related information, including ingestion details, to users who have the appropriate permissions. However, access to this data is controlled by DataHub’s security and access control mechanisms. Here are some key points to consider:
-
Access Control: Access to the GraphQL API and the data it exposes is typically governed by user roles and permissions. Users need to have the necessary permissions to query ingestion-related data.
-
Authentication: The GraphQL API is usually protected by authentication mechanisms. Users must authenticate themselves to access the API, ensuring that only authorized users can retrieve sensitive information.
-
Configuration: The level of detail and the specific data exposed by the API can be configured by the DataHub administrators. This means that while the API is capable of exposing ingestion details, the actual data available to a user depends on the configuration and permissions set by the administrators.
-
Security Best Practices: It is recommended to follow security best practices, such as using secure connections (HTTPS), managing API keys securely, and regularly reviewing user permissions to ensure that only authorized users have access to sensitive data.
If you have specific concerns about data exposure or need to configure access controls, it would be best to consult with your DataHub administrator or refer to the DataHub documentation for guidance on securing the API.
Would you like more information on configuring access controls or any other specific aspect of DataHub’s security?
Sources:
- https://datahubspace.slack.com/archives/C029A3M079U/p1710166913.482859|https://datahubspace.slack.com/archives/C029A3M079U/p1710166913.482859
- https://datahubspace.slack.com/archives/CUMUWQU66/p1693832029.210239|https://datahubspace.slack.com/archives/CUMUWQU66/p1693832029.210239
0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> DataHubGraph how to use this
To use DataHubGraph
in Python, you need to interact with the DataHub’s metadata graph through its API. DataHubGraph
is part of the DataHub Python SDK, which allows you to programmatically interact with DataHub’s metadata. Here’s a step-by-step guide on how to use it:
-
Installation: Ensure you have the DataHub Python SDK installed. You can install it using pip:
pip install --upgrade acryl-datahub
-
Configuration: Set up the
DataHubGraph
client with the necessary configuration, including the server endpoint and authentication token. -
Example Usage: Below is an example of how to use
DataHubGraph
to fetch metadata about a dataset:from datahub.ingestion.graph.client import DatahubClientConfig, DataHubGraph # Set the GMS endpoint and your personal access token gms_endpoint = "http://localhost:8080" access_token = "<your-personal-access-token>" # Configure the DataHubGraph client with the access token config = DatahubClientConfig( server=gms_endpoint, extra_headers={"Authorization": f"Bearer {access_token}"} ) graph = DataHubGraph(config) # Example: Fetch dataset metadata dataset_urn = "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_created,PROD)" dataset_metadata = graph.get_aspects_for_entity( entity_urn=dataset_urn, aspects=["schemaMetadata"] ) print(dataset_metadata)
-
Authentication: Make sure to replace
<your-personal-access-token>
with a valid token. You can generate a personal access token from your DataHub instance. -
Operations: You can perform various operations using
DataHubGraph
, such as fetching metadata, updating entities, and querying relationships.
Would you like more detailed steps or specific examples on using DataHubGraph
for other operations?
Sources:
- <Slack Slack Discussion on Using DataHubGraph>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> {“operationName”:“getHighlights”,“variables”:{},“query”:“query getHighlights {\n getHighlights {\n value\n title\n body\n __typename\n }\n}\n”}
how to call using datahubgraph
To call the getHighlights
query using the DataHubGraph
client in Python, you need to execute a GraphQL query through the client. Here’s how you can do it:
-
Set Up the DataHubGraph Client: Ensure you have the DataHub Python SDK installed and set up the
DataHubGraph
client with the necessary configuration, including the server endpoint and authentication token. -
Execute the GraphQL Query: Use the
execute_graphql
method of theDataHubGraph
client to execute thegetHighlights
query.
Here is a step-by-step example:
from datahub.ingestion.graph.client import DatahubClientConfig, DataHubGraph
# Set the GMS endpoint and your personal access token
gms_endpoint = "http://localhost:8080"
access_token = "<your-personal-access-token>"
# Configure the DataHubGraph client with the access token
config = DatahubClientConfig(
server=gms_endpoint,
extra_headers={"Authorization": f"Bearer {access_token}"}
)
graph = DataHubGraph(config)
# Define the GraphQL query for getHighlights
query = """
query getHighlights {
getHighlights {
value
title
body
__typename
}
}
"""
# Execute the GraphQL query
result = graph.execute_graphql(query=query)
# Print the result
print(result)
Explanation:
- Configuration: The
DatahubClientConfig
is configured with the GMS endpoint and theAuthorization
header containing theBearer
token. - GraphQL Query: The
getHighlights
query is defined to fetch highlights, includingvalue
,title
,body
, and__typename
. - Execution: The
execute_graphql
method is used to execute the query and print the results.
Make sure to replace <your-personal-access-token>
with your actual access token.
Would you like more detailed steps or additional examples?
Sources:
- <https://github.com/datahub-project/datahub/blob/master/datahub-web-react/src/graphql/highlights.graphql|DataHub GraphQL Highlights Query>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> {“operationName”:“getAnalyticsCharts”,“variables”:{},“query”:“query getAnalyticsCharts {\n getAnalyticsCharts {\n groupId\n title\n charts {\n …analyticsChart\n __typename\n }\n __typename\n }\n}\n\nfragment analyticsChart on AnalyticsChart {\n … on TimeSeriesChart {\n title\n lines {\n name\n data {\n x\n y\n __typename\n }\n __typename\n }\n dateRange {\n start\n end\n __typename\n }\n interval\n __typename\n }\n … on BarChart {\n title\n bars {\n name\n segments {\n label\n value\n __typename\n }\n __typename\n }\n __typename\n }\n … on TableChart {\n title\n columns\n rows {\n values\n cells {\n value\n linkParams {\n searchParams {\n types\n query\n filters {\n field\n values\n __typename\n }\n __typename\n }\n entityProfileParams {\n urn\n type\n __typename\n }\n __typename\n }\n __typename\n }\n __typename\n }\n __typename\n }\n __typename\n}\n”}
To execute the getAnalyticsCharts
GraphQL query using the DataHubGraph
client in Python, you can follow these steps. This query retrieves analytics charts, including time series, bar charts, and table charts, from DataHub.
-
Set Up the DataHubGraph Client: Ensure you have the DataHub Python SDK installed and configure the
DataHubGraph
client with the necessary server endpoint and authentication token. -
Execute the GraphQL Query: Use the
execute_graphql
method of theDataHubGraph
client to execute thegetAnalyticsCharts
query.
Here is a step-by-step example:
from datahub.ingestion.graph.client import DatahubClientConfig, DataHubGraph
# Set the GMS endpoint and your personal access token
gms_endpoint = "http://localhost:8080"
access_token = "<your-personal-access-token>"
# Configure the DataHubGraph client with the access token
config = DatahubClientConfig(
server=gms_endpoint,
extra_headers={"Authorization": f"Bearer {access_token}"}
)
graph = DataHubGraph(config)
# Define the GraphQL query for getAnalyticsCharts
query = """
query getAnalyticsCharts {
getAnalyticsCharts {
groupId
title
charts {
...analyticsChart
__typename
}
__typename
}
}
fragment analyticsChart on AnalyticsChart {
... on TimeSeriesChart {
title
lines {
name
data {
x
y
__typename
}
__typename
}
dateRange {
start
end
__typename
}
interval
__typename
}
... on BarChart {
title
bars {
name
segments {
label
value
__typename
}
__typename
}
__typename
}
... on TableChart {
title
columns
rows {
values
cells {
value
linkParams {
searchParams {
types
query
filters {
field
values
__typename
}
__typename
}
entityProfileParams {
urn
type
__typename
}
__typename
}
__typename
}
__typename
}
__typename
}
__typename
}
"""
# Execute the GraphQL query
result = graph.execute_graphql(query=query)
# Print the result
print(result)
Explanation:
- Configuration: The
DatahubClientConfig
is configured with the GMS endpoint and theAuthorization
header containing theBearer
token. - GraphQL Query: The
getAnalyticsCharts
query is defined to fetch analytics charts, including time series, bar charts, and table charts. - Execution: The
execute_graphql
method is used to execute the query and print the results.
Make sure to replace <your-personal-access-token>
with your actual access token.
Would you like more detailed steps or additional examples?
Sources:
- <https://github.com/datahub-project/datahub/blob/master/datahub-web-react/src/graphql/analytics.graphql|DataHub GraphQL Analytics Charts Query>
0 button
0 button
Hint: Mention @runllm in the thread for followups.