<@U06TM7M40RK> graphql to get all s3 assets with a downstream asset
To retrieve all S3 assets with a downstream asset using GraphQL in DataHub, you can use the searchAcrossLineage
query. This query allows you to search for entities and their lineage relationships. Here’s a step-by-step guide on how to construct this query:
-
Define the Query: Use the
searchAcrossLineage
query to search for S3 assets. You will need to specify theurn
of the S3 assets and set thedirection
toDOWNSTREAM
to find downstream assets. -
Set the Filters: Use filters to specify that you are interested in S3 assets. You can filter by the platform or any other relevant field that identifies S3 assets.
-
Specify the Degree: If you are interested in direct downstream assets, set the
degree
to1
. If you want to include indirect downstream assets, you can specify higher degrees.
Here is an example GraphQL query:
query searchS3AssetsWithDownstream {
searchAcrossLineage(
input: {
urn: "urn:li:dataset:(urn:li:dataPlatform:s3,YourS3AssetName,PROD)"
query: "*"
start: 0
count: 10
direction: DOWNSTREAM
orFilters: [
{
and: [
{
field: "degree"
condition: EQUAL
values: ["1"]
}
]
}
]
}
) {
searchResults {
entity {
urn
type
... on Dataset {
name
}
}
paths {
path {
urn
type
}
}
degree
}
}
}
Explanation:
urn
: Replace"urn:li:dataset:(urn:li:dataPlatform:s3,YourS3AssetName,PROD)"
with the actual URN of your S3 assets.direction
: Set toDOWNSTREAM
to find downstream assets.degree
: Set to1
for direct downstream assets.
This query will return the S3 assets along with their downstream entities.
Would you like more detailed information or assistance with a specific part of this query?
Sources:
- https://datahubspace.slack.com/archives/C029A3M079U/p1692614240.589429|https://datahubspace.slack.com/archives/C029A3M079U/p1692614240.589429
- https://datahubspace.slack.com/archives/CV2KB471C/p1683608326.611839|https://datahubspace.slack.com/archives/CV2KB471C/p1683608326.611839
0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> I do not want to look at the downstream asset of a specific s3 asset. I want to check how many s3 assets have atleast one downstream asset
It seems likely that you can achieve this by using the aggregateAcrossEntities
GraphQL query to count the number of S3 assets that have at least one downstream asset. This query allows you to aggregate data across entities based on specific conditions.
Here’s a step-by-step guide on how to construct this query:
-
Define the Query: Use the
aggregateAcrossEntities
query to aggregate S3 assets. -
Set the Filters: Use filters to specify that you are interested in S3 assets with downstream relationships.
-
Specify the Conditions: You can use conditions to filter entities that have downstream relationships.
Here is an example GraphQL query:
query aggregateS3AssetsWithDownstream {
aggregateAcrossEntities(
input: {
types: ["DATASET"]
facets: ["_entityType"]
orFilters: [
{
and: [
{
field: "platform"
condition: EQUAL
values: ["s3"]
},
{
field: "downstream"
condition: EXISTS
negated: false
}
]
}
]
}
) {
facets {
field
displayName
aggregations {
value
count
}
}
}
}
Explanation:
types
: Set to["DATASET"]
to focus on dataset entities.facets
: Use["_entityType"]
to aggregate by entity type.orFilters
: Use filters to specify that you are interested in S3 assets (platform: "s3"
) and that have downstream relationships (downstream: EXISTS
).
This query will return the count of S3 assets that have at least one downstream asset.
Would you like more detailed information or assistance with a specific part of this query?
Sources:
- https://datahubspace.slack.com/archives/C029A3M079U/p1708349471.811119|https://datahubspace.slack.com/archives/C029A3M079U/p1708349471.811119
0 button
0 button
Hint: Mention @runllm in the thread for followups.