hello, we’ve written some custom ingestion code, and I’ve noticed that when I query the GraphQL API the lastIngested
value comes back as null for metadata created this way. I’m wondering what could be missing that prevents this timestamp from being set. I looked in the metadata_aspect_v2
table but I didn’t spot anything obviously amiss. Weirdly it is there for one dashboard, but not anything else.
Hey there!
<@U06TM7M40RK> Can you help with the questions in this thread?
Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!
To address the issue of the lastIngested
value being null for metadata created via custom ingestion, there are a few potential areas to investigate:
-
System Metadata: When using custom ingestion, it’s crucial to ensure that the system metadata is correctly set on each
MetadataChangeProposal
(MCP). According to the <https://forum.datahubproject.io/t/adding-last-synchronized-feature-to-custom-ingestion/270|DataHub forum>, you need to set thesystemMetadata
field, which includeslastObserved
andrunId
, on each MCP. This metadata is used to populate fields likelastIngested
. -
Audit Stamps: If you are using the REST emitter or similar methods, ensure that audit stamps are included. As per a <Slack discussion>, audit stamps such as
createdOn
andupdatedOn
are inferred from theUpstream
object in theUpstreamLineage
aspect. These timestamps help in determining the recency of the data. -
GraphQL API and System Metadata: The
lastIngested
field is derived from system metadata. According to the https://github.com/datahub-project/datahub/blob/master/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/types/common/mappers/util/SystemMetadataUtils.java|SystemMetadataUtils.java, thegetLastIngestedTime
method retrieves this information from theEnvelopedAspectMap
. If the system metadata is not set correctly, this value will be null. -
Ingestion Process: Double-check your custom ingestion process to ensure that all necessary fields and metadata are being set. If one dashboard has the
lastIngested
value while others do not, compare the ingestion process for the dashboard that works with those that do not to identify any discrepancies.
Would you like more detailed guidance on setting up the system metadata or any other specific aspect of the ingestion process?
Sources:
- <https://forum.datahubproject.io/t/adding-last-synchronized-feature-to-custom-ingestion/270|DataHub forum on custom ingestion>
- <Slack discussion on audit stamps>
- https://github.com/datahub-project/datahub/blob/master/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/types/common/mappers/util/SystemMetadataUtils.java|SystemMetadataUtils.java 1 button 0 button Hint: Mention <@U06TM7M40RK> in the thread for followups.
Hi ! Did this interaction resolve your issue or do you still require assistance ?
Please tag @ RunLLM if you still need further assistance !
Or tag @Datahub Community Support if you require a person to take a look
Hope this helps ! ><