Investigating Null `lastIngested` Value for Metadata Created via Custom Ingestion

Original Slack Thread

hello, we’ve written some custom ingestion code, and I’ve noticed that when I query the GraphQL API the lastIngested value comes back as null for metadata created this way. I’m wondering what could be missing that prevents this timestamp from being set. I looked in the metadata_aspect_v2 table but I didn’t spot anything obviously amiss. Weirdly it is there for one dashboard, but not anything else.

Hey there! :wave:

<@U06TM7M40RK> Can you help with the questions in this thread?

:information_source: Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!

To address the issue of the lastIngested value being null for metadata created via custom ingestion, there are a few potential areas to investigate:

  1. System Metadata: When using custom ingestion, it’s crucial to ensure that the system metadata is correctly set on each MetadataChangeProposal (MCP). According to the <https://forum.datahubproject.io/t/adding-last-synchronized-feature-to-custom-ingestion/270|DataHub forum>, you need to set the systemMetadata field, which includes lastObserved and runId, on each MCP. This metadata is used to populate fields like lastIngested.

  2. Audit Stamps: If you are using the REST emitter or similar methods, ensure that audit stamps are included. As per a <Slack discussion>, audit stamps such as createdOn and updatedOn are inferred from the Upstream object in the UpstreamLineage aspect. These timestamps help in determining the recency of the data.

  3. GraphQL API and System Metadata: The lastIngested field is derived from system metadata. According to the https://github.com/datahub-project/datahub/blob/master/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/types/common/mappers/util/SystemMetadataUtils.java|SystemMetadataUtils.java, the getLastIngestedTime method retrieves this information from the EnvelopedAspectMap. If the system metadata is not set correctly, this value will be null.

  4. Ingestion Process: Double-check your custom ingestion process to ensure that all necessary fields and metadata are being set. If one dashboard has the lastIngested value while others do not, compare the ingestion process for the dashboard that works with those that do not to identify any discrepancies.

Would you like more detailed guidance on setting up the system metadata or any other specific aspect of the ingestion process?

Sources:

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><