<@U06TM7M40RK> why is schema field description set using dataset upsert cli on snowflake table field removed when next snowflake ingestion is run
Sure thing! I’m looking through the available data to answer your question. Hold on a second…
<@U06TM7M40RK> anything here
Your question is a bit vague—could you clarify what you mean by “anything here”? Are you asking if there is any information in the previous answer that addresses your issue, or are you looking for something specific about DataHub, Snowflake ingestion, or schema field descriptions?
If you are asking whether DataHub provides a built-in safeguard to prevent Snowflake ingestion from overwriting schema field descriptions set via the CLI, the answer is:
- No, there is currently no built-in safeguard in DataHub that prevents Snowflake ingestion from overwriting technical (SchemaMetadata) field descriptions with null or empty values from the source. This is confirmed in <Slack Slack discussion>.
However, user-edited descriptions (set via the UI or CLI and stored in EditableSchemaMetadata) are intended to be preserved and shown in the UI, even after ingestion, as long as the ingestion process does not explicitly overwrite the EditableSchemaMetadata aspect. For more on how DataHub handles these aspects, see the <https://datahubproject.io/docs/0.14.0/generated/metamodel/entities/dataset#editableschemametadata|EditableSchemaMetadata documentation>.
If you have a more specific scenario or requirement, could you please provide more details? For example:
- Are you looking for a way to configure ingestion to avoid overwriting certain fields?
- Are you asking about how to merge or synchronize descriptions between Snowflake and DataHub?
- Or do you have a different question about DataHub’s metadata handling?
Let me know how I can help further!
Sources:
- <https://datahubproject.io/docs/0.14.0/generated/metamodel/entities/dataset#editableschemametadata|EditableSchemaMetadata documentation>
- <Slack Slack discussion>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> why is schema field description set using dataset upsert cli on snowflake table field removed when next snowflake ingestion is run
The reason your schema field (column) description—set using the DataHub CLI’s dataset upsert command—gets removed after the next Snowflake ingestion is due to how DataHub manages and prioritizes metadata from different sources.
Key Points:
-
Two Aspects for Descriptions:
- SchemaMetadata: Populated by ingestion jobs (e.g., Snowflake source).
- EditableSchemaMetadata: Populated by user edits via the UI or CLI (such as the upsert command).
-
Ingestion Overwrites Technical Metadata:
When you run a Snowflake ingestion, DataHub fetches the latest schema from Snowflake and updates the SchemaMetadata aspect. If the column in Snowflake has no description, the ingested SchemaMetadata will have an empty or null description for that field, which can overwrite any previous technical description. -
UI Display Logic:
DataHub’s UI always shows the description from EditableSchemaMetadata if it exists; otherwise, it falls back to SchemaMetadata.- If you set a description via the CLI and it goes into EditableSchemaMetadata, it should be preserved and shown in the UI, even after ingestion.
- However, if your CLI upsert is updating SchemaMetadata (not EditableSchemaMetadata), the next ingestion will overwrite it with whatever is in Snowflake (including null/empty).
-
No Built-in Safeguard for Technical Descriptions:
There is currently no built-in safeguard to prevent ingestion from overwriting SchemaMetadata with null/empty values from Snowflake. This is confirmed in <Slack Slack discussion>.
Summary:
If you want your descriptions to persist across ingestions, you should ensure they are set in EditableSchemaMetadata (via UI or the correct CLI method). Ingested technical descriptions (SchemaMetadata) will always be replaced by the latest from Snowflake, including empty values if the source has no description.
Would you like more detail on how to ensure your CLI edits persist, or on the technical distinction between these aspects?
Sources:
- <Slack Safeguards for overwriting descriptions>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> i am using dataset upsert cli with dataset yml file having descriptions for field. Dues this description go into schema metadata instead of editable schema metadata?
Great question! When you use the DataHub CLI’s dataset upsert
command with a dataset YAML file that includes field (column) descriptions, those descriptions are written to the technical schema—specifically, the SchemaMetadata
aspect—not to EditableSchemaMetadata
.
What does this mean?
- The descriptions you provide in the YAML file are treated as technical metadata, just like what would be ingested from a source system (e.g., Snowflake, BigQuery).
- On the next ingestion from your source (e.g., Snowflake), the
SchemaMetadata
aspect will be overwritten with whatever is present in the source system. If the source has no description for a field, your upserted description will be lost. - User edits (via the UI or via APIs that target
EditableSchemaMetadata
) are stored separately and are preserved across ingestions. The UI always prefers to show the editable description if it exists.
This behavior is confirmed in <Slack Slack thread>, where it is noted that upserting technical metadata (including field descriptions) via the CLI or YAML will be overwritten by subsequent ingestions.
If you want your field descriptions to persist and not be overwritten by ingestion, you should use the API or CLI methods that specifically target EditableSchemaMetadata
(such as the UI, the Python SDK, or the updateDescription
GraphQL mutation).
Would you like more detail on how to programmatically set field descriptions in EditableSchemaMetadata
so they persist across ingestions?
Sources:
- <https://datahubproject.io/docs/0.14.0/generated/metamodel/entities/dataset#editableschemametadata|EditableSchemaMetadata documentation>
- <Slack Slack thread>
0 button
0 button
Hint: Mention @runllm in the thread for followups.