<@U06TM7M40RK> Why is this metadata work unit invalid?
MetadataChangeEventClass({‘auditHeader’: None, ‘proposedSnapshot’: DatasetSnapshotClass({‘urn’: ‘urn:li:dataset:(urn:li:dataPlatform:sqlite,main.orders,PROD)’, ‘aspects’: [StatusClass({‘removed’: False}), DatasetPropertiesClass({‘customProperties’: {}, ‘externalUrl’: None, ‘name’: ‘orders’, ‘qualifiedName’: None, ‘description’: None, ‘uri’: None, ‘created’: None, ‘lastModified’: None, ‘tags’: }), SchemaMetadataClass({‘schemaName’: ‘main.orders’, ‘platform’: ‘urn:li:dataPlatform:sqlite’, ‘version’: 0, ‘created’: AuditStampClass({‘time’: 0, ‘actor’: ‘urn:li:corpuser:unknown’, ‘impersonator’: None, ‘message’: None}), ‘lastModified’: AuditStampClass({‘time’: 0, ‘actor’: ‘urn:li:corpuser:unknown’, ‘impersonator’: None, ‘message’: None}), ‘deleted’: None, ‘dataset’: None, ‘cluster’: None, ‘hash’: ‘’, ‘platformSchema’: MySqlDDLClass({‘tableSchema’: ‘’}), ‘fields’: [SchemaFieldClass({‘fieldPath’: ‘user_id’, ‘jsonPath’: None, ‘nullable’: False, ‘description’: None, ‘label’: None, ‘created’: None, ‘lastModified’: None, ‘type’: SchemaFieldDataTypeClass({‘type’: NumberTypeClass({})}), ‘nativeDataType’: ‘INTEGER’, ‘recursive’: False, ‘globalTags’: None, ‘glossaryTerms’: None, ‘isPartOfKey’: False, ‘isPartitioningKey’: None, ‘jsonProps’: None}), SchemaFieldClass({‘fieldPath’: ‘total’, ‘jsonPath’: None, ‘nullable’: False, ‘description’: None, ‘label’: None, ‘created’: None, ‘lastModified’: None, ‘type’: SchemaFieldDataTypeClass({‘type’: NumberTypeClass({})}), ‘nativeDataType’: ‘DECIMAL(10, 2)’, ‘recursive’: False, ‘globalTags’: None, ‘glossaryTerms’: None, ‘isPartOfKey’: False, ‘isPartitioningKey’: None, ‘jsonProps’: None}), SchemaFieldClass({‘fieldPath’: ‘order_id’, ‘jsonPath’: None, ‘nullable’: False, ‘description’: None, ‘label’: None, ‘created’: None, ‘lastModified’: None, ‘type’: SchemaFieldDataTypeClass({‘type’: NumberTypeClass({})}), ‘nativeDataType’: ‘INTEGER’, ‘recursive’: False, ‘globalTags’: None, ‘glossaryTerms’: None, ‘isPartOfKey’: True, ‘isPartitioningKey’: None, ‘jsonProps’: None})], ‘primaryKeys’: None, ‘foreignKeysSpecs’: None, ‘foreignKeys’: [ForeignKeyConstraintClass({‘name’: None, ‘foreignFields’: [‘urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:sqlite,main.users,PROD),id)’], ‘sourceFields’: [‘urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:sqlite,main.orders,PROD),user_id)’], ‘foreignDataset’: ‘urn:li:dataset:(urn:li:dataPlatform:sqlite,main.users,PROD)’})]})]}), ‘proposedDelta’: None, ‘systemMetadata’: None})
The metadata work unit you provided is invalid due to several potential issues. Let’s break down the possible reasons:
-
AuditStamp Fields: The
created
andlastModified
fields in theAuditStampClass
have atime
value of 0, which indicates missing data. According to the <https://datahubproject.io/docs/0.13.1/generated/metamodel/entities/dataset#schemametadata|SchemaMetadata documentation>, a value of 0 for time indicates missing data, which might not be acceptable depending on the system’s requirements. -
Platform Schema: The
platformSchema
is set toMySqlDDLClass
with an emptytableSchema
. This might be incorrect since the dataset is on thesqlite
platform, and the schema should reflect the actual schema used by the platform. The https://datahubproject.io/docs/0.14.0/python-sdk/models#datahub.metadata.schema_classes.SchemaMetadataClass|SchemaMetadataClass should have a valid platform schema. -
Foreign Key Constraints: The
ForeignKeyConstraintClass
has aname
set toNone
. While this might be acceptable in some cases, it could be a problem if the system expects a name for each foreign key constraint. -
Primary Keys: The
primaryKeys
field is set toNone
. If the dataset has primary keys, they should be specified. The <https://datahubproject.io/docs/0.14.0/generated/metamodel/entities/dataset#schemametadata|SchemaMetadata documentation> indicates that primary keys should be provided if they exist. -
System Metadata: The
systemMetadata
is set toNone
. While this might not be a direct cause of invalidity, it could be an issue if the system expects certain metadata to be present.
Would you like more detailed guidance on any specific part of this process?
Sources:
- <https://datahubproject.io/docs/0.13.1/generated/metamodel/entities/dataset#schemametadata|SchemaMetadata documentation>
- https://datahubproject.io/docs/0.14.0/python-sdk/models#datahub.metadata.schema_classes.SchemaMetadataClass|SchemaMetadataClass
- <https://datahubproject.io/docs/0.14.0/generated/metamodel/entities/dataset#schemametadata|SchemaMetadata documentation>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
Hi <@U07Q7RZU63F>, how did my answer work out for you? Need any more assistance? You can also tag <@U07FV3UHE4D> to let your fellow community members know you’re still looking for guidance.
Hi ! Did this interaction resolve your issue or do you still require assistance ?
Please tag @ RunLLM if you still need further assistance !
Or tag @Datahub Community Support if you require a person to take a look
Hope this helps ! ><