Analyzing Invalid Metadata Work Unit Issues

Original Slack Thread

<@U06TM7M40RK> Why is this metadata work unit invalid?
MetadataChangeEventClass({‘auditHeader’: None, ‘proposedSnapshot’: DatasetSnapshotClass({‘urn’: ‘urn:li:dataset:(urn:li:dataPlatform:sqlite,main.orders,PROD)’, ‘aspects’: [StatusClass({‘removed’: False}), DatasetPropertiesClass({‘customProperties’: {}, ‘externalUrl’: None, ‘name’: ‘orders’, ‘qualifiedName’: None, ‘description’: None, ‘uri’: None, ‘created’: None, ‘lastModified’: None, ‘tags’: }), SchemaMetadataClass({‘schemaName’: ‘main.orders’, ‘platform’: ‘urn:li:dataPlatform:sqlite’, ‘version’: 0, ‘created’: AuditStampClass({‘time’: 0, ‘actor’: ‘urn:li:corpuser:unknown’, ‘impersonator’: None, ‘message’: None}), ‘lastModified’: AuditStampClass({‘time’: 0, ‘actor’: ‘urn:li:corpuser:unknown’, ‘impersonator’: None, ‘message’: None}), ‘deleted’: None, ‘dataset’: None, ‘cluster’: None, ‘hash’: ‘’, ‘platformSchema’: MySqlDDLClass({‘tableSchema’: ‘’}), ‘fields’: [SchemaFieldClass({‘fieldPath’: ‘user_id’, ‘jsonPath’: None, ‘nullable’: False, ‘description’: None, ‘label’: None, ‘created’: None, ‘lastModified’: None, ‘type’: SchemaFieldDataTypeClass({‘type’: NumberTypeClass({})}), ‘nativeDataType’: ‘INTEGER’, ‘recursive’: False, ‘globalTags’: None, ‘glossaryTerms’: None, ‘isPartOfKey’: False, ‘isPartitioningKey’: None, ‘jsonProps’: None}), SchemaFieldClass({‘fieldPath’: ‘total’, ‘jsonPath’: None, ‘nullable’: False, ‘description’: None, ‘label’: None, ‘created’: None, ‘lastModified’: None, ‘type’: SchemaFieldDataTypeClass({‘type’: NumberTypeClass({})}), ‘nativeDataType’: ‘DECIMAL(10, 2)’, ‘recursive’: False, ‘globalTags’: None, ‘glossaryTerms’: None, ‘isPartOfKey’: False, ‘isPartitioningKey’: None, ‘jsonProps’: None}), SchemaFieldClass({‘fieldPath’: ‘order_id’, ‘jsonPath’: None, ‘nullable’: False, ‘description’: None, ‘label’: None, ‘created’: None, ‘lastModified’: None, ‘type’: SchemaFieldDataTypeClass({‘type’: NumberTypeClass({})}), ‘nativeDataType’: ‘INTEGER’, ‘recursive’: False, ‘globalTags’: None, ‘glossaryTerms’: None, ‘isPartOfKey’: True, ‘isPartitioningKey’: None, ‘jsonProps’: None})], ‘primaryKeys’: None, ‘foreignKeysSpecs’: None, ‘foreignKeys’: [ForeignKeyConstraintClass({‘name’: None, ‘foreignFields’: [‘urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:sqlite,main.users,PROD),id)’], ‘sourceFields’: [‘urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:sqlite,main.orders,PROD),user_id)’], ‘foreignDataset’: ‘urn:li:dataset:(urn:li:dataPlatform:sqlite,main.users,PROD)’})]})]}), ‘proposedDelta’: None, ‘systemMetadata’: None})

The metadata work unit you provided is invalid due to several potential issues. Let’s break down the possible reasons:

  1. AuditStamp Fields: The created and lastModified fields in the AuditStampClass have a time value of 0, which indicates missing data. According to the <https://datahubproject.io/docs/0.13.1/generated/metamodel/entities/dataset#schemametadata|SchemaMetadata documentation>, a value of 0 for time indicates missing data, which might not be acceptable depending on the system’s requirements.

  2. Platform Schema: The platformSchema is set to MySqlDDLClass with an empty tableSchema. This might be incorrect since the dataset is on the sqlite platform, and the schema should reflect the actual schema used by the platform. The https://datahubproject.io/docs/0.14.0/python-sdk/models#datahub.metadata.schema_classes.SchemaMetadataClass|SchemaMetadataClass should have a valid platform schema.

  3. Foreign Key Constraints: The ForeignKeyConstraintClass has a name set to None. While this might be acceptable in some cases, it could be a problem if the system expects a name for each foreign key constraint.

  4. Primary Keys: The primaryKeys field is set to None. If the dataset has primary keys, they should be specified. The <https://datahubproject.io/docs/0.14.0/generated/metamodel/entities/dataset#schemametadata|SchemaMetadata documentation> indicates that primary keys should be provided if they exist.

  5. System Metadata: The systemMetadata is set to None. While this might not be a direct cause of invalidity, it could be an issue if the system expects certain metadata to be present.

Would you like more detailed guidance on any specific part of this process?

Sources:

Hi <@U07Q7RZU63F>, how did my answer work out for you? Need any more assistance? You can also tag <@U07FV3UHE4D> to let your fellow community members know you’re still looking for guidance.

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><