<@U06TM7M40RK> I was attempting to ingest a SQLite database using SQLAlchemy and got the following error. Can you explain what went wrong and how to fix it?
Source produced bad metadata => main.ordersSource produced bad metadata => main.orders: source produced an invalid metadata work unit: MetadataChangeEventClass({‘auditHeader’: None, ‘proposedSnapshot’: DatasetSnapshotClass({‘urn’: ‘urn:li:dataset:(urn:li:dataPlatform:sqlite,main.orders,PROD)’, ‘aspects’: [StatusClass({‘removed’: False}), DatasetPropertiesClass({‘customProperties’: {}, ‘externalUrl’: None, ‘name’: ‘orders’, ‘qualifiedName’: None, ‘description’: None, ‘uri’: None, ‘created’: None, ‘lastModified’: None, ‘tags’: }), SchemaMetadataClass({‘schemaName’: ‘main.orders’, ‘platform’: ‘urn:li:dataPlatform:sqlite’, ‘version’: 0, ‘created’: AuditStampClass({‘time’: 0, ‘actor’: ‘urn:li:corpuser:unknown’, ‘impersonator’: None, ‘message’: None}), ‘lastModified’: AuditStampClass({‘time’: 0, ‘actor’: ‘urn:li:corpuser:unknown’, ‘impersonator’: None, ‘message’: None}), ‘deleted’: None, ‘dataset’: None, ‘cluster’: None, ‘hash’: ‘’, ‘platformSchema’: MySqlDDLClass({‘tableSchema’: ‘’}), ‘fields’: [SchemaFieldClass({‘fieldPath’: ‘user_id’, ‘jsonPath’: None, ‘nullable’: False, ‘description’: None, ‘label’: None, ‘created’: None, ‘lastModified’: None, ‘type’: SchemaFieldDataTypeClass({‘type’: NumberTypeClass({})}), ‘nativeDataType’: ‘INTEGER’, ‘recursive’: False, ‘globalTags’: None, ‘glossaryTerms’: None, ‘isPartOfKey’: False, ‘isPartitioningKey’: None, ‘jsonProps’: None}), SchemaFieldClass({‘fieldPath’: ‘total’, ‘jsonPath’: None, ‘nullable’: False, ‘description’: None, ‘label’: None, ‘created’: None, ‘lastModified’: None, ‘type’: SchemaFieldDataTypeClass({‘type’: NumberTypeClass({})}), ‘nativeDataType’: ‘DECIMAL(10, 2)’, ‘recursive’: False, ‘globalTags’: None, ‘glossaryTerms’: None, ‘isPartOfKey’: False, ‘isPartitioningKey’: None, ‘jsonProps’: None}), SchemaFieldClass({‘fieldPath’: ‘order_id’, ‘jsonPath’: None, ‘nullable’: False, ‘description’: None, ‘label’: None, ‘created’: None, ‘lastModified’: None, ‘type’: SchemaFieldDataTypeClass({‘type’: NumberTypeClass({})}), ‘nativeDataType’: ‘INTEGER’, ‘recursive’: False, ‘globalTags’: None, ‘glossaryTerms’: None, ‘isPartOfKey’: True, ‘isPartitioningKey’: None, ‘jsonProps’: None})], ‘primaryKeys’: None, ‘foreignKeysSpecs’: None, ‘foreignKeys’: [ForeignKeyConstraintClass({‘name’: None, ‘foreignFields’: [‘urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:sqlite,main.users,PROD),id)’], ‘sourceFields’: [‘urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:sqlite,main.orders,PROD),user_id)’], ‘foreignDataset’: ‘urn:li:dataset:(urn:li:dataPlatform:sqlite,main.users,PROD)’})]})]}), ‘proposedDelta’: None, ‘systemMetadata’: None})
The error message you encountered indicates that there was an issue with the metadata work unit produced during the ingestion of your SQLite database using SQLAlchemy. Specifically, the error message states that the source produced an invalid metadata work unit, which is a MetadataChangeEventClass
.
Based on the provided error message, here are some potential issues and solutions:
-
Schema Definition: The
platformSchema
is defined asMySqlDDLClass
with an emptytableSchema
. This might be incorrect for a SQLite database. Ensure that the schema class used is appropriate for SQLite and that thetableSchema
is correctly populated. -
AuditStampClass: The
created
andlastModified
fields inAuditStampClass
have atime
value of 0. This might indicate missing or incorrect timestamp data. Ensure that these fields are populated with valid timestamps. -
Foreign Key Constraints: The foreign key constraints are defined, but ensure that the
foreignFields
andsourceFields
are correctly specified and that the referenced datasets and fields exist. -
Field Definitions: Check that the
SchemaFieldClass
instances are correctly defined, including thefieldPath
,type
, andnativeDataType
. Ensure that the data types are compatible with SQLite. -
Primary Keys: The
primaryKeys
field isNone
. If there are primary keys in your table, ensure they are correctly specified.
To resolve these issues, you should:
- Verify the schema and data types used in your SQLite database and ensure they are correctly represented in the metadata.
- Ensure that all required fields, such as timestamps and primary keys, are correctly populated.
- Check the foreign key constraints to ensure they reference existing datasets and fields.
If you need more detailed guidance on any specific part of this process, please let me know.
Sources:
- https://datahubspace.slack.com/archives/C029A3M079U/p1723647393.963559|https://datahubspace.slack.com/archives/C029A3M079U/p1723647393.963559
- https://datahubspace.slack.com/archives/CUMUWQU66/p1704230567.357089|https://datahubspace.slack.com/archives/CUMUWQU66/p1704230567.357089
0 button
0 button
Hint: Mention @runllm in the thread for followups.
Hi <@U07Q7RZU63F>, how did my answer work out for you? Need any more assistance? You can also tag <@U07FV3UHE4D> to let your fellow community members know you’re still looking for guidance.
Hi ! Did this interaction resolve your issue or do you still require assistance ?
Please tag @ RunLLM if you still need further assistance !
Or tag @Datahub Community Support if you require a person to take a look
Hope this helps ! ><