Troubleshooting SQLite database ingestion error in SQLAlchemy

Original Slack Thread

<@U06TM7M40RK> I was attempting to ingest a SQLite database using SQLAlchemy and got the following error. Can you explain what went wrong and how to fix it?
Source produced bad metadata => main.ordersSource produced bad metadata => main.orders: source produced an invalid metadata work unit: MetadataChangeEventClass({‘auditHeader’: None, ‘proposedSnapshot’: DatasetSnapshotClass({‘urn’: ‘urn:li:dataset:(urn:li:dataPlatform:sqlite,main.orders,PROD)’, ‘aspects’: [StatusClass({‘removed’: False}), DatasetPropertiesClass({‘customProperties’: {}, ‘externalUrl’: None, ‘name’: ‘orders’, ‘qualifiedName’: None, ‘description’: None, ‘uri’: None, ‘created’: None, ‘lastModified’: None, ‘tags’: }), SchemaMetadataClass({‘schemaName’: ‘main.orders’, ‘platform’: ‘urn:li:dataPlatform:sqlite’, ‘version’: 0, ‘created’: AuditStampClass({‘time’: 0, ‘actor’: ‘urn:li:corpuser:unknown’, ‘impersonator’: None, ‘message’: None}), ‘lastModified’: AuditStampClass({‘time’: 0, ‘actor’: ‘urn:li:corpuser:unknown’, ‘impersonator’: None, ‘message’: None}), ‘deleted’: None, ‘dataset’: None, ‘cluster’: None, ‘hash’: ‘’, ‘platformSchema’: MySqlDDLClass({‘tableSchema’: ‘’}), ‘fields’: [SchemaFieldClass({‘fieldPath’: ‘user_id’, ‘jsonPath’: None, ‘nullable’: False, ‘description’: None, ‘label’: None, ‘created’: None, ‘lastModified’: None, ‘type’: SchemaFieldDataTypeClass({‘type’: NumberTypeClass({})}), ‘nativeDataType’: ‘INTEGER’, ‘recursive’: False, ‘globalTags’: None, ‘glossaryTerms’: None, ‘isPartOfKey’: False, ‘isPartitioningKey’: None, ‘jsonProps’: None}), SchemaFieldClass({‘fieldPath’: ‘total’, ‘jsonPath’: None, ‘nullable’: False, ‘description’: None, ‘label’: None, ‘created’: None, ‘lastModified’: None, ‘type’: SchemaFieldDataTypeClass({‘type’: NumberTypeClass({})}), ‘nativeDataType’: ‘DECIMAL(10, 2)’, ‘recursive’: False, ‘globalTags’: None, ‘glossaryTerms’: None, ‘isPartOfKey’: False, ‘isPartitioningKey’: None, ‘jsonProps’: None}), SchemaFieldClass({‘fieldPath’: ‘order_id’, ‘jsonPath’: None, ‘nullable’: False, ‘description’: None, ‘label’: None, ‘created’: None, ‘lastModified’: None, ‘type’: SchemaFieldDataTypeClass({‘type’: NumberTypeClass({})}), ‘nativeDataType’: ‘INTEGER’, ‘recursive’: False, ‘globalTags’: None, ‘glossaryTerms’: None, ‘isPartOfKey’: True, ‘isPartitioningKey’: None, ‘jsonProps’: None})], ‘primaryKeys’: None, ‘foreignKeysSpecs’: None, ‘foreignKeys’: [ForeignKeyConstraintClass({‘name’: None, ‘foreignFields’: [‘urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:sqlite,main.users,PROD),id)’], ‘sourceFields’: [‘urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:sqlite,main.orders,PROD),user_id)’], ‘foreignDataset’: ‘urn:li:dataset:(urn:li:dataPlatform:sqlite,main.users,PROD)’})]})]}), ‘proposedDelta’: None, ‘systemMetadata’: None})

The error message you encountered indicates that there was an issue with the metadata work unit produced during the ingestion of your SQLite database using SQLAlchemy. Specifically, the error message states that the source produced an invalid metadata work unit, which is a MetadataChangeEventClass.

Based on the provided error message, here are some potential issues and solutions:

  1. Schema Definition: The platformSchema is defined as MySqlDDLClass with an empty tableSchema. This might be incorrect for a SQLite database. Ensure that the schema class used is appropriate for SQLite and that the tableSchema is correctly populated.

  2. AuditStampClass: The created and lastModified fields in AuditStampClass have a time value of 0. This might indicate missing or incorrect timestamp data. Ensure that these fields are populated with valid timestamps.

  3. Foreign Key Constraints: The foreign key constraints are defined, but ensure that the foreignFields and sourceFields are correctly specified and that the referenced datasets and fields exist.

  4. Field Definitions: Check that the SchemaFieldClass instances are correctly defined, including the fieldPath, type, and nativeDataType. Ensure that the data types are compatible with SQLite.

  5. Primary Keys: The primaryKeys field is None. If there are primary keys in your table, ensure they are correctly specified.

To resolve these issues, you should:

  • Verify the schema and data types used in your SQLite database and ensure they are correctly represented in the metadata.
  • Ensure that all required fields, such as timestamps and primary keys, are correctly populated.
  • Check the foreign key constraints to ensure they reference existing datasets and fields.

If you need more detailed guidance on any specific part of this process, please let me know.

Sources:

Hi <@U07Q7RZU63F>, how did my answer work out for you? Need any more assistance? You can also tag <@U07FV3UHE4D> to let your fellow community members know you’re still looking for guidance.

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><