<@U06TM7M40RK> I am trying to do ingestion from mssql source to datahub. using stateful ingestion, firstly, I ingested 50 datasets, but now i have only 49 datasets in source, so reingesting using stateful ingestion should have only 49 datasets in the datahub but still 50 datasets are there. i get this error.
<class ‘datahub.configuration.common.OperationalError’>: (‘Unable to emit metadata to DataHub GMS: User urn:li:corpuser:wmdg_sql39_dev_metaai is unauthorized to modify entity: HttpStatus: 403 Urn: urn:li:dataJob:(urn:li:dataFlow:(datahub,CareSight_Dev2,prod),mssql_stale_entity_removal)’, {‘exceptionClass’: ‘com.linkedin.restli.server.RestLiServiceException’, ‘message’: ‘User urn:li:corpuser:wmdg_sql39_dev_metaai is unauthorized to modify entity: HttpStatus: 403 Urn: urn:li:dataJob:(urn:li:dataFlow:(datahub,CareSight_Dev2,prod),mssql_stale_entity_removal)’, ‘status’: 403})\\
The issue you’re encountering seems to be related to authorization problems when using stateful ingestion with DataHub. Specifically, the error message indicates that the user urn:li:corpuser:wmdg_sql39_dev_metaai
is unauthorized to modify certain entities, which is causing the stateful ingestion process to fail in removing stale datasets.
Here are some steps and considerations based on the available data sources:
-
Stateful Ingestion Configuration: Ensure that your stateful ingestion is properly configured. According to the <https://datahubproject.io/docs/0.13.1/metadata-ingestion/docs/dev_guides/stateful|Stateful Ingestion documentation>, you need to enable stateful ingestion and set
remove_stale_metadata
totrue
in your ingestion recipe. This configuration allows DataHub to automatically soft-delete datasets that are no longer present in the source. -
Authorization Issues: The error message indicates a 403 status, which means the user does not have the necessary permissions to perform the operation. You need to ensure that the user has the appropriate privileges. According to the <https://datahubproject.io/docs/0.14.0/authorization/policies#product-features|Product Features documentation>, you may need to grant the user the “Manage Metadata Ingestion” privilege or similar permissions to allow them to modify entities.
-
Pipeline Name Consistency: Ensure that the
pipeline_name
is consistent across runs. If thepipeline_name
is changed, the previous state will not be available for comparison, which can affect the stateful ingestion process. -
Soft Deletion: If you have already ingested datasets without stateful ingestion and then deleted some tables, you may need to perform a one-time soft delete of all entities for the sources. This can be done using the following command:
datahub delete --platform your_data_source --soft
After performing the soft delete, run the ingestion with stateful ingestion enabled. The real existing entities will reappear on the UI, and the deleted ones will remain soft-deleted.
Would you like more detailed steps or further assistance on this?
Sources:
- <https://datahubproject.io/docs/0.13.1/metadata-ingestion/docs/dev_guides/stateful|Stateful Ingestion documentation>
- <https://datahubproject.io/docs/0.14.0/authorization/policies#product-features|Product Features documentation>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
Hi <@U08MVA5MLAW>, how did my answer work out for you? Need any more assistance? You can also tag <@U07FV3UHE4D> to let your fellow community members know you’re still looking for guidance.