I am running superset v4.0.2 and v0.14.0 of Datahub. The ingestion seems to work with the following config:
type: superset
config:
connect_uri: '<http://host:8088>'
username: username
password: password```
However while dashboards and charts are imported, datasets are not (see screenshot).
Is that integration supported? The datasets are Trino backed.![attachment](https://files.slack.com/files-pri/TUMKD5EGJ-F07KT8KDM16/image.png)
To address your issue with Superset ingestion where dashboards and charts are imported but datasets are not, let’s break down the possible causes and solutions.
Supported Integration
According to the <https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/superset|Superset ingestion documentation>, the Superset source should support the ingestion of charts, dashboards, and associated metadata. However, there have been known issues and ongoing improvements related to dataset ingestion.
Troubleshooting Steps
Verify API Endpoints:
Ensure that your Superset instance is exposing the necessary API endpoints. The Superset connector extracts datasets by contacting the /api/v1/dataset/ endpoint. You can manually check this endpoint to see if it returns the expected data.
Check Permissions:
Ensure that the user account used for the ingestion has the necessary permissions to access the datasets. There have been instances where permissions issues prevented the ingestion from retrieving the data.
Enable Debug Logging:
Run the ingestion with the --debug flag to get detailed logs. This can help identify any issues during the ingestion process.
datahub --debug ingest run -c <recipe-path> &> /tmp/superset-ingest.log
Inspect the Payload:
Add logging to inspect the payload being sent to DataHub. This can help determine if the payload is empty or if there are any issues with the data being retrieved from Superset.
Check Stateful Ingestion Configuration:
Ensure that your stateful ingestion configuration is correctly set up. Verify that the stateful_ingestion settings are properly configured in your ingestion recipe.
Example Configuration
Here is an example of an ingestion recipe for Superset with stateful ingestion enabled:
Known Issues and Updates
There have been updates and discussions regarding the ingestion of datasets from Superset. For instance, a <https://github.com/datahub-project/datahub/pull/10592|GitHub pull request> introduces support for the Datasets entity in the Superset data ingestion plugin, enhancing its integration capabilities with DataHub. This might be relevant to your issue.
Would you like more detailed steps or further assistance on any specific part of the process?