I’d like to ask: Can the tags set on Virtual Datasets and Physical Datasets in Dremio also be ingested into DataHub as part of the viewable metadata through this ingestion feature?
It seems likely that the integration of Dremio with DataHub can include the ingestion of tags for both Virtual and Physical Datasets. According to the <https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/dremio/dremio_source.py|Dremio Source Code>, the Dremio integration extracts metadata for both physical and virtual datasets, which includes schema and column information, lineage, ownership, and glossary terms. This suggests that tags, which are a form of metadata, could also be ingested.
Additionally, the <Dataset | DataHub documentation on adding tags or glossary terms> indicates that tags can be attached to datasets and their fields, which supports the idea that tags from Dremio datasets could be ingested and viewed in DataHub.
Would you like more detailed information on how to set this up or any specific part of the process?
I have another question. In this documentation https://blog.datahubproject.io/introducing-datahub-integration-with-dremio-322198bac697, it mentions that version v0.14.1.12 can seamlessly integrate metadata from Dremio into DataHub. However, after configuring the ingestion, I encountered the following error: ERROR {datahub.entrypoints:218} - Command failed: Failed to find a registered source for type Dremio: 'Did not find a registered class for Dremio'.
Could you please provide some guidance on how to resolve it?
Thank you.
It seems likely that the error you’re encountering is due to a misconfiguration or an issue with the DataHub version you are using. The error message indicates that the system is unable to find a registered source for the type “Dremio,” which suggests that the Dremio source might not be properly set up or recognized in your DataHub instance.
To address this, you might want to check the following:
DataHub Version: Ensure that you are using a version of DataHub that supports Dremio integration. The documentation you referred to mentions version v0.14.1.12, so make sure your DataHub instance is at least this version or newer.
Plugin Installation: Ensure that the Dremio plugin is installed and properly registered in your DataHub environment. This might involve updating your DataHub installation or ensuring that the necessary Python packages are installed.
Logs and Debugging: Check the logs for any additional error messages or stack traces that might provide more context on why the Dremio source is not being recognized.
If you need more detailed guidance on any of these steps, please let me know!
To enable Dremio ingestion in DataHub, you need to install the appropriate Python package that includes the Dremio source plugin. According to the <https://github.com/datahub-project/datahub/blob/master/docs/cli.md|DataHub CLI documentation>, you can install the Dremio plugin using the following command:
pip install 'acryl-datahub[dremio]'
This command will install the necessary dependencies for the Dremio source, allowing you to configure and run metadata ingestion from Dremio into DataHub.
Would you like more detailed instructions on setting up the ingestion process or any other aspect of the integration?