Resolving DataHub Unity Catalog Plugin Initialization Error

Original Slack Thread

<@U06TM7M40RK> datahub.ingestion.run.pipeline.PipelineInitError: Failed to find a registered source for type unity-catalog: unity-catalog is disabled due to an error in initialization

The error message “Failed to find a registered source for type unity-catalog: unity-catalog is disabled due to an error in initialization” typically indicates an issue with the unity-catalog plugin or a dependency conflict. Here are some steps to troubleshoot and resolve the issue:

  1. Verify Plugin Installation: Ensure that the acryl-datahub[unity-catalog] plugin is correctly installed. You can verify this by running:

    pip show acryl-datahub[unity-catalog]
    
  2. Check for Dependency Conflicts: The error might be due to a conflict with the pydantic library or another dependency. Make sure all dependencies are up-to-date and compatible. If you encounter issues with pydantic, consider pinning it to a compatible version:

    pip install 'pydantic<2'
    
  3. Reinstall Dependencies: Try reinstalling the dependencies to ensure there are no corrupted installations:

    pip uninstall acryl-datahub[unity-catalog]
    pip install acryl-datahub[unity-catalog]
    
  4. Check DataHub Plugins: Run the following command to check the installed plugins and their statuses:

    datahub check plugins --verbose
    
  5. Review Configuration: Double-check your ingestion configuration to ensure there are no issues. Refer to the <Databricks | DataHub Unity Catalog documentation> for the correct setup.

  6. Databricks SDK Version: If you are using a version of the Databricks SDK that is incompatible with the DataHub plugin, you might need to downgrade it. For example, the databricks-sdk==0.30.0 version removed ExecuteStatementResponse, which is expected by the plugin. You can resolve this by installing a compatible version:

    pip install 'databricks-sdk<0.30.0'
    

Would you like more detailed steps on any of these troubleshooting points or further assistance on a specific part of the process?

Sources:

<@U06TM7M40RK> My databricks version is 0.1.8, is this datahub compatible with databricks sdk 0.1.8?

I’m not able to find any data sources to answer your question. You might want to specify the DataHub version you are using to check compatibility with Databricks SDK 0.1.8. If this doesn’t help, please reach out to the DataHub support team for further assistance. :+1: 0 button :-1: 0 button :bulb: Hint: Mention @runllm in the thread for followups.

<@U06TM7M40RK> DataHub version you are using to check compatibility with Databricks SDK 0.1.8

I’m not able to find any data sources to answer your question. You might want to provide details about the specific features or integrations you are using with DataHub to better assess compatibility. If this doesn’t help, please reach out to the DataHub support team for further assistance. :+1: 0 button :-1: 0 button :bulb: Hint: Mention @runllm in the thread for followups.

<@U06TM7M40RK> which Datahub version is compatible with Databricks SDK 0.1.8

I’m not able to find any data sources to answer your question. You might want to specify any specific features or plugins you are using with DataHub that require compatibility with Databricks SDK 0.1.8. If this doesn’t help, please reach out to the DataHub support team for further assistance. :+1: 0 button :-1: 0 button :bulb: Hint: Mention @runllm in the thread for followups.

<@U06TM7M40RK> What are the Unity Catalog source dependencies that need to be installed?

To use the Unity Catalog source with DataHub, you need to ensure that the necessary dependencies are installed. According to the <https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/setup.py|DataHub setup.py file>, the dependencies for the Unity Catalog source are as follows:

  • databricks
  • sql_common
  • sqllineage_lib

These dependencies are part of the acryl-datahub[unity-catalog] package. You can install them by running:

pip install 'acryl-datahub[unity-catalog]'

This command will install the required dependencies for the Unity Catalog source.

Would you like more detailed steps on how to set up the Unity Catalog source, or further assistance on a specific part of the process?

Sources:

<@U06TM7M40RK> What are the setup steps to install Unity Catalog source in Datahub?

To set up the Unity Catalog source in DataHub, you need to follow these steps:

  1. Install the Unity Catalog Plugin: You need to install the acryl-datahub[unity-catalog] package, which includes the necessary dependencies for the Unity Catalog source. You can do this by running the following command:

    pip install 'acryl-datahub[unity-catalog]'
    
  2. Configure the Ingestion Recipe: Create a configuration file (YAML format) for the Unity Catalog source. This file should include your Databricks workspace URL, personal access token, and other necessary configurations. Here is an example configuration:

    source:
      type: unity-catalog
      config:
        workspace_url: 'https://your-databricks-workspace-url'
        token: 'your-databricks-token'
        include_hive_metastore: false
        include_ownership: true
        profiling:
          method: "ge"
          enabled: true
          warehouse_id: 'your-warehouse-id'
    
  3. Set Up Permissions: Ensure that your Databricks service principal has the necessary permissions. According to the <Databricks | DataHub documentation>, the service principal must have:

    • USE CATALOG privilege on any catalogs you want to ingest.
    • USE SCHEMA privilege on any schemas you want to ingest.
    • SELECT privilege on any tables and views you want to ingest.
    • CAN_MANAGE permissions on any SQL Warehouses you want to ingest.
  4. Run the Ingestion: Use the DataHub CLI to run the ingestion process with your configuration file. You can do this by executing:

    datahub ingest -c your-config-file.yaml
    

These steps will help you set up the Unity Catalog source in DataHub. If you encounter any issues, ensure that all dependencies are correctly installed and that your configuration file is properly set up.

Would you like more detailed steps on any of these points or further assistance with a specific part of the process?

Sources:

  • <Databricks | DataHub Unity Catalog Prerequisites> :+1: 0 button :-1: 0 button :bulb: Hint: Mention @runllm in the thread for followups.

<@U07FV3UHE4D> Can anyone help me with “datahub.ingestion.run.pipeline.PipelineInitError: Failed to find a registered source for type unity-catalog: unity-catalog is disabled due to an error in initialization” issue?

<@U07FV3UHE4D> Inspite of following all the necessary steps provided, and installing every package, we are still facing the same issue while ingesting databricks metadata into Datahub

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><

<@U07FV3UHE4D> Please look into my issue as early as possible

Thank you for letting us know you need a person to take a look at this thread.

Either a community member or one of our team members will try to assist when they have a moment, responses unfortunately might not be immediate. We hope this helps speed things along :slight_smile: