Troubleshooting PowerBI Admin API Permissions and Lineage Issues

Original Slack Thread

hello <@U06TM7M40RK> I see “Dataset lineage can not be ingestion because this user does not have access to the PowerBI Admin API.” errors. the group of the datahub principal is already part of the groups that can use Fabric API but the service principal is already.

The error message “Dataset lineage can not be ingestion because this user does not have access to the PowerBI Admin API” typically indicates an issue with permissions related to the PowerBI Admin API. Here are some steps and considerations to address this issue:

  1. Verify Admin API Permissions: Ensure that the service principal has the necessary permissions to use the PowerBI Admin API. According to the <https://datahubproject.io/docs/0.14.0/quick-ingestion-guides/powerbi/setup|PowerBI Setup Guide>, you need to enable the following settings in the PowerBI Admin Portal:

    • Allow service principals to use read-only admin APIs.
    • Enhance admin APIs responses with detailed metadata.
    • Enhance admin APIs responses with DAX and mashup expressions.
  2. Check Security Group Configuration: Make sure that the security group containing your Azure AD app is correctly configured in the PowerBI portal. This includes adding the security group to the specific settings that allow access to the PowerBI APIs, as described in the <https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/powerbi#admin-ingestion-vs-basic-ingestion|PowerBI Ingestion Guide>.

  3. Service Principal as Member in Workspace: Ensure that the service principal is added as a member in the workspace you are trying to ingest. This is crucial for the ingestion process to work correctly, as mentioned in the <https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/docs/sources/powerbi/powerbi_pre.md|PowerBI Ingestion Documentation>.

  4. Review Configuration: Double-check your ingestion configuration to ensure that it is set up correctly. The admin_apis_only setting should be configured according to your needs. If you are using the Admin API, ensure that this setting is enabled.

  5. Troubleshooting Steps: If the issue persists, consider reviewing the detailed logs for any additional error messages or warnings that might provide more context. You can enable debug logging to capture more detailed information about the ingestion process.

Would you like more detailed guidance on any of these steps or additional troubleshooting tips?

Sources:

Hi <@U02RTMUBPA7>, how did my answer work out for you? Need any more assistance? You can also tag <@U07FV3UHE4D> to let your fellow community members know you’re still looking for guidance.

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><

<@U06TM7M40RK> what does it mean “DataAccess function is not present in M-Query expression” in the powerbi ingest logs?

The message “DataAccess function is not present in M-Query expression” in the PowerBI ingest logs indicates that the M-Query expression being processed does not contain any recognized data-access functions. This is important because DataHub relies on these functions to extract lineage information from the M-Query.

Here’s a breakdown of what this means and how you might address it:

  1. DataAccess Functions: These are specific functions within an M-Query that interact with data sources, such as PostgreSQL.Database, Snowflake.Databases, or similar functions for other databases. These functions are crucial for identifying the data sources and tables involved in the query, which is necessary for lineage extraction.

  2. Validation Process: According to the https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/powerbi/m_query/validator.py|validator.py, the validation process checks if any of the supported data-access functions are present in the M-Query expression. If none are found, the validation fails, and the message is logged.

  3. Possible Causes:

    • The M-Query might be using a custom or unsupported function to access data.
    • The query might be purely computational or transformation-based without direct data source access.
    • There could be a syntax issue or formatting problem in the M-Query that prevents the function from being recognized.
  4. Next Steps:

    • Review the M-Query to ensure it includes supported data-access functions.
    • Check for any syntax errors or unusual formatting that might affect parsing.
    • If using custom functions, consider modifying the query to use standard data-access functions or consult the documentation to see if custom functions can be supported.

Would you like more detailed steps on how to modify your M-Query or further assistance on this topic?

Sources:

<@U06TM7M40RK> what is “_server_to_platform_instance.key._metastore” exactly in powerbi config?

The _server_to_platform_instance.key._metastore in the PowerBI configuration refers to a specific configuration option used to map a PowerBI datasource’s server to a DataHub platform instance, particularly when dealing with Databricks Unity Catalog.

Here’s a breakdown of what this configuration option means:

  1. Purpose: The server_to_platform_instance mapping is used to associate a PowerBI datasource’s server (e.g., host and optional port) with a specific DataHub platform instance. This is crucial for ensuring that the lineage and metadata are correctly attributed to the right platform instance within DataHub.

  2. Key Components:

    • key: This represents the server address of the PowerBI datasource, typically in the format host[:port]. The port is optional and only needed if your datasource server is running on a non-standard port.
    • _metastore: This is an additional field used specifically for Databricks connectors. It specifies the Databricks Unity Catalog metastore name, which is necessary for generating the dataset URN (Uniform Resource Name) in DataHub.
  3. Usage: This configuration is particularly relevant when you are using Databricks as a data source in PowerBI and want to ensure that the metadata ingested into DataHub is correctly associated with the Databricks platform instance.

  4. Example Configuration:

    server_to_platform_instance:
      'databricks-server:443':
        platform_instance: 'databricks_instance'
        env: 'PROD'
        _metastore: 'my_metastore'
    

This configuration ensures that any data accessed from the specified Databricks server is correctly mapped to the databricks_instance platform instance in the PROD environment, using the specified metastore.

Would you like more detailed information on how to configure this or any other aspect of the PowerBI ingestion process?

Sources:

<@U06TM7M40RK> what is platform_instance in this config?

The platform_instance in the PowerBI configuration is a parameter used to specify a particular instance of a data platform within DataHub. This is important for distinguishing between multiple instances of the same platform, especially when they exist in different environments or have different configurations.

Here’s a detailed explanation of platform_instance:

  1. Purpose: The platform_instance parameter helps uniquely identify datasets that belong to a specific instance of a data platform. This is particularly useful in organizations that have multiple instances of the same platform (e.g., multiple Snowflake or Databricks instances) and need to manage metadata for each instance separately.

  2. Configuration: In the context of PowerBI, the platform_instance is used in conjunction with the server_to_platform_instance mapping. This mapping associates a PowerBI datasource’s server with a specific platform instance in DataHub.

  3. Example Usage:

    server_to_platform_instance:
      'databricks-server:443':
        platform_instance: 'databricks_instance'
        env: 'PROD'
    

    In this example, the platform_instance is set to 'databricks_instance', which means that any data accessed from the specified Databricks server will be associated with this particular instance in DataHub.

  4. Benefits: Using platform_instance allows for more granular control and organization of metadata. It ensures that lineage and metadata are correctly attributed to the right instance, which is crucial for accurate data governance and observability.

Would you like more detailed guidance on how to configure platform_instance or any other specific aspect of the PowerBI ingestion process?

Sources:

<@U06TM7M40RK> In the powerbi ingestion logs, I see “Databricks.Catalogs(Databricks_Server” but i don’t see any definition for Databricks_Server. how datahub maps that information to the databricks tble and builds the lineage? Currently my ineage does not work between the powerbi and databricks

It seems likely that the issue you’re encountering with the PowerBI ingestion logs and the “Databricks.Catalogs(Databricks_Server” message is related to how DataHub maps the server information to build lineage between PowerBI and Databricks. Here’s a breakdown of how this mapping typically works and some steps to troubleshoot the issue:

  1. Server Mapping: In DataHub, the server_to_platform_instance configuration is used to map a PowerBI datasource’s server to a specific DataHub platform instance. This mapping is crucial for establishing the correct lineage between PowerBI and Databricks.

  2. Configuration Example:

    server_to_platform_instance:
      'Databricks_Server':
        platform_instance: 'databricks_instance'
        env: 'PROD'
    

    In this configuration, 'Databricks_Server' should match the server name used in your PowerBI M-Query. The platform_instance specifies which Databricks instance this server corresponds to in DataHub.

  3. Troubleshooting Steps:

    • Verify Server Name: Ensure that the server name in your PowerBI M-Query (Databricks_Server) matches exactly with the key used in your server_to_platform_instance configuration. This includes checking for any case sensitivity issues.
    • Check Configuration: Double-check your PowerBI ingestion configuration to ensure that the server_to_platform_instance mapping is correctly set up. This mapping should include the correct server name and platform instance.
    • Enable Lineage Extraction: Make sure that lineage extraction is enabled in your PowerBI ingestion configuration by setting the extract_lineage parameter to true.
  4. Example Configuration:

    source:
      type: powerbi
      config:
        tenant_id: 'your-tenant-id'
        client_id: 'your-client-id'
        client_secret: 'your-client-secret'
        extract_lineage: true
        server_to_platform_instance:
          'Databricks_Server':
            platform_instance: 'databricks_instance'
            env: 'PROD'
    
  5. Review Logs: If the issue persists, review the ingestion logs for any additional error messages or warnings that might provide more context. Enabling debug logging can help capture more detailed information about the ingestion process.

Would you like more detailed guidance on any of these steps or further assistance with your configuration?

Sources:

<@U06TM7M40RK> how can I delete all tables in a databricks schema

To delete all tables in a Databricks schema using DataHub, you can use the DataHub CLI to perform a recursive delete on the container that represents the schema. Here’s a step-by-step guide on how to achieve this:

  1. Identify the URN of the Databricks Schema: You need to find the URN of the Databricks schema container. This can be done by navigating to the relevant schema in the DataHub UI and copying the URN.

  2. Use the datahub delete Command: Use the datahub delete command with the --urn option to specify the Databricks schema container URN and the --recursive flag to delete all contained tables.

  3. Perform a Hard Delete: Add the --hard flag to ensure that the deletion is a hard delete, which physically deletes all rows for all aspects of the entity.

Here is an example command:

datahub delete --urn "urn:li:container:<databricks_schema_urn>" --recursive --hard

Example Command Breakdown:

  • --urn "urn:li:container:<databricks_schema_urn>": Specifies the URN of the Databricks schema container.
  • --recursive: Ensures that all contained tables are included in the deletion.
  • --hard: Performs a hard delete, which physically deletes all rows for all aspects of the entity.

Important Considerations:

  • Dry Run: It is recommended to use the --dry-run flag initially to see what entities will be deleted before actually performing the delete operation.
  • Force: The --force flag can be used to skip confirmation prompts.

Example with Dry Run and Force:

datahub delete --urn "urn:li:container:<databricks_schema_urn>" --recursive --hard --dry-run --force

This command will show what entities will be deleted without actually performing the deletion and will skip any confirmation prompts.

Would you like more detailed information on how to perform this operation safely, or do you have any specific concerns about the deletion process?

Sources:

<@U06TM7M40RK> do I need to define the http path somehow in server_to_platform_instance for powerbi databricks lineage?

The DataHub’s documentation I have ingested does not mention how to define the HTTP path in server_to_platform_instance for PowerBI Databricks lineage. Typically, the server_to_platform_instance configuration is used to map a PowerBI datasource’s server to a DataHub platform instance, and it usually involves specifying the server’s host and optional port, rather than an HTTP path. You might want to ensure that the server name and platform instance are correctly configured and match the Databricks instance used in your PowerBI M-Query. If this doesn’t help, please reach out to the DataHub support team for further assistance. :+1: 0 button :-1: 0 button :light_bulb: Hint: Mention @runllm in the thread for followups.