Configuring Lineage from dbt to Azure SQL and PowerBIaucets

user-3 · November 4, 2024, 12:02am

<@U06TM7M40RK> We are using dbt core as our source, target for the models is azure sql, and we use power BI to build dashboards based on the data in Azure SQL.
I want to ingest PowerBI including lineage. So for example, I will see a lineage from dbt to azuresql and from azuresql to powerbi. Below is the example of the recipe file taken from datahub documentation.
How should I adjust it for our purposes?

  type: "powerbi"
  config:
    # Your Power BI tenant identifier
    tenant_id: a949d688-67c0-4bf1-a344-e939411c6c0a

    # Azure AD Application identifier
    client_id: foo
    # Azure AD App client secret
    client_secret: bar
    
    # Ingest elements of below PowerBi Workspace into Datahub
    workspace_id_pattern:
      allow:
        - 4bd10256-e999-45dd-8e56-571c77153a5f
      deny:

    # Enable / Disable ingestion of ownership information for dashboards
    extract_ownership: true
    
    # Enable/Disable extracting workspace information to DataHub containers
    extract_workspaces_to_containers: true
    
    # Enable / Disable ingestion of endorsements.
    # Please notice that this may overwrite any existing tags defined to ingested entities!
    extract_endorsements_to_tags: false
    
    # Optional -- This mapping is optional and only required to configure platform-instance for upstream tables
    # A mapping of PowerBI datasource's server i.e host[:port] to data platform instance.
    # :port is optional and only needed if your datasource server is running on non-standard port.
    # For Google BigQuery the datasource's server is google bigquery project name
    server_to_platform_instance:
        <http://ap-south-1.snowflakecomputing.com|ap-south-1.snowflakecomputing.com>:
          platform_instance: operational_instance
          env: DEV
        oracle-server:1920:
          platform_instance: high_performance_production_unit
          env: PROD
        big-query-sales-project:
          platform_instance: sn-2
          env: QA

    # Need admin_api, only ingest workspace that are modified since...
    modified_since: "2023-02-10T00:00:00.0000000Z"

    ownership:
        # create powerbi user as datahub corpuser, false will still extract ownership of workspace/ dashboards
        create_corp_user: false
        # use email to build user urn instead of powerbi user identifier
        use_powerbi_email: true
        # remove email suffix like @acryl.io
        remove_email_suffix: true
        # only ingest user with certain authority
        owner_criteria: ["ReadWriteReshareExplore","Owner","Admin"]
    # wrap powerbi tables (datahub dataset) under 1 powerbi dataset (datahub container)
    extract_datasets_to_containers: true
    # only ingest dataset that are endorsed, like "Certified"
    filter_dataset_endorsements: 
      allow:
        - Certified
      
    # extract powerbi dashboards and tiles
    extract_dashboards: false
    # extract powerbi dataset table schema
    extract_dataset_schema: true

    # Enable PowerBI dataset profiling
    profiling:
      enabled: false
    # Pattern to limit which resources to profile
    # Matched resource format is following:
    # workspace_name.dataset_name.table_name
    profile_pattern:
      deny:
        - .*


sink:
  # sink config```

user-1 · November 4, 2024, 12:02am

To adjust your PowerBI ingestion recipe for your specific use case, where you want to see lineage from dbt to Azure SQL and from Azure SQL to PowerBI, you need to ensure that the configuration captures the necessary lineage information. Here are some key points and adjustments you might consider:

Ensure Lineage Extraction: Make sure that the extract_lineage parameter is set to true to enable lineage extraction. This is crucial for capturing the lineage from Azure SQL to PowerBI.
Server to Platform Instance Mapping: You need to map your Azure SQL server to a platform instance in the server_to_platform_instance section. This mapping helps DataHub understand the relationship between your data sources and PowerBI.
Workspace and Dataset Configuration: Ensure that your workspace_id_pattern is correctly configured to include all relevant workspaces. This will ensure that all necessary PowerBI workspaces are ingested.
Ownership and Endorsements: If you want to capture ownership and endorsements, ensure that extract_ownership and extract_endorsements_to_tags are set according to your needs.
Profiling and Schema Extraction: If you want to profile datasets or extract schema information, ensure that profiling.enabled and extract_dataset_schema are set to true.

Here’s an example of how you might adjust your recipe:

source:
  type: "powerbi"
  config:
    tenant_id: a949d688-67c0-4bf1-a344-e939411c6c0a
    client_id: foo
    client_secret: bar
    workspace_id_pattern:
      allow:
        - 4bd10256-e999-45dd-8e56-571c77153a5f
    extract_ownership: true
    extract_workspaces_to_containers: true
    extract_endorsements_to_tags: false
    extract_lineage: true  # Ensure lineage extraction is enabled
    server_to_platform_instance:
        azure-sql-server:  # Replace with your Azure SQL server details
          platform_instance: azure_sql_instance
          env: PROD
    modified_since: "2023-02-10T00:00:00.0000000Z"
    ownership:
        create_corp_user: false
        use_powerbi_email: true
        remove_email_suffix: true
        owner_criteria: ["ReadWriteReshareExplore","Owner","Admin"]
    extract_datasets_to_containers: true
    filter_dataset_endorsements: 
      allow:
        - Certified
    extract_dashboards: true  # Enable dashboard extraction
    extract_dataset_schema: true
    profiling:
      enabled: false
    profile_pattern:
      deny:
        - .*

sink:
  type: datahub-rest
  config:
    server: 'http://localhost:8080'

user-1 · November 4, 2024, 12:02am

Make sure to replace placeholders like azure-sql-server and azure_sql_instance with your actual Azure SQL server details. This configuration should help you capture the desired lineage from dbt to Azure SQL and from Azure SQL to PowerBI.

Would you like more detailed steps or further assistance on any of these points?

Sources:

<PowerBI | DataHub Ingestion Documentation>
<Slack Conversation on PowerBI Ingestion> 0 button 0 button Hint: Mention @runllm in the thread for followups.

user-3 · November 4, 2024, 12:02am

<@U06TM7M40RK> I still dont see columns and lineage four the sources on powerbi platform.
In my case, the azure-sql-server should be something like this? my-sql.http://database.windows.net|database.windows.net
Do I need the port?
What should I put instead of the azure_sql_instance?

user-1 · November 4, 2024, 12:02am

To ensure that you see columns and lineage for the sources on the PowerBI platform, it’s important to correctly configure the server_to_platform_instance mapping in your ingestion recipe. Let’s address your questions step-by-step:

Azure SQL Server Format:
- The format for your Azure SQL server should be my-sql.database.windows.net. You do not need to include http:// or https:// in the server name.
- You typically do not need to specify a port unless your Azure SQL server is running on a non-standard port. The default port for SQL Server is 1433, which is usually implied.
Platform Instance:
- The platform_instance is a logical name you assign to your Azure SQL instance within DataHub. It helps DataHub identify and map the data source to the PowerBI platform.
- You can choose a descriptive name for your platform_instance, such as azure_sql_prod or azure_sql_dev, depending on your environment.

Here’s how you can adjust your configuration:

source:
  type: "powerbi"
  config:
    tenant_id: a949d688-67c0-4bf1-a344-e939411c6c0a
    client_id: foo
    client_secret: bar
    workspace_id_pattern:
      allow:
        - 4bd10256-e999-45dd-8e56-571c77153a5f
    extract_ownership: true
    extract_workspaces_to_containers: true
    extract_endorsements_to_tags: false
    extract_lineage: true
    server_to_platform_instance:
        my-sql.database.windows.net:  # Use your actual Azure SQL server name
          platform_instance: azure_sql_prod  # Choose a descriptive name for your instance
          env: PROD
    modified_since: "2023-02-10T00:00:00.0000000Z"
    ownership:
        create_corp_user: false
        use_powerbi_email: true
        remove_email_suffix: true
        owner_criteria: ["ReadWriteReshareExplore","Owner","Admin"]
    extract_datasets_to_containers: true
    filter_dataset_endorsements: 
      allow:
        - Certified
    extract_dashboards: true
    extract_dataset_schema: true
    profiling:
      enabled: false
    profile_pattern:
      deny:
        - .*

sink:
  type: datahub-rest
  config:
    server: 'http://localhost:8080'

Make sure to replace my-sql.database.windows.net with your actual Azure SQL server name and choose an appropriate name for azure_sql_prod that reflects your environment.

Would you like more detailed steps or further assistance on any specific part of this process?

Sources:

https://datahubspace.slack.com/archives/C029A3M079U/p1726506291.721859|https://datahubspace.slack.com/archives/C029A3M079U/p1726506291.721859
https://datahubspace.slack.com/archives/C029A3M079U/p1726004842.602859|https://datahubspace.slack.com/archives/C029A3M079U/p1726004842.602859 0 button 0 button Hint: Mention @runllm in the thread for followups.

user-3 · November 4, 2024, 12:02am

<@U06TM7M40RK> Do I need admin_apis_only arameter for the lineage?

user-1 · November 4, 2024, 12:02am

The admin_apis_only parameter in the PowerBI ingestion configuration is relevant when you want to use the PowerBI Admin API exclusively for metadata retrieval. This can be important for extracting comprehensive metadata, including lineage, but it comes with certain caveats.

When to Use admin_apis_only:

Comprehensive Metadata Extraction:
- Using the Admin API allows you to extract more detailed metadata, including lineage, as it provides access to all PowerBI resources irrespective of whether the service principal has explicit access to them.
Caveats:
- If admin_apis_only is set to true, certain metadata like report pages will not be ingested because the page API is not available in the PowerBI Admin API.
- PowerBI parameters might not be resolved to actual values while processing M-Query for table lineage.
- Dataset profiling is unavailable, as it requires access to the workspace API.

Configuration:

If you decide to use the Admin API for lineage extraction, you can enable the admin_apis_only parameter in your configuration:

source:
  type: "powerbi"
  config:
    tenant_id: a949d688-67c0-4bf1-a344-e939411c6c0a
    client_id: foo
    client_secret: bar
    admin_apis_only: true  # Enable Admin API usage
    workspace_id_pattern:
      allow:
        - 4bd10256-e999-45dd-8e56-571c77153a5f
    extract_ownership: true
    extract_workspaces_to_containers: true
    extract_endorsements_to_tags: false
    extract_lineage: true
    server_to_platform_instance:
        my-sql.database.windows.net:
          platform_instance: azure_sql_prod
          env: PROD
    modified_since: "2023-02-10T00:00:00.0000000Z"
    ownership:
        create_corp_user: false
        use_powerbi_email: true
        remove_email_suffix: true
        owner_criteria: ["ReadWriteReshareExplore","Owner","Admin"]
    extract_datasets_to_containers: true
    filter_dataset_endorsements: 
      allow:
        - Certified
    extract_dashboards: true
    extract_dataset_schema: true
    profiling:
      enabled: false
    profile_pattern:
      deny:
        - .*

sink:
  type: datahub-rest
  config:
    server: 'http://localhost:8080'

Conclusion:

Using admin_apis_only can be beneficial for extracting detailed lineage information, but you should be aware of the limitations and ensure that it aligns with your metadata extraction needs.

Would you like more detailed steps or further assistance on any specific part of this process?

Sources:

<PowerBI | DataHub Ingestion Documentation>
<https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/docs/sources/powerbi/powerbi_pre.md|GitHub PowerBI Ingestion Discussion> 0 button 0 button Hint: Mention @runllm in the thread for followups.

user-3 · November 4, 2024, 12:02am

<@U06TM7M40RK> with or without the admin_apis_only - I still don’t see upstream lineage to azure sql.
Do I need to re-run azure sql ingestion after ingesting the power bi ?

user-1 · November 4, 2024, 12:02am

To ensure that you see upstream lineage from PowerBI to Azure SQL, it’s important to correctly sequence and configure your ingestion processes. Here are some steps and considerations:

Ingestion Order:
- It is generally recommended to run the ingestion for both your data warehouse (Azure SQL in this case) and PowerBI. The order can matter if there are dependencies, but typically, you should ensure that both are ingested to capture the full lineage.
Re-run Azure SQL Ingestion:
- If you have already ingested PowerBI and are not seeing the expected lineage, it might be beneficial to re-run the Azure SQL ingestion. This ensures that all metadata is up-to-date and any new configurations or changes are captured.
Verify Configuration:
- Double-check that your ingestion configurations for both Azure SQL and PowerBI are correct and that they include all necessary parameters for lineage extraction.
Check for Known Issues:
- There might be specific issues or limitations with the current version of DataHub that affect lineage visibility. Checking the <https://github.com/datahub-project/datahub/issues|DataHub GitHub issues> or <Slack discussions> can provide insights into any known problems.
Ensure Correct Mapping:
- Make sure that the server_to_platform_instance mapping in your PowerBI configuration correctly matches the Azure SQL server details. This mapping is crucial for DataHub to understand the relationship between the data sources.
Debugging and Logs:
- Enable detailed logging during the ingestion process to capture any warnings or errors that might indicate why lineage is not being captured as expected.

Here’s a general approach to follow:

Step 1: Ingest Azure SQL metadata into DataHub.
Step 2: Ingest PowerBI metadata into DataHub.
Step 3: Verify lineage in the DataHub UI.

If you have followed these steps and still do not see the expected lineage, it might be necessary to review the logs for any errors or warnings and ensure that all configurations are correct.

Would you like more detailed steps or further assistance on any specific part of this process?

Sources:

https://datahubspace.slack.com/archives/C029A3M079U/p1726506291.721859|https://datahubspace.slack.com/archives/C029A3M079U/p1726506291.721859
https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/powerbi|https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/powerbi 0 button 0 button Hint: Mention @runllm in the thread for followups.

user-3 · November 4, 2024, 12:02am

<@U06TM7M40RK> To be more accurate, I’m adding here the recipe for the azure sql ingestion and for the power bi recipe.
Go over the server_to_platform_instance and in general let me know if there is any possible conflict that might impact the missing lineage.
Azure SQL recipe:

  type: mssql-odbc
  config:
    # Coordinates
    host_port: "<http://myserver-dev1-001.database.windows.net:1433|myserver-dev1-001.database.windows.net:1433>"
    database: "custom-data"
 
    # Credentials
    username: $HYPERPLANE__KEYVAULT_SECRET__QQID
    password: $HYPERPLANE__KEYVAULT_SECRET__QQPASS
    include_tables: true
    include_jobs: false
    profiling:
      enabled: true
      profile_table_level_only: false  # Only profile at the table level if set to true
      include_field_distinct_count: true
      include_field_min_value: true
      include_field_max_value: true
      include_field_mean_value: false
      include_field_median_value: false
      include_field_stddev_value: false
      include_field_histogram: false
    include_views: true
    options:
      connect_args:
        authentication: ActiveDirectoryPassword
    use_odbc: "True"
    uri_args:
      driver: "ODBC Driver 18 for SQL Server"
      Encrypt: "No"
      TrustServerCertificate: "No"
      ssl: "False"
  # Exclude specific tables
    stateful_ingestion:
      enabled: true  # Enable stateful ingestion
      remove_stale_metadata: true  # Automatically remove stale datasets

pipeline_name: "azure_sql_ingestion_pipeline"  # Consistent pipeline name

Here is the powerbi recipe file:

    type: powerbi
    config:
        tenant_id: XXX
        extract_endorsements_to_tags: false
        client_secret: YYY
        client_id: GGG
        stateful_ingestion:
            enabled: true
        admin_apis_only: true
        extract_lineage: true
        extract_column_level_lineage: true
        extract_dataset_schema: true
        extract_ownership: true
        extract_datasets_to_containers: true
        server_to_platform_instance:
            '<http://myserver-dev1-001.database.windows.net:1433|myserver-dev1-001.database.windows.net:1433>':
                platform_instance: azure_sql_instance
                env: PROD
pipeline_name: 'urn:li:dataHubIngestionSource:4f6d6ed4-6ea1-4822-9b37-7d627dd2932c'```

user-1 · November 4, 2024, 12:02am

To ensure that the lineage from Azure SQL to PowerBI is correctly captured, let’s review the configurations in both your Azure SQL and PowerBI ingestion recipes, focusing on the server_to_platform_instance mapping and other potential conflicts.

Azure SQL Ingestion Recipe

Host Port:
- Ensure that the host_port is correctly specified without the http:// prefix. It should be myserver-dev1-001.database.windows.net:1433.
ODBC Configuration:
- The use_odbc parameter should be a boolean, not a string. It should be true instead of "True".
URI Arguments:
- Ensure that the uri_args are correctly configured for your environment. The Encrypt and TrustServerCertificate settings should align with your security requirements.

PowerBI Ingestion Recipe

Server to Platform Instance Mapping:
- The server_to_platform_instance mapping should match the host_port used in the Azure SQL ingestion. Ensure that it is specified without the http:// prefix, like this:
```
server_to_platform_instance:
  'myserver-dev1-001.database.windows.net:1433':
    platform_instance: azure_sql_instance
    env: PROD
```
Admin APIs:
- The admin_apis_only parameter is set to true, which is fine for detailed metadata extraction. Ensure that your Azure AD app has the necessary permissions to access the PowerBI Admin API.
Stateful Ingestion:
- Ensure that stateful ingestion is enabled in both recipes to manage metadata changes effectively.

General Considerations

Consistency: Ensure that the platform_instance name (azure_sql_instance) is consistent across both recipes. This helps DataHub map the lineage correctly.
Environment: Verify that the env parameter is set correctly in both recipes. It should reflect the actual environment (e.g., PROD, DEV).
Logging and Debugging: Enable detailed logging during ingestion to capture any warnings or errors that might indicate why lineage is not being captured as expected.

Adjusted Configuration

Here’s how you might adjust the configurations:

Azure SQL Recipe:

user-1 · November 4, 2024, 12:02am

source:
  type: mssql-odbc
  config:
    host_port: "<http://myserver-dev1-001.database.windows.net:1433|myserver-dev1-001.database.windows.net:1433>"
    database: "custom-data"
    username: $HYPERPLANE__KEYVAULT_SECRET__QQID
    password: $HYPERPLANE__KEYVAULT_SECRET__QQPASS
    include_tables: true
    include_jobs: false
    profiling:
      enabled: true
      profile_table_level_only: false
      include_field_distinct_count: true
      include_field_min_value: true
      include_field_max_value: true
    include_views: true
    options:
      connect_args:
        authentication: ActiveDirectoryPassword
    use_odbc: true  # Corrected to boolean
    uri_args:
      driver: "ODBC Driver 18 for SQL Server"
      Encrypt: "No"
      TrustServerCertificate: "No"
  stateful_ingestion:
    enabled: true
    remove_stale_metadata: true
pipeline_name: "azure_sql_ingestion_pipeline"

PowerBI Recipe:

source:
  type: powerbi
  config:
    tenant_id: XXX
    client_secret: YYY
    client_id: GGG
    stateful_ingestion:
      enabled: true
    admin_apis_only: true
    extract_lineage: true
    extract_column_level_lineage: true
    extract_dataset_schema: true
    extract_ownership: true
    extract_datasets_to_containers: true
    server_to_platform_instance:
      '<http://myserver-dev1-001.database.windows.net:1433|myserver-dev1-001.database.windows.net:1433>':
        platform_instance: azure_sql_instance
        env: PROD
pipeline_name: 'urn:li:dataHubIngestionSource:4f6d6ed4-6ea1-4822-9b37-7d627dd2932c'

By ensuring consistency and correctness in these configurations, you should be able to capture the desired lineage from Azure SQL to PowerBI. If issues persist, reviewing logs and checking for any specific errors or warnings during ingestion can provide further insights.

Would you like more detailed steps or further assistance on any specific part of this process?

Sources:

https://datahubspace.slack.com/archives/CUMUWQU66/p1720578726.113079|https://datahubspace.slack.com/archives/CUMUWQU66/p1720578726.113079
https://datahubspace.slack.com/archives/CUMUWQU66/p1685508242.723709|https://datahubspace.slack.com/archives/CUMUWQU66/p1685508242.723709 0 button 0 button Hint: Mention @runllm in the thread for followups.

user-3 · November 4, 2024, 12:02am

<@U06TM7M40RK> About this suggestion:
General Considerations

Consistency: Ensure that the platform_instance name (azure_sql_instance) is consistent across both recipes. This helps DataHub map the lineage correctly.

I don’t have platform_instance in my azure sql ingestion, so how can I make sure its the same?

user-1 · November 4, 2024, 12:02am

To ensure consistency between your Azure SQL and PowerBI ingestion recipes, particularly regarding the platform_instance, you can follow these steps:

Add platform_instance to Azure SQL Ingestion:
- While the platform_instance is not mandatory for all ingestion sources, adding it can help ensure that DataHub correctly maps the lineage between Azure SQL and PowerBI. You can add a platform_instance parameter to your Azure SQL ingestion configuration.
Ensure Consistent Naming:
- Use the same platform_instance name in both the Azure SQL and PowerBI configurations. This consistency helps DataHub understand that the data from Azure SQL is the same as the data being used in PowerBI.

Here’s how you can modify your Azure SQL ingestion recipe to include the platform_instance:

Azure SQL Recipe with platform_instance:

source:
  type: mssql-odbc
  config:
    host_port: "myserver-dev1-001.database.windows.net:1433"
    database: "custom-data"
    username: $HYPERPLANE__KEYVAULT_SECRET__QQID
    password: $HYPERPLANE__KEYVAULT_SECRET__QQPASS
    include_tables: true
    include_jobs: false
    profiling:
      enabled: true
      profile_table_level_only: false
      include_field_distinct_count: true
      include_field_min_value: true
      include_field_max_value: true
    include_views: true
    options:
      connect_args:
        authentication: ActiveDirectoryPassword
    use_odbc: true
    uri_args:
      driver: "ODBC Driver 18 for SQL Server"
      Encrypt: "No"
      TrustServerCertificate: "No"
    platform_instance: azure_sql_instance  # Add this line for consistency
  stateful_ingestion:
    enabled: true
    remove_stale_metadata: true
pipeline_name: "azure_sql_ingestion_pipeline"

PowerBI Recipe:

source:
  type: powerbi
  config:
    tenant_id: XXX
    client_secret: YYY
    client_id: GGG
    stateful_ingestion:
      enabled: true
    admin_apis_only: true
    extract_lineage: true
    extract_column_level_lineage: true
    extract_dataset_schema: true
    extract_ownership: true
    extract_datasets_to_containers: true
    server_to_platform_instance:
      'myserver-dev1-001.database.windows.net:1433':
        platform_instance: azure_sql_instance  # Ensure this matches the Azure SQL recipe
        env: PROD
pipeline_name: 'urn:li:dataHubIngestionSource:4f6d6ed4-6ea1-4822-9b37-7d627dd2932c'

By adding the platform_instance to your Azure SQL ingestion and ensuring it matches the one in your PowerBI configuration, you help DataHub correctly map the lineage between these two data sources.

Would you like more detailed steps or further assistance on any specific part of this process?

Sources:

<https://datahubproject.io/docs/0.14.0/platform-instances|Working With Platform Instances>
<Slack Discussion on Platform Instances> 0 button 0 button Hint: Mention @runllm in the thread for followups.

user-3 · November 4, 2024, 12:02am

<@U06TM7M40RK> I added the platform instance to azure sql ingestion - still no column lineage and no lineage from azure sql to powerbi.

From the log I see this in results (is it ok that only 1 dashboard scanned?)

 'last_state_non_deletable_entities': [],
 'dashboards_scanned': 1,
 'charts_scanned': 0,
 'filtered_dashboards': [],
 'filtered_charts': [],
 'number_of_workspaces': 1,
 'start_time': '2024-11-02 01:26:09.794352 (16 seconds ago)',
 'running_time': '16 seconds',
 'failures': [],
 'warnings': [],
 'infos': []}
Sink (datahub-rest) report:
{'total_records_written': 332,
 'records_written_per_second': 20,
 'warnings': [],
 'failures': [],
 'start_time': '2024-11-02 01:26:09.435398 (16.36 seconds ago)',
 'current_time': '2024-11-02 01:26:25.796493 (now)',
 'total_duration_in_seconds': 16.36,
 'max_threads': 15,
 'gms_version': 'v0.14.1',
 'pending_requests': 0,
 'main_thread_blocking_timer': '1.857 seconds'}```
And I also see this in the log:
```[2024-11-02 01:26:16,436] INFO     {datahub.ingestion.source.powerbi.rest_api_wrapper.powerbi_api:325} - Unable to fetch dataset parameters for 946d58a0-39c5-4866-a9d4-ecdb337b0528: 'currentValue'
[2024-11-02 01:26:20,464] INFO     {datahub.ingestion.source.powerbi.rest_api_wrapper.powerbi_api:325} - Unable to fetch dataset parameters for 07426e41-bdf1-45a1-b2f7-1df406390808: 'currentValue'
[2024-11-02 01:26:21,914] INFO     {datahub.ingestion.source.powerbi.rest_api_wrapper.powerbi_api:415} - Skipping endorsements tag as extract_endorsements_to_tags is set to false 
[2024-11-02 01:26:23,343] INFO     {datahub.ingestion.source.powerbi.rest_api_wrapper.powerbi_api:175} - Skipping endorsements tags retrieval for report as extract_endorsements_to_tags is set to false
[2024-11-02 01:26:23,344] INFO     {datahub.ingestion.source.powerbi.rest_api_wrapper.powerbi_api:466} - Skipping tag retrieval for dashboard as extract_endorsements_to_tags is set to false
[2024-11-02 01:26:23,344] INFO     {datahub.ingestion.source.powerbi.rest_api_wrapper.powerbi_api:426} - Skipping independent datasets retrieval as extract_independent_datasets is set to false```

user-1 · November 4, 2024, 12:02am

Based on the logs and the configuration you’ve shared, there are a few areas to consider for troubleshooting the missing column lineage and the lineage from Azure SQL to PowerBI:

Dashboard and Dataset Scanning:
- The log indicates that only one dashboard was scanned. This might be expected if you have only one dashboard in the workspace or if the configuration limits the scope of ingestion. Ensure that your workspace_id_pattern in the PowerBI configuration is set to include all relevant workspaces.
Dataset Parameters:
- The log messages about being unable to fetch dataset parameters ('currentValue') suggest there might be issues with parameter resolution in PowerBI datasets. This can affect lineage extraction if parameters are used in queries that define lineage.
Endorsements and Tags:
- The log indicates that endorsements and tags are being skipped because extract_endorsements_to_tags is set to false. While this setting is not directly related to lineage, it might affect the completeness of metadata if endorsements are used to signify important datasets.
Independent Datasets:
- The log mentions skipping independent datasets retrieval because extract_independent_datasets is set to false. If your PowerBI reports rely on datasets that are not directly linked to dashboards, this setting might need to be adjusted.
Column Lineage:
- Ensure that extract_column_level_lineage is set to true in your PowerBI configuration, which you have already done. However, verify that the Azure SQL ingestion is also configured to support column-level metadata extraction.
Re-run Ingestion:
- After making any configuration changes, re-run both the Azure SQL and PowerBI ingestion processes to ensure that all metadata is up-to-date.

Suggested Actions:

Review Workspace Configuration:
- Ensure that all relevant workspaces are included in the workspace_id_pattern and that there are no filters inadvertently excluding dashboards or datasets.
Check Dataset Parameters:
- Investigate the datasets mentioned in the log to understand why parameters cannot be fetched. This might require checking the PowerBI dataset configurations.
Enable Independent Datasets:
- Consider setting extract_independent_datasets to true if your reports use datasets not directly linked to dashboards.
Verify Column Lineage Support:
- Double-check that both Azure SQL and PowerBI configurations are set to support column-level lineage and that there are no known issues with the versions you are using.
Enable Detailed Logging:
- Increase the logging level to capture more detailed information during ingestion, which might provide further insights into any issues.

user-1 · November 4, 2024, 12:02am

If these steps do not resolve the issue, it might be helpful to consult the <https://github.com/datahub-project/datahub/issues|DataHub GitHub issues> or <Slack discussions> for any known issues or additional troubleshooting tips.

Would you like more detailed steps or further assistance on any specific part of this process?

Sources:

<https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/powerbi|PowerBI Ingestion Documentation> 0 button 0 button Hint: Mention @runllm in the thread for followups.

datahub_team · November 4, 2024, 12:02am

Hi ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look

Hope this helps ! ><

Topic		Replies	Views
Ensuring Proper Configuration for Column-Level Lineage in DataHub ingestion	4	60	January 20, 2025
Ingesting Power BI Metadata: Lineage Between Page and Report Issue ingestion	3	30	July 29, 2024
Troubleshooting PowerBI Admin API Permissions and Lineage Issues ingestion	15	8	March 31, 2025
Configuring Datahub Ingestion of dbt Artifacts from Azure DevOps to Databricks Cluster ingestion	12	58	April 7, 2025
Troubleshooting Lineage Extraction for SQL Queries in DataHub ingestion	8	125	July 8, 2024

Configuring Lineage from dbt to Azure SQL and PowerBIaucets

Related topics