Troubleshooting Qlik Cloud Ingestion Configuration and Errors

Original Slack Thread

Hello!
Setting up the Qlik Cloud ingestion, works fine to connect, but its only extracting the spaces, not the dashboards etc. I have full permission on my user that the API token is generated on. Any info on if this should work? Getting no errors. Looks like <@U01GZEETMEZ> and <@U056J673Y5N> made this in february?

Hey there! :wave:

<@U06TM7M40RK> Can you help with the questions in this thread?

:information_source: Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!

I’m not able to find any data sources to answer your question.

Could you provide more details about your ingestion configuration or any specific logs you are seeing?

If this doesn’t help, please reach out to the DataHub support team for further assistance. :+1: 0 button :-1: 0 button :bulb: Hint: Mention @runllm in the thread for followups.

Also, adding
stateful_ingestion:
enabled: true
This causes the error:
[88c5b281-b09d-487e-8ee0-d3e12f7b797e logs] AttributeError: ‘StatefulIngestionConfig’ object has no attribute ‘remove_stale_metadata’
[88c5b281-b09d-487e-8ee0-d3e12f7b797e logs] [2024-09-06 11:49:59,900] ERROR {datahub.ingestion.run.pipeline:738} - Pipeline Error: Ingestion pipeline raised an unexpected exception!Pipeline Error: Ingestion pipeline raised an unexpected exception!: ‘StatefulIngestionConfig’ object has no attribute ‘remove_stale_metadata’

If I do:
stateful_ingestion:
enabled: true
remove_stale_metadata: true

I get the error:
[645532da-b68c-494b-be1c-a1c32dec19a9 logs] Failed to configure the source (qlik-sense): 1 validation error for QlikSourceConfig
[645532da-b68c-494b-be1c-a1c32dec19a9 logs] stateful_ingestion → remove_stale_metadata
[645532da-b68c-494b-be1c-a1c32dec19a9 logs] extra fields not permitted (type=value_error.extra)

<@U06TM7M40RK> this is my ingestion config:
source:
type: qlik-sense
config:
tenant_hostname: http://domain.eu.qlikcloud.com|domain.eu.qlikcloud.com
api_key: ‘${qlik_api_token}’
ingest_owner: true
space_pattern:
deny:
- ‘*Data’
stateful_ingestion:
enabled: false

To address the issues you’re facing with Qlik Cloud ingestion and stateful ingestion, let’s break down the configuration and errors:

  1. Extracting Only Streams:

    • Ensure that your API key has the necessary permissions to access dashboards and other entities.
    • Verify that your space_pattern configuration is not inadvertently filtering out the dashboards. The deny patterns you have might be excluding some entities.
  2. Stateful Ingestion Configuration:

    • The error AttributeError: 'StatefulIngestionConfig' object has no attribute 'remove_stale_metadata' suggests that the remove_stale_metadata attribute is not recognized in your configuration.
    • The error extra fields not permitted (type=value_error.extra) indicates that remove_stale_metadata is not a valid field under stateful_ingestion for the Qlik Sense source.

According to the <Qlik Sense | DataHub Sense ingestion documentation>, the stateful_ingestion configuration for Qlik Sense does not include remove_stale_metadata. This attribute is not supported for this source.

Here is a corrected version of your ingestion configuration:

source:
  type: qlik-sense
  config:
    tenant_hostname: "domain.eu.qlikcloud.com"
    api_key: "${qlik_api_token}"
    ingest_owner: true
    space_pattern:
      deny:
        - "*Data"
        - "Persontogtrafikk - Master"
    stateful_ingestion:
      enabled: true  # Enable stateful ingestion
      # remove_stale_metadata is not supported for Qlik Sense
pipeline_name: "qlik_sense_ingestion_pipeline"  # Ensure you have a pipeline name for stateful ingestion
sink:
  type: "datahub-rest"
  config:
    server: "http://localhost:8080"  # Replace with your DataHub instance URL
    token: "${datahub_token}"  # Replace with your DataHub API token

Key Points:

  • Stateful Ingestion: Ensure enabled is set to true and provide a pipeline_name.
  • Remove Stale Metadata: This attribute is not supported for Qlik Sense, so it should be omitted.
  • Permissions: Double-check that your API key has the necessary permissions to access all required entities.

Would you like more detailed steps or further assistance on any specific part of this process?

Sources:

<@U06TM7M40RK> where do I get the datahub token I can use for the sink?

To generate a DataHub token for use in the sink configuration, you need to create a Personal Access Token (PAT) from the DataHub UI. Here are the steps to generate and use a PAT:

Steps to Generate a Personal Access Token:

  1. Navigate to Settings:

    • Log in to your DataHub instance.
    • Go to the Settings page.
  2. Access Tokens:

    • In the Settings page, find the Access Tokens section.
    • Click on Generate new token.
  3. Generate Token:

    • Fill out the required information, such as the token name and expiration date.
    • Click Create to generate the token.
    • Save the generated token securely, as you will need it for your sink configuration.

Example Configuration with the Token:

Once you have the token, you can use it in your ingestion configuration as follows:

source:
  type: qlik-sense
  config:
    tenant_hostname: "domain.eu.qlikcloud.com"
    api_key: "${qlik_api_token}"
    ingest_owner: true
    space_pattern:
      deny:
        - "*Data"
        - "Persontogtrafikk - Master"
    stateful_ingestion:
      enabled: true
pipeline_name: "qlik_sense_ingestion_pipeline"
sink:
  type: "datahub-rest"
  config:
    server: "http://localhost:8080"  # Replace with your DataHub instance URL
    token: "<your-personal-access-token>"  # Replace with the generated token

Using the Token in HTTP Requests:

If you need to use the token in HTTP requests, include it in the Authorization header as follows:

curl 'http://localhost:8080/entities/urn:li:corpuser:datahub' -H 'Authorization: Bearer <your-personal-access-token>'

Additional Resources:

Would you like more detailed steps or assistance with a specific part of this process?

Sources:

I’m using your suggestion to add a pipeline name and sink, still the same error.

The recipe:

    type: qlik-sense
    config:
        tenant_hostname: <http://domain.eu.qlikcloud.com|domain.eu.qlikcloud.com>
        api_key: '${qlik_api_token}'
        ingest_owner: true
        stateful_ingestion:
            enabled: true
pipeline_name: qlik_sense_ingestion_pipeline
sink:
    type: datahub-rest
    config:
        server: '<http://datahub-gms:8080>'```
[ffc9c756-0941-432f-984a-73b8b6c654d1 logs]   File "/tmp/datahub/ingest/venv-qlik-sense-4bc28c54b897321b/lib/python3.10/site-packages/datahub/ingestion/source/state/stale_entity_removal_handler.py", line 120, in __init__
[ffc9c756-0941-432f-984a-73b8b6c654d1 logs]     and self.stateful_ingestion_config.remove_stale_metadata
[ffc9c756-0941-432f-984a-73b8b6c654d1 logs] AttributeError: 'StatefulIngestionConfig' object has no attribute 'remove_stale_metadata'
[ffc9c756-0941-432f-984a-73b8b6c654d1 logs] [2024-09-06 13:06:03,734] ERROR    {datahub.ingestion.run.pipeline:738} - Pipeline Error: Ingestion pipeline raised an unexpected exception!Pipeline Error: Ingestion pipeline raised an unexpected exception!: 'StatefulIngestionConfig' object has no attribute 'remove_stale_metadata'

<@U06TM7M40RK>

The error you’re encountering indicates that the remove_stale_metadata attribute is being referenced internally, even though it is not supported for the qlik-sense source. This suggests there might be an issue with the DataHub version or the specific implementation of stateful ingestion for Qlik Sense.

Steps to Troubleshoot:

  1. Check DataHub Version:

    • Ensure you are using a compatible version of DataHub that supports stateful ingestion for Qlik Sense. The latest versions often have bug fixes and improvements.
  2. Review Documentation:

    • Double-check the <Qlik Sense | DataHub Sense ingestion documentation> to ensure there are no additional configuration steps or known issues.
  3. Simplify Configuration:

    • Temporarily disable stateful ingestion to verify that the rest of the configuration works correctly.

Example Configuration Without Stateful Ingestion:

source:
  type: qlik-sense
  config:
    tenant_hostname: "domain.eu.qlikcloud.com"
    api_key: "${qlik_api_token}"
    ingest_owner: true
pipeline_name: "qlik_sense_ingestion_pipeline"
sink:
  type: "datahub-rest"
  config:
    server: "http://datahub-gms:8080"
    token: "<your-personal-access-token>"

If the Above Configuration Works:

If the simplified configuration works without errors, the issue is likely related to the stateful ingestion settings. In that case, consider the following:

  1. Update DataHub:

    • Update to the latest version of DataHub to ensure you have the latest fixes and features.
  2. Custom Stateful Ingestion Configuration:

    • If you still need stateful ingestion, you might need to customize the ingestion code or wait for an official fix if it’s a known issue.

Reporting the Issue:

If the problem persists, consider reporting it to the DataHub community or checking if there are any open issues related to stateful ingestion for Qlik Sense:

Would you like more detailed steps on any of these troubleshooting methods?

Sources: