Resolving Lineage Parse Error Due to Control Character in JSON Data

Original Slack Thread

Hi, I’m facing a lineage parse error due to presence of a control character emanating from error class <class 'pydantic.error_wrappers.ValidationError'>: 1 validation error for UpstreamLineageEdge

  {
    "query_id": "xxxxxx",
    "query_text": "xxxxxxx",
    "start_time": "	2024-11-26 14:05:20.043 Z"
  }
]; pos=1022; lineno=5; colno=20)```
which doesn't let me visualise the lineage for the metadata i'm ingesting, Can you help with how to resolve this error

Hey there! :wave:

<@U06TM7M40RK> Can you help with the questions in this thread?

:information_source: Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!

The error you’re encountering, Invalid control character, typically arises from the presence of an unexpected character in your JSON data, which is causing the JSON decoder to fail. This is often due to improper formatting or encoding issues in the JSON string.

Here are some steps you can take to resolve this issue:

  1. Check for Control Characters:

    • Review the JSON data you are trying to ingest, especially around the specified position (line 5, column 20). Look for any control characters (like tabs, newlines, etc.) that might not be properly escaped.
  2. Validate JSON Format:

    • Use a JSON validator tool to check the format of your JSON data. This can help identify any syntax errors or invalid characters.
  3. Escape Special Characters:

    • Ensure that any special characters in your JSON string are properly escaped. For example, replace tabs with \t and newlines with \n.
  4. Review Ingestion Configuration:

    • Check your ingestion configuration to ensure that the data source is correctly set up to handle the JSON format. This includes verifying any transformations or preprocessing steps that might introduce control characters.
  5. Update DataHub Version:

    • If you are using an older version of DataHub, consider updating to a newer version. There might be bug fixes or improvements related to JSON parsing in newer releases.
  6. Debugging:

    • Enable debug logging to get more detailed information about where the error is occurring. This can help pinpoint the exact location and cause of the issue.

Would you like more detailed steps on any of these points or assistance with a specific part of this process?

Sources:

<@U06TM7M40RK> need to more steps on how the json can be sanitised

I’m not able to find any data sources to answer your question. You might want to provide details about the specific JSON structure or the context in which it’s being used. If this doesn’t help, please reach out to the DataHub support team for further assistance. :+1: 0 button :-1: 0 button :bulb: Hint: Mention @runllm in the thread for followups.

<@U06TM7M40RK> i need more assistance with the above mentioned process

I’m not able to find any data sources to answer your question. You might want to provide information about the tools or libraries you’re using to process the JSON data. If this doesn’t help, please reach out to the DataHub support team for further assistance. :+1: 0 button :-1: 0 button :bulb: Hint: Mention @runllm in the thread for followups.

<@U06TM7M40RK> how can i reach out to the support team

To reach out to the DataHub support team, you can join the DataHub Slack community, which is a great place to get real-time support and interact with other DataHub users and contributors. Here are the steps to join and engage with the community:

  1. Join the DataHub Slack Community:

  2. Post Your Question:

    • Once you’re a member, you can post your question in relevant channels such as #troubleshoot or #office-hours. Make sure to include detailed information about your issue, such as error messages, logs, and any steps you’ve already taken to troubleshoot.
  3. Use Threads:

    • When posting, use threads to keep conversations organized, especially if you’re including long messages or code snippets.
  4. Office Hours:

    • DataHub has office hours where you can get live support from the Core DataHub Team. Check the #office-hours channel for more details on the schedule.
  5. Reach Out to Community Managers:

By following these steps, you should be able to get the support you need from the DataHub community.

Sources:

Can i get some support on this issue

how can we ensure the json can be sanitised for presence of escape characters

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><

<@U07FV3UHE4D>

Thank you for letting us know you need a person to take a look at this thread.

Either a community member or one of our team members will try to assist when they have a moment, responses unfortunately might not be immediate. We hope this helps speed things along :slight_smile:

are you getting this error during Ingestion? what are you using to collect this upstream lineage? an out-of-the-box Recipe or any sort of custom code?

Hey <@U04583XMT5Z> thanks for responding, Yes, I am receiving this error during ingestion not using any custom code the recipe is executed from the UI

> what are you using to collect this upstream lineage?
not sure if i understand this correctly, can you elaborate

was just confirming this was a supported Datahub recipe and not a custom process sending the lineage metadata in. could you share the full Recipe? not sure what would be causing this without understanding more the source-system you’re connecting to, etc

The source system is Snowflake

    type: snowflake
    config:
        authentication_type: KEY_PAIR_AUTHENTICATOR
        account_id: &lt;snowflake_account_id&gt;
        include_table_lineage: true
        include_view_lineage: true
        include_tables: true
        include_views: true
        ignore_start_time_lineage: false  # Disable to limit data processed
        use_legacy_lineage_method: true   # Optional: Use legacy method for 
        profiling:
            enabled: false
            profile_table_level_only: true
        stateful_ingestion:
            enabled: false
        username: &lt;username&gt;
        warehouse: &lt;warehouse&gt;
        role: &lt;role&gt;
        private_key: &lt;private key&gt;
        database_pattern:
            allow:
                - &lt;STAGING_DB&gt;
        schema_pattern:
            allow:
                - ADMIN```