Troubleshooting Teradata Source Plugin for Lineage Sync Issue

Original Slack Thread

Hi Community,

Currently we are able to use Teradata source plugin to ingest metadata but for some reason the lineage info is not getting synced ,even though we make sure that query logging is enabled and dbqlogtbl access to user is present. Currently we are on version 0.12.0 and we don’t see any error in logs.

Also ,below are the packages version that been installed.

teradatasql 20.0.0.1
teradatasqlalchemy 17.20.0.0
SQLAlchemy 1.4.50
sqllineage 1.3.8
sqlparse 0.4.4

Any suggestion on how we can proceed further will be helpful.

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Which DataHub version are you using? (e.g. 0.12.0)
  2. Please post any relevant error logs on the thread!

Please, can you run the ingestion in debug mode and share the logs with me?

Also, can you share your ingestion recipe?

Also make sure lineage extraction is enabled in your recipe:
include_table_lineage: true

Hi Tamas, apologies for delay response since i was OOO for few days . I ran recipe in debug mode and couldn’t find any error message and couldn’t attach the logs here due to logs containing table details (can verify and attach if possible) . Also yes i have enabled “include_table_lineage=true” statement in recipe .

  type: teradata
  config:
      include_views: true
      env: PROD
      database: ${TD_DATABASE}
      host_port: ********
      username: ${DATAHUB_TD_USER}
      password: ${DATAHUB_TD_PASSWORD}
      include_table_lineage: true
      include_usage_statistics: true
        #profiling:
        #enabled: true
pipeline_name: my-teradata-ingestion-pipeline
sink:
  type: "datahub-rest"
  config:
    server: '<http://datahub-datahub-gms:8080>'
    disable_ssl_verification: true
    token: ${METADATA_AUTHENTICATION_TOKEN_PROD}

transformers:
  - type: "simple_add_dataset_tags"
    config:
      semantics: PATCH
      tag_urns:
        - "urn:li:tag:Teradata"```

Hi <@UV14447EU> , any suggestion will be helpful here . Also i observed this warning message popping up in debug logs , but i m not sure how useful it would be.

WARNING {py.warnings:109} - /usr/local/lib/python3.9/dist-packages/datahub/ingestion/source/sql/teradata.py:162: RemovedIn20Warning: \\u001b[31mDeprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. \\u001b[32mTo prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to \\\"sqlalchemy&lt;2.0\\\". \\u001b[36mSet environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message.\\u001b[0m (Background on SQLAlchemy 2.0 at: <https://sqlalche.me/e/b8d9>)\\n for entry in engine.execute

these warnings should not affect ingestion I think

In debug mode can you see some tables databases anything?

yes i can see some table and db info

ahh, cool, so only lineage is missing, right?

correct

let me give you the query we run if you can check if it returns anything

        s.QueryID as "query_id",
        UserName as "user",
        StartTime AT TIME ZONE 'GMT' as "timestamp",
        DefaultDatabase as default_database,
        s.SqlTextInfo as "query_text",
        s.SqlRowNo as "row_no"
    FROM "DBC".DBQLogTbl as l
    JOIN "DBC".DBQLSqlTbl as s on s.QueryID = l.QueryID
    WHERE
        l.ErrorCode = 0
        AND l.statementtype not in (
        'Unrecognized type',
        'Create Database/User',
        'Help',
        'Modify Database',
        'Drop Table',
        'Show',
        'Not Applicable',
        'Grant',
        'Abort',
        'Database',
        'Flush Query Logging',
        'Null',
        'Begin/End DBQL',
        'Revoke'
    )
        and default_database not in ('DEMONOW_MONITOR')
    ORDER BY "query_id", "row_no"```

we further filter the resultset based on the start/end time of lineage (it is usually the last 24 hours) and we also add filter based on your table filter

I think the generated query should show up in the logs if you run it in debug mode

sure , just trying to understand where i need to run this query at ? our setup is based on Kubernetes cluster, where we are triggering ingestion recipe.

ah got it . let me connect with appropriate team who do have access to “DBQLogTbl” and see the output

i verified the query from the output with the debug logs and it seems like we are not getting those sql queries in it . any suggestion on how we can proceed further