Metadata Ingestion Failures from Metabase Source

Original Slack Thread

datahub version : v0.12.0

datahub cli : 0.12.1.1

metabase version : v0.48.2

While ingesting metadata from metabase source we encountered these failures.

        "metabase-query": [
          "Unable to retrieve lineage from query. Query: select  port_date, sum(port_out_numbers) as port_out_numbers from {{#35-base-model-base-model-for-hypothesis-testing}} group by port_date order by port_date desc\n Reason: An Identifier is expected, got Token[value: {] instead. ",
          "Unable to retrieve lineage from query. Query: WITH port_data_ranked AS (\r\n  SELECT\r\n    port_date,\r\n    port_out_numbers,\r\n    ROW_NUMBER() OVER (ORDER BY port_date DESC) AS rnk\r\n  FROM {{#32}}\r\n),\r\nlatest_data AS (\r\n  SELECT\r\n    port_date,\r\n    port_out_numbers,\r\n    ROW_NUMBER() OVER (ORDER BY port_date DESC) AS rnk\r\n  FROM\r\n    port_data_ranked\r\n  WHERE\r\n    rnk = 1\r\n),\r\nprevious_data AS (\r\n  SELECT\r\n    port_date,\r\n    port_out_numbers\r\n  FROM\r\n    port_data_ranked\r\n  WHERE\r\n    rnk > 1\r\n),\r\nchange_per_cte AS (\r\n  SELECT\r\n    l.port_date,\r\n    l.port_out_numbers,\r\n    (l.port_out_numbers - AVG(p.port_out_numbers) OVER ()) / AVG(p.port_out_numbers) OVER () * 100 AS change_per\r\n  FROM\r\n    latest_data l\r\n    JOIN previous_data p ON l.rnk = 1\r\n)\r\nSELECT\r\n  MAX(port_date) AS latest_port_date,\r\n  MAX(port_out_numbers) AS latest_port_out_numbers,\r\n  MAX(change_per) AS change_per,\r\n  CASE WHEN ABS(MAX(change_per)) > {{threshold}} THEN 'SIGNIFICANT' ELSE 'NOT SIGNIFICANT' END AS change\r\nFROM\r\n  change_per_cte Reason: An Identifier is expected, got Token[value: {] instead. ",
          "Unable to retrieve lineage from query. Query: WITH PortNumbersWithAvg AS (\r\n  SELECT\r\n    circle_id,\r\n    operator_name,\r\n    port_date,\r\n    SUM(port_out_numbers) AS port_out_numbers,\r\n    AVG(SUM(port_out_numbers)) OVER (PARTITION BY circle_id, operator_name ORDER BY port_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS avg_port_out_numbers\r\n  FROM\r\n    {{#51}}\r\n  GROUP BY\r\n    circle_id,\r\n    operator_name,\r\n    port_date\r\n),\r\nLatestPortNumbers AS (\r\n  SELECT\r\n    circle_id,\r\n    operator_name,\r\n    port_date,\r\n    port_out_numbers,\r\n    LAG(port_out_numbers) OVER (PARTITION BY circle_id, operator_name ORDER BY port_date) AS prev_port_out_numbers,\r\n    ROW_NUMBER() OVER (PARTITION BY circle_id, operator_name ORDER BY port_date DESC) AS row_num\r\n  FROM\r\n    PortNumbersWithAvg\r\n)\r\nSELECT\r\n  circle_id,\r\n  operator_name,\r\n  port_date,\r\n  ROUND(((port_out_numbers - prev_port_out_numbers) / prev_port_out_numbers) * 100, 2) AS change_per,\r\n  CASE\r\n    WHEN ((port_out_numbers - prev_port_out_numbers) / prev_port_out_numbers) * 100 > 5 THEN 'SIGNIFICANT'\r\n    ELSE 'NOT SIGNIFICANT'\r\n  END AS change_significance\r\nFROM\r\n  LatestPortNumbers\r\nWHERE\r\n  row_num = 1 Reason: An Identifier is expected, got Token[value: {] instead. ",
          "Unable to retrieve lineage from query. Query: select operator_name, sum(port_out_numbers) from{{#35}} where circle_id = {{circle_id}} group by operator_name Reason: An Identifier is expected, got Token[value: {] instead. ",
          "Unable to retrieve lineage from query. Query: select circle_id, operator_name, port_date, sum(port_out_numbers) as port_out_numbers \nfrom{{#35-base-model-base-model-for-hypothesis-testing}} \ngroup by circle_id, operator_name , port_date\norder by port_date desc Reason: An Identifier is expected, got Token[value: {] instead. ",
          "Unable to retrieve lineage from query. Query: select circle_id, port_date, sum(port_out_numbers) as port_out_numbers from{{#35-base-model-base-model-for-hypothesis-testing}} group by circle_id, port_date order by circle_id,port_date desc Reason: An Identifier is expected, got Token[value: {] instead. ",
          "Unable to retrieve lineage from query. Query: WITH PortStats AS (\r\n  SELECT\r\n    circle_id,\r\n    port_date,\r\n    port_out_numbers,\r\n    ROW_NUMBER() OVER (PARTITION BY circle_id ORDER BY port_date DESC) as row_num\r\n  FROM\r\n    {{#45-h2-base-final-done}}\r\n)\r\n\r\nSELECT\r\n  circle_id,\r\n  MAX(CASE WHEN row_num = 1 THEN port_date END) as latest_port_date,\r\n  ROUND((MAX(CASE WHEN row_num = 1 THEN port_out_numbers END) - AVG(CASE WHEN row_num > 1 THEN port_out_numbers END)) / AVG(CASE WHEN row_num > 1 THEN port_out_numbers END) * 100, 2) as change_per,\r\n  CASE WHEN ROUND((MAX(CASE WHEN row_num = 1 THEN port_out_numbers END) - AVG(CASE WHEN row_num > 1 THEN port_out_numbers END)) / AVG(CASE WHEN row_num > 1 THEN port_out_numbers END) * 100, 2) > 5 THEN 'SIGNIFICANT' ELSE 'NOT SIGNIFICANT' END as change_significance\r\nFROM\r\n  PortStats\r\nGROUP BY\r\n  circle_id\r\n Reason: An Identifier is expected, got Token[value: {] instead. "
        ],
        "metabase-dashboard": [
          "Unable to retrieve dashboards. Reason: 404 Client Error: Not Found for url: <https://ip>:port/api/dashboard"
        ]
      },
      "start_time": "2024-02-14 21:19:50.688836 (2.36 seconds ago)",
      "running_time": "2.36 seconds"
    }
  },
  "sink": {
    "type": "datahub-rest",
    "report": {
      "total_records_written": 112,
      "records_written_per_second": 43,
      "warnings": [],
      "failures": [],
      "start_time": "2024-02-14 21:19:50.499048 (2.55 seconds ago)",
      "current_time": "2024-02-14 21:19:53.053049 (now)",
      "total_duration_in_seconds": 2.55,
      "gms_version": "v0.12.0",
      "pending_requests": 0
    }
  }
}```

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U01GZEETMEZ> FYI on this error.

Please try running ingestion with CLI version v0.12.1.5 – it has improvements to the metabase connector