Troubleshooting DataHub Ingestion from Metabase and BigQuery

Original Slack Thread

Hello community!

I recently started using DataHub (self-hosted) and created ingestion (via UI) from Metabase and BigQuery. They kind of work but I’d like to clarify a few bits - I researched a lot but couldn’t find an answer, hence, posting here. Please refer to the picture:

1. Metabase:
• It ingested properly my objects, however, everything is sitting under the folder “Default”, which doesn’t exist in my Metabase. I expected that DataHub would replicate the same structure from my collections. Is it a fair expectation? Or, otherwise, how can I manage this “Default” folder in any way to get the structure I need? I couldn’t find anything in the UI.
2. BigQuery:
• Similarly, I’ve got my data correctly ingested, however, you can see that DataHub split in 2 different folders tables from BigQuery that are from the same project, and the same dataset. I didn’t get the reason why as I understood it should be all together.
Any advice?
Thank you.attachment

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U06TM7M40RK> Can you help with the questions in this thread?


:information_source: We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a :thumbsup: or :thumbsdown: so other folks in the Community know if you’re still looking for help on the topic!

Hi there! Let’s address your questions one by one.

1. Metabase:
The issue with everything being placed under a “Default” folder in DataHub after ingestion from Metabase is a known behavior. This happens because the ingestion process defaults to a “Default” folder when it cannot determine a more appropriate folder structure.

Solution:

  • Update CLI Version: Ensure you are using a recent version of the DataHub CLI that supports the browsePathsV2 aspect. According to <Slack Slack conversation>, the browsePathsV2 aspect is supported from CLI version v0.10.5 onwards.
  • Ingestion Configuration: You might need to update your ingestion configuration to ensure it correctly reflects the structure of your Metabase collections. If you have recently updated your CLI and still face issues, consider reinstalling the Metabase ingestion on a clean state.

For more details, you can refer to the <https://datahubproject.io/docs/browsev2/browse-paths-v2|DataHub documentation on Browse Paths V2>.

2. BigQuery:
The issue with tables from the same project and dataset being split into different folders might be due to the ingestion configuration or how the URNs are being generated.

Solution:

If you need more detailed steps or further assistance, please let me know!

Sources:

I got the same issue with you. I dont have the answer for it yet.