Understanding Ingestion Sources Interaction and DBT Source Integration

Original Slack Thread

Hi, I’d like to check my understanding of the way multiple ingestion sources interact, and the DBT source specifically.

As I understand it, the smallest atomic unit in datahub is the aspect, so if you have two sources writing the same aspect, the most recent update will win.

If that is the case, I’m assuming that it’s a bad idea to have two sources scheduled to ingest the same URN if they have aspects in common, because the two sources would fight each other over the value of the aspect, rather than datahub combining both versions in some way e.g. a DatasetProperties containing some merge of the DatasetProperties from both sources. Is this correct?

In the case of DBT, I initially thought that when you set a target platform, I would be able to set up another ingestion specific to the target platform in parallel with the DBT one, so that essentially one source would enrich data from the other. Now I’m thinking this is not how it’s supposed to be used, and I should instead restrict the sources so they are not operating on the same metadata.

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

I found this thread on the topic, but it hasn’t been updated in a while: https://datahubspace.slack.com/archives/CUMUWQU66/p1625662683499300

<@U066YNLNRN3> We have dbt running against Athena and we do have separate ingestion for each platform. Target alone wouldn’t be able to connect to a different platform (credentials, paths, etc…), would it?