Resolving Looker/LookML Ingestion with Shared View Files and Models

Original Slack Thread

Hey team. I have a question with Looker/LookML ingestion. I am using 0.14.1.
We have view files that share model files. So the same view file can be used when run from 2 different models( let’s say model_1 and model_2 ). Also, these models have the same connection as well. And my view_naming_pattern is set to "{project}.{model}.view.{name}" .
So I expect the view file to be ingested twice for each model. So the URN pattern would be {project}.{model_1}.view.{name} and {project}.{model_2}.view.{name}However, I have noticed the view files ingested only once. So only {project}.{model_1}.view.{name} . I think it is due to the fact that they share the same connection. I do not see this when the models have different connections. Then it gets ingested as expected. Do you have any tips or solutions to this?

Hey there! :wave:

<@U06TM7M40RK> Can you help with the questions in this thread?

:information_source: Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!

It seems likely that the issue you’re experiencing with Looker/LookML ingestion, where view files are not being ingested multiple times for different models with the same connection, might be related to how DataHub handles view connections and naming patterns.

Based on the <https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py|LookML source code>, there is logic that maps views to connections and models. If a view has already been discovered and emitted using a particular connection, it might not be re-emitted for another model with the same connection. This could explain why you’re seeing the view file ingested only once.

Additionally, the <Looker | DataHub details> for LookML ingestion mention patterns for naming views, which you have set to "{project}.{model}.view.{name}". However, if the ingestion process is not distinguishing between models with the same connection, this pattern might not be applied as expected.

To address this, you might consider the following:

  1. Check Configuration: Ensure that your configuration explicitly allows for the differentiation of views by model, even when connections are the same. This might involve adjusting the view_naming_pattern or other related settings.

  2. Review Source Code: If you’re comfortable with Python, reviewing the LookML source code might provide insights into how views are processed and whether there are any conditions that prevent multiple ingestions for the same connection.

  3. Community Support: Consider reaching out to the DataHub community or checking if there are any recent updates or patches that address this specific issue.

Would you like more detailed guidance on any of these steps, or is there a specific aspect of the ingestion process you’d like to explore further?

Sources:

<@U06TM7M40RK> Is there a pattern that you suggest I can use in view_naming_pattern to solve this?

It could be possible that adjusting the view_naming_pattern to include more distinguishing factors might help in differentiating the views even when they share the same connection. However, I’m not entirely certain if this will fully resolve the issue, as it might depend on how DataHub processes these patterns internally.

Based on the <Looker | DataHub configuration details>, the view_naming_pattern allows for the use of variables such as platform, env, project, model, name, file_path, and folder_path. Given your scenario, you might want to ensure that both the model and project are included in the pattern to differentiate views that are used across different models.

Here’s a suggestion for a pattern you could try:

view_naming_pattern: "{project}.{model}.view.{name}"

This pattern already includes the model, which should theoretically differentiate views across different models. However, if the issue persists, it might be worth exploring whether additional context, such as file_path or folder_path, could be included to further distinguish the views.

If this approach doesn’t resolve the issue, it might be beneficial to look into whether there are any specific configurations or updates in the DataHub ingestion process that address this scenario.

Would you like more detailed guidance on how to implement this pattern, or is there another aspect of the configuration you’d like to explore?

Sources:

  • <Looker | DataHub configuration details> :+1: 0 button :-1: 0 button :bulb: Hint: Mention @runllm in the thread for followups.

<@U07FV3UHE4D> Could you maybe help me with the above question ? :slightly_smiling_face:

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><