Previewing Projects and Datasets for BigQuery Recipe Ingestion

Original Slack Thread

Hi team is there a way to preview the list of projects and dataset that will be ingested by a bigquery recipe after tweaking the allow and deny field

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U06TM7M40RK> Can you help with the questions in this thread?


:information_source: We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a :thumbsup: or :thumbsdown: so other folks in the Community know if you’re still looking for help on the topic!

To preview the list of projects and datasets that will be ingested by a BigQuery recipe after tweaking the allow and deny fields, you can use the --preview option of the datahub ingest command. This option performs all the ingestion steps but limits processing to a subset of the data, allowing you to quickly verify the configuration without ingesting all the data.

Here’s how you can do it using the CLI:

  1. Create or Update Your Recipe: Ensure your BigQuery recipe is configured with the appropriate project_id_pattern and dataset_pattern allow and deny fields.

  2. Run the Ingestion with Preview:
    Use the --preview option to perform a limited ingestion run. By default, this will process only the first 10 workunits.

    datahub ingest -c path/to/your/recipe.yaml --preview
    
  3. Specify the Number of Workunits (Optional):
    If you want to preview more than 10 workunits, you can use the --preview-workunits option to specify the number of workunits to process.

    datahub ingest -c path/to/your/recipe.yaml --preview --preview-workunits=20
    

This will allow you to see which projects and datasets are being processed according to your allow and deny patterns without actually ingesting all the data into DataHub.

Would you like more detailed instructions or have any specific questions about the configuration?

Sources: