Troubleshooting ingestion issue for dataset with tables and sharded tables in Datahub v0.13

Original Slack Thread

Hello everyone (Datahub v0.13). I’m ingesting a series of datasets from BigQuery; their name is in the form published_something. I’m in trouble beacuse one of my published_something datasets is not able to be ingested for some reason, in the sense that the ingestion starts but then it remains running like forever. I’m copying the recipe I’m using to ingest every single published dataset, and I want to specify that it works fine for every other dataset:

source:
type: bigquery
config:
env: TEST
include_table_lineage: true
include_usage_statistics: true
include_tables: true
include_views: true
include_schema_metadata: true
profiling:
enabled: true
profile_table_level_only: true
stateful_ingestion:
enabled: true
credential:
project_id: extractor_project
private_key: ‘-----BEGIN PRIVATE KEY-----\n<covered>\n-----END PRIVATE KEY-----\n’
private_key_id: <covered>
client_email: <covered>
client_id: <covered>
project_id: test-project
dataset_pattern:
allow:
- test-project.published_dataset
deny:
ignoreCase: true
table_pattern:
allow:
- ‘.*’
deny:
ignoreCase: true

I want to specify that the dataset I’m not able to ingest contains tables and sharded tables, whereas all of the other dataset only contains tables and views. Any idea of the problem or any suggestion to ingest tables and sharded tables belonging to this dataset? I’ve tried to ingest one single table at a time, but it does not work.

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)