"Ingesting BigTable Data into Datahub and Exploring Potential Implementation"

Original Slack Thread

Hello friends, I’m wondering if anyone has been ingesting BigTable data into Datahub? Searching github issues, feature requests, and this slack doesn’t turn up much.

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

Hi Aaron!

At this time there are no formal plans to introduce BigTable support, but we’d absolutely be open to it. Do you know if BigTable infers & stores the structure of it’s collections in an easily accessible place? This would make developing the connector a bit easier. We do support MongoDB (another NoSQL document store connector) which may be a good reference example for anyone seeking to build this out.

Cheers

Thanks, John! The python API makes it really easy to iterate through down to column families for tables. Below that level of granularity it’s harder because each row can have different values in the same column family.

tables = instance.list_tables()
for tbl in tables:
...   column_families = tbl.list_column_families()
...   for column_family_name, gc_rule in sorted(column_families.items()):
...     print("Table: ", tbl.table_id, "  Column Family:", column_family_name)```

In practice I think ingestion column families would be very valuable