Hi all, my team and I are currently exploring the possibility of integrating DataHub
we are using S3 data lake as the source and we have a bunch of JSON data partitioned by clients and date, i want my data hub to show the folder structure rather than capturing the metadata by scanning all the JSON data , can we know how to do that.
problem is that dataHub scans complete JSON data present and not some sample and scanning all JSON documents take time since data size in TB’s
CLI - ingestion
DataHub version 0.13.1.3
data source(s) s3 data lake