Hi all,
I have a question regarding the S3 Data Lake connector and the path_spec configuration.
For example, given the following file structure
s3:///bucket-name/some/dir/001_adv.csv
s3:///bucket-name/some/dir/002_adv.csv
s3:///bucket-name/some/dir/003_adv.csv
s3:///bucket-name/some/dir/001_rsld.csv
s3:///bucket-name/some/dir/002_rsld.csv
I’d like to ingest all files ending with _adv as one table and all files ending with _rsld as another table. I don’t mind if I have to write a separate path_spec line for each table and if I have to/can provide the table names to be shown in DataHub manually. But I struggle to write a path_spec to get the desired result. s3://bucket-name/some/dir/*_{table}.csv does not work.
Is there any option to provide the table name (and ideally the browse path shown in DataHub) manually when ingesting groups of files via the S3 Data Lake connector? Or any other way to achive the desired result?
Changing the directory/file structure is unfortunately not an option.
Using DataHub 0.13.0