Hi everyone. My team recently upgraded from datahub v0.10.4 to v0.12.1 and have been liking the new features so far, however we are having a bit of trouble with the browsePathsV2 aspects. It seems the browsePathsV2 aspects were not backfilled for most of our data sources. Actually only datasets for one of our custom connectors had their browsePathsV2s filled in. Not entirely sure why that’s the case as all of the upgrade jobs finished successfully, but figured they’d get filled in when our scheduled ingestion runs ran again.
However, after some waiting and additional testing it seems these aspects are not getting filled in. The ingestion pipeline finishes successfully without any errors or warnings, but no browsePathsV2 aspects are emitted. I’ve made sure the CLI version matches as well (v0.12.1
). Here is the summary output from an example mssql run:
{'cli_version': '0.12.1.0',
'cli_entry_location': '/usr/local/lib/python3.9/site-packages/datahub/__init__.py',
'py_version': '3.9.17 (main, Jun 13 2023, 16:05:09) \n[GCC 8.3.0]',
'py_exec_path': '/usr/local/bin/python',
'os_details': 'Linux-4.18.0-425.19.2.el8_7.x86_64-x86_64-with-glibc2.28',
'peak_memory_usage': '102.59 MB',
'mem_info': '102.59 MB',
'peak_disk_usage': '21.12 GB',
'disk_info': {'total': '321.97 GB', 'used': '21.12 GB', 'free': '300.85 GB'}}
Source (mssql) report:
{'events_produced': 276,
'events_produced_per_sec': 69,
'entities': {'container': ['<example container urns>',
'... sampled of 15 total elements'],
'dataset': ['<example dataset urns>',
'... sampled of 64 total elements']},
'aspects': {'container': {'containerProperties': 15, 'status': 15, 'dataPlatformInstance': 15, 'subTypes': 15, 'container': 14},
'dataset': {'container': 64, 'status': 64, 'datasetProperties': 64, 'schemaMetadata': 64, 'subTypes': 64, 'viewProperties': 10}},
'warnings': {},
'failures': {},
'soft_deleted_stale_entities': [],
'tables_scanned': 54,
'views_scanned': 10,
'entities_profiled': 0,
'filtered': [],
'start_time': '2024-02-05 16:38:07.937584 (4 seconds ago)',
'running_time': '4 seconds'}
Sink (datahub-kafka) report:
{'total_records_written': 276,
'records_written_per_second': 63,
'warnings': [],
'failures': [],
'start_time': '2024-02-05 16:38:07.560390 (4.38 seconds ago)',
'current_time': '2024-02-05 16:38:11.937851 (now)',
'total_duration_in_seconds': 4.38}```
I'm not entirely sure why this is the case and will continue to update this thread as I research and debug more, but wanted to make a post in case anyone else has encountered a similar issue. Is it the expected behavior that browsePathsV2 aspects should be created when they don't exist for existing entities during ingestion? Is there anything I may be forgetting which would impact the behavior here? Appreciate any and all help. Thanks!