Hi, I’m using datahub v12 and am having trouble getting my <https://datahubproject.io/docs/generated/ingestion/sources/file-based-lineage/|file based lineage> ingest jobs working. They are failing with a 401 unauthorized error when trying to hit GMS. Is there a way to pass credentials for the job to use? I haven’t been able to find any documentation showing how to do this when using a kafka sink
Hey there! Make sure your message includes the following information if relevant, so we can help more effectively!
- Which DataHub version are you using? (e.g. 0.12.0)
- Please post any relevant error logs on the thread!
Recipe:
datahub_api:
server: "<http://datahub-datahub-gms:8080>"
source:
type: datahub-lineage-file
config:
file: '<http://file-host.svc.cluster.local/lineage.yaml>'
preserve_upstream: true
sink:
type: datahub-kafka
config:
connection:
bootstrap: ...
schema_registry_url: ...```
Logs:
[2024-02-09 15:01:10,513] ERROR {datahub.entrypoints:201} - Command failed: 401 Client Error: Unauthorized for url: <http://datahub-datahub-gms:8080/entitiesV2/urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Asnowflake%2Cdata-warehouse.fivetran.prod_realtime_scriptdash_aurora_public.deliveries%2CPROD%29?aspects=List(upstreamLineage)>
2024-02-09 08:01:10.516
Traceback (most recent call last):
2024-02-09 08:01:10.516
File "/datahub-ingestion/.local/lib/python3.10/site-packages/datahub/entrypoints.py", line 188, in main
2024-02-09 08:01:10.516
sys.exit(datahub(standalone_mode=False, **kwargs))
2024-02-09 08:01:10.516
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
2024-02-09 08:01:10.516
return self.main(*args, **kwargs)
2024-02-09 08:01:10.516
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1078, in main
2024-02-09 08:01:10.516
rv = self.invoke(ctx)
2024-02-09 08:01:10.516
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
2024-02-09 08:01:10.516
return _process_result(sub_ctx.command.invoke(sub_ctx))
2024-02-09 08:01:10.516
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
2024-02-09 08:01:10.516
return _process_result(sub_ctx.command.invoke(sub_ctx))
2024-02-09 08:01:10.516
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
2024-02-09 08:01:10.516
return ctx.invoke(self.callback, **ctx.params)
2024-02-09 08:01:10.516
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 783, in invoke
2024-02-09 08:01:10.516
return __callback(*args, **kwargs)
2024-02-09 08:01:10.516
File "/datahub-ingestion/.local/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 454, in wrapper
2024-02-09 08:01:10.516
raise e
2024-02-09 08:01:10.516
File "/datahub-ingestion/.local/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 403, in wrapper
2024-02-09 08:01:10.516
res = func(*args, **kwargs)
2024-02-09 08:01:10.516
File "/datahub-ingestion/.local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 201, in run
2024-02-09 08:01:10.516
ret = loop.run_until_complete(run_ingestion_and_check_upgrade())
2024-02-09 08:01:10.516
File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
2024-02-09 08:01:10.516
return future.result()
2024-02-09 08:01:10.516
File "/datahub-ingestion/.local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 185, in run_ingestion_and_check_upgrade
2024-02-09 08:01:10.516
ret = await ingestion_future
2024-02-09 08:01:10.516
File "/datahub-ingestion/.local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 139, in run_pipeline_to_completion
2024-02-09 08:01:10.516
raise e
2024-02-09 08:01:10.516
File "/datahub-ingestion/.local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 131, in run_pipeline_to_completion
2024-02-09 08:01:10.516
pipeline.run()
2024-02-09 08:01:10.516
File "/datahub-ingestion/.local/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 404, in run
2024-02-09 08:01:10.516
for wu in itertools.islice(
2024-02-09 08:01:10.516
File "/datahub-ingestion/.local/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 150, in auto_workunit_reporter
2024-02-09 08:01:10.516
for wu in stream:
2024-02-09 08:01:10.516
File "/datahub-ingestion/.local/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 70, in auto_status_aspect
2024-02-09 08:01:10.516
for wu in stream:
2024-02-09 08:01:10.516
File "/datahub-ingestion/.local/lib/python3.10/site-packages/datahub/ingestion/source/metadata/lineage.py", line 166, in get_workunits_internal
2024-02-09 08:01:10.516
mcp = _get_lineage_mcp(entity_node, self.config.preserve_upstream)
2024-02-09 08:01:10.516
File "/datahub-ingestion/.local/lib/python3.10/site-packages/datahub/ingestion/source/metadata/lineage.py", line 208, in _get_lineage_mcp
2024-02-09 08:01:10.516
old_upstream_lineage = get_aspects_for_entity(
2024-02-09 08:01:10.516
File "/datahub-ingestion/.local/lib/python3.10/site-packages/datahub/cli/cli_utils.py", line 528, in get_aspects_for_entity
2024-02-09 08:01:10.516
entity_response = get_entity(
2024-02-09 08:01:10.516
File "/datahub-ingestion/.local/lib/python3.10/site-packages/datahub/cli/cli_utils.py", line 449, in get_entity
2024-02-09 08:01:10.516
response.raise_for_status()
2024-02-09 08:01:10.516
File "/usr/local/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
2024-02-09 08:01:10.516
raise HTTPError(http_error_msg, response=self)
2024-02-09 08:01:10.516
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: <http://datahub-datahub-gms:8080/entitiesV2/urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Asnowflake%2Cdata-warehouse.fivetran.prod_realtime_scriptdash_aurora_public.deliveries%2CPROD%29?aspects=List(upstreamLineage)>```
I discovered that setting preserve_upstream: false
in the recipe stops the requests to GMS, but I don’t want to hard replace upstream data for a given entity if I don’t have to