Solution for 'maximum recursion depth exceeded' bug in Azure AD group ingestion

Original Slack Thread

Hi, I encountered a bug in Azure AD group ingestion related to ‘maximum recursion depth exceeded’. Does anyone have a solution? The error log is below. (version: 0.13.0)

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Which DataHub version are you using? (e.g. 0.12.0)
  2. Please post any relevant error logs on the thread!
Traceback (most recent call last):
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 404, in run
    for wu in itertools.islice(
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 126, in auto_stale_entity_removal
    for wu in stream:
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 150, in auto_workunit_reporter
    for wu in stream:
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 206, in re_emit_browse_path_v2
    for wu in stream:
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 247, in auto_browse_path_v2
    for urn, batch in _batch_workunits_by_urn(stream):
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 385, in _batch_workunits_by_urn
    for wu in stream:
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 163, in auto_materialize_referenced_tags
    for wu in stream:
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 70, in auto_status_aspect
    for wu in stream:
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/datahub/ingestion/source/identity/azure_ad.py", line 363, in get_workunits_internal
    self._add_group_members_to_group_membership(
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/datahub/ingestion/source/identity/azure_ad.py", line 415, in _add_group_members_to_group_membership
    self._add_group_members_to_group_membership(
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/datahub/ingestion/source/identity/azure_ad.py", line 415, in _add_group_members_to_group_membership
    self._add_group_members_to_group_membership(
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/datahub/ingestion/source/identity/azure_ad.py", line 415, in _add_group_members_to_group_membership
    self._add_group_members_to_group_membership(
  [Previous line repeated 944 more times]
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/datahub/ingestion/source/identity/azure_ad.py", line 401, in _add_group_members_to_group_membership
    for azure_ad_group_members in self._get_azure_ad_group_members(azure_ad_group):
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/datahub/ingestion/source/identity/azure_ad.py", line 488, in _get_azure_ad_group_members
    yield from self._get_azure_ad_data(kind=kind)
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/datahub/ingestion/source/identity/azure_ad.py", line 497, in _get_azure_ad_data
    response = requests.get(url, headers=headers)
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/requests/api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/requests/adapters.py", line 486, in send
    resp = conn.urlopen(
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/urllib3/connectionpool.py", line 793, in urlopen
    response = self._make_request(
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/urllib3/connectionpool.py", line 537, in _make_request
    response = conn.getresponse()
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/urllib3/connection.py", line 466, in getresponse
    httplib_response = super().getresponse()
  File "/usr/local/lib/python3.10/http/client.py", line 1375, in getresponse
    response.begin()
  File "/usr/local/lib/python3.10/http/client.py", line 337, in begin
    self.headers = self.msg = parse_headers(self.fp)
  File "/usr/local/lib/python3.10/http/client.py", line 236, in parse_headers
    return email.parser.Parser(_class=_class).parsestr(hstring)
  File "/usr/local/lib/python3.10/email/parser.py", line 67, in parsestr
    return self.parse(StringIO(text), headersonly=headersonly)
  File "/usr/local/lib/python3.10/email/parser.py", line 56, in parse
    feedparser.feed(data)
  File "/usr/local/lib/python3.10/email/feedparser.py", line 176, in feed
    self._call_parse()
  File "/usr/local/lib/python3.10/email/feedparser.py", line 180, in _call_parse
    self._parse()
  File "/usr/local/lib/python3.10/email/feedparser.py", line 295, in _parsegen
    if self._cur.get_content_maintype() == 'message':
  File "/usr/local/lib/python3.10/email/message.py", line 594, in get_content_maintype
    ctype = self.get_content_type()
  File "/usr/local/lib/python3.10/email/message.py", line 578, in get_content_type
    value = self.get('content-type', missing)
  File "/usr/local/lib/python3.10/email/message.py", line 471, in get
    return self.policy.header_fetch_parse(k, v)
  File "/usr/local/lib/python3.10/email/_policybase.py", line 316, in header_fetch_parse
    return self._sanitize_header(name, value)
  File "/usr/local/lib/python3.10/email/_policybase.py", line 287, in _sanitize_header
    if _has_surrogates(value):
  File "/usr/local/lib/python3.10/email/utils.py", line 57, in _has_surrogates
    s.encode()
RecursionError: maximum recursion depth exceeded while calling a Python object```
Traceback (most recent call last):
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/datahub/entrypoints.py", line 188, in main
    sys.exit(datahub(standalone_mode=False, **kwargs))
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 454, in wrapper
    raise e
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 403, in wrapper
    res = func(*args, **kwargs)
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 201, in run
    ret = loop.run_until_complete(run_ingestion_and_check_upgrade())
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 185, in run_ingestion_and_check_upgrade
    ret = await ingestion_future
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 139, in run_pipeline_to_completion
    raise e
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 131, in run_pipeline_to_completion
    pipeline.run()
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 404, in run
    for wu in itertools.islice(
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 126, in auto_stale_entity_removal
    for wu in stream:
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 150, in auto_workunit_reporter
    for wu in stream:
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 206, in re_emit_browse_path_v2
    for wu in stream:
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 247, in auto_browse_path_v2
    for urn, batch in _batch_workunits_by_urn(stream):
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 385, in _batch_workunits_by_urn
    for wu in stream:
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 163, in auto_materialize_referenced_tags
    for wu in stream:
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 70, in auto_status_aspect
    for wu in stream:
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/datahub/ingestion/source/identity/azure_ad.py", line 363, in get_workunits_internal
    self._add_group_members_to_group_membership(
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/datahub/ingestion/source/identity/azure_ad.py", line 415, in _add_group_members_to_group_membership
    self._add_group_members_to_group_membership(
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/datahub/ingestion/source/identity/azure_ad.py", line 415, in _add_group_members_to_group_membership
    self._add_group_members_to_group_membership(
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/datahub/ingestion/source/identity/azure_ad.py", line 415, in _add_group_members_to_group_membership
    self._add_group_members_to_group_membership(
  [Previous line repeated 944 more times]
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/datahub/ingestion/source/identity/azure_ad.py", line 401, in _add_group_members_to_group_membership
    for azure_ad_group_members in self._get_azure_ad_group_members(azure_ad_group):
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/datahub/ingestion/source/identity/azure_ad.py", line 488, in _get_azure_ad_group_members
    yield from self._get_azure_ad_data(kind=kind)
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/datahub/ingestion/source/identity/azure_ad.py", line 497, in _get_azure_ad_data
    response = requests.get(url, headers=headers)
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/requests/api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/requests/adapters.py", line 486, in send
    resp = conn.urlopen(
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/urllib3/connectionpool.py", line 793, in urlopen
    response = self._make_request(
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/urllib3/connectionpool.py", line 537, in _make_request
    response = conn.getresponse()
  File "/tmp/datahub/ingest/venv-azure-ad-79357ad588bf7a10/lib/python3.10/site-packages/urllib3/connection.py", line 466, in getresponse
    httplib_response = super().getresponse()
  File "/usr/local/lib/python3.10/http/client.py", line 1375, in getresponse
    response.begin()
  File "/usr/local/lib/python3.10/http/client.py", line 337, in begin
    self.headers = self.msg = parse_headers(self.fp)
  File "/usr/local/lib/python3.10/http/client.py", line 236, in parse_headers
    return email.parser.Parser(_class=_class).parsestr(hstring)
  File "/usr/local/lib/python3.10/email/parser.py", line 67, in parsestr
    return self.parse(StringIO(text), headersonly=headersonly)
  File "/usr/local/lib/python3.10/email/parser.py", line 56, in parse
    feedparser.feed(data)
  File "/usr/local/lib/python3.10/email/feedparser.py", line 176, in feed
    self._call_parse()
  File "/usr/local/lib/python3.10/email/feedparser.py", line 180, in _call_parse
    self._parse()
  File "/usr/local/lib/python3.10/email/feedparser.py", line 295, in _parsegen
    if self._cur.get_content_maintype() == 'message':
  File "/usr/local/lib/python3.10/email/message.py", line 594, in get_content_maintype
    ctype = self.get_content_type()
  File "/usr/local/lib/python3.10/email/message.py", line 578, in get_content_type
    value = self.get('content-type', missing)
  File "/usr/local/lib/python3.10/email/message.py", line 471, in get
    return self.policy.header_fetch_parse(k, v)
  File "/usr/local/lib/python3.10/email/_policybase.py", line 316, in header_fetch_parse
    return self._sanitize_header(name, value)
  File "/usr/local/lib/python3.10/email/_policybase.py", line 287, in _sanitize_header
    if _has_surrogates(value):
  File "/usr/local/lib/python3.10/email/utils.py", line 57, in _has_surrogates
    s.encode()
RecursionError: maximum recursion depth exceeded while calling a Python object```