Issue with Okta connector in DataHub deployment upgrade

Original Slack Thread

Hello, I just upgraded my deployment from 0.9.6.1 to 0.10.5 and my okta ingestor started failing (as a cronjob) with error:

[2023-08-14 09:09:31,659] INFO     {datahub.ingestion.run.pipeline:230} - Source configured successfully.
[2023-08-14 09:09:31,660] INFO     {datahub.cli.ingest_cli:129} - Starting metadata ingestion
[2023-08-14 09:09:31,661] ERROR    {datahub.ingestion.run.pipeline:418} - Caught error
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 367, in run
    for wu in itertools.islice(
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 119, in auto_stale_entity_removal
    for wu in stream:
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 143, in auto_workunit_reporter
    for wu in stream:
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 208, in auto_browse_path_v2
    for urn, batch in _batch_workunits_by_urn(stream):
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 346, in _batch_workunits_by_urn
    for wu in stream:
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 156, in auto_materialize_referenced_tags
    for wu in stream:
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 70, in auto_status_aspect
    for wu in stream:
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/identity/okta.py", line 376, in get_workunits_internal
    for user_count, datahub_corp_user_snapshot in enumerate(
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/identity/okta.py", line 609, in _map_okta_users
    for okta_user in okta_users:
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/identity/okta.py", line 514, in _get_okta_users
    users, resp, err = event_loop.run_until_complete(
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 625, in run_until_complete
    self._check_running()
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 584, in _check_running
    raise RuntimeError('This event loop is already running')
RuntimeError: This event loop is already running```
Does anybody else experience similar problem with okta connector? Does it run fine for other people?

Hmm that’s strange - <@U01GZEETMEZ> could you help here?

It’s already been addressed https://github.com/datahub-project/datahub/pull/8637

Yup thanks Piotr! This is an issue specifically with okta, and will be fixed in the next cli release

We found another issue related to that and got a fix for it as well, will share more info with a PR later

<@U01GZEETMEZ> <@U027B7R23J4>
I have prepared a PR fixing code after previous PR… it turned out the okta ingestor would not finish correctly (in our case it runs for a very long time so I haven’t spotted it before).
https://github.com/datahub-project/datahub/pull/8675
Let me know what do you think about it

Thanks! I commented on the PR