Debugging v.0.0.11 Actions Release Ingestion Failures

Original Slack Thread

Hi everyone! I’m having problems with v.0.0.11 release of actions. Ingestion immedietly fails without meaningful errors (all of them in thread), can somebody suggest the way to debug this? Using helm chart 0.3.1, infra components deployed separetly.

Log of ingestion task in ui

Execution finished with errors.
{'exec_id': '446a5a3a-66fd-4f72-ac48-aeec0596589e',
 'infos': ['2023-10-18 14:17:16.366391 INFO: Starting execution for task with name=RUN_INGEST',
           '2023-10-18 14:17:16.371356 INFO: Caught exception EXECUTING task_id=446a5a3a-66fd-4f72-ac48-aeec0596589e, name=RUN_INGEST, '
           'stacktrace=Traceback (most recent call last):\n'
           '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 122, in execute_task\n'
           '    task_event_loop.run_until_complete(task_future)\n'
           '  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete\n'
           '    return future.result()\n'
           '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 82, in execute\n'
           '    full_log_file = open(f"{self.config.log_dir}/ingestion-{exec_id}.txt", "w")\n'
           "FileNotFoundError: [Errno 2] No such file or directory: '/tmp/datahub/logs/ingestion-446a5a3a-66fd-4f72-ac48-aeec0596589e.txt'\n"],
 'errors': []}

~~~~ Ingestion Logs ~~~~```

In actions container
2023/10/18 13:48:49 Received 200 from <http://datahub-datahub-gms:8080/health>
touch: cannot touch ‘/tmp/datahub/logs/actions/actions.out’: No such file or directory
No user action configurations found. Not starting user actions.
[2023-10-18 13:48:50,136] INFO {datahub_actions.cli.actions:76} - DataHub Actions version: unavailable (installed editable via git)
[2023-10-18 13:48:50,301] WARNING {datahub_actions.cli.actions:103} - Skipping pipeline datahub_slack_action as it is not enabled
[2023-10-18 13:48:50,304] WARNING {datahub_actions.cli.actions:103} - Skipping pipeline datahub_teams_action as it is not enabled
[2023-10-18 13:48:50,305] INFO {datahub_actions.cli.actions:119} - Action Pipeline with name ‘ingestion_executor’ is now running.
[2023-10-18 14:17:16,324] DEBUG {acryl.executor.dispatcher.default_dispatcher:57} - Started thread <Thread(Thread-2 (dispatch_async), started 139864443963136)> for 446a5a3a-66fd-4f72-ac48-aeec0596589e
[2023-10-18 14:17:16,367] DEBUG {acryl.executor.execution.default_executor:120} - Task for 446a5a3a-66fd-4f72-ac48-aeec0596589e created
[2023-10-18 14:17:16,371] DEBUG {acryl.executor.execution.default_executor:136} - Cleaned up task for 446a5a3a-66fd-4f72-ac48-aeec0596589e

Can you try to kill the action pod and recreate it?

Killed pod with actions, sent ingestion request after new one boot up, same error message

I reinstalled this release multiple times, same error on each ingest (tried pg and ch)

To clarify it is deployment from scratch. Previously used up to 0.9.5 datahub in other company, no issues with ingestion

Oh, and gms logs

2023-10-18 14:43:13,877 [pool-11-thread-8] INFO c.l.m.filter.RestliLoggingFilter:55 - POST /aspects?action=ingestProposal - ingestProposal - 200 - 17ms
2023-10-18 14:43:13,886 [qtp2006034581-237] INFO c.l.m.r.entity.AspectResource:186 - INGEST PROPOSAL proposal: {aspectName=dataHubExecutionRequestResult, entityKeyAspect={contentType=application/json, value=ByteString(length=46,bytes=7b226964...3030227d)}, entityType=dataHubExecutionRequest, aspect={contentType=application/json, value=ByteString(length=1311,bytes=7b227374...2033307d)}, changeType=UPSERT}
2023-10-18 14:43:13,906 [pool-11-thread-9] INFO c.l.m.filter.RestliLoggingFilter:55 - POST /aspects?action=ingestProposal - ingestProposal - 200 - 20ms
2023-10-18 14:43:13,999 [I/O dispatcher 1] INFO c.l.m.s.e.update.BulkListener:47 - Successfully fed bulk request. Number of events: 7 Took time ms: -1```

Is this all the log which you posted above from the ingestion?

Well, technically it’s from third ingestion. Should I provide all three logs (ui, actions and gms) for same ingestion at once?

UI

Execution finished with errors.
{'exec_id': 'def02489-f341-445e-8ebb-84d1c90e0954',
 'infos': ['2023-10-18 16:14:18.096730 INFO: Starting execution for task with name=RUN_INGEST',
           '2023-10-18 16:14:18.100375 INFO: Caught exception EXECUTING task_id=def02489-f341-445e-8ebb-84d1c90e0954, name=RUN_INGEST, '
           'stacktrace=Traceback (most recent call last):\n'
           '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 122, in execute_task\n'
           '    task_event_loop.run_until_complete(task_future)\n'
           '  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete\n'
           '    return future.result()\n'
           '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 82, in execute\n'
           '    full_log_file = open(f"{self.config.log_dir}/ingestion-{exec_id}.txt", "w")\n'
           "FileNotFoundError: [Errno 2] No such file or directory: '/tmp/datahub/logs/ingestion-def02489-f341-445e-8ebb-84d1c90e0954.txt'\n"],
 'errors': []}

~~~~ Ingestion Logs ~~~~```

actions

touch: cannot touch '/tmp/datahub/logs/actions/actions.out': No such file or directory
No user action configurations found. Not starting user actions.
[2023-10-18 14:31:04,598] INFO {datahub_actions.cli.actions:76} - DataHub Actions version: unavailable (installed editable via git)
[2023-10-18 14:31:04,759] WARNING {datahub_actions.cli.actions:103} - Skipping pipeline datahub_slack_action as it is not enabled
[2023-10-18 14:31:04,763] WARNING {datahub_actions.cli.actions:103} - Skipping pipeline datahub_teams_action as it is not enabled
[2023-10-18 14:31:04,763] INFO {datahub_actions.cli.actions:119} - Action Pipeline with name 'ingestion_executor' is now running.
[2023-10-18 14:31:20,733] DEBUG {acryl.executor.dispatcher.default_dispatcher:57} - Started thread &lt;Thread(Thread-2 (dispatch_async), started 139744470099712)&gt; for e355134f-efc5-4464-be71-818a28ab61b7
[2023-10-18 14:31:20,753] DEBUG {acryl.executor.execution.default_executor:120} - Task for e355134f-efc5-4464-be71-818a28ab61b7 created
[2023-10-18 14:31:20,757] DEBUG {acryl.executor.execution.default_executor:136} - Cleaned up task for e355134f-efc5-4464-be71-818a28ab61b7
[2023-10-18 14:43:13,854] DEBUG {acryl.executor.dispatcher.default_dispatcher:57} - Started thread &lt;Thread(Thread-3 (dispatch_async), started 139744470099712)&gt; for 3c328afa-f694-46a2-8a61-e831af2be900
[2023-10-18 14:43:13,879] DEBUG {acryl.executor.execution.default_executor:120} - Task for 3c328afa-f694-46a2-8a61-e831af2be900 created
[2023-10-18 14:43:13,882] DEBUG {acryl.executor.execution.default_executor:136} - Cleaned up task for 3c328afa-f694-46a2-8a61-e831af2be900```

gms

2023-10-18 14:43:13,999 [I/O dispatcher 1] INFO c.l.m.s.e.update.BulkListener:47 - Successfully fed bulk request. Number of events: 7 Took time ms: -1
2023-10-18 16:14:11,329 [ForkJoinPool.commonPool-worker-3] ERROR c.l.m.r.c.MostPopularSource:91 - Failed to determine whether DataHub usage index exists
2023-10-18 16:14:18,079 [qtp2006034581-346] INFO c.l.m.r.entity.AspectResource:186 - INGEST PROPOSAL proposal: {aspectName=dataHubExecutionRequestResult, entityKeyAspect={contentType=application/json, value=ByteString(length=46,bytes=7b226964...3534227d)}, entityType=dataHubExecutionRequest, aspect={contentType=application/json, value=ByteString(length=51,bytes=7b227374...3037327d)}, changeType=UPSERT}
2023-10-18 16:14:18,094 [pool-11-thread-1] INFO c.l.m.filter.RestliLoggingFilter:55 - POST /aspects?action=ingestProposal - ingestProposal - 200 - 15ms
2023-10-18 16:14:18,103 [qtp2006034581-345] INFO c.l.m.r.entity.AspectResource:186 - INGEST PROPOSAL proposal: {aspectName=dataHubExecutionRequestResult, entityKeyAspect={contentType=application/json, value=ByteString(length=46,bytes=7b226964...3534227d)}, entityType=dataHubExecutionRequest, aspect={contentType=application/json, value=ByteString(length=1311,bytes=7b227374...2032397d)}, changeType=UPSERT}
2023-10-18 16:14:18,123 [pool-11-thread-2] INFO c.l.m.filter.RestliLoggingFilter:55 - POST /aspects?action=ingestProposal - ingestProposal - 200 - 20ms
2023-10-18 16:14:18,548 [I/O dispatcher 1] INFO c.l.m.s.e.update.BulkListener:47 - Successfully fed bulk request. Number of events: 7 Took time ms: -1```

Well, no answer for me sadly. Will try to use image of actions 0.0.12 and 0.0.13, hopefully it will fix the issue with execution or at least will provide meaningful logs

Normally the managed ingestion should work and have these directories, so I don’t know what is going on.

I’m not really familiar how it is work, so I need to ask internally if they have any idea.

I tried 0.0.12 image, exactly same problem. Maybe other component at fault here? Will post new thread in <#C029A3M079U|troubleshoot>