Error in Datahub Ingestion Recipe Due to None Values in Fields

Original Slack Thread

I am running this locally on my laptop at the moment … This does run but it throws an error:

Obtaining venv creation lock...
Acquired venv creation lock
venv is already set up
venv setup time = 0 sec
This version of datahub supports report-to functionality
+ exec datahub ingest run -c /tmp/datahub/ingest/383a4ddd-78eb-4902-926d-ae026bdae430/recipe.yml --report-to /tmp/datahub/ingest/383a4ddd-78eb-4902-926d-ae026bdae430/ingestion_report.json
[2023-11-15 19:03:04,425] INFO     {datahub.cli.ingest_cli:147} - DataHub CLI version: 0.12.0
[2023-11-15 19:03:04,443] INFO     {} - Sink configured successfully. DataHubRestEmitter: configured to talk to <http://datahub-gms:8080>
[2023-11-15 19:03:04,600] INFO     {} - Source configured successfully.
[2023-11-15 19:03:04,600] INFO     {datahub.cli.ingest_cli:128} - Starting metadata ingestion
[2023-11-15 19:05:10,134] INFO     {datahub.ingestion.reporting.file_reporter:52} - Wrote UNKNOWN report successfully to &lt;_io.TextIOWrapper name='/tmp/datahub/ingest/383a4ddd-78eb-4902-926d-ae026bdae430/ingestion_report.json' mode='w' encoding='UTF-8'&gt;
[2023-11-15 19:05:10,134] INFO     {datahub.cli.ingest_cli:133} - Source (starburst-trino-usage) report:
{'events_produced': 0,
 'events_produced_per_sec': 0,
 'entities': {},
 'aspects': {},
 'warnings': {},
 'failures': {},
 'start_time': '2023-11-15 19:03:04.600756 (2 minutes and 5.53 seconds ago)',
 'running_time': '2 minutes and 5.53 seconds'}
[2023-11-15 19:05:10,135] INFO     {datahub.cli.ingest_cli:136} - Sink (datahub-rest) report:
{'total_records_written': 0,
 'records_written_per_second': 0,
 'warnings': [],
 'failures': [],
 'start_time': '2023-11-15 19:03:04.441967 (2 minutes and 5.69 seconds ago)',
 'current_time': '2023-11-15 19:05:10.135108 (now)',
 'total_duration_in_seconds': 125.69,
 'gms_version': 'v0.12.0',
 'pending_requests': 0}
2 validation errors for TrinoJoinedAccessEvent
  none is not an allowed value (type=type_error.none.not_allowed)
  none is not an allowed value (type=type_error.none.not_allowed)```

There are two keys called create_timer and end_time which cannot receive the value None, as specified in the ingestion recipe validator.

Hello <@U05AW4DVBAA> I do not know where these values are generated from … I see a lot of event being generated and they do seem to have the fields and the values:

DEBUG {datahub.ingestion.source.usage.starburst_trino_usage:174} - event_dict: {'"'"'usr'"'"': '"'"'<|>'"'"', '"'"'query'"'"': '"'"'SELECT version()'"'"', '"'"'catalog'"'"': '"'"'bdc_redshift'"'"', '"'"'schema'"'"': None, '"'"'query_type'"'"': '"'"'SELECT'"'"', '"'"'accessed_metadata'"'"': '"'"'[]'"'"', '"'"'create_time'"'"': datetime.datetime(2023, 11, 15, 0, 10, 47, 523000, tzinfo=zoneinfo.ZoneInfo(key='"'"'UTC'"'"')), '"'"'end_time'"'"': datetime.datetime(2023, 11, 15, 0, 10, 47, 654000, tzinfo=zoneinfo.ZoneInfo(key='"'"'UTC'"'"'))}\\n

These fields are the ingestion configuration(
They are validated in here:|[…]ion/src/datahub/ingestion/source/usage/

As they do not have a default value, they are considered mandatory. That’s why your recipe is failing.

ok … I should be setting the start_time and end_time parameters in the config … I thought they had default values … So did not specify a value for the config

Hey <@U066AU9EE1F> - did that resolve the issue?

I couldnt get it working … Not sure where the issue is … But I can try later …