Error in Datahub Ingestion Recipe Due to None Values in Fields

Original Slack Thread

I am running this locally on my laptop at the moment … This does run but it throws an error:

Obtaining venv creation lock...
Acquired venv creation lock
venv is already set up
venv setup time = 0 sec
This version of datahub supports report-to functionality
+ exec datahub ingest run -c /tmp/datahub/ingest/383a4ddd-78eb-4902-926d-ae026bdae430/recipe.yml --report-to /tmp/datahub/ingest/383a4ddd-78eb-4902-926d-ae026bdae430/ingestion_report.json
[2023-11-15 19:03:04,425] INFO     {datahub.cli.ingest_cli:147} - DataHub CLI version: 0.12.0
[2023-11-15 19:03:04,443] INFO     {} - Sink configured successfully. DataHubRestEmitter: configured to talk to <http://datahub-gms:8080>
[2023-11-15 19:03:04,600] INFO     {} - Source configured successfully.
[2023-11-15 19:03:04,600] INFO     {datahub.cli.ingest_cli:128} - Starting metadata ingestion
[2023-11-15 19:05:10,134] INFO     {datahub.ingestion.reporting.file_reporter:52} - Wrote UNKNOWN report successfully to &lt;_io.TextIOWrapper name='/tmp/datahub/ingest/383a4ddd-78eb-4902-926d-ae026bdae430/ingestion_report.json' mode='w' encoding='UTF-8'&gt;
[2023-11-15 19:05:10,134] INFO     {datahub.cli.ingest_cli:133} - Source (starburst-trino-usage) report:
{'events_produced': 0,
 'events_produced_per_sec': 0,
 'entities': {},
 'aspects': {},
 'warnings': {},
 'failures': {},
 'start_time': '2023-11-15 19:03:04.600756 (2 minutes and 5.53 seconds ago)',
 'running_time': '2 minutes and 5.53 seconds'}
[2023-11-15 19:05:10,135] INFO     {datahub.cli.ingest_cli:136} - Sink (datahub-rest) report:
{'total_records_written': 0,
 'records_written_per_second': 0,
 'warnings': [],
 'failures': [],
 'start_time': '2023-11-15 19:03:04.441967 (2 minutes and 5.69 seconds ago)',
 'current_time': '2023-11-15 19:05:10.135108 (now)',
 'total_duration_in_seconds': 125.69,
 'gms_version': 'v0.12.0',
 'pending_requests': 0}
2 validation errors for TrinoJoinedAccessEvent
  none is not an allowed value (type=type_error.none.not_allowed)
  none is not an allowed value (type=type_error.none.not_allowed)```

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Which DataHub version are you using? (e.g. 0.12.0)

  2. Please post any relevant error logs on the thread!

There are two keys called create_timer and end_time which cannot receive the value None, as specified in the ingestion recipe validator.

Hello <@U05AW4DVBAA> I do not know where these values are generated from … I see a lot of event being generated and they do seem to have the fields and the values:

DEBUG {datahub.ingestion.source.usage.starburst_trino_usage:174} - event_dict: {'"'"'usr'"'"': '"'"'<|>'"'"', '"'"'query'"'"': '"'"'SELECT version()'"'"', '"'"'catalog'"'"': '"'"'bdc_redshift'"'"', '"'"'schema'"'"': None, '"'"'query_type'"'"': '"'"'SELECT'"'"', '"'"'accessed_metadata'"'"': '"'"'[]'"'"', '"'"'create_time'"'"': datetime.datetime(2023, 11, 15, 0, 10, 47, 523000, tzinfo=zoneinfo.ZoneInfo(key='"'"'UTC'"'"')), '"'"'end_time'"'"': datetime.datetime(2023, 11, 15, 0, 10, 47, 654000, tzinfo=zoneinfo.ZoneInfo(key='"'"'UTC'"'"'))}\\n

These fields are the ingestion configuration(
They are validated in here:|[…]ion/src/datahub/ingestion/source/usage/

As they do not have a default value, they are considered mandatory. That’s why your recipe is failing.

ok … I should be setting the start_time and end_time parameters in the config … I thought they had default values … So did not specify a value for the config

Hey <@U066AU9EE1F> - did that resolve the issue?

I couldnt get it working … Not sure where the issue is … But I can try later …