Error due to Extra Fields in YAML Configuration for DataHub Ingestion Pipeline

Original Slack Thread

what is the error

+ exec datahub ingest run -c /tmp/datahub/ingest/ed6050df-f5ea-4331-bee0-c8b71d4b287c/recipe.yml --report-to /tmp/datahub/logs/ed6050df-f5ea-4331-bee0-c8b71d4b287c/artifacts/ingestion_report.json
[2025-01-18 04:43:47,905] DEBUG    {datahub.telemetry.telemetry:286} - Sending init telemetry
[2025-01-18 04:43:48,610] DEBUG    {datahub.telemetry.telemetry:318} - Sending telemetry for function-call datahub.cli.ingest_cli.run, status start
[2025-01-18 04:43:48,886] INFO     {datahub.cli.ingest_cli:149} - DataHub CLI version: 0.14.0.5
[2025-01-18 04:43:48,888] DEBUG    {datahub.telemetry.telemetry:318} - Sending telemetry for function-call datahub.cli.ingest_cli.run, status error
[2025-01-18 04:43:49,186] DEBUG    {datahub.entrypoints:215} - Error: 7 validation errors for PipelineConfig
source -> database
  extra fields not permitted (type=value_error.extra)
source -> host_port
  extra fields not permitted (type=value_error.extra)
source -> include_tables
  extra fields not permitted (type=value_error.extra)
source -> include_views
  extra fields not permitted (type=value_error.extra)
source -> password
  extra fields not permitted (type=value_error.extra)
source -> schema_pattern
  extra fields not permitted (type=value_error.extra)
source -> username
  extra fields not permitted (type=value_error.extra)
Traceback (most recent call last):
  File "/tmp/datahub/ingest/venv-hana-d77916d56bd699e1/lib/python3.10/site-packages/datahub/entrypoints.py", line 205, in main
    sys.exit(datahub(standalone_mode=False, **kwargs))
  File "/tmp/datahub/ingest/venv-hana-d77916d56bd699e1/lib/python3.10/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
  File "/tmp/datahub/ingest/venv-hana-d77916d56bd699e1/lib/python3.10/site-packages/click/core.py", line 1082, in main
    rv = self.invoke(ctx)
  File "/tmp/datahub/ingest/venv-hana-d77916d56bd699e1/lib/python3.10/site-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/tmp/datahub/ingest/venv-hana-d77916d56bd699e1/lib/python3.10/site-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/tmp/datahub/ingest/venv-hana-d77916d56bd699e1/lib/python3.10/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/tmp/datahub/ingest/venv-hana-d77916d56bd699e1/lib/python3.10/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
  File "/tmp/datahub/ingest/venv-hana-d77916d56bd699e1/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 462, in wrapper
    raise e
  File "/tmp/datahub/ingest/venv-hana-d77916d56bd699e1/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 411, in wrapper
    res = func(*args, **kwargs)
  File "/tmp/datahub/ingest/venv-hana-d77916d56bd699e1/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 206, in run
    ret = loop.run_until_complete(run_ingestion_and_check_upgrade())
  File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/tmp/datahub/ingest/venv-hana-d77916d56bd699e1/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 176, in run_ingestion_and_check_upgrade
    pipeline = Pipeline.create(
  File "/tmp/datahub/ingest/venv-hana-d77916d56bd699e1/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 418, in create
    config = PipelineConfig.from_dict(config_dict, raw_config)
  File "/tmp/datahub/ingest/venv-hana-d77916d56bd699e1/lib/python3.10/site-packages/datahub/ingestion/run/pipeline_config.py", line 108, in from_dict
    config = cls.parse_obj(resolved_dict)
  File "pydantic/main.py", line 532, in pydantic.main.BaseModel.parse_obj
  File "pydantic/main.py", line 347, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 7 validation errors for PipelineConfig
source -> database
  extra fields not permitted (type=value_error.extra)
source -> host_port
  extra fields not permitted (type=value_error.extra)
source -> include_tables
  extra fields not permitted (type=value_error.extra)
source -> include_views
  extra fields not permitted (type=value_error.extra)
source -> password
  extra fields not permitted (type=value_error.extra)
source -> schema_pattern
  extra fields not permitted (type=value_error.extra)
source -> username
  extra fields not permitted (type=value_error.extra)
7 validation errors for PipelineConfig
source -> database
  extra fields not permitted (type=value_error.extra)
source -> host_port
  extra fields not permitted (type=value_error.extra)
source -> include_tables
  extra fields not permitted (type=value_error.extra)
source -> include_views
  extra fields not permitted (type=value_error.extra)
source -> password
  extra fields not permitted (type=value_error.extra)
source -> schema_pattern
  extra fields not permitted (type=value_error.extra)
source -> username
  extra fields not permitted (type=value_error.extra)
[2025-01-18 04:43:49,188] DEBUG    {datahub.entrypoints:220} - DataHub CLI version: 0.14.0.5 at /tmp/datahub/ingest/venv-hana-d77916d56bd699e1/lib/python3.10/site-packages/datahub/__init__.py
[2025-01-18 04:43:49,188] DEBUG    {datahub.entrypoints:223} - Python version: 3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0] at /tmp/datahub/ingest/venv-hana-d77916d56bd699e1/bin/python3 on Linux-5.10.220-209.869.amzn2.x86_64-x86_64-with-glibc2.35```
in my yaml

source:
    type: hana
    config: null
    host_port: '34.87.167.111:30044'
    database: SAPABAP1
    username: VH_DATAWAREHOUSE
    password: Warehouse@2025
    include_tables: true
    include_views: true
    schema_pattern:
        allow:
            - BKPF
            - BSEG

Hey there! :wave:

<@U06TM7M40RK> Can you help with the questions in this thread?

:information_source: Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!

The error message you’re encountering indicates that there are “extra fields not permitted” in your YAML configuration for the DataHub ingestion pipeline. Specifically, the fields database, host_port, include_tables, include_views, password, schema_pattern, and username are not recognized as valid fields for the PipelineConfig in the context of the hana source type.

This type of error typically occurs when the configuration fields specified in the YAML file do not match the expected schema for the source type you are using. It’s possible that the hana source type does not support these fields, or they may need to be nested under a different configuration structure.

To resolve this issue, you should:

  1. Verify the Source Configuration: Check the documentation or source code for the hana source type to ensure that you are using the correct configuration fields. The fields you are using might be valid for another source type but not for hana.

  2. Adjust the YAML Configuration: Modify your YAML configuration to align with the expected schema for the hana source type. Remove or correctly nest any fields that are not supported.

Unfortunately, the specific configuration schema for the hana source type is not detailed in the provided data sources. You may need to refer to the official DataHub documentation or source code for the most accurate and up-to-date information.

Would you like more detailed guidance on how to adjust your YAML configuration, or do you have access to the DataHub documentation for the hana source type?

Sources:

Hi, how did my answer work out for you? Need any more assistance? You can also tag <@U07FV3UHE4D> to let your fellow community members know you’re still looking for guidance.

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><