Troubleshooting Oracle Ingestion with DataHub Data Dictionary Mode

Original Slack Thread

Hey I m using datahub. I just discover that while I am doing ingestion from Oracle, the datahub app tries to SELECT tables in database instead of using catalog privileges and catalog tables. I encounter “the missing column info”. How can I fix it? I just need taking table and column name using all_tab_cols or other catalog privileges

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

<@U06TM7M40RK> Can you help with the questions in this thread?


:information_source: We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a :thumbsup: or :thumbsdown: so other folks in the Community know if you’re still looking for help on the topic!

To help you more effectively, could you please provide the following information:

  1. Are you using the UI or CLI for ingestion?
  2. Which DataHub version are you using?
  3. What data source(s) are you integrating with DataHub? (e.g., Oracle)?

Based on your initial message, it seems like you are encountering issues with Oracle ingestion due to the requirement for DBA table access, despite setting the data_dictionary_mode to “ALL”. This setting should ideally use the ALL_* tables instead of the DBA_* tables.

According to the <https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/docs/sources/oracle/oracle_pre.md|Oracle ingestion source documentation>, the data_dictionary_mode configuration option is indeed set to “ALL” by default. However, there might be specific configurations or permissions required for your Oracle user.

Please confirm the details above so I can provide more targeted assistance.

Sources:

<@U07674AQGT1>, please, can you elaborate a bit more on the issue you are facing?
The source queries system tables to get tables/columns etc…
For example getting column info it runs the following query:

                col.column_name,
                col.data_type,
                col.%(char_length_col)s,
                col.data_precision,
                col.data_scale,
                col.nullable,
                col.data_default,
                com.comments,
                col.virtual_column,
                %(identity_cols)s
            FROM dba_tab_cols%(dblink)s col
            LEFT JOIN dba_col_comments%(dblink)s com
            ON col.table_name = com.table_name
            AND col.column_name = com.column_name
            AND col.owner = com.owner
            WHERE col.table_name = CAST(:table_name AS VARCHAR2(128))
            AND col.hidden_column = 'NO'```

Is there any issue querying these tables?

Hi Tamas. Thanks for quick support. Our problem is that our DB-ingestion user does not have any SELECT permission for any table in the DB, so the ALL option and all_table and that query return null for us. We tested that if we give SELECT privilege to our user for all tables in the schema, ingestion works with colums successfully. But this is not the method we want to follow. We choose to use data_dictionary_mode: ‘DBA’ and without having any select permission for schema tables. We get in all dictionay priviledge in our dba team, however we cannot set data_dictionary_mode.

data_dictionary_mode
  extra fields not permitted (type=value_error.extra)```

source:
type: oracle
config:
host_port: ‘working_well’
service_name: working_well
schema_pattern:
allow:
- VD_COMMON_SERVICE
username: ‘${ORACLE_USERNAME}’
password: ‘${ORACLE_PASSWORD}’
include_tables: true
include_views: false
include_table_location_lineage: false
include_view_lineage: false
use_file_backed_cache: false
data_dictionary_mode: ‘DBA’
profiling:
enabled: false

Can’t you grant select permission on system tables?
Otherwise I don’t see how it should get the metadata

Yes we give select permission in DBA tables but in application before starting ingestion we cannot set the dictionary mode

Obtaining venv creation lock...
Acquired venv creation lock
venv is already set up
venv setup time = 0 sec
This version of datahub supports report-to functionality
+ exec datahub ingest run -c /tmp/datahub/ingest/985ce576-5331-4718-9a96-3836ae03108a/recipe.yml --report-to /tmp/datahub/ingest/985ce576-5331-4718-9a96-3836ae03108a/ingestion_report.json
[2024-07-24 10:38:21,591] INFO     {datahub.cli.ingest_cli:147} - DataHub CLI version: 0.12.1.5
[2024-07-24 10:38:21,644] INFO     {datahub.ingestion.run.pipeline:238} - Sink configured successfully. DataHubRestEmitter: configured to talk to <http://datahub-datahub-gms:8080>
Failed to configure the source (oracle): 1 validation error for OracleConfig
data_dictionary_mode
  extra fields not permitted (type=value_error.extra)```

Thats the error know I am taking

Is your datahub client’s version is over v0.13.0