<@U06TM7M40RK> oracle ingestion via airflow pipeline doesnt work for us. It requires cx_Oracle package and the oracle instantclient but due to error in extracting the instantclient and setting as LD_LIBRARY_PATH env variable, the ingestion via airflow doesnt work. Oracle ingestion via shell script and datahub recipe works though.
It seems likely that the issue you’re encountering with Oracle ingestion via an Airflow pipeline is related to the configuration of the Oracle client libraries, specifically the cx_Oracle
package and the Oracle Instant Client. The error you’re facing with setting the LD_LIBRARY_PATH
environment variable might be causing the ingestion process to fail in Airflow, even though it works with a shell script and DataHub recipe.
Based on the <https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/oracle|DataHub Oracle Ingestion Documentation>, the Oracle ingestion source requires the correct drivers, such as the Oracle Instant Client, to be installed. The cx_Oracle
package is used to connect to Oracle databases, and it relies on the Oracle Instant Client being correctly installed and configured.
Here are some steps you might consider to resolve the issue:
-
Verify Oracle Instant Client Installation: Ensure that the Oracle Instant Client is correctly installed on the machine where Airflow is running. You can follow the <https://cx-oracle.readthedocs.io/en/latest/user_guide/installation.html|cx_Oracle installation guide> for detailed instructions.
-
Set
LD_LIBRARY_PATH
Correctly: Make sure that theLD_LIBRARY_PATH
environment variable is set to include the directory where the Oracle Instant Client libraries are located. This can be done in the Airflow environment configuration or within the Airflow DAG script. -
Check Airflow Environment: Ensure that the Airflow environment has access to the necessary environment variables and paths. You might need to configure the Airflow worker or scheduler to include these settings.
-
Test Outside Airflow: Since the ingestion works via a shell script, you can try running the same script within an Airflow BashOperator to see if it works. This can help isolate whether the issue is with the Airflow Python environment or the overall system configuration.
-
Use a Virtual Environment: Consider using a virtual environment for Airflow that includes the
cx_Oracle
package and the necessary Oracle client libraries. This can help manage dependencies more effectively.
Would you like more detailed instructions or examples for any specific part of this process?
Sources:
- <https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/oracle|DataHub Oracle Ingestion Documentation> 0 button 0 button Hint: Mention @runllm in the thread for followups.
Hi ! Did this interaction resolve your issue or do you still require assistance ?
Please tag @ RunLLM if you still need further assistance !
Or tag @Datahub Community Support if you require a person to take a look
Hope this helps ! ><