This message was deleted.
Hey there! 
<@U06TM7M40RK> Can you help with the questions in this thread?
Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!
To address the issue you’re facing with DataHub ingestion from an MS SQL server, let’s break down the steps and potential solutions:
-
Understanding the Environment:
- The DataHub CLI and the ingestion process can run in different environments. The CLI runs on your local machine, while the ingestion process might be executed within a Docker container or another environment, depending on your setup.
-
Python Environment:
- You mentioned using a virtual environment on your laptop. This is a good practice to isolate dependencies. However, the error you encountered suggests that the required packages for MS SQL ingestion might not be correctly installed in the environment where the ingestion is executed.
-
Installing Required Packages:
- Ensure that the
acryl-datahub[mssql]
package is installed in the correct environment. If you’re running the ingestion from the CLI on your local machine, make sure the virtual environment is activated when you install the package:source env/bin/activate pip install 'acryl-datahub[mssql]'
- Ensure that the
-
Docker Environment:
- If the ingestion is running inside a Docker container, you might need to customize the Docker image to include the necessary dependencies. According to <Slack Slack discussion>, you can create a custom Docker image with the required packages:
FROM linkedin/datahub-ingestion:60c17a2 RUN apt-get update && apt-get install -y unixodbc-dev RUN pip install pyodbc RUN pip install 'acryl-datahub[mssql]'
- If the ingestion is running inside a Docker container, you might need to customize the Docker image to include the necessary dependencies. According to <Slack Slack discussion>, you can create a custom Docker image with the required packages:
-
SQLAlchemy Error:
- The error related to SQLAlchemy might be due to a version mismatch or missing dependencies. Ensure that SQLAlchemy is installed and compatible with the other packages in your environment.
-
Logs and Debugging:
- Review the logs for specific error messages that can provide more context. You can access the logs by clicking on the Details button of the failed ingestion run in the UI.
Would you like more detailed steps on any of these points?
Sources:
- <Slack Discussion on MSSQL Ingestion>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
I decided to try CLI ingestion instead. It seems to partially work, but I get a giant number of errors. Here is the tail of the console output:
'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException',
'message': 'com.datahub.util.exception.RetryLimitReached: Failed to add after 3 retries',
'status': 500,
'urn': 'urn:li:container:ca29412f291726fbe3654fd5a19a170d',
'workunit_id': 'urn:li:container:ca29412f291726fbe3654fd5a19a170d-dataPlatformInstance'}},
{'error': 'Unable to emit metadata to DataHub GMS: com.datahub.util.exception.RetryLimitReached: Failed to add after 3 retries',
'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException',
'message': 'com.datahub.util.exception.RetryLimitReached: Failed to add after 3 retries',
'status': 500,
'urn': 'urn:li:container:ca29412f291726fbe3654fd5a19a170d',
'workunit_id': 'urn:li:container:ca29412f291726fbe3654fd5a19a170d-dataPlatformInstance'}},
'... sampled of 100 total elements'],
'start_time': '2025-02-04 15:59:17.670023 (6 minutes and 34.68 seconds ago)',
'current_time': '2025-02-04 16:05:52.344306 (now)',
'total_duration_in_seconds': 394.67,
'mode': 'ASYNC_BATCH',
'max_threads': 15,
'gms_version': 'v1.0.0rc1',
'pending_requests': 0,
'async_batches_prepared': 44,
'async_batches_split': 0,
'main_thread_blocking_timer': '2.398 seconds'}
Pipeline finished with at least 100 failures; produced 4172 events in 6 minutes and 33.15 seconds.```
Anyone have ideas on what the issue might be? My CPU was pegged at 100%, so I’m not sure if it’s just my machine is too busy to handle all requests. Is there a way to slow things down?
Hi ! Did this interaction resolve your issue or do you still require assistance ?
Please tag @ RunLLM if you still need further assistance !
Or tag @Datahub Community Support if you require a person to take a look
Hope this helps ! ><