Error during DataHub ingestion process and inability to add datasets on Private AKS

Original Slack Thread

Hi team, recently we installed datahub & GMS Components on Private AKS. During Ingestion process we are getting below error and also not able to add datasets only platform is added.

Any suggestions or recommendations would be helpful

Errors

Could not fetch URL https://pypi.org/simple/pip/: There was a problem confirming the ssl certificate: HTTPSConnectionPool(host=‘http://pypi.org|pypi.org’, port=443): Max retries exceeded with url: /simple/pip/ (Caused by SSLError(SSLEOFError(8, ‘EOF occurred in violation of protocol (_ssl.c:1007)’))) - skipping
[2023-09-08 14:33:22,977] ERROR {datahub.ingestion.run.pipeline:418} - Caught error
Traceback (most recent call last):
File “/usr/local/lib/python3.10/site-packages/datahub_classify/infotype_helper.py”, line 35, in <module>
nlp_english = spacy.load(spacy_model_name)
File “/usr/local/lib/python3.10/site-packages/spacy/init.py”, line 54, in load
return util.load_model(
File “/usr/local/lib/python3.10/site-packages/spacy/util.py”, line 439, in load_model
raise IOError(Errors.E050.format(name=name))

> OSError: [E050] Can’t find model ‘en_core_web_sm’. It doesn’t seem to be a Python package or a valid path to a data directory.

Hi, this seems like it’s not directly related to DataHub so we might not have the best answer, but this post would be helpful: https://stackoverflow.com/questions/54334304/spacy-cant-find-model-en-core-web-sm-on-windows-10-and-python-3-5-3-anacon