Scaling DataHub Deployment on OpenShift for Production

Original Slack Thread

Hey fellow DataHuber,

I am from a large national lab. We have evaluated DataHub in our POC to manage large scale ML/AI data. We really like the functionality. We have deployed our prototype system on OpenShift with QuickStart configuration. Now we plan to go production potentially. My question to this group is if we need to adjust from QuickStart configuration. We expect to have 1 million data sets with volume up to 200 TB and 75 active users. Any guidance is appreciated.

Thank you in advance!

Maybe you can try and run the following commanddatahub docker quickstart --quickstart-compose-file <your-modified-compose>.yml
Fetching docker-compose files from <https://github.com/datahub-project/datahub/tree/master/docker/quickstart> then modify the config file