Deploying DataHub via Docker and Moving to Production: Security Concerns and Best Practices

user-2 · March 4, 2024, 3:24pm

Hey folks!

We’re exploring data hub for our company. I just tried the quick start and it was GREAT.

We would like to deploy via Docker but I saw this in the documentation - https://datahubproject.io/docs/quickstart/#move-to-production|https://datahubproject.io/docs/quickstart/#move-to-production

But then there’s a deploying with Docker guide? https://datahubproject.io/docs/docker|https://datahubproject.io/docs/docker.

Would someone mind shedding a little light on deploying via Docker in production? Are there any security concerns?

user-2 · March 4, 2024, 3:24pm

In particular this piece. Does this mean data hub services can be accessed by anyone? Or just anyone with access to the machine Docker is running on?

Exposed Ports
DataHub’s services, and it’s backend data stores use the docker default behavior of binding to all interface addresses. This makes it useful for development but is not recommended in a production environment.

user-3 · March 4, 2024, 3:24pm

Would also love to learn more about this - is there a guide about the steps needed to move from Quickstart to production (if not, I would write one once I’m finished). We are currently thinking about using Kubernetes for production, but would also be open to continue with Docker.

user-1 · March 4, 2024, 3:24pm

I can provide a bit of insights:

DataHub services by default would be able to be accessed by anyone who has access to the docker / kubernetes pods on the network. Using K8s (or cloud provider security policies), you can control which ports on each pod are exposed to the outside world using proxies. The key services by default will appear on localhost:9002, localhost:8080, localhost:9092, localhost:3306, … and a few others on the local machine where DataHub is deployed.

If you are not publicly exposing the ports where datahub runs (e.g. localhost:8080) outside of the node where datahub is deployed, there is no security concern. But when you deploy it is expected you understand how to properly set up security rules to prevent unauthorized access to the internal datahub services by only surfacing officially public endpoints (localhost:9002 where the datahub frontend service is served).

If you only intend to expose DataHub to your organization (e.g. on a private network not accessible from the internet), you should generally be A.O.K. The disclaimers noted are primarily to serve as a warning to closely consider security implications before making DataHub "production’, reminding you to closely consider how traffic accesses your DataHub instance where ever it is hosted

user-1 · March 4, 2024, 3:24pm

That being said, there is no mandated set of rules that that dictate what constitutes a “production” grade deployment. ‘Production-grade’ will be unique to your organization’s requirements and needs

user-2 · March 4, 2024, 3:24pm

<@U01GCJKA8P9> this is super helpful and detailed. I suspected as much but wanted to confirm. THANK YOU!

Topic		Replies	Views
Deploying Datahub in AirGap Kubernetes Environment and Pointing to Internal Docker Hub for Security Concerns all-things-deployment	2	54	March 4, 2024
Troubleshooting DataHub Deployment on Single Docker Host in Ubuntu getting-started	1	66	March 4, 2024
Troubleshooting 'datahub docker quickstart' command issue getting-started	1	102	March 4, 2024
Installing Datahub Quick Start Guide Without Docker getting-started	1	72	March 4, 2024
Custom Implementation and Docker Image Deployment Challenges in Upgrading DataHub all-things-deployment	1	55	March 4, 2024

Deploying DataHub via Docker and Moving to Production: Security Concerns and Best Practices

Related topics