Automatic User Provisioning in DataHub with OIDC and GCP Integration

Original Slack Thread

does datahub support automated user provisioning with OIDC?
I dont want to have to manage datahub users and group access I want datahub to auto-provision new users into the correct groups based on their attributes

Yes, see the options and documentation in an earlier section on that page here:

<@U03MF8MU5P0> so I looked into this more. It seems GCP’s OIDC doesn’t support group claims :disappointed: . do you know if its possible to map GCP groups to datahub groups? or do you need group claims for that to work?


OR even if I can attach a custom attribute to my GCP users for “datahub_group” or whatever and have it use a custom attribute to put users in the right datahub groups would be enough for my needs

Certainly interacting with the API is possible, the key point would be to create an action when a new user logs in. This event would trigger a new entity for the user being created. The action could be developed in the actions framework and then use the API to use some logic to map the user into a group

Another option would be to write a script to use the API to list the users and add them to groups which you run ad-hoc or on some schedule.

yeah the latter would be more of a workaround but creating my own custom action sounds pretty doable.

without getting too deep in the weeds here would my login action automatically see all the users’ OIDC metadata like any custom attribute?

The action would not see any oidc details like other claims, you’d probably get the email address because it is part of the datahub user’s urn but that’s about it.

<@U03MF8MU5P0> Ah I see. so I would probably have to have my action hit the GCP OIDC API to get more info which means I’d need to get GCP creds in there somehow which complicates thing. all because GCP’s OIDC provider still doesn’t support group claims

Yeah, my understanding is that GCP didn’t really have the concept of a group in the first place. It was added later with the business tool suite called like Workspace. If you happen to be running DataHub in like GKE on GCP, then you could set the <|service account for workloads> to get the credentials to the pod. You end up adding some annotations in the helm chart if its setup in your GKE cluster and the pod gets the credentials for a service account which you have to create. There are no explicit credentials to worry about. Its definitely an advanced configuration to run but no reason it couldn’t work.