Restricting User Visibility of Platforms and Datasets in DataHub

Original Slack Thread

<@U06TM7M40RK> we have been able to set access control policies for viewing and editing datasets using the domains. But the users are still able to view all platforms on the homepage and also all datasets using the global filter and search. How to force users to ONLY see those platforms and datasets to which they have access to ?

To restrict users from seeing platforms and datasets they do not have access to on the homepage and in search results, you need to configure DataHub’s access control policies effectively. However, it’s important to note that, as per the current state of DataHub, it does not hide entities in search results that users do not have rights to view. Users may still see the names of datasets and platforms even if they cannot access the details.

Here are the steps to configure access control policies to limit what users can see:

  1. Define Metadata Policies: Create metadata policies that specify which users or groups can access specific datasets and platforms.
  2. Assign Domains: Ensure that datasets and other entities are assigned to the appropriate domains.
  3. Apply Policies: Apply the policies to restrict user access to only those entities within the specified domains.

Step-by-Step Guide

  1. Create or Update Users and Groups:

    • Define users and groups in a YAML file and use the DataHub CLI to upsert them.
    # user.yaml
    - id: user1@example.com
      first_name: User
      last_name: One
      email: user1@example.com
      groups:
        - data_scientists
    
    # group.yaml
    id: data_scientists
    display_name: Data Scientists
    members:
      - user1@example.com
    
    datahub user upsert -f user.yaml
    datahub group upsert -f group.yaml
    
  2. Define Metadata Policies:

    • Create a policy that specifies which users or groups can access specific datasets and platforms.
    # policy.yaml
    type: metadata
    name: "Data Scientists Access"
    description: "Allow data scientists to access specific datasets"
    actors:
      users:
        - user1@example.com
      groups:
        - data_scientists
    privileges:
      - view
    resources:
      - urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)
    
    datahub policy upsert -f policy.yaml
    
  3. Apply Policies:

    • Ensure that the policies are applied correctly and verify that users can only see the datasets they have access to.

Important Considerations

Would you like more detailed steps or specific examples on any of these points?

Sources: