Ingesting user information into DataHub using the `upsert` operation

Original Slack Thread

How do we ingest user information into DataHub? I associate owner information with datasets, but the user is not found in datahub

Hey there! :wave:

<@U06TM7M40RK> Can you help with the questions in this thread?

:information_source: Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!

To ingest user information into DataHub, you can use the upsert operation to create or update user accounts. This process involves creating a YAML file with the user details and using the DataHub CLI to ingest this data. Here’s a step-by-step guide:

  1. Create a YAML File: Save the user information in a YAML file. For example, user.yaml might look like this:

    - id: <mailto:bar@acryl.io|bar@acryl.io>
      first_name: The
      last_name: Bar
      email: <mailto:bar@acryl.io|bar@acryl.io>
      slack: "@the_bar_raiser"
      description: "I like raising the bar higher"
      groups:
        - <mailto:foogroup@acryl.io|foogroup@acryl.io>
    - id: datahub
      slack: "@datahubproject"
      phone: "1-800-GOT-META"
      description: "The DataHub Project"
      picture_link: "<https://raw.githubusercontent.com/datahub-project/datahub/master/datahub-web-react/src/images/datahub-logo-color-stable.svg>"
    
  2. Ingest User Data: Use the DataHub CLI to ingest the user data. Run the following command:

    datahub user upsert -f user.yaml
    

    This command will create new users or update existing ones based on the information provided in the YAML file.

  3. Verify the Ingestion: After running the command, you should see logs indicating the success of the operation, such as:

    Update succeeded for urn urn:li:corpuser:bar@acryl.io.
    Update succeeded for urn urn:li:corpuser:datahub.
    
  4. Associate Users with Datasets: Once the users are ingested, you can associate them with datasets as owners. Ensure that the user IDs or URNs used in the dataset metadata match those in the user information you ingested.

If the user is not found in DataHub, ensure that the user information is correctly formatted and that the ingestion process completed successfully. You can also check the DataHub UI under Settings &gt; Access &gt; Users &amp; Groups to verify that the user has been created or updated.

For more detailed information, you can refer to the <https://github.com/datahub-project/datahub/blob/master/docs/cli.md|DataHub CLI documentation> and the <https://datahubproject.io/docs/0.14.0/api/tutorials/owners|tutorial on managing users and groups>.

Would you like more detailed guidance on any specific part of this process?

Sources:

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><

Can you define groups in the same users.yaml file?