Creating a New Custom Aspect in `metadata-models-custom` Module of DataHub

Original Slack Thread

<@U06TM7M40RK>
how to create new custom aspect in metadata-models-custom

To create a new custom aspect in the metadata-models-custom module of DataHub, you can follow these steps:

  1. Define the Custom Aspect: Create a PDL (Pegasus Data Language) file to define your custom aspect. This file should be placed in the src/main/pegasus directory of the metadata-models-custom module.

  2. Add the Aspect to the Entity Registry: Update the entity-registry.yaml file to include your new aspect. This file is located under the registry directory.

  3. Build and Deploy the Custom Model: Use Gradle to build and deploy your custom model to DataHub.

Here is a detailed breakdown of each step:

Step 1: Define the Custom Aspect

Create a PDL file (e.g., MyCustomAspect.pdl) in the src/main/pegasus directory. Here is an example of what the PDL file might look like:

namespace com.mycompany.datahub.schema;

/**
 * Properties for my custom aspect
 */
@Aspect = {
  "name": "myCustomAspect"
}
record MyCustomAspect {
  /**
   * Unique Identifier
   */
  packageId: string
}

Step 2: Add the Aspect to the Entity Registry

Edit the entity-registry.yaml file to include your new aspect. Here is an example:

entities:
  - name: dataset
    aspects:
      - myCustomAspect

Step 3: Build and Deploy the Custom Model

Navigate to the metadata-models-custom directory and run the following commands to build and deploy your custom model:

cd metadata-models-custom
../gradlew build
./gradlew :metadata-models-custom:modelDeploy
../gradlew -PprojVersion=0.0.1 build
../gradlew -PprojVersion=0.0.1 install

This process will create a versioned artifact and deploy it to DataHub.

Would you like more detailed guidance on any specific part of this process?

Sources:

<@U06TM7M40RK>
in entity-registry.yaml what is id field. for example id: mycompany-dq-model

In the entity-registry.yaml file, the id field serves as a unique identifier for your registry. This identifier is crucial because it drives naming and artifact generation, ensuring that there are no conflicts with other registries you might create. The id should be unique to avoid any naming conflicts with other registries.

For example, in the following snippet:

id: mycompany-dq-model
entities:
  - name: dataset
    aspects:
      - customDataQualityRules

The id is set to mycompany-dq-model, which uniquely identifies this particular registry configuration. This ID is used in various parts of the DataHub system to reference this specific set of entity and aspect definitions.

Would you like more details on how the id field is used or any other part of the entity-registry.yaml file?

Sources:

<@U06TM7M40RK>
do I need to define another id in same file ?

I’m not able to find any data sources to answer your question. You might want to specify the context or purpose for defining another ID in the same file. If this doesn’t help, please reach out to the DataHub support team for further assistance. :+1: 0 button :-1: 0 button :bulb: Hint: Mention @runllm in the thread for followups.