Using a Custom Classifier in DataHub: Implementation and Configuration Steps

Original Slack Thread

<@U06TM7M40RK> How can we use custom classifier instead of datahub default classifier?

To use a custom classifier instead of the default DataHub classifier, you need to follow a few steps to integrate your custom logic into the DataHub classification framework. Here’s a summary of the process:

  1. Implement a Custom Classifier: You need to create a custom classifier by implementing the Classifier interface. This involves defining the classify method, which will contain your custom classification logic.

  2. Register the Custom Classifier: Once you have implemented your custom classifier, you need to register it with the DataHub classifier registry. This is done by adding your classifier to the classifier_registry in the DataHub codebase.

  3. Configure the Custom Classifier: Update your DataHub ingestion configuration to use your custom classifier. This involves specifying the type of your custom classifier in the classifiers section of the ClassificationConfig.

Here is a basic outline of how you might implement and register a custom classifier:

from datahub.ingestion.glossary.classifier import Classifier
from datahub.ingestion.glossary.classifier_registry import classifier_registry

class MyCustomClassifier(Classifier):
    def classify(self, columns):
        # Implement your custom classification logic here
        return columns

# Register the custom classifier
classifier_registry.register("my_custom_classifier", MyCustomClassifier)

To configure your ingestion to use this custom classifier, you would update your configuration file to include:

classification:
  enabled: true
  classifiers:
    - type: my_custom_classifier
      config: # Add any specific configuration for your custom classifier here

Would you like more detailed instructions or have any specific questions about this process?

Sources:

Hi <@U07GU0BJG4Q>, how did my answer work out for you? Need any more assistance? You can also tag <@U07FV3UHE4D> to let your fellow community members know you’re still looking for guidance.

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><