<@U06TM7M40RK> How can we use custom classifier instead of datahub default classifier?
To use a custom classifier instead of the default DataHub classifier, you need to follow a few steps to integrate your custom logic into the DataHub classification framework. Here’s a summary of the process:
-
Implement a Custom Classifier: You need to create a custom classifier by implementing the
Classifier
interface. This involves defining theclassify
method, which will contain your custom classification logic. -
Register the Custom Classifier: Once you have implemented your custom classifier, you need to register it with the DataHub classifier registry. This is done by adding your classifier to the
classifier_registry
in the DataHub codebase. -
Configure the Custom Classifier: Update your DataHub ingestion configuration to use your custom classifier. This involves specifying the type of your custom classifier in the
classifiers
section of theClassificationConfig
.
Here is a basic outline of how you might implement and register a custom classifier:
from datahub.ingestion.glossary.classifier import Classifier
from datahub.ingestion.glossary.classifier_registry import classifier_registry
class MyCustomClassifier(Classifier):
def classify(self, columns):
# Implement your custom classification logic here
return columns
# Register the custom classifier
classifier_registry.register("my_custom_classifier", MyCustomClassifier)
To configure your ingestion to use this custom classifier, you would update your configuration file to include:
classification:
enabled: true
classifiers:
- type: my_custom_classifier
config: # Add any specific configuration for your custom classifier here
Would you like more detailed instructions or have any specific questions about this process?
Sources:
- <https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/glossary/classifier_registry.py|DataHub Classifier Registry>
- <https://datahubproject.io/docs/0.14.0/metadata-ingestion/docs/dev_guides/classification|DataHub Classification Documentation>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
Hi <@U07GU0BJG4Q>, how did my answer work out for you? Need any more assistance? You can also tag <@U07FV3UHE4D> to let your fellow community members know you’re still looking for guidance.
Hi ! Did this interaction resolve your issue or do you still require assistance ?
Please tag @ RunLLM if you still need further assistance !
Or tag @Datahub Community Support if you require a person to take a look
Hope this helps ! ><