Is it possible to get column level descriptions stored as comments from Vertica tables?
Hey there!
<@U06TM7M40RK> Can you help with the questions in this thread?
Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!
To retrieve column-level descriptions stored as comments from Vertica tables and have them reflected in DataHub, you need to ensure that the ingestion process is correctly configured to capture these comments as descriptions.
Based on the <https://github.com/datahub-project/datahub/blob/master/docs/api/datahub-apis.md|DataHub API documentation>, DataHub supports adding descriptions to columns, which can be done through various methods, including using GraphQL mutations. However, the specific process for automatically capturing comments from Vertica and mapping them to descriptions in DataHub is not explicitly detailed in the provided data sources.
If you are using a custom ingestion job, you might need to ensure that the ingestion logic is set up to read the comments from the Vertica metadata and map them to the appropriate fields in DataHub. This might involve customizing the ingestion code or configuration to extract and apply these comments as descriptions.
Would you like more detailed guidance on how to configure or customize the ingestion process to achieve this? If so, please let me know, and I can provide further steps or code examples based on the available data sources.
Sources:
- <https://github.com/datahub-project/datahub/blob/master/docs/api/datahub-apis.md|DataHub API documentation> 0 button 0 button Hint: Mention @runllm in the thread for followups.
I would like more detailed guidance on how to configure or customize the ingestion process to achieve that
I would like more detailed guidance on how to configure or customize the ingestion process to achieve that <@U06TM7M40RK>
To configure or customize the ingestion process to capture column-level descriptions stored as comments from Vertica tables and reflect them in DataHub, you can follow these steps:
Step-by-Step Guide
-
Enable Descriptions in Ingestion Recipe: Ensure that your ingestion recipe is configured to include descriptions. This involves setting the appropriate flags in your configuration file.
-
Custom Transformer: Implement a custom transformer to extract comments from Vertica and map them to column descriptions in DataHub.
-
Run the Ingestion: Use the DataHub CLI to execute the ingestion process with the custom configuration.
Step 1: Enable Descriptions in Ingestion Recipe
Create or modify your ingestion recipe YAML file to include descriptions. Here is an example configuration for Vertica:
source:
type: vertica
config:
host_port: "localhost:5433"
database: "your_database"
username: "your_username"
password: "your_password"
include_tables: true
include_views: true
include_columns: true
include_descriptions: true # Ensure descriptions are included
sink:
type: "datahub-rest"
config:
server: "http://localhost:8080"
Step 2: Implement a Custom Transformer
Create a custom transformer to extract comments from Vertica and map them to column descriptions. Here is a simplified example:
from datahub.ingestion.api.transform import Transformer
from datahub.metadata.schema_classes import SchemaFieldClass
class VerticaCommentToDescription(Transformer):
def transform(self, entity):
if hasattr(entity, 'schemaMetadata') and entity.schemaMetadata:
for field in entity.schemaMetadata.fields:
if isinstance(field, SchemaFieldClass):
# Extract comment from Vertica and set as description
comment = self.get_comment_from_vertica(field.fieldPath)
if comment:
field.description = comment
return entity
def get_comment_from_vertica(self, field_path):
# Implement logic to fetch comment from Vertica for the given field_path
return "Sample comment from Vertica"
@classmethod
def create(cls, config_dict, ctx):
return cls(config_dict)
Step 3: Run the Ingestion
-
Save the Custom Transformer: Save the
VerticaCommentToDescription
class in a Python file accessible to your ingestion process. -
Update the Ingestion Recipe: Add the custom transformer to your ingestion recipe:
transformers:
- type: path.to.your.VerticaCommentToDescription
- Execute the Ingestion: Run the ingestion command using the DataHub CLI:
datahub ingest -c path/to/your/ingestion_recipe.yaml
Additional Considerations
- Database Access: Ensure that your ingestion process has the necessary permissions to access Vertica metadata, including comments.
- Testing: Test the ingestion process with a small dataset to verify that comments are correctly mapped to descriptions.
Would you like more detailed steps or further assistance on any specific part of this process?
Sources:
- https://datahubspace.slack.com/archives/CV2UXSE9L/p1723600167.340519|https://datahubspace.slack.com/archives/CV2UXSE9L/p1723600167.340519
- https://datahubspace.slack.com/archives/CUMUWQU66/p1726216517.161159|https://datahubspace.slack.com/archives/CUMUWQU66/p1726216517.161159 0 button 0 button Hint: Mention @runllm in the thread for followups.
- type: path.to.your.VerticaCommentToDescription```
give me an example of what the path would look like. Also the option include_descriptions: true does not exist for vertica currently <@U06TM7M40RK>
To specify the path for a custom transformer in your ingestion recipe, you need to provide the fully qualified Python module and class name. This path should reflect the directory structure where your custom transformer code is located.
Example Path for Custom Transformer
Assuming your custom transformer is defined in a file named custom_transformer.py
and is located in a directory named transformers
, the path in your ingestion recipe would look like this:
transformers:
- type: transformers.custom_transformer.VerticaCommentToDescription
Directory Structure Example
Here’s an example of how your directory structure might look:
/your_project_directory
/transformers
custom_transformer.py
ingestion_recipe.yaml
Custom Transformer Code Example
Ensure your custom_transformer.py
file contains the VerticaCommentToDescription
class:
from datahub.ingestion.api.transform import Transformer
from datahub.metadata.schema_classes import SchemaFieldClass
class VerticaCommentToDescription(Transformer):
def transform(self, entity):
if hasattr(entity, 'schemaMetadata') and entity.schemaMetadata:
for field in entity.schemaMetadata.fields:
if isinstance(field, SchemaFieldClass):
# Extract comment from Vertica and set as description
comment = self.get_comment_from_vertica(field.fieldPath)
if comment:
field.description = comment
return entity
def get_comment_from_vertica(self, field_path):
# Implement logic to fetch comment from Vertica for the given field_path
return "Sample comment from Vertica"
@classmethod
def create(cls, config_dict, ctx):
return cls(config_dict)
Note on include_descriptions
As you mentioned, the include_descriptions: true
option does not exist for Vertica in the current DataHub configuration. Instead, you will need to rely on custom logic within your transformer to fetch and apply comments as descriptions.
Running the Ingestion
Once your transformer is set up and your ingestion recipe is configured, you can run the ingestion process using the DataHub CLI:
datahub ingest -c path/to/your/ingestion_recipe.yaml
Would you like more detailed steps or further assistance on any specific part of this process?
Sources:
- https://datahubspace.slack.com/archives/CUMUWQU66/p1720625480.052599|https://datahubspace.slack.com/archives/CUMUWQU66/p1720625480.052599 0 button 0 button Hint: Mention @runllm in the thread for followups.
Hi ! Did this interaction resolve your issue or do you still require assistance ?
Please tag @ RunLLM if you still need further assistance !
Or tag @Datahub Community Support if you require a person to take a look
Hope this helps ! ><