Ingesting Stored Procedures in Oracle and Custom Ingestion Recipes in DataHub

Original Slack Thread

<@U06TM7M40RK> <@U07FV3UHE4D> Need recipe for ingesting stored procedures in oracle

To ingest stored procedures from Oracle into DataHub, you can use the Oracle ingestion source configuration. However, it’s important to note that direct support for parsing stored procedures for lineage extraction is not explicitly mentioned in the provided data sources. You may need to manually handle stored procedures or convert them into DML statements for lineage extraction.

Here’s a general approach to configure an ingestion recipe for Oracle, which you can adapt for stored procedures:

  1. Create an Ingestion Recipe: Write a YAML configuration file that specifies the source of the data and the destination (DataHub).

  2. Configure the Source: Use the oracle source type and provide necessary connection details.

  3. Run the Ingestion: Use the DataHub CLI to run the ingestion based on the recipe.

Example Configuration:

source:
  type: oracle
  config:
    host_port: your_oracle_host:1521
    database: your_database_name
    username: your_username
    password: your_password
    data_dictionary_mode: ALL
    include_views: true
    include_tables: true
    # Add any additional configuration for stored procedures if needed
sink:
  type: datahub-rest
  config:
    server: http://localhost:8080

Running the Ingestion:

datahub ingest -c path_to_your_recipe.yaml

For more detailed guidance on Oracle ingestion, you can refer to the <https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/oracle|Oracle Ingestion Configuration>.

Would you like more detailed steps or additional examples on handling stored procedures specifically?

Sources:

<@U06TM7M40RK> Above code printed store proc how to show them on dashboard

I’m not able to find any data sources to answer your question. You might want to specify which dashboard tool you are using with DataHub. If this doesn’t help, please reach out to the DataHub support team for further assistance. :+1: 0 button :-1: 0 button :bulb: Hint: Mention @runllm in the thread for followups.

<@U06TM7M40RK> need code for custom ingestion

To create a custom ingestion source in DataHub, you can follow these steps. This involves defining a custom source configuration, implementing the source class, and using it in an ingestion recipe. Here’s a high-level guide with example code:

  1. Define the Custom Source Configuration: Use Pydantic to define the configuration model for your custom source.

  2. Implement the Custom Source Class: Extend the Source class from DataHub and implement the logic to read and parse your data.

  3. Use the Custom Source in an Ingestion Recipe: Reference your custom source class in a YAML recipe and run the ingestion using the DataHub CLI.

Step 1: Define the Custom Source Configuration

from pydantic import BaseModel

class CustomCSVSourceConfig(BaseModel):
    file_path: str

Step 2: Implement the Custom Source Class

from datahub.ingestion.api.common import PipelineContext
from datahub.ingestion.api.source import Source, SourceReport
from datahub.metadata.schema_classes import MetadataChangeEventClass, DatasetSnapshotClass, DatasetPropertiesClass, AuditStampClass
from datahub.emitter.mce_builder import make_dataset_urn
import csv

class CustomCSVSource(Source):
    def __init__(self, config: CustomCSVSourceConfig, ctx: PipelineContext):
        super().__init__(ctx)
        self.config = config

    @classmethod
    def create(cls, config_dict, ctx):
        config = CustomCSVSourceConfig.parse_obj(config_dict)
        return cls(config, ctx)

    def get_workunits(self):
        with open(self.config.file_path, mode='r') as file:
            csv_reader = csv.DictReader(file)
            for row in csv_reader:
                dataroot = row['Dataroot']
                datafolder = row['Datafolder']
                datafield = row['Datafield']
                
                # Create URNs
                dataroot_urn = make_dataset_urn("custom", dataroot, "PROD")
                datafolder_urn = make_dataset_urn("custom", f"{dataroot}.{datafolder}", "PROD")
                datafield_urn = make_dataset_urn("custom", f"{dataroot}.{datafolder}.{datafield}", "PROD")
                
                # Create DatasetProperties
                dataset_properties = DatasetPropertiesClass(
                    description=f"Datafield {datafield} in folder {datafolder} under root {dataroot}",
                    customProperties={"source": "CSV"}
                )
                
                # Create AuditStamp
                audit_stamp = AuditStampClass(
                    time=1640692800000,  # Example timestamp
                    actor="urn:li:corpuser:ingestion"
                )
                
                # Create DatasetSnapshot
                dataset_snapshot = DatasetSnapshotClass(
                    urn=datafield_urn,
                    aspects=[dataset_properties]
                )
                
                # Create MetadataChangeEvent
                mce = MetadataChangeEventClass(
                    proposedSnapshot=dataset_snapshot
                )
                
                yield mce

    def get_report(self):
        return SourceReport()

Step 3: Use the Custom Source in an Ingestion Recipe

source:
  type: custom_source.CustomCSVSource
  config:
    file_path: '/path/to/your/csvfile.csv'
sink:
  type: datahub-rest
  config:
    server: '<http://localhost:8080>'

Run the Ingestion:

datahub ingest -c path/to/your/recipe.yaml

This setup will allow you to ingest data from a CSV file using a custom source. You can adapt this example to fit your specific data source and requirements.

For more detailed guidance, you can refer to the <https://datahubproject.io/docs/0.13.1/how/add-custom-ingestion-source|DataHub documentation on adding a custom ingestion source>.

Would you like more detailed steps or specific configurations for any of these points?

Sources:

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><