<@U06TM7M40RK> <@U07FV3UHE4D> Need recipe for ingesting stored procedures in oracle
To ingest stored procedures from Oracle into DataHub, you can use the Oracle ingestion source configuration. However, it’s important to note that direct support for parsing stored procedures for lineage extraction is not explicitly mentioned in the provided data sources. You may need to manually handle stored procedures or convert them into DML statements for lineage extraction.
Here’s a general approach to configure an ingestion recipe for Oracle, which you can adapt for stored procedures:
-
Create an Ingestion Recipe: Write a YAML configuration file that specifies the source of the data and the destination (DataHub).
-
Configure the Source: Use the
oracle
source type and provide necessary connection details. -
Run the Ingestion: Use the DataHub CLI to run the ingestion based on the recipe.
Example Configuration:
source:
type: oracle
config:
host_port: your_oracle_host:1521
database: your_database_name
username: your_username
password: your_password
data_dictionary_mode: ALL
include_views: true
include_tables: true
# Add any additional configuration for stored procedures if needed
sink:
type: datahub-rest
config:
server: http://localhost:8080
Running the Ingestion:
datahub ingest -c path_to_your_recipe.yaml
For more detailed guidance on Oracle ingestion, you can refer to the <https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/oracle|Oracle Ingestion Configuration>.
Would you like more detailed steps or additional examples on handling stored procedures specifically?
Sources:
- <https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/oracle|Oracle Ingestion Configuration> 0 button 0 button Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> Above code printed store proc how to show them on dashboard
I’m not able to find any data sources to answer your question. You might want to specify which dashboard tool you are using with DataHub. If this doesn’t help, please reach out to the DataHub support team for further assistance. 0 button 0 button Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> need code for custom ingestion
To create a custom ingestion source in DataHub, you can follow these steps. This involves defining a custom source configuration, implementing the source class, and using it in an ingestion recipe. Here’s a high-level guide with example code:
-
Define the Custom Source Configuration: Use Pydantic to define the configuration model for your custom source.
-
Implement the Custom Source Class: Extend the
Source
class from DataHub and implement the logic to read and parse your data. -
Use the Custom Source in an Ingestion Recipe: Reference your custom source class in a YAML recipe and run the ingestion using the DataHub CLI.
Step 1: Define the Custom Source Configuration
from pydantic import BaseModel
class CustomCSVSourceConfig(BaseModel):
file_path: str
Step 2: Implement the Custom Source Class
from datahub.ingestion.api.common import PipelineContext
from datahub.ingestion.api.source import Source, SourceReport
from datahub.metadata.schema_classes import MetadataChangeEventClass, DatasetSnapshotClass, DatasetPropertiesClass, AuditStampClass
from datahub.emitter.mce_builder import make_dataset_urn
import csv
class CustomCSVSource(Source):
def __init__(self, config: CustomCSVSourceConfig, ctx: PipelineContext):
super().__init__(ctx)
self.config = config
@classmethod
def create(cls, config_dict, ctx):
config = CustomCSVSourceConfig.parse_obj(config_dict)
return cls(config, ctx)
def get_workunits(self):
with open(self.config.file_path, mode='r') as file:
csv_reader = csv.DictReader(file)
for row in csv_reader:
dataroot = row['Dataroot']
datafolder = row['Datafolder']
datafield = row['Datafield']
# Create URNs
dataroot_urn = make_dataset_urn("custom", dataroot, "PROD")
datafolder_urn = make_dataset_urn("custom", f"{dataroot}.{datafolder}", "PROD")
datafield_urn = make_dataset_urn("custom", f"{dataroot}.{datafolder}.{datafield}", "PROD")
# Create DatasetProperties
dataset_properties = DatasetPropertiesClass(
description=f"Datafield {datafield} in folder {datafolder} under root {dataroot}",
customProperties={"source": "CSV"}
)
# Create AuditStamp
audit_stamp = AuditStampClass(
time=1640692800000, # Example timestamp
actor="urn:li:corpuser:ingestion"
)
# Create DatasetSnapshot
dataset_snapshot = DatasetSnapshotClass(
urn=datafield_urn,
aspects=[dataset_properties]
)
# Create MetadataChangeEvent
mce = MetadataChangeEventClass(
proposedSnapshot=dataset_snapshot
)
yield mce
def get_report(self):
return SourceReport()
Step 3: Use the Custom Source in an Ingestion Recipe
source:
type: custom_source.CustomCSVSource
config:
file_path: '/path/to/your/csvfile.csv'
sink:
type: datahub-rest
config:
server: '<http://localhost:8080>'
Run the Ingestion:
datahub ingest -c path/to/your/recipe.yaml
This setup will allow you to ingest data from a CSV file using a custom source. You can adapt this example to fit your specific data source and requirements.
For more detailed guidance, you can refer to the <https://datahubproject.io/docs/0.13.1/how/add-custom-ingestion-source|DataHub documentation on adding a custom ingestion source>.
Would you like more detailed steps or specific configurations for any of these points?
Sources:
- <https://datahubproject.io/docs/0.13.1/how/add-custom-ingestion-source|DataHub documentation on adding a custom ingestion source> 1 button 0 button Hint: Mention <@U06TM7M40RK> in the thread for followups.
Hi ! Did this interaction resolve your issue or do you still require assistance ?
Please tag @ RunLLM if you still need further assistance !
Or tag @Datahub Community Support if you require a person to take a look
Hope this helps ! ><