I’m ingesting json schemas through the Datahub CLI , I get a success and when I view it on the datahub UI, I can see only the raw data without the structure. Also, no lineage visible.
This is my recipe yml. All schemas are available in the Entity folder including the referenced schemas as well.
pipeline_name: json_schema_ingestion
source:
type: json-schema
config:
path: “C:/datahub/Entity/”
use_id_as_base_uri: true
platform: TMFSchemaRegistry # e.g. schemaregistry
# platform_instance: <add a platform_instance if there are multiple schema repositories>
stateful_ingestion:
enabled: false # recommended to have this turned on
To address the issue of seeing only raw data without structure and no lineage in the DataHub UI after ingesting JSON schemas, let’s consider a few potential areas to investigate:
Schema Structure and References: Ensure that the JSON schemas are correctly structured and that all references are properly resolved. According to the <https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/json-schema|JSON Schemas documentation>, the ingestion process should handle reference resolution based on the $ref keyword. Verify that all referenced schemas are accessible and correctly linked.
Stateful Ingestion: You have stateful_ingestion set to false. While this is not directly related to the structure visibility, enabling stateful ingestion can help manage the lifecycle of entities and might affect how changes are reflected in the UI. Consider setting stateful_ingestion to true to see if it impacts the visibility of the structure.
Platform Configuration: Ensure that the platform and platform_instance (if applicable) are correctly configured. The platform should match the environment where the schemas are intended to be used.
DataHub Version and Compatibility: Make sure that the DataHub version you are using is compatible with the features you expect. The <https://datahubproject.io/docs/0.14.0/metadata-ingestion/cli-ingestion|CLI Ingestion documentation> provides guidance on setting up and running ingestion pipelines.
Debugging and Logs: Run the ingestion process with debug mode enabled to capture detailed logs. This can provide insights into any issues during the ingestion process that might affect the visibility of the structure and lineage.
UI Configuration: Check if there are any UI settings or filters that might be affecting the display of the structure and lineage.
Would you like more detailed guidance on any of these steps or further assistance with troubleshooting this issue?
<@U06TM7M40RK> The json schema is as follows
{
“$schema”: “http://json-schema.org/draft-07/schema#”,
“$id”: “HourPeriod.schema.json”,
“title”: “HourPeriod”,
“definitions”: {
“HourPeriod”: {
“$id”: “#HourPeriod”,
“type”: “object”,
“description”: “Hour interval”,
“properties”: {
“endHour”: {
“type”: “string”,
“description”: “The time when the status ends applying”
},
“startHour”: {
“type”: “string”,
“description”: “The time when the status starts applying”
}
},
“allOf”: [
{
“$ref”: “./Extensible.schema.json#Extensible”
}
]
}
}
}
Based on the JSON schema you provided, it seems like you are using a reference to another schema (Extensible.schema.json). Here are a few things to check and consider:
Reference Resolution: Ensure that the referenced schema (Extensible.schema.json) is located in the same directory as HourPeriod.schema.json or in a path that is accessible based on your configuration. The $ref keyword is used to include definitions from other schemas, and the path must be correct for the reference to be resolved.
Schema Structure: The structure of your JSON schema seems correct, with a $schema identifier, $id, and a definitions section. Make sure that the Extensible.schema.json is also correctly structured and accessible.
Ingestion Configuration: Verify that your ingestion configuration in the recipe YAML file is set up to handle references. The use_id_as_base_uri: true setting should help with resolving references based on the $id property.
Debugging: If the structure is still not visible in the DataHub UI, consider enabling debug mode during ingestion to capture detailed logs. This can help identify any issues with reference resolution or schema parsing.
DataHub Version: Ensure that you are using a version of DataHub that supports the features you are using, such as JSON schema ingestion and reference resolution.
If you need more detailed guidance on any of these steps or further assistance with troubleshooting, please let me know!
Given that both your HourPeriod.schema.json and Extensible.schema.json are in the same folder, and you are using acryl-datahub version 0.14.1, here are a few additional steps and considerations to ensure proper ingestion and visibility in the DataHub UI:
Check Reference Path: Ensure that the reference path in your HourPeriod.schema.json is correct. The reference "$ref": "./Extensible.schema.json#Extensible" should correctly point to the Extensible definition within Extensible.schema.json. Double-check the spelling and case sensitivity of the file name and reference.
Schema Validation: Validate both JSON schemas using a JSON schema validator to ensure there are no syntax errors or issues with the schema definitions.
Ingestion Logs: Run the ingestion process with increased verbosity or debug mode to capture detailed logs. This can help identify any issues with reference resolution or schema parsing. Look for any warnings or errors related to schema references.
DataHub UI Settings: After ingestion, ensure that there are no UI settings or filters that might be affecting the display of the schema structure and lineage.
Stateful Ingestion: Although you have stateful_ingestion set to false, consider enabling it to see if it impacts the visibility of the structure. Stateful ingestion can help manage the lifecycle of entities and might affect how changes are reflected in the UI.
Platform and Instance Configuration: Verify that the platform and platform_instance (if applicable) are correctly configured in your recipe. This ensures that the schemas are associated with the correct environment.
To run the DataHub ingest command in debug mode, you can use the --debug option. This will enable detailed logging, which can help you diagnose issues during the ingestion process. Here’s how you can use it:
This command will execute the ingestion process using the specified recipe file (your_ingestion_recipe.yml) and provide detailed debug output in the console.
If you need further assistance or more detailed steps, feel free to ask!