<@U06TM7M40RK> Hello Team
While evaluating DataHub, I’m trying to ingest Json schemas using CLI. My recipe file is as follows
pipeline_name: json_schema_ingestion
source:
type: json-schema
config:
path: https://json.schemastore.org/petstore-v1.0.json # e.g. https://json.schemastore.org/petstore-v1.0.json
platform: SchemaRegistry # e.g. schemaregistry
# platform_instance:
stateful_ingestion:
enabled: true # recommended to have this turned on
Error log while executing the command
datahub ingest -c
I’m getting the error as placed in the log file
Please can you help figuring out the error.
To help you troubleshoot the issue with ingesting JSON schemas using the DataHub CLI, let’s break down the steps and potential issues based on the provided data sources.
Recipe File Configuration
Your recipe file seems to be correctly formatted for ingesting JSON schemas. Here is a quick review of the key components:
pipeline_name: json_schema_ingestion
source:
type: json-schema
config:
path: <https://json.schemastore.org/petstore-v1.0.json> # e.g. <https://json.schemastore.org/petstore-v1.0.json>
platform: SchemaRegistry # e.g. schemaregistry
# platform_instance: <add a platform_instance if there are multiple schema repositories>
stateful_ingestion:
enabled: true # recommended to have this turned on
Common Issues and Solutions
Path Configuration: Ensure that the path field is correctly pointing to a valid JSON schema URL. The URL should be accessible and return a valid JSON schema.
Stateful Ingestion: The stateful_ingestion configuration is optional but recommended. Ensure that it is correctly configured.
Error Log Analysis
The error log you provided seems to be an HTML document rather than a JSON or plain text log. This might indicate that the URL provided in the path field is not returning the expected JSON schema but rather an HTML page. This can happen if the URL is incorrect or if there is a redirection happening.
Steps to Resolve
Verify URL: Ensure that the URL in the path field is correct and accessible. You can test this by opening the URL in a web browser to see if it returns the expected JSON schema.
Check Platform Registration: Ensure that the platform specified in the platform field is registered in DataHub. You can use the put platform command to register your custom platform if needed.
Review Logs: If the error persists, review the DataHub CLI logs for more detailed error messages. Ensure that the logs are in a readable format (JSON or plain text) to help identify the issue.
Example Command
Here is an example command to run the ingestion:
datahub ingest -c <path_to_recipe_file>
Additional Help
If you need more detailed guidance or if the issue persists, please provide the exact error message from the DataHub CLI logs in a readable format.
Would you like more detailed guidance on any specific part of the process?
The error log file is provided in txt format. the error is as follows {datahub.entrypoints:218} - Command failed: Failed to configure the source (json-schema): Subscripted generics cannot be used with class and instance checks