Datahub: Ingesting Metadata from APIs through OpenAPI

Original Slack Thread

Hello, sorry for the basic question. Does Datahub allow for the ingestion of metadata from APIs?

Hey there! :wave:

<@U06TM7M40RK> Can you help with the questions in this thread?

:information_source: Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!

Yes, DataHub does allow for the ingestion of metadata from APIs. You can use the OpenAPI connection to ingest metadata into DataHub. This involves interacting with the /entities endpoint to upsert metadata aspects. You need to define the metadata aspects in JSON format and send a POST request to the /entities endpoint to ingest the metadata.

Here’s a brief overview of the process:

  1. Define Metadata Aspects: Each metadata aspect, such as SchemaMetadata or DatasetProperties, should be defined in JSON format.
  2. Use the /entities Endpoint: This endpoint is used for both reading and writing to the metadata graph. You can upsert metadata aspects by sending a POST request to this endpoint.

For more detailed information, you can refer to the <https://datahubproject.io/docs/0.13.1/api/openapi/openapi-usage-guide|DataHub OpenAPI Guide> and the <https://github.com/datahub-project/datahub/blob/master/docs/api/openapi/openapi-usage-guide.md|OpenAPI usage examples>.

Would you like more detailed steps or additional examples on any specific part of this process?

Sources:

<@U06TM7M40RK> so to be clear if i have a lot of APIs in our company, i can use the OpenAPI connection to connect to each of my APIs and subsequently the API will be catalogued ?

To catalog APIs using DataHub, you can indeed use the OpenAPI connection. However, it’s important to note that the OpenAPI ingestion is typically used to ingest metadata about the APIs themselves, rather than directly cataloging the APIs as entities. Here’s how you can approach this:

  1. Ingest Metadata: You can ingest metadata about your APIs using the OpenAPI /entities endpoint. This involves defining metadata aspects such as SchemaMetadata or DatasetProperties in JSON format and sending them to DataHub.

  2. Cataloging APIs: While you can ingest metadata about APIs, DataHub does not inherently treat APIs as first-class entities like datasets or dashboards. You might need to define custom metadata models if you want to represent APIs as entities within DataHub.

  3. Use Cases: If your goal is to document and manage metadata about the APIs (e.g., endpoints, parameters, responses), the OpenAPI ingestion can be useful. However, if you want to track APIs as entities with lineage and relationships, you might need to extend DataHub’s metadata model.

For more detailed guidance, you can refer to the <https://datahubproject.io/docs/0.14.0/api/openapi/openapi-usage-guide|DataHub OpenAPI Guide> and consider defining <https://datahubproject.io/docs/metadata-models-custom/|custom metadata models>.

Would you like more detailed steps or additional examples on any specific part of this process?

Sources:

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><