Creating an Ingestion Source and Handling Secrets in DataHub API Calls

Original Slack Thread

<@U06TM7M40RK> Can we create a ingestion source using api call.

Yes, you can create an ingestion source using an API call by utilizing the DataHub GraphQL API. Specifically, you can use the createIngestionSource mutation to create a new ingestion source.

Here is an example of how to do this using a GraphQL mutation:

mutation {
  createIngestionSource(input: {
    name: "My Test Ingestion Source",
    type: "mysql",
    description: "My ingestion source description",
    schedule: {interval: "*/5 * * * *", timezone: "UTC"},
    config: {
      recipe: "{\"source\":{\"type\":\"mysql\",\"config\":{\"include_tables\":true,\"database\":null,\"password\":\"${MYSQL_PASSWORD}\",\"profiling\":{\"enabled\":false},\"host_port\":null,\"include_views\":true,\"username\":\"${MYSQL_USERNAME}\"}},\"pipeline_name\":\"urn:li:dataHubIngestionSource:f38bd060-4ea8-459c-8f24-a773286a2927\"}",
      version: "0.8.18",
      executorId: "mytestexecutor"
    }
  })
}

You can send this mutation using a GraphQL client like curl, Postman, or any GraphQL client library in your preferred programming language.

Would you like more detailed steps or additional information on any specific part of the process?

Sources:

<@U06TM7M40RK> if I am using postman from different machine will it work datahub secrets or I will have to hardcode them in query

When using Postman or any other client from a different machine to interact with DataHub’s API, you should not hardcode sensitive information such as secrets directly in your queries. Instead, you can leverage DataHub’s secret management capabilities to securely handle these values.

According to the <Ingestion | DataHub documentation>, secrets are securely stored within DataHub’s storage layer and can be referenced in your ingestion recipes using variable substitution. This means that you can create secrets in DataHub and reference them in your queries without exposing the actual secret values.

Here’s how you can manage secrets:

  1. Create Secrets in DataHub: Use the DataHub UI to create secrets. Navigate to the ‘Secrets’ tab and create a new secret by providing a unique name and the value to be encrypted.

  2. Reference Secrets in Queries: In your ingestion recipes or API calls, reference these secrets using their names. For example, use ${MYSQL_USERNAME} and ${MYSQL_PASSWORD} in your configuration.

  3. Ensure Proper Permissions: Make sure that the user or service account making the API call has the necessary permissions to access these secrets. Only users with the Manage Secrets platform privilege can retrieve plaintext secret values using the GraphQL API.

By following these steps, you can securely manage and reference secrets without hardcoding them in your queries.

Would you like more detailed steps or further assistance on any specific part of the process?

Sources:

<@U06TM7M40RK> When should I run query on GMS endpoint of some other endpoint

It could be possible that deciding whether to run a query on the GMS (Generalized Metadata Service) endpoint or another endpoint depends on the specific use case and the type of data or metadata you need to access or manipulate.

  1. Use GMS Endpoint:

    • If you need to perform CRUD (Create, Read, Update, Delete) operations on metadata entities, the GMS endpoint is appropriate. According to the <https://datahubproject.io/docs/0.14.0/what/gms|DataHub documentation>, GMS provides a Rest.li API for accessing and managing metadata entities and their aspects.
    • When you need to interact with the metadata graph, such as querying relationships or lineage, the GMS endpoint is suitable.
  2. Use GraphQL Endpoint:

    • If you require a more flexible and efficient querying mechanism, especially for complex queries involving multiple entities, the GraphQL endpoint might be more appropriate. The <https://datahubproject.io/docs/0.14.0/datahub-graphql-core|DataHub GraphQL API> allows for graph-based representation and querying of metadata.
    • When you need to perform operations that involve multiple entities or aspects in a single query, GraphQL can be more efficient.
  3. Use OpenAPI Endpoint:

It might be helpful to consider the specific requirements of your application, such as the complexity of the queries, the need for real-time updates, and the preferred API style (REST vs. GraphQL), when deciding which endpoint to use.

Would you like more detailed steps or further assistance on any specific part of this process?

Sources:

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><