Creating an Ingestion Source and Handling Secrets in DataHub API Calls

user-3 · January 13, 2025, 12:04am

<@U06TM7M40RK> Can we create a ingestion source using api call.

user-1 · January 13, 2025, 12:04am

Yes, you can create an ingestion source using an API call by utilizing the DataHub GraphQL API. Specifically, you can use the createIngestionSource mutation to create a new ingestion source.

Here is an example of how to do this using a GraphQL mutation:

mutation {
  createIngestionSource(input: {
    name: "My Test Ingestion Source",
    type: "mysql",
    description: "My ingestion source description",
    schedule: {interval: "*/5 * * * *", timezone: "UTC"},
    config: {
      recipe: "{\"source\":{\"type\":\"mysql\",\"config\":{\"include_tables\":true,\"database\":null,\"password\":\"${MYSQL_PASSWORD}\",\"profiling\":{\"enabled\":false},\"host_port\":null,\"include_views\":true,\"username\":\"${MYSQL_USERNAME}\"}},\"pipeline_name\":\"urn:li:dataHubIngestionSource:f38bd060-4ea8-459c-8f24-a773286a2927\"}",
      version: "0.8.18",
      executorId: "mytestexecutor"
    }
  })
}

You can send this mutation using a GraphQL client like curl, Postman, or any GraphQL client library in your preferred programming language.

Would you like more detailed steps or additional information on any specific part of the process?

Sources:

<https://github.com/datahub-project/datahub/blob/master/docs/ui-ingestion.md|DataHub UI Ingestion Documentation> 0 button 0 button Hint: Mention @runllm in the thread for followups.

user-3 · January 13, 2025, 12:05am

<@U06TM7M40RK> if I am using postman from different machine will it work datahub secrets or I will have to hardcode them in query

user-1 · January 13, 2025, 12:05am

When using Postman or any other client from a different machine to interact with DataHub’s API, you should not hardcode sensitive information such as secrets directly in your queries. Instead, you can leverage DataHub’s secret management capabilities to securely handle these values.

According to the <Ingestion | DataHub documentation>, secrets are securely stored within DataHub’s storage layer and can be referenced in your ingestion recipes using variable substitution. This means that you can create secrets in DataHub and reference them in your queries without exposing the actual secret values.

Here’s how you can manage secrets:

Create Secrets in DataHub: Use the DataHub UI to create secrets. Navigate to the ‘Secrets’ tab and create a new secret by providing a unique name and the value to be encrypted.
Reference Secrets in Queries: In your ingestion recipes or API calls, reference these secrets using their names. For example, use ${MYSQL_USERNAME} and ${MYSQL_PASSWORD} in your configuration.
Ensure Proper Permissions: Make sure that the user or service account making the API call has the necessary permissions to access these secrets. Only users with the Manage Secrets platform privilege can retrieve plaintext secret values using the GraphQL API.

By following these steps, you can securely manage and reference secrets without hardcoding them in your queries.

Would you like more detailed steps or further assistance on any specific part of the process?

Sources:

<Ingestion | DataHub a Secret>
<https://github.com/datahub-project/datahub/blob/master/docs/ui-ingestion.md|DataHub UI Ingestion Documentation> 0 button 0 button Hint: Mention @runllm in the thread for followups.

user-3 · January 13, 2025, 12:05am

<@U06TM7M40RK> When should I run query on GMS endpoint of some other endpoint

user-1 · January 13, 2025, 12:05am

It could be possible that deciding whether to run a query on the GMS (Generalized Metadata Service) endpoint or another endpoint depends on the specific use case and the type of data or metadata you need to access or manipulate.

Use GMS Endpoint:
- If you need to perform CRUD (Create, Read, Update, Delete) operations on metadata entities, the GMS endpoint is appropriate. According to the <https://datahubproject.io/docs/0.14.0/what/gms|DataHub documentation>, GMS provides a Rest.li API for accessing and managing metadata entities and their aspects.
- When you need to interact with the metadata graph, such as querying relationships or lineage, the GMS endpoint is suitable.
Use GraphQL Endpoint:
- If you require a more flexible and efficient querying mechanism, especially for complex queries involving multiple entities, the GraphQL endpoint might be more appropriate. The <https://datahubproject.io/docs/0.14.0/datahub-graphql-core|DataHub GraphQL API> allows for graph-based representation and querying of metadata.
- When you need to perform operations that involve multiple entities or aspects in a single query, GraphQL can be more efficient.
Use OpenAPI Endpoint:
- If you are integrating with external systems that prefer RESTful APIs, you might consider using the OpenAPI endpoints. The <https://github.com/datahub-project/datahub/blob/master/docs/api/openapi/openapi-usage-guide.md|DataHub OpenAPI Guide> provides details on how to use these endpoints for various operations.

It might be helpful to consider the specific requirements of your application, such as the complexity of the queries, the need for real-time updates, and the preferred API style (REST vs. GraphQL), when deciding which endpoint to use.

Would you like more detailed steps or further assistance on any specific part of this process?

Sources:

<https://datahubproject.io/docs/0.14.0/what/gms|DataHub documentation>
<https://datahubproject.io/docs/0.14.0/datahub-graphql-core|DataHub GraphQL API>
<https://github.com/datahub-project/datahub/blob/master/docs/api/openapi/openapi-usage-guide.md|DataHub OpenAPI Guide> 0 button 0 button Hint: Mention @runllm in the thread for followups.

datahub_team · January 13, 2025, 12:05am

Hi ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look

Hope this helps ! ><

Topic		Replies	Views
Updating and Mass Changing Parameters in DataHub Ingestion Sources ingestion	6	10	July 29, 2024
Storing and Retrieving Secrets in DataHub for Ingestion Source ingestion	4	115	July 22, 2024
Using Secrets in an Ingestion Recipe with DataHub ingestion	2	14	May 5, 2025
Using DataHubGraph for Monitoring and Analytics ingestion	10	6	April 28, 2025
Creating a Custom Ingestion Source in DataHub with CSV Data ingestion	5	52	August 12, 2024

Creating an Ingestion Source and Handling Secrets in DataHub API Calls

Related topics