Resolving Issues with EKS Pod Restart and Secret Persistence

user-1 · February 17, 2025, 12:04am

hello Everyone , I m facing an issue that after eks pods restarted, the Secrets went unresolved. What’s the guideline or existing solution to get the secret persists. Thank you !!

datahub_team · February 17, 2025, 12:04am

Hey there!

<@U06TM7M40RK> Can you help with the questions in this thread?

Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!

user-3 · February 17, 2025, 12:04am

To address the issue of secrets not persisting across EKS pod restarts, there are several potential causes and solutions to consider:

Secret Persistence Across Deployments:
- According to <Slack Slack Discussion>, secrets might not persist correctly across pod restarts or redeployments. This is because secrets are typically stored in the container and not in a database, which could lead to loss upon redeployment.
- Solution: Ensure that secrets are properly managed and persisted across deployments. You might need to use a more robust secret management system integrated with Kubernetes, such as Kubernetes Secrets or an external secret manager like HashiCorp Vault.
Encryption Key Issue:
- As mentioned in <Slack Slack Thread>, GMS (Graph Metadata Service) auto-generates an encryption key used for encrypting secrets. If GMS is redeployed, the encryption key might change, causing previously input secrets to become inaccessible.
- Solution: Provide the encryption key yourself by updating the Helm chart. Ensure that the encryption key is consistent across deployments.
Environment Variables:
- If environment variables are being used for secrets and tokens, ensure they are correctly passed to all relevant pods.
- Solution: Verify that the environment variables are correctly set in your Helm charts and Kubernetes manifests. Check that these variables are available in all pods involved in the ingestion process.
Kubernetes Secrets:
- If using Kubernetes Secrets, ensure they are correctly mounted into your pods. You can create a Kubernetes secret and mount it as a volume to the relevant pods.
- Example Kubernetes manifest snippet:
```
apiVersion: v1
kind: Secret
metadata:
  name: my-secret
type: Opaque
data:
  MYSQL_USERNAME: base64-encoded-username
  MYSQL_PASSWORD: base64-encoded-password
```

Would you like more detailed steps or specific configurations for any of these points?

Sources:

<Slack Slack Discussion> 0 button 0 button Hint: Mention @runllm in the thread for followups.

user-1 · February 17, 2025, 12:04am

<@U06TM7M40RK> for kafka ingestion recipe, how can i filter out the internal topics ?

user-3 · February 17, 2025, 12:04am

To filter out internal Kafka topics during ingestion, you can use the topic_patterns configuration in your Kafka ingestion recipe. This allows you to specify regex patterns to include or exclude specific topics.

Here’s how you can configure your ingestion recipe to exclude internal topics:

Use the topic_patterns.deny Configuration:
- You can specify a list of regex patterns to exclude internal topics. Internal topics often start with an underscore (_), so you can use a pattern like ^_.* to exclude them.
Example Configuration:
- Below is an example of how you might configure your Kafka ingestion recipe to exclude internal topics:

source:
  type: "kafka"
  config:
    connection:
      bootstrap: "localhost:9092"
      schema_registry_url: "http://localhost:8081"
    topic_patterns:
      allow:
        - ".*"  # Allow all topics
      deny:
        - "^_.*"  # Deny all topics starting with an underscore (internal topics)

This configuration will allow all topics except those that start with an underscore, effectively filtering out internal topics.

If you need further assistance or more detailed guidance, feel free to ask!

Sources:

<https://datahubproject.io/docs/0.13.1/generated/ingestion/sources/kafka#config-details|Kafka Ingestion Configuration>
<https://github.com/datahub-project/datahub/blob/master/datahub-web-react/src/app/ingest/source/builder/RecipeForm/kafka.ts|GitHub Kafka Configuration> 0 button 0 button Hint: Mention @runllm in the thread for followups.

user-1 · February 17, 2025, 12:04am

<@U06TM7M40RK> how to remove the dataset that has already ingested ?

user-3 · February 17, 2025, 12:04am

To remove a dataset that has already been ingested into DataHub, you can use the DataHub CLI to perform either a soft delete or a hard delete. Here’s how you can do it:

Soft Delete:
- A soft delete will mark the dataset as removed, but it will still exist in the database. This is useful if you might want to restore the dataset later.
- Command:
```
datahub delete --urn "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_deleted,PROD)"
```
Hard Delete:
- A hard delete will permanently remove the dataset and all its metadata from the database.
- Command:
```
datahub delete --urn "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_deleted,PROD)" --hard
```
Using Queries:
- If you want to delete datasets based on a query, you can use the --query option. Be cautious with this as it might select more datasets than intended.
- Example:
```
datahub delete --entity-type dataset --query "_tmp"
```

Using the Python SDK:

You can also use the DataHub Python SDK to delete datasets programmatically.

Example:

from datahub.emitter.mce_builder import make_dataset_urn
from datahub.ingestion.graph.client import DatahubClientConfig, DataHubGraph

graph = DataHubGraph(config=DatahubClientConfig(server="http://localhost:8080"))
dataset_urn = make_dataset_urn(name="fct_users_created", platform="hive")

# Soft-delete the dataset
graph.delete_entity(urn=dataset_urn, hard=False)

Would you like more detailed steps or further assistance on any of these methods?

Sources:

<https://github.com/datahub-project/datahub/blob/master/docs/how/delete-metadata.md|DataHub CLI Delete Examples>
<https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/delete_dataset.py|DataHub Python SDK Example> 0 button 0 button Hint: Mention @runllm in the thread for followups.

user-1 · February 17, 2025, 12:04am

<@U06TM7M40RK> pls provide example of values.yaml if i need to use extraVolumeMounts to mount EBS for GMS

user-3 · February 17, 2025, 12:04am

To mount an Amazon EBS volume for the GMS (Global Metadata Service) in a Kubernetes deployment using Helm, you can specify the extraVolumes and extraVolumeMounts in your values.yaml file. Here’s an example configuration:

datahub-gms:
  enabled: true
  image:
    repository: linkedin/datahub-gms
    tag: "v0.13.0"  # Use the appropriate version
    pullPolicy: IfNotPresent

  resources:
    limits:
      memory: 1Gi
    requests:
      cpu: 500m
      memory: 1Gi

  service:
    type: ClusterIP
    port: 8080
    targetPort: http
    protocol: TCP
    name: http

  # Specify the extra volumes and volume mounts
  extraVolumes:
    - name: ebs-volume
      persistentVolumeClaim:
        claimName: ebs-pvc

  extraVolumeMounts:
    - name: ebs-volume
      mountPath: /mnt/ebs

# Define the PersistentVolumeClaim (PVC) for the EBS volume
persistence:
  enabled: true
  size: 20Gi  # Specify the size of the EBS volume
  storageClass: gp2  # Specify the storage class for EBS

# Example PVC definition
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ebs-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi
  storageClassName: gp2

Explanation:

extraVolumes: This section defines the additional volumes to be mounted. Here, we specify a volume named ebs-volume that uses a PersistentVolumeClaim (PVC) named ebs-pvc.
extraVolumeMounts: This section specifies where the volume should be mounted inside the container. In this example, the EBS volume is mounted at /mnt/ebs.
PersistentVolumeClaim (PVC): This is an example PVC definition that requests a 20Gi EBS volume with the gp2 storage class. You need to ensure that this PVC is created in your Kubernetes cluster.

Steps to Apply:

Create the PVC: Ensure that the PVC (ebs-pvc) is created in your Kubernetes cluster.
Update values.yaml: Add the extraVolumes and extraVolumeMounts configurations to your values.yaml.

Deploy with Helm: Use the following command to apply the changes:

helm upgrade --install datahub datahub/datahub --values values.yaml

Would you like more detailed steps or further assistance on any of these configurations?

Sources:

https://datahubproject.io/docs/0.13.1/metadata-models-custom#deploying-to-a-remote-kubernetes-server|https://datahubproject.io/docs/0.13.1/metadata-models-custom#deploying-to-a-remote-kubernetes-server
https://datahubproject.io/docs/0.14.0/metadata-models-custom#deploying-to-a-remote-kubernetes-server|https://datahubproject.io/docs/0.14.0/metadata-models-custom#deploying-to-a-remote-kubernetes-server 0 button 0 button Hint: Mention @runllm in the thread for followups.

datahub_team · February 17, 2025, 12:04am

Hi ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look

Hope this helps ! ><

Topic		Replies	Views
Resolving issues with secrets in Kubernetes and querying secrets with GraphQL in DataHub ingestion	4	14	February 17, 2025
Troubleshooting Tokens and Secrets Issue in DataHub Ingestion ingestion	3	89	June 24, 2024
Handling Kubernetes Pod Evictions and Resuming Ingestion Tasks in DataHub-actions ingestion	7	3	March 3, 2025
Deploying Datahub into Amazon EKS on Fargate: Challenges and Solutions all-things-deployment	9	77	March 4, 2024
Integrating Kibana with Datahub deployed on GKE using Helm all-things-deployment	1	80	March 4, 2024

Resolving Issues with EKS Pod Restart and Secret Persistence

Hey there!

<@U06TM7M40RK> Can you help with the questions in this thread?

Related topics