hello Everyone , I m facing an issue that after eks pods restarted, the Secrets went unresolved. What’s the guideline or existing solution to get the secret persists. Thank you !!
Hey there! 
<@U06TM7M40RK> Can you help with the questions in this thread?
Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!
To address the issue of secrets not persisting across EKS pod restarts, there are several potential causes and solutions to consider:
-
Secret Persistence Across Deployments:
- According to <Slack Slack Discussion>, secrets might not persist correctly across pod restarts or redeployments. This is because secrets are typically stored in the container and not in a database, which could lead to loss upon redeployment.
- Solution: Ensure that secrets are properly managed and persisted across deployments. You might need to use a more robust secret management system integrated with Kubernetes, such as Kubernetes Secrets or an external secret manager like HashiCorp Vault.
-
Encryption Key Issue:
- As mentioned in <Slack Slack Thread>, GMS (Graph Metadata Service) auto-generates an encryption key used for encrypting secrets. If GMS is redeployed, the encryption key might change, causing previously input secrets to become inaccessible.
- Solution: Provide the encryption key yourself by updating the Helm chart. Ensure that the encryption key is consistent across deployments.
-
Environment Variables:
- If environment variables are being used for secrets and tokens, ensure they are correctly passed to all relevant pods.
- Solution: Verify that the environment variables are correctly set in your Helm charts and Kubernetes manifests. Check that these variables are available in all pods involved in the ingestion process.
-
Kubernetes Secrets:
- If using Kubernetes Secrets, ensure they are correctly mounted into your pods. You can create a Kubernetes secret and mount it as a volume to the relevant pods.
- Example Kubernetes manifest snippet:
apiVersion: v1 kind: Secret metadata: name: my-secret type: Opaque data: MYSQL_USERNAME: base64-encoded-username MYSQL_PASSWORD: base64-encoded-password
Would you like more detailed steps or specific configurations for any of these points?
Sources:
- <Slack Slack Discussion>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> for kafka ingestion recipe, how can i filter out the internal topics ?
To filter out internal Kafka topics during ingestion, you can use the topic_patterns
configuration in your Kafka ingestion recipe. This allows you to specify regex patterns to include or exclude specific topics.
Here’s how you can configure your ingestion recipe to exclude internal topics:
-
Use the
topic_patterns.deny
Configuration:- You can specify a list of regex patterns to exclude internal topics. Internal topics often start with an underscore (
_
), so you can use a pattern like^_.*
to exclude them.
- You can specify a list of regex patterns to exclude internal topics. Internal topics often start with an underscore (
-
Example Configuration:
- Below is an example of how you might configure your Kafka ingestion recipe to exclude internal topics:
source:
type: "kafka"
config:
connection:
bootstrap: "localhost:9092"
schema_registry_url: "http://localhost:8081"
topic_patterns:
allow:
- ".*" # Allow all topics
deny:
- "^_.*" # Deny all topics starting with an underscore (internal topics)
This configuration will allow all topics except those that start with an underscore, effectively filtering out internal topics.
If you need further assistance or more detailed guidance, feel free to ask!
Sources:
- <https://datahubproject.io/docs/0.13.1/generated/ingestion/sources/kafka#config-details|Kafka Ingestion Configuration>
- <https://github.com/datahub-project/datahub/blob/master/datahub-web-react/src/app/ingest/source/builder/RecipeForm/kafka.ts|GitHub Kafka Configuration>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> how to remove the dataset that has already ingested ?
To remove a dataset that has already been ingested into DataHub, you can use the DataHub CLI to perform either a soft delete or a hard delete. Here’s how you can do it:
-
Soft Delete:
- A soft delete will mark the dataset as removed, but it will still exist in the database. This is useful if you might want to restore the dataset later.
- Command:
datahub delete --urn "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_deleted,PROD)"
-
Hard Delete:
- A hard delete will permanently remove the dataset and all its metadata from the database.
- Command:
datahub delete --urn "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_deleted,PROD)" --hard
-
Using Queries:
- If you want to delete datasets based on a query, you can use the
--query
option. Be cautious with this as it might select more datasets than intended. - Example:
datahub delete --entity-type dataset --query "_tmp"
- If you want to delete datasets based on a query, you can use the
-
Using the Python SDK:
- You can also use the DataHub Python SDK to delete datasets programmatically.
- Example:
from datahub.emitter.mce_builder import make_dataset_urn from datahub.ingestion.graph.client import DatahubClientConfig, DataHubGraph graph = DataHubGraph(config=DatahubClientConfig(server="http://localhost:8080")) dataset_urn = make_dataset_urn(name="fct_users_created", platform="hive") # Soft-delete the dataset graph.delete_entity(urn=dataset_urn, hard=False)
Would you like more detailed steps or further assistance on any of these methods?
Sources:
- <https://github.com/datahub-project/datahub/blob/master/docs/how/delete-metadata.md|DataHub CLI Delete Examples>
- <https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/delete_dataset.py|DataHub Python SDK Example>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> pls provide example of values.yaml if i need to use extraVolumeMounts to mount EBS for GMS
To mount an Amazon EBS volume for the GMS (Global Metadata Service) in a Kubernetes deployment using Helm, you can specify the extraVolumes
and extraVolumeMounts
in your values.yaml
file. Here’s an example configuration:
datahub-gms:
enabled: true
image:
repository: linkedin/datahub-gms
tag: "v0.13.0" # Use the appropriate version
pullPolicy: IfNotPresent
resources:
limits:
memory: 1Gi
requests:
cpu: 500m
memory: 1Gi
service:
type: ClusterIP
port: 8080
targetPort: http
protocol: TCP
name: http
# Specify the extra volumes and volume mounts
extraVolumes:
- name: ebs-volume
persistentVolumeClaim:
claimName: ebs-pvc
extraVolumeMounts:
- name: ebs-volume
mountPath: /mnt/ebs
# Define the PersistentVolumeClaim (PVC) for the EBS volume
persistence:
enabled: true
size: 20Gi # Specify the size of the EBS volume
storageClass: gp2 # Specify the storage class for EBS
# Example PVC definition
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: ebs-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
storageClassName: gp2
Explanation:
- extraVolumes: This section defines the additional volumes to be mounted. Here, we specify a volume named
ebs-volume
that uses a PersistentVolumeClaim (PVC) namedebs-pvc
. - extraVolumeMounts: This section specifies where the volume should be mounted inside the container. In this example, the EBS volume is mounted at
/mnt/ebs
. - PersistentVolumeClaim (PVC): This is an example PVC definition that requests a 20Gi EBS volume with the
gp2
storage class. You need to ensure that this PVC is created in your Kubernetes cluster.
Steps to Apply:
- Create the PVC: Ensure that the PVC (
ebs-pvc
) is created in your Kubernetes cluster. - Update
values.yaml
: Add theextraVolumes
andextraVolumeMounts
configurations to yourvalues.yaml
. - Deploy with Helm: Use the following command to apply the changes:
helm upgrade --install datahub datahub/datahub --values values.yaml
Would you like more detailed steps or further assistance on any of these configurations?
Sources:
- https://datahubproject.io/docs/0.13.1/metadata-models-custom#deploying-to-a-remote-kubernetes-server|https://datahubproject.io/docs/0.13.1/metadata-models-custom#deploying-to-a-remote-kubernetes-server
- https://datahubproject.io/docs/0.14.0/metadata-models-custom#deploying-to-a-remote-kubernetes-server|https://datahubproject.io/docs/0.14.0/metadata-models-custom#deploying-to-a-remote-kubernetes-server
0 button
0 button
Hint: Mention @runllm in the thread for followups.
Hi ! Did this interaction resolve your issue or do you still require assistance ?
Please tag @ RunLLM if you still need further assistance !
Or tag @Datahub Community Support if you require a person to take a look
Hope this helps ! ><