<@U06TM7M40RK> <@U07FV3UHE4D> @here ,
I want to write an ingestion script for my data source which is AWS Redshift and Glue, for the selected source I want to enable stats field. enable it for 1-2 table only will also work, I tried 2-3 approach, but it didn’t enable. please help me.
Thanks in advance.
To enable the stats field for specific tables in AWS Redshift and Glue, you need to ensure that profiling is correctly configured in your ingestion recipe. Here are the steps you can follow:
-
Redshift Profiling Configuration:
- According to the <https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/redshift|Redshift documentation>, profiling can be enabled by setting the
profiling
configuration in your ingestion recipe. You can specifyprofile_table_level_only
to true if you want to limit profiling to table-level statistics without reading the data.
Example configuration for Redshift:
source: type: redshift config: host_port: "your-redshift-cluster:5439" database: "your_database" username: "your_username" password: "your_password" profiling: enabled: true profile_table_level_only: true table_pattern: allow: - "your_table_name" # Specify the table you want to profile
- According to the <https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/redshift|Redshift documentation>, profiling can be enabled by setting the
-
Glue Profiling Configuration:
- For AWS Glue, ensure that profiling is enabled in the configuration. According to the <https://github.com/datahub-project/datahub/pull/10858|GitHub pull request>, you can enable profiling by setting the
enabled
flag in theGlueProfilingConfig
.
Example configuration for Glue:
source: type: glue config: aws_region: "your-region" aws_access_key_id: "your-access-key-id" aws_secret_access_key: "your-secret-access-key" profiling: enabled: true profile_table_level_only: true table_pattern: allow: - "your_table_name" # Specify the table you want to profile
- For AWS Glue, ensure that profiling is enabled in the configuration. According to the <https://github.com/datahub-project/datahub/pull/10858|GitHub pull request>, you can enable profiling by setting the
-
Run the Ingestion:
- Use the DataHub CLI to run the ingestion with the configured recipe. Ensure that the CLI is installed and configured correctly.
datahub ingest -c your_recipe.yaml
If you have already tried these configurations and the stats field is still not enabled, please ensure that:
- The tables you are trying to profile are accessible and have the necessary permissions.
- The DataHub version you are using supports the profiling features for Redshift and Glue.
Would you like more detailed steps or additional troubleshooting tips?
Sources:
- <https://datahubproject.io/docs/0.14.0/generated/ingestion/sources/redshift|Redshift Documentation>
- <https://github.com/datahub-project/datahub/pull/10858|GitHub Pull Request on Glue Profiling>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
Can you describe your problem more in detail? Did the table stats only appear partially or not at all? In addition, it helps significantly if you share your recipe
Hi ! Did this interaction resolve your issue or do you still require assistance ?
Please tag @ RunLLM if you still need further assistance !
Or tag @Datahub Community Support if you require a person to take a look
Hope this helps ! ><