how to ingress Databricks, give me the example to set include_usage_statistics
to False
<@U06TM7M40RK>
To configure DataHub to ingest data from Databricks and set include_usage_statistics
to false
, you need to modify your ingestion recipe accordingly. Here is an example configuration:
source:
type: unity-catalog
config:
workspace_url: 'https://your-databricks-workspace-url'
token: 'your-databricks-token'
include_hive_metastore: true
warehouse_id: 'your-warehouse-id'
include_usage_statistics: false # Set this to false to disable usage statistics
# other configurations
In this configuration:
workspace_url
is the URL of your Databricks workspace.token
is your Databricks personal access token.include_hive_metastore
is set totrue
to include the Hive Metastore, but you can set it tofalse
if not needed.warehouse_id
is required ifinclude_hive_metastore
is set totrue
.include_usage_statistics
is explicitly set tofalse
to disable the ingestion of usage statistics.
Make sure to replace placeholders like 'https://your-databricks-workspace-url'
, 'your-databricks-token'
, and 'your-warehouse-id'
with your actual Databricks details.
Would you like more detailed steps or further assistance on a specific part of the process?
Sources:
- https://datahubspace.slack.com/archives/CUMUWQU66/p1719896965.332949|https://datahubspace.slack.com/archives/CUMUWQU66/p1719896965.332949
- https://datahubproject.io/docs/0.13.1/generated/ingestion/sources/databricks#config-details|https://datahubproject.io/docs/0.13.1/generated/ingestion/sources/databricks#config-details 0 button 0 button Hint: Mention @runllm in the thread for followups.
profiling:
method: ge
enabled: true
i want to only get scehma x and y in catalog z <@U06TM7M40RK>
To configure DataHub to profile only specific schemas (e.g., x
and y
) within a specific catalog (e.g., z
), you can use the schema_pattern
and catalog_pattern
configuration options in your ingestion recipe. Here’s how you can set it up:
source:
type: unity-catalog
config:
workspace_url: 'https://your-databricks-workspace-url'
token: 'your-databricks-token'
include_hive_metastore: true
warehouse_id: 'your-warehouse-id'
profiling:
method: "ge"
enabled: true
catalog_pattern:
allow:
- '^z$' # Only include catalog 'z'
schema_pattern:
allow:
- '^x$' # Only include schema 'x'
- '^y$' # Only include schema 'y'
# other configurations
In this configuration:
catalog_pattern
is set to allow only the catalog namedz
.schema_pattern
is set to allow only the schemas namedx
andy
.- The
profiling
section is configured to use thege
method and is enabled.
Make sure to replace placeholders like 'https://your-databricks-workspace-url'
, 'your-databricks-token'
, and 'your-warehouse-id'
with your actual Databricks details.
Would you like more detailed steps or further assistance on a specific part of the process?
Sources:
- <https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/unity/config.py|Databricks Unity Catalog Configuration> 0 button 0 button Hint: Mention @runllm in the thread for followups.
will it impact other lineage?
<@U06TM7M40RK> will it impact other lineage. I only want to profile in limited schema, but I want to get all lineages
It seems likely that configuring your ingestion recipe to profile only specific schemas using schema_pattern
and catalog_pattern
should not impact the lineage extraction for other schemas or catalogs. Profiling and lineage are generally separate processes in DataHub, and you can configure them independently.
To ensure that you still capture all lineage information while limiting profiling to specific schemas, you can configure your ingestion recipe as follows:
- Set Profiling Patterns: Use
schema_pattern
andcatalog_pattern
to limit profiling to specific schemas and catalogs. - Ensure Lineage is Enabled: Make sure that lineage-related configurations are set to include all necessary tables and views for lineage extraction.
Here’s an example configuration:
source:
type: unity-catalog
config:
workspace_url: 'https://your-databricks-workspace-url'
token: 'your-databricks-token'
include_hive_metastore: true
warehouse_id: 'your-warehouse-id'
profiling:
method: "ge"
enabled: true
catalog_pattern:
allow:
- '^z$' # Only profile catalog 'z'
schema_pattern:
allow:
- '^x$' # Only profile schema 'x'
- '^y$' # Only profile schema 'y'
include_table_lineage: true # Ensure lineage is enabled
include_view_lineage: true # Ensure view lineage is enabled
# other configurations
In this setup:
- Profiling is limited to schemas
x
andy
in catalogz
. - Lineage extraction is enabled for all tables and views, ensuring that you capture lineage information across your data ecosystem.
Would you like more detailed steps or further assistance on a specific part of the process?
Sources:
- <https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/unity/config.py|Databricks Unity Catalog Configuration> 0 button 0 button Hint: Mention @runllm in the thread for followups.
Hi ! Did this interaction resolve your issue or do you still require assistance ?
Please tag @ RunLLM if you still need further assistance !
Or tag @Datahub Community Support if you require a person to take a look
Hope this helps ! ><