Is it possible to omit or not ingest all tables whose names start with "DEV_"?```
Hey there! Make sure your message includes the following information if relevant, so we can help more effectively!
- Are you using UI or CLI for ingestion?
- Which DataHub version are you using? (e.g. 0.12.0)
- What data source(s) are you integrating with DataHub? (e.g. BigQuery)
For sure, looks like the teradata integration has the _table_pattern.deny field.
you can use regex to match anything starting with DEV and ignore it, which I think would look like this:
deny:
- 'DEV_.*'```
Thank you very much <@U05JJ9WESHL>. I’ve already taken it completely without any restrictions. Now I want to delete the tables whose names start with “DEV_”. If I now ingest with the restriction:
deny:
- 'DEV_.*'```
will the tables whose names start with "DEV_" be removed or will I need to delete everything and ingest again with the restriction? In the latter case, I will lose the information already inserted in the DataHub.
ingesting with ‘deny’ table_pattern won’t remove already ingested table metadata. One solution I think of is to write a small script to retrieve all urns(with GQL) in the platform and delete one you don’t need
Thank you <@U0445MUD81W>!
Is there also a way to omit DBs whose name starts with “DEV_”?
similar to table_pattern, along with that add schema_pattern
deny:
- 'DEV_.*'```
in new version of Datahub it is database_pattern
Thank you <@U0445MUD81W>!
As for writing a script to recover all the ballot boxes and delete specific ones, I’ll have to study to try to do it.
Another question <@U0445MUD81W>, where does the DataHub store the information? I noticed that it uses mysql, but where can I access this DB?
here is GQL query to get all urn for give platform
search(input: {{type: DATASET,
query: "*", start: 10, count: 10000
orFilters: [
{{
and: [
{{
field: "platform"
values: ["{platform}"]
condition: CONTAIN
}}
]
}}
]
}}
){{
start
count
total
searchResults {{
entity {{
urn
... on Dataset {{
urn
}}
}}
}}
}}
}}```
By default Datahub uses MySQL, Elasticsearch, and Kafka in the persistence layer
MySql runs in container, it looks like something below
5704b3bb9a18 mariadb:10.5.8 "docker-entrypoint.s…" 2 months ago Up 28 hours (healthy) 0.0.0.0:3306->3306/tcp mysql
Thanks <@U0445MUD81W>,
Is there any way to access this BD?
There are two more things that I couldn’t do: The first is to change the default DataHub user (I followed the documentation on the official website, but I can’t find the file to make the change). The second is backup, which gives an error when executing the command “datahub docker quickstart --backup”
Yes, you can access using any SQL client using JDBC connector, default username:datahub, pass:datahub
Thanks! I’ll try it soon
please take a look at this link to change default DataHub
https://datahubproject.io/docs/authentication/changing-default-credentials/
<@U0445MUD81W>, I noticed now that the lineage was not loaded. Was there an option missing in the ingestion configuration?
there are many lineage configs for ingest recipe based on needs eg:
include_copy_lineage: false
include_table_lineage: true
include_tables: true
include_unload_lineage: false```
check out full option here
<https://datahubproject.io/docs/generated/ingestion/sources/mysql>
Thank you <@U0445MUD81W>, later I will try with this configuration:
source:
type: teradata
config:
host_port: 'xxxxxxxxxxxxxxxx:1025'
username: xxxxxxxxx
password: xxxxxxxxxxx
include_table_lineage: true
stateful_ingestion:
enabled: true
table_lineage_mode: sql_based
include_copy_lineage: false
include_table_lineage: true
include_tables: true
include_unload_lineage: false
schema_pattern:
deny:
- 'DEV_.*'```
Previously I used this:
source:
type: teradata
config:
host_port: 'xxxxxxxxxxxxxx'
username: xxxxxxx
password: xxxxxxxxx
include_table_lineage: true
include_usage_statistics: true
stateful_ingestion:
enabled: true```