How to change the logo for customizing the front-end
if you just want to customize logo REACT_APP_LOGO_URL
and REACT_APP_FAVICON_URL
variables for deploying GMS
https://datahubproject.io/docs/datahub-web-react/#theming
Where is REACT_APP_LOGO_URL variables? In which .env file I will get this?
if you are using the datahub helm chart for your deployment
https://github.com/acryldata/datahub-helm/blob/master/charts/datahub/values.yaml
it goes to extraEnvs
enabled: true
image:
repository: linkedin/datahub-gms
# tag: "v0.11.0 # defaults to .global.datahub.version
resources:
limits:
memory: 2Gi
requests:
cpu: 100m
memory: 1Gi
....
extraEnvs:
- name: REACT_APP_LOGO_URL
value: <path>```
how to read upstream lineage using python script and also establish relationshi; between table and column lineages
take a look at these examples for creating lineage with Python sdk
for dataset to dataset
https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/lineage_dataset_job_dataset.py
for dataset column lineage
https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/lineage_emitter_dataset_finegrained.py
you can use GQL for retrive existing lineage information
query getAllLineageEntities($input: SearchAcrossLineageInput!) {
searchAcrossLineage(input: $input) {
...searchAcrossRelationshipResults
__typename
}
}
fragment searchAcrossRelationshipResults on SearchAcrossLineageResults {
start
count
total
searchResults {
entity {
...searchResultFields
... on Dataset {
__typename
}
__typename
}
degree
__typename
}
facets {
...facetFields
__typename
}
__typename
}
fragment searchResultFields on Entity {
urn
type
... on Dataset {
name
platform {
...platformFields
__typename
}
properties {
name
__typename
}
deprecation {
...deprecationFields
__typename
}
__typename
}
... on CorpUser {
username
properties {
displayName
email
__typename
}
__typename
}
... on Dashboard {
dashboardId
properties {
name
description
externalUrl
__typename
}
__typename
}
... on Chart {
chartId
properties {
name
externalUrl
__typename
}
__typename
}
... on Experiment {
experimentId
properties {
name
description
externalUrl
expType
__typename
}
__typename
}
... on DataFlow {
flowId
cluster
properties {
name
description
project
externalUrl
__typename
}
__typename
}
... on DataJob {
dataFlow {
...nonRecursiveDataFlowFields
__typename
}
jobId
properties {
name
description
externalUrl
__typename
}
__typename
}
... on Container {
properties {
name
description
externalUrl
__typename
}
subTypes {
typeNames
__typename
}
__typename
}
... on DataPlatform {
...nonConflictingPlatformFields
__typename
}
__typename
}
fragment platformFields on DataPlatform {
urn
type
name
properties {
type
displayName
__typename
}
displayName
__typename
}
fragment deprecationFields on Deprecation {
deprecated
__typename
}
fragment nonRecursiveDataFlowFields on DataFlow {
urn
type
orchestrator
flowId
cluster
properties {
name
project
__typename
}
deprecation {
...deprecationFields
__typename
}
__typename
}
fragment nonConflictingPlatformFields on DataPlatform {
urn
type
name
properties {
displayName
__typename
}
displayName
__typename
}
fragment facetFields on FacetMetadata {
field
displayName
aggregations {
value
count
entity {
urn
type
... on DataPlatform {
...platformFields
__typename
}
__typename
}
__typename
}
__typename
}```
params
"input": {
"urn": f"{urn}",
"direction": f"{l_direction}",
"types": [],
"query": "",
"start": 0,
"count": 1000,
"orFilters": [
{
"and": [
{
"field": "degree",
"condition": "EQUAL",
"negated": False,
"values": [
"1",
"2",
"3+"
]
}
]
}
]
}
}```
I have extracted lineage between tables of a dataset. Now I want to create an ER diagram showing lineages between all tables using python scripts. How should I proceed?
query = “”"
query searchAcrossLineage {
searchAcrossLineage(
input: {
query: “*”
urn: “urn:li:dataset:(urn:li:dataPlatform:”“” + dataplatform + “”“,”“” + dataset + “”“.”“” + table + “”“,PROD)”
start: 0
count: 10
direction: “”" + lin_stream + “”"
orFilters: [{
and: [
{
condition: EQUAL
negated: false
field: “degree”
values: [“1”,“2”, “3+”]
}
] # Additional search filters can be included here as well
}
]
}
)
{
searchResults {
degree
entity {
urn
type
}
}
}
}
"""
How to add subTypes in this query, resutlting in subTypes like table or view
"input": {
"types": [],
"query": "",
"start": 0,
"count": 10,
"filters": [],
"orFilters": [
{
"and": [
{
"field": "typeNames",
"condition": "EQUAL",
"values": [
"View"
],
"negated": false
}
]
}
]
}
}```
I have created a python file which extracts info about dataset, columns and lineage details of ingested data of any given platform which returns a response in json form. For now I need to run that file manually to get the json response. How can I run that file along with datahub is started. So it can automatically generate the required json and furthur which is used to build ER diagram in the frontend.
I think generating json and building ER diagram is a bit out of scope for Datahub, there are plenty tech and scheduling tools to achieve your goal. I suggest using cron, aws lambda
Ok. But what if I need to run any python file when we are starting datahub/docker/quickstart.sh? So that it should run along with it.
Technically it is possible if you make sure all datahub components are up and running healthy before running the query, but I don’t think it a good idea to include the user custom use case code core datahub start-up script. but up to you how you want implement.
I’m just curious
• why can’t you just run your script after you running quickstart by checking service is up and running, it is one more command after the quickstart ?
Because I want to automate the process.
For example as soon as the metadata is ingested it automatically creates the lineages of the dataset, tables…etc.
In same way I want to automate the process of getting metadata dataset details like tables,columns, lineages to create a complete flow chart of linegaes showing a complete relationaship ER diagram as soon as the metadata is ingested.
how will you run metadata ingest jobs ?
Using UI or CLI
The python file which I want to run is a flask api file. Which contains api script to extract metadata info about lineages.
How to get primary keys of tables in a dataset using graph ql query?