How to change the logo for customizing the front-end

Original Slack Thread

How to change the logo for customizing the front-end

if you just want to customize logo REACT_APP_LOGO_URL and REACT_APP_FAVICON_URL variables for deploying GMS
https://datahubproject.io/docs/datahub-web-react/#theming

Where is REACT_APP_LOGO_URL variables? In which .env file I will get this?

if you are using the datahub helm chart for your deployment

https://github.com/acryldata/datahub-helm/blob/master/charts/datahub/values.yaml
it goes to extraEnvs

  enabled: true
  image:
    repository: linkedin/datahub-gms
    # tag: "v0.11.0 # defaults to .global.datahub.version
  resources:
    limits:
      memory: 2Gi
    requests:
      cpu: 100m
      memory: 1Gi
     ....
  extraEnvs:
    - name: REACT_APP_LOGO_URL
      value: <path>```

how to read upstream lineage using python script and also establish relationshi; between table and column lineages

take a look at these examples for creating lineage with Python sdk
for dataset to dataset
https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/lineage_dataset_job_dataset.py
for dataset column lineage
https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/lineage_emitter_dataset_finegrained.py

you can use GQL for retrive existing lineage information

query getAllLineageEntities($input: SearchAcrossLineageInput!) {
  searchAcrossLineage(input: $input) {
    ...searchAcrossRelationshipResults
    __typename
  }
}

fragment searchAcrossRelationshipResults on SearchAcrossLineageResults {
  start
  count
  total
  searchResults {
    entity {
      ...searchResultFields
      ... on Dataset {
        __typename
      }
      __typename
    }
    degree
    __typename
  }
  facets {
    ...facetFields
    __typename
  }
  __typename
}

fragment searchResultFields on Entity {
  urn
  type
  ... on Dataset {
    name
    platform {
      ...platformFields
      __typename
    }
    properties {
      name
      __typename
    }
    deprecation {
      ...deprecationFields
      __typename
    }
    __typename
  }
  ... on CorpUser {
    username
    properties {
      displayName
      email
      __typename
    }
    __typename
  }
  ... on Dashboard {
    dashboardId
    properties {
      name
      description
      externalUrl
      __typename
    }
    __typename
  }
  ... on Chart {
    chartId
    properties {
      name
      externalUrl
      __typename
    }
    __typename
  }
  ... on Experiment {
    experimentId
    properties {
      name
      description
      externalUrl
      expType
      __typename
    }
    __typename
  }
  ... on DataFlow {
    flowId
    cluster
    properties {
      name
      description
      project
      externalUrl
      __typename
    }
    __typename
  }
  ... on DataJob {
    dataFlow {
      ...nonRecursiveDataFlowFields
      __typename
    }
    jobId
    properties {
      name
      description
      externalUrl
      __typename
    }
    __typename
  }
  ... on Container {
    properties {
      name
      description
      externalUrl
      __typename
    }
    subTypes {
      typeNames
      __typename
    }
    __typename
  }
  ... on DataPlatform {
    ...nonConflictingPlatformFields
    __typename
  }
  __typename
}

fragment platformFields on DataPlatform {
  urn
  type
  name
  properties {
    type
    displayName
    __typename
  }
  displayName
  __typename
}

fragment deprecationFields on Deprecation {
  deprecated
  __typename
}

fragment nonRecursiveDataFlowFields on DataFlow {
  urn
  type
  orchestrator
  flowId
  cluster
  properties {
    name
    project
    __typename
  }
  deprecation {
    ...deprecationFields
    __typename
  }
  __typename
}


fragment nonConflictingPlatformFields on DataPlatform {
  urn
  type
  name
  properties {
    displayName
    __typename
  }
  displayName
  __typename
}

fragment facetFields on FacetMetadata {
  field
  displayName
  aggregations {
    value
    count
    entity {
      urn
      type
      ... on DataPlatform {
        ...platformFields
        __typename
      }
      __typename
    }
    __typename
  }
  __typename
}```

params

            "input": {
                "urn": f"{urn}",
                "direction": f"{l_direction}",
                "types": [],
                "query": "",
                "start": 0,
                "count": 1000,
                "orFilters": [
                    {
                        "and": [
                            {
                                "field": "degree",
                                "condition": "EQUAL",
                                "negated": False,
                                "values": [
                                    "1",
                                    "2",
                                    "3+"
                                ]
                            }
                        ]
                    }
                ]
            }
        }```

I have extracted lineage between tables of a dataset. Now I want to create an ER diagram showing lineages between all tables using python scripts. How should I proceed?

query = “”"
query searchAcrossLineage {
searchAcrossLineage(
input: {
query: “*”
urn: “urn:li:dataset:(urn:li:dataPlatform:”“” + dataplatform + “”“,”“” + dataset + “”“.”“” + table + “”“,PROD)”
start: 0
count: 10
direction: “”" + lin_stream + “”"
orFilters: [{
and: [
{
condition: EQUAL
negated: false
field: “degree”
values: [“1”,“2”, “3+”]
}
] # Additional search filters can be included here as well
}
]
}
)
{
searchResults {
degree
entity {
urn
type
}
}
}

                                }



    """

How to add subTypes in this query, resutlting in subTypes like table or view

  "input": {
    "types": [],
    "query": "",
    "start": 0,
    "count": 10,
    "filters": [],
    "orFilters": [
      {
        "and": [
          {
            "field": "typeNames",
            "condition": "EQUAL",
            "values": [
              "View"
            ],
            "negated": false
          }
        ]
      }
    ]
  }
}```

I have created a python file which extracts info about dataset, columns and lineage details of ingested data of any given platform which returns a response in json form. For now I need to run that file manually to get the json response. How can I run that file along with datahub is started. So it can automatically generate the required json and furthur which is used to build ER diagram in the frontend.

I think generating json and building ER diagram is a bit out of scope for Datahub, there are plenty tech and scheduling tools to achieve your goal. I suggest using cron, aws lambda

Ok. But what if I need to run any python file when we are starting datahub/docker/quickstart.sh? So that it should run along with it.

Technically it is possible if you make sure all datahub components are up and running healthy before running the query, but I don’t think it a good idea to include the user custom use case code core datahub start-up script. but up to you how you want implement.
I’m just curious
• why can’t you just run your script after you running quickstart by checking service is up and running, it is one more command after the quickstart ?

Because I want to automate the process.
For example as soon as the metadata is ingested it automatically creates the lineages of the dataset, tables…etc.
In same way I want to automate the process of getting metadata dataset details like tables,columns, lineages to create a complete flow chart of linegaes showing a complete relationaship ER diagram as soon as the metadata is ingested.

how will you run metadata ingest jobs ?

Using UI or CLI

The python file which I want to run is a flask api file. Which contains api script to extract metadata info about lineages.

How to get primary keys of tables in a dataset using graph ql query?