Configuring the `sink` in PowerBI connector for DataHub

Original Slack Thread

<@U06TM7M40RK> what i can do with sink config in powerbi connector

In the context of the PowerBI connector for DataHub, the sink configuration specifies where the ingested metadata should be sent. According to the <https://datahubproject.io/docs/0.14.0/ui-ingestion|DataHub documentation>, the sink can be configured to use either datahub-rest or datahub-kafka as the destination for the metadata.

For example, if you are using the datahub-rest sink, you would configure it with the server URL of your DataHub instance. This is where the metadata will be sent after being extracted from PowerBI. The configuration might look something like this:

sink:
  type: "datahub-rest"
  config:
    server: "http://localhost:8080"

This configuration specifies that the metadata should be sent to a DataHub instance running on localhost at port 8080.

If you need more detailed information or have specific questions about configuring the sink for PowerBI, please let me know!

Sources:

is it always `
server: "<http://localhost:8080>"

is it always `
server: "<http://localhost:8080>"
<@U06TM7M40RK>

The server configuration in the sink section is not always set to http://localhost:8080. The server URL should point to the DataHub GMS (Graph Metadata Service) endpoint that you are using, which can vary depending on your deployment setup.

According to the <https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/sink_docs/datahub.md|DataHub Rest Sink documentation>, the server URL can be configured differently based on your environment:

  1. Local Development: If you are running DataHub locally, you might use http://localhost:8080.

  2. Docker Environment: If DataHub is running in a Docker environment, you might use a Docker-specific hostname like http://datahub-gms:8080.

  3. Kubernetes Environment: If DataHub is deployed in a Kubernetes cluster, you might use a Kubernetes service name like http://datahub-datahub-gms.datahub.svc.cluster.local:8080.

  4. Hosted or Cloud Environment: If you are using a hosted DataHub instance, the server URL would be specific to your instance, such as https://<your-instance>.acryl.io/gms.

The server URL should be configured to match the specific environment where your DataHub instance is running. If you have any specific questions about configuring the server URL for your setup, feel free to ask!

Sources:

I face this issue:

    raise PipelineInitError(f"Failed to {step}: {e}") from e
datahub.ingestion.run.pipeline.PipelineInitError: Failed to configure the sink (datahub-rest): 💥 Failed to connect to DataHub with DataHubRestEmitter: configured to talk to <http://datahub-datahub-gms.datahub.svc.cluster.local:8080>```
<@U06TM7M40RK>

The error message you’re encountering indicates that the datahub-rest sink is unable to connect to the DataHub GMS endpoint specified in your configuration. This issue can arise due to several reasons, such as network connectivity problems, incorrect server URL, or issues with the DataHub service itself.

Here are some steps you can take to troubleshoot and resolve the issue:

  1. Verify Network Connectivity: Ensure that the machine running the DataHub ingestion process can reach the DataHub GMS endpoint. You can test this by using tools like curl or ping to check connectivity.

    curl -v http://datahub-datahub-gms.datahub.svc.cluster.local:8080
    
  2. Check Server URL: Make sure the server URL in your sink configuration is correct and accessible. The URL should point to the DataHub GMS endpoint within your Kubernetes cluster.

  3. Kubernetes Service Configuration: Verify that the Kubernetes service for DataHub GMS is correctly configured and running. You can check the service status and endpoints using kubectl.

    kubectl get svc -n <namespace>
    
  4. Port Forwarding: If you are accessing the service from outside the Kubernetes cluster, ensure that port forwarding is correctly set up.

  5. Logs and Errors: Check the logs of the DataHub GMS pod for any errors or issues that might be preventing it from accepting connections.

  6. SSL and Authentication: If your setup requires SSL or authentication, ensure that these are correctly configured in your sink configuration.

If you need more detailed assistance or have specific questions about any of these steps, please let me know!

Sources:

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><