Hi,
If my superset using Single Sign on as login, how do I setup the recipe in datahub?
Hi,
If my superset using Single Sign on as login, how do I setup the recipe in datahub?
Hey there! Make sure your message includes the following information if relevant, so we can help more effectively!
<@U06TM7M40RK> Can you help with the questions in this thread?
Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses! We’re experimenting with a new AI assistant for questions in the <#CUMUWQU66> channel to help you track down helpful resources more quickly. Keep in mind that responses may not always be spot-on; feel free to ask it follow-up questions and don’t forget to provide its responses a or so other folks in the Community know if you’re still looking for help on the topic and to help us improve the assistant!
To help you more effectively, could you please provide the following information?
Based on the information provided, I can offer more targeted assistance. However, I can share some general insights on the issue of ingesting metadata from Superset when it is configured to use Single Sign-On (SSO).
General Insights
OAuth Authentication:
DataHub’s current Superset ingestion connector primarily supports basic authentication methods like username and password. OAuth support is not explicitly mentioned in the available documentation or codebase.
Workarounds:
Community Discussions:
Debugging:
--debug
flag to capture more information about the process.Example Ingestion Recipe
Here is an example of a basic ingestion recipe for Superset:
source:
type: superset
config:
connect_uri: 'https://your-superset-instance/'
username: 'your-username'
password: 'your-password'
provider: db
stateful_ingestion:
enabled: true
remove_stale_metadata: false
sink:
type: datahub-rest
config:
server: 'http://localhost:8080'
token: 'your-personal-access-token'
Next Steps
Would you like more detailed steps or further assistance on any specific part of the process?
Sources:
<@U06TM7M40RK>
Thank you for providing the details. Given that you are using the UI for ingestion with the latest version of DataHub and integrating with Superset, let’s focus on the configuration and potential issues related to Single Sign-On (SSO).
Steps to Configure Superset Ingestion with SSO
Verify API Endpoints:
Ensure that your Superset instance is exposing the necessary API endpoints. The Superset connector extracts datasets by contacting the /api/v1/dataset/
endpoint. You can manually check this endpoint to see if it returns the expected data.
curl -u your-username:your-password https://your-superset-instance/api/v1/dataset/
Check Permissions:
Ensure that the user account used for the ingestion has the necessary permissions to access the datasets. There have been instances where permissions issues prevented the ingestion from retrieving the data. For example, the can read on Dataset
permission might be required.
Enable Debug Logging:
Run the ingestion with the --debug
flag to get detailed logs. This can help identify any issues during the ingestion process. You can use the following command:
datahub --debug ingest run -c <recipe-path> &> /tmp/superset-ingest.log
Inspect the Payload:
Add logging to inspect the payload being sent to DataHub. This can help determine if the payload is empty or if there are any issues with the data being retrieved from Superset.
Check Stateful Ingestion Configuration:
Ensure that your stateful ingestion configuration is correctly set up. Verify that the stateful_ingestion
settings are properly configured in your ingestion recipe.
Example Configuration
Ensure your configuration looks similar to the following:
source:
type: superset
config:
connect_uri: 'https://your-superset-instance/'
username: 'your-username'
password: 'your-password'
provider: db
stateful_ingestion:
enabled: true
remove_stale_metadata: false
sink:
type: datahub-rest
config:
server: 'http://localhost:8080'
token: 'your-token'
Debugging Steps
Verify API Response:
Manually check the response from the /api/v1/dataset/
endpoint to ensure it returns the expected data.
curl -u your-username:your-password https://your-superset-instance/api/v1/dataset/
Enable Detailed Logging:
Enable detailed logging to capture more information about the ingestion process. Add a debug statement to log the payload being sent to DataHub. Modify the superset.py
file to include a debug statement at the point where the payload is being created.
import logging
logger = logging.getLogger(__name__)
# Add this line to log the payload
logger.info(f"payload = {payload}")
Run Ingestion Locally:
If you are running the ingestion in a Kubernetes environment, try running it locally to isolate any potential issues with the Kubernetes setup. Set up a local DataHub instance using the quickstart guide and run the ingestion using the CLI.
datahub docker quickstart
datahub ingest -c your-ingestion-recipe.yml
Check for Known Issues:
Review the <https://github.com/datahub-project/datahub/issues|DataHub GitHub issues> and the <Slack Slack community> for any known issues or similar problems reported by other users.
Would you like more detailed steps or further assistance on any specific part of the process?
Sources:
<@U06TM7M40RK>
Failed to get dashboard data: {"message":"Forbidden"}\n'
The message is like this. When I tried to test via curl above, it mentioned
{“msg”:“Missing ‘Bearer’ type in ‘Authorization’ header. Expected ‘Authorization: Bearer <JWT>’”}
The error message you’re encountering indicates that the Superset API expects a Bearer token for authentication, which is typical for setups using Single Sign-On (SSO). To resolve this, you’ll need to configure your DataHub ingestion recipe to use a Bearer token for authentication.
Steps to Configure Bearer Token Authentication
Obtain a Bearer Token:
Update Ingestion Recipe:
Authorization
header.Example Ingestion Recipe with Bearer Token
Here’s an example of how you can configure your ingestion recipe to use a Bearer token:
source:
type: superset
config:
connect_uri: 'https://your-superset-instance/'
provider: db
extra_headers:
Authorization: 'Bearer <your-jwt-token>'
stateful_ingestion:
enabled: true
remove_stale_metadata: false
sink:
type: datahub-rest
config:
server: 'http://localhost:8080'
token: 'your-datahub-token'
Testing with cURL
To verify that your Bearer token is working correctly, you can test the API endpoint using cURL:
curl --location --request GET 'https://your-superset-instance/api/v1/dataset/' \
--header 'Authorization: Bearer <your-jwt-token>'
Debugging Steps
Verify Token Validity:
exp
claim).Check API Permissions:
Enable Detailed Logging:
Example cURL Command
Here’s an example cURL command to test the API endpoint with a Bearer token:
curl --location --request GET 'https://your-superset-instance/api/v1/dataset/' \
--header 'Authorization: Bearer <your-jwt-token>'
If the above steps do not resolve the issue, please provide more details about your setup, such as the Identity Provider you are using for SSO and any specific configurations you have applied.
Would you like more detailed steps or further assistance on any specific part of the process?
Sources:
<@U06TM7M40RK>
'extra_headers\n'
' extra fields not permitted (type=value_error.extra)\n',```
The error message indicates that the extra_headers
field is not recognized in the Superset source configuration. This suggests that the current version of the DataHub Superset connector does not support the extra_headers
parameter directly.
Workaround for Bearer Token Authentication
Since the extra_headers
parameter is not supported, you can use environment variables to set the Bearer token for authentication. This approach involves setting the REQUESTS_CA_BUNDLE
environment variable to include the Bearer token.
Steps to Set Environment Variable for Bearer Token
Set Environment Variable:
REQUESTS_CA_BUNDLE
environment variable to include your Bearer token. This will ensure that the requests library used by DataHub can include the Bearer token in the Authorization
header.Run Ingestion with Environment Variable:
Example Configuration
Here is an example of how you can set the environment variable and run the ingestion:
Setting Environment Variable
export REQUESTS_CA_BUNDLE="Bearer <your-jwt-token>"
Running Ingestion
datahub ingest -c your-ingestion-recipe.yml
Example Ingestion Recipe
Here is your ingestion recipe without the unsupported extra_headers
parameter:
source:
type: superset
config:
connect_uri: 'https://your-superset-instance/'
provider: db
stateful_ingestion:
enabled: true
remove_stale_metadata: false
sink:
type: datahub-rest
config:
server: 'http://localhost:8080'
token: 'your-datahub-token'
Debugging Steps
Verify Token Validity:
exp
claim).Check API Permissions:
Enable Detailed Logging:
Example cURL Command
Here’s an example cURL command to test the API endpoint with a Bearer token:
curl --location --request GET 'https://your-superset-instance/api/v1/dataset/' \
--header 'Authorization: Bearer <your-jwt-token>'
If the above steps do not resolve the issue, please provide more details about your setup, such as the Identity Provider you are using for SSO and any specific configurations you have applied.
Would you like more detailed steps or further assistance on any specific part of the process?
Sources:
<@U06TM7M40RK>
I see in the documentation that it has options in superset recipe, but I can’t find the documentation. help me explain
Certainly! The options
parameter in the Superset ingestion recipe allows you to specify additional configuration options for the Superset connector. These options can be used to customize the behavior of the ingestion process.
Superset Ingestion Configuration
According to the <Superset | DataHub Config Details>, the options
parameter is an object that can include various settings to control the ingestion process.
Example Configuration with options
Here is an example of how you can use the options
parameter in your Superset ingestion recipe:
source:
type: superset
config:
connect_uri: 'https://your-superset-instance/'
username: 'your-username'
password: 'your-password'
provider: db
options:
# Add your custom options here
key1: value1
key2: value2
stateful_ingestion:
enabled: true
remove_stale_metadata: false
sink:
type: datahub-rest
config:
server: 'http://localhost:8080'
token: 'your-datahub-token'
Available Options
The specific options you can include in the options
parameter depend on the capabilities and requirements of the Superset connector. Here are some common options you might consider:
Example with Custom Headers
If you need to add custom headers, such as a Bearer token for authentication, you can include them in the options
parameter:
source:
type: superset
config:
connect_uri: 'https://your-superset-instance/'
provider: db
options:
custom_headers:
Authorization: 'Bearer <your-jwt-token>'
stateful_ingestion:
enabled: true
remove_stale_metadata: false
sink:
type: datahub-rest
config:
server: 'http://localhost:8080'
token: 'your-datahub-token'
Debugging Steps
Verify Token Validity:
exp
claim).Check API Permissions:
Enable Detailed Logging:
Example cURL Command
Here’s an example cURL command to test the API endpoint with a Bearer token:
curl --location --request GET 'https://your-superset-instance/api/v1/dataset/' \
--header 'Authorization: Bearer <your-jwt-token>'
If the above steps do not resolve the issue, please provide more details about your setup, such as the Identity Provider you are using for SSO and any specific configurations you have applied.
Would you like more detailed steps or further assistance on any specific part of the process?
Sources: