Troubleshooting AWS Glue ingestion failure due to missing access key

Original Slack Thread

Hi Guys,
I am using glue as source to ingest metadata . My ingestion is failing with an error
Command failed: Failed to configure the source (glue): 'NoneType' object has no attribute 'access_key'
Any ideas how to fix it ?

Datahub version: 0.12.1.3
Data Source : Glue

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Are you using UI or CLI for ingestion?
  2. Which DataHub version are you using? (e.g. 0.12.0)
  3. What data source(s) are you integrating with DataHub? (e.g. BigQuery)

What does your recipe look like?

Hey <@U01GZEETMEZ>. I am ingesting glue to datahub by a python script(ingest.py) as a glue job .
The job is returning an error
> File “/home/spark/.local/lib/python3.10/site-packages/datahub/ingestion/source/aws/aws_common.py”, line 149, in get_session
> “AccessKeyId”: current_credentials.access_key,
> AttributeError: ‘NoneType’ object has no attribute ‘access_key’![attachment]({‘ID’: ‘F06TZ0CB1T3’, ‘EDITABLE’: True, ‘IS_EXTERNAL’: False, ‘USER_ID’: ‘U06NL7C1CDR’, ‘CREATED’: ‘2024-04-10 03:09:57+00:00’, ‘PERMALINK’: ‘Slack’, ‘EXTERNAL_TYPE’: ‘’, ‘TIMESTAMPS’: ‘2024-04-10 03:09:57+00:00’, ‘MODE’: ‘snippet’, ‘DISPLAY_AS_BOT’: False, ‘PRETTY_TYPE’: ‘Python’, ‘NAME’: ‘ingest.py’, ‘IS_PUBLIC’: True, ‘PREVIEW_HIGHLIGHT’: ‘

\n
\n
from datahub.ingestion.run.pipeline import Pipeline
\n
from botocore.config import Config
\n
import boto3
\n
import yaml
\n
\n
\n’, ‘MIMETYPE’: ‘text/plain’, ‘PERMALINK_PUBLIC’: ‘https://slack-files.com/TUMKD5EGJ-F06TZ0CB1T3-aa55aa86fc’, ‘FILETYPE’: ‘python’, ‘EDIT_LINK’: ‘Slack’, ‘URL_PRIVATE’: ‘Slack’, ‘HAS_RICH_PREVIEW’: False, ‘TITLE’: ‘ingest.py’, ‘IS_STARRED’: False, ‘PREVIEW_IS_TRUNCATED’: False, ‘URL_PRIVATE_DOWNLOAD’: ‘Slack’, ‘PREVIEW’: ‘from datahub.ingestion.run.pipeline import Pipeline\nfrom botocore.config import Config\nimport boto3\nimport yaml\n’, ‘PUBLIC_URL_SHARED’: False, ‘MESSAGE_TS’: ‘1712719060.928499’, ‘PARENT_MESSAGE_TS’: ‘1712577501.716079’, ‘MESSAGE_CHANNEL_ID’: ‘CUMUWQU66’, ‘_FIVETRAN_DELETED’: False, ‘LINES_MORE’: 20, ‘LINES’: 25, ‘SIZE’: 736, ‘_FIVETRAN_SYNCED’: ‘2024-04-14 08:22:25.748000+00:00’})![attachment]({‘ID’: ‘F06TL6SFBMH’, ‘EDITABLE’: True, ‘IS_EXTERNAL’: False, ‘USER_ID’: ‘U06NL7C1CDR’, ‘CREATED’: ‘2024-04-10 03:09:59+00:00’, ‘PERMALINK’: ‘Slack’, ‘EXTERNAL_TYPE’: ‘’, ‘TIMESTAMPS’: ‘2024-04-10 03:09:59+00:00’, ‘MODE’: ‘snippet’, ‘DISPLAY_AS_BOT’: False, ‘PRETTY_TYPE’: ‘YAML’, ‘NAME’: ‘local-glue.yaml’, ‘IS_PUBLIC’: True, ‘PREVIEW_HIGHLIGHT’: ‘
\n
\n
source:
\n
  type: glue
\n
  config:
\n
    # Coordinates
\n
    aws_region: "eu-west-1"
\n
\n
\n’, ‘MIMETYPE’: ‘text/plain’, ‘PERMALINK_PUBLIC’: ‘https://slack-files.com/TUMKD5EGJ-F06TL6SFBMH-b50d149f67’, ‘FILETYPE’: ‘yaml’, ‘EDIT_LINK’: ‘Slack’, ‘URL_PRIVATE’: ‘Slack’, ‘HAS_RICH_PREVIEW’: False, ‘TITLE’: ‘local-glue.yaml’, ‘IS_STARRED’: False, ‘PREVIEW_IS_TRUNCATED’: False, ‘URL_PRIVATE_DOWNLOAD’: ‘Slack’, ‘PREVIEW’: ‘source:\n type: glue\n config:\n # Coordinates\n aws_region: “eu-west-1”’, ‘PUBLIC_URL_SHARED’: False, ‘MESSAGE_TS’: ‘1712719060.928499’, ‘PARENT_MESSAGE_TS’: ‘1712577501.716079’, ‘MESSAGE_CHANNEL_ID’: ‘CUMUWQU66’, ‘_FIVETRAN_DELETED’: False, ‘LINES_MORE’: 8, ‘LINES’: 13, ‘SIZE’: 165, ‘_FIVETRAN_SYNCED’: ‘2024-04-14 08:22:25.748000+00:00’})

How were you expecting authentication to work? There’s nothing set in the recipe, so are there some environment variables available?

For gms pod configuration , i am passing the server and token but it is throwing error on aws access key.

Do i need to pass the aws credentials(aws_access_key_id , aws_secret_tokens etc) in the recipe, boto should automatically fetch it when it is running in a aws glue console.

The datahub pipeline is not able to fetch the aws credentials