Configuring Slf4j to Emit Structured JSON Logs in DataHub Pods

Original Slack Thread

We are currently running DataHub 0.12.1 on Kubernetes, with gms, frontend, mce-consumer, and mae-consumer pods.

Is there any way to configure slf4j to emit structured JSON logs from these pods, rather than the simple line based format?

We would prefer to add a suitable logging configuration at runtime, but we build our own containers so we could also configure it at the build stage, if that is the only option.

Thanks.

I think building your own container is the only option to change log formats
for gms and frontend
https://github.com/datahub-project/datahub/blob/master/metadata-service/war/src/main/resources/logback.xml
frontend
https://github.com/datahub-project/datahub/blob/master/datahub-frontend/conf/logback.xml

Ah, thanks for that pointer <@U0445MUD81W> - I will look into it. I also see the mae-consumer, mce-consumer, and upgrade containers mentioned too. I think that our observability team will be pleased if we can get JSON logs out of DataHub.![attachment]({‘ID’: ‘F071PHTD202’, ‘EDITABLE’: False, ‘IS_EXTERNAL’: False, ‘USER_ID’: ‘U03097DKQ3T’, ‘CREATED’: ‘2024-05-02 12:29:49+00:00’, ‘PERMALINK’: ‘Slack’, ‘EXTERNAL_TYPE’: ‘’, ‘TIMESTAMPS’: ‘2024-05-02 12:29:49+00:00’, ‘MODE’: ‘hosted’, ‘DISPLAY_AS_BOT’: False, ‘PRETTY_TYPE’: ‘PNG’, ‘NAME’: ‘image.png’, ‘IS_PUBLIC’: True, ‘PREVIEW_HIGHLIGHT’: None, ‘MIMETYPE’: ‘image/png’, ‘PERMALINK_PUBLIC’: ‘https://slack-files.com/TUMKD5EGJ-F071PHTD202-b5f81f79e9’, ‘FILETYPE’: ‘png’, ‘EDIT_LINK’: None, ‘URL_PRIVATE’: ‘Slack’, ‘HAS_RICH_PREVIEW’: False, ‘TITLE’: ‘image.png’, ‘IS_STARRED’: False, ‘PREVIEW_IS_TRUNCATED’: None, ‘URL_PRIVATE_DOWNLOAD’: ‘Slack’, ‘PREVIEW’: None, ‘PUBLIC_URL_SHARED’: False, ‘MESSAGE_TS’: ‘1714653053.846469’, ‘PARENT_MESSAGE_TS’: ‘1714560072.761909’, ‘MESSAGE_CHANNEL_ID’: ‘CV2UVAPPG’, ‘_FIVETRAN_DELETED’: False, ‘LINES_MORE’: None, ‘LINES’: None, ‘SIZE’: 44083, ‘_FIVETRAN_SYNCED’: ‘2024-05-05 11:25:59.171000+00:00’})

Yes, if you are deploying mae-consumer, and mce-consumer as a separate pod independently you may have to change there as well.
Upgraded is an optional, only being used for the upgrade process and might not have logs required application monitoring or observability metrics

For reference, we have our own public issue tracker for this request here: https://phabricator.wikimedia.org/T363856

We had an issue recently where the mae-consumer started being extremely verbose after some kind of kafka message corruption and the resulting log torrent caused a backlog in our ELK based logging stack. I have been asked to look into the possibility of getting JSON logs out.

Thanks again. I’ll let you know how we get on.