Handling Trailing Whitespace in URNs During LDAP Ingestion

Original Slack Thread

<@U06TM7M40RK> in the ingestion recipe for LDAP source, I’m getting the below error -
{'error': 'Unable to emit metadata to DataHub GMS: Error: cannot provide an URN with leading or trailing whitespace', 'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'message': 'Error: cannot provide an URN with leading or trailing whitespace', 'status': 400, 'id': 'urn:li:corpuser:cisweingest01 '}}
for certain users which have trailing whitespace. I want to modify the recipe to exclude such IDs from the ingestion

To address the issue of trailing whitespace in URNs during LDAP ingestion, you can modify your ingestion recipe to exclude such IDs. Unfortunately, the provided data sources do not explicitly mention a built-in feature to filter out URNs with trailing whitespace directly in the LDAP ingestion configuration. However, you can implement a workaround by preprocessing the data to remove or exclude such entries before they are ingested.

Here are some steps you can take:

  1. Preprocess the Data: Before ingestion, ensure that any user IDs with trailing whitespace are either trimmed or excluded. This can be done by modifying the LDAP source code or by adding a preprocessing step in your ingestion pipeline.

  2. Modify the LDAP Source Code: If you have access to the LDAP source code, you can add a check to trim whitespace from user IDs. For example, you can modify the part of the code where the URN is constructed to ensure it does not include trailing whitespace.

  3. Use a Custom Script: If modifying the source code is not feasible, consider using a custom script to preprocess the data. This script can iterate over the user entries and remove any trailing whitespace before passing the data to the ingestion pipeline.

  4. Regex Filtering: If your ingestion framework supports regex filtering, you can use a regex pattern to exclude entries with trailing whitespace. However, this feature is not explicitly mentioned in the provided data sources for LDAP ingestion.

If you need more detailed guidance on implementing these steps, please let me know!

Sources:

Hi :slight_smile: ! Did this interaction resolve your issue or do you still require assistance ?

Please tag @ RunLLM if you still need further assistance !

Or tag @Datahub Community Support if you require a person to take a look :slight_smile:

Hope this helps ! ><