Troubleshooting OpenAPI Errors: Structured Properties and Dataset Retrieval

Original Slack Thread

Hi, we have DataHub (v.0.13.0) running in AWS (EKS, MSK, RDS, OpenSearch). I added a structured property to a dataset <https://datahubproject.io/docs/api/openapi/openapi-structured-properties/|following the example in the docs>. I can return the dataset properties with GraphQL API, but the OpenAPI GET /v2/entity/dataset/{urn} fails with the error below. However, the GET /v2/entity/{entityName}/{entityUrn} works.

  "cause2": "java.lang.RuntimeException: com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize value of type `java.util.ArrayList&lt;java.lang.Object&gt;` from Object value (token `JsonToken.START_OBJECT`)\n at [Source: (String)&amp;quot;{&amp;quot;properties&amp;quot;:[{&amp;quot;propertyUrn&amp;quot;:&amp;quot;urn:li:structuredProperty:my.test.MyProperty01&amp;quot;,&amp;quot;values&amp;quot;:[{&amp;quot;string&amp;quot;:&amp;quot;foo&amp;quot;}]}],&amp;quot;__type&amp;quot;:&amp;quot;StructuredProperties&amp;quot;}&amp;quot;; line: 1, column: 119] (through reference chain: io.datahubproject.openapi.generated.StructuredProperties$StructuredPropertiesBuilder[&amp;quot;properties&amp;quot;]-&gt;java.util.ArrayList[0]-&gt;io.datahubproject.openapi.generated.StructuredPropertyValueAssignment$StructuredPropertyValueAssignmentBuilder[&amp;quot;values&amp;quot;]-&gt;java.util.ArrayList[0])",
  "cause1": "java.lang.RuntimeException: Failed to batch get entities with urns: [urn:li:dataset:(urn:li:dataPlatform:redshift,test_db_2.test_schema_2.test_table_2,PROD)], projectedAspects: [editableSchemaMetadata, container, testResults, siblings, access, datasetUpstreamLineage, viewProperties, datasetProperties, globalTags, browsePathsV2, embed, schemaMetadata, datasetKey, domains, subTypes, datasetProfile, deprecation, browsePaths, incidentsSummary, datasetUsageStatistics, structuredProperties, ownership, dataPlatformInstance, datasetDeprecation, editableDatasetProperties, glossaryTerms, institutionalMemory, upstreamLineage, operation, forms, status]",
  "servlet": "openapiServlet",
  "cause3": "com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize value of type `java.util.ArrayList&lt;java.lang.Object&gt;` from Object value (token `JsonToken.START_OBJECT`)\n at [Source: (String)&amp;quot;{&amp;quot;properties&amp;quot;:[{&amp;quot;propertyUrn&amp;quot;:&amp;quot;urn:li:structuredProperty:my.test.MyProperty01&amp;quot;,&amp;quot;values&amp;quot;:[{&amp;quot;string&amp;quot;:&amp;quot;foo&amp;quot;}]}],&amp;quot;__type&amp;quot;:&amp;quot;StructuredProperties&amp;quot;}&amp;quot;; line: 1, column: 119] (through reference chain: io.datahubproject.openapi.generated.StructuredProperties$StructuredPropertiesBuilder[&amp;quot;properties&amp;quot;]-&gt;java.util.ArrayList[0]-&gt;io.datahubproject.openapi.generated.StructuredPropertyValueAssignment$StructuredPropertyValueAssignmentBuilder[&amp;quot;values&amp;quot;]-&gt;java.util.ArrayList[0])",
  "cause0": "jakarta.servlet.ServletException: Request processing failed: java.lang.RuntimeException: Failed to batch get entities with urns: [urn:li:dataset:(urn:li:dataPlatform:redshift,test_db_2.test_schema_2.test_table_2,PROD)], projectedAspects: [editableSchemaMetadata, container, testResults, siblings, access, datasetUpstreamLineage, viewProperties, datasetProperties, globalTags, browsePathsV2, embed, schemaMetadata, datasetKey, domains, subTypes, datasetProfile, deprecation, browsePaths, incidentsSummary, datasetUsageStatistics, structuredProperties, ownership, dataPlatformInstance, datasetDeprecation, editableDatasetProperties, glossaryTerms, institutionalMemory, upstreamLineage, operation, forms, status]",
  "message": "jakarta.servlet.ServletException: Request processing failed: java.lang.RuntimeException: Failed to batch get entities with urns: [urn:li:dataset:(urn:li:dataPlatform:redshift,test_db_2.test_schema_2.test_table_2,PROD)], projectedAspects: [editableSchemaMetadata, container, testResults, siblings, access, datasetUpstreamLineage, viewProperties, datasetProperties, globalTags, browsePathsV2, embed, schemaMetadata, datasetKey, domains, subTypes, datasetProfile, deprecation, browsePaths, incidentsSummary, datasetUsageStatistics, structuredProperties, ownership, dataPlatformInstance, datasetDeprecation, editableDatasetProperties, glossaryTerms, institutionalMemory, upstreamLineage, operation, forms, status]",
  "url": "/openapi/v2/entity/dataset/urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Aredshift%2Ctest_db_2.test_schema_2.test_table_2%2CPROD%29",
  "status": "500"
}```
Also, I tried to remove the structured property using the curl command shown in the same document and it also errored with: Error 415 Unsupported Media Type. See the error below (slightly redacted for privacy)

```* Preparing request to <https://data-hub-test.ic1.org/openapi/v2/entity/dataset/urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Aredshift%2Ctest_db_2.test_schema_2.test_table_2%2CPROD%29/structuredProperties>
* Current time is 2024-04-03T20:25:06.472Z
* Enable automatic URL encoding
* Using default HTTP version
* Enable timeout of 30000ms
* Disable SSL validation
* Found bundle for host <http://data-hub-test.ic1.org|data-hub-test.ic1.org>: 0x1200582f780 [can multiplex]
* Re-using existing connection! (#9) with host <http://data-hub-test.ic1.org|data-hub-test.ic1.org>
* Connected to <http://data-hub-test.ic1.org|data-hub-test.ic1.org> (10.233.52.234) port 443 (#9)
* Using Stream ID: 5 (easy handle 0x12008571600)
* TLSv1.2 (OUT), TLS header, Supplemental data (23):

&gt; PATCH /openapi/v2/entity/dataset/urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Aredshift%2Ctest_db_2.test_schema_2.test_table_2%2CPROD%29/structuredProperties HTTP/2
&gt; Host: <http://data-hub-test.ic1.org|data-hub-test.ic1.org>
&gt; user-agent: insomnia/8.6.0
&gt; accept: application/json
&gt; content-type: application/json-patch+json
&gt; authorization: Bearer &lt;redacted&gt;
&gt; content-length: 183

* TLSv1.2 (OUT), TLS header, Supplemental data (23):

| {
| 	"patch": [
| 		{
| 			"op": "remove",
| 			"path": "/properties/urn:li:structuredProperty:my.test.MyProperty01"
| 		}
| 	],
| 	"arrayPrimaryKeys": {
| 		"properties": [
| 			"propertyUrn"
| 		]
| 	}
| }

* We are completely uploaded and fine
* TLSv1.2 (IN), TLS header, Supplemental data (23):

&lt; HTTP/2 415 
&lt; date: Wed, 03 Apr 2024 20:25:06 GMT
&lt; content-type: application/octet-stream
&lt; content-length: 0
&lt; accept: application/json-patch+json
&lt; server: Jetty (11.0.19)
&lt; accept-patch: application/json-patch+json```

Thoughts?

Hey there! :wave: Make sure your message includes the following information if relevant, so we can help more effectively!

  1. Which DataHub version are you using? (e.g. 0.12.0)
  2. Please post any relevant error logs on the thread!

When using the openapi/v2/entity/structuredProperty call, it appears to return all of the values of the structured property:

	"value": {
		"allowedValues": [
			{
				"value": {
					"string": "foo"
				},
				"description": "test foo value"
			},
			{
				"value": {
					"string": "bar"
				},
				"description": "test bar value"
			}
		],
		"qualifiedName": "my.test.MyProperty01",
		"displayName": "MyProperty01",
		"valueType": "urn:li:dataType:datahub.string",
		"description": "test description",
		"entityTypes": [
			"urn:li:entityType:datahub.dataset"
		],
		"cardinality": "MULTIPLE"
	},```

However, the `openapi/v2/entity/{entityName}/{entityUrn}` call and targeting the dataset with the attached structure property, it will return only *one* of the values of the attached structured property:
```   "structuredProperties": {
      "value": {
        "properties": [
          {
            "propertyUrn": "urn:li:structuredProperty:my.test.MyProperty01",
            "values": [
              {
                "string": "foo"
              }
            ]
          }
        ]
      }
    },```
When using GraphQL to retrieve the dataset with the structured property attached, the values are not accessible.
Here is the query with no way to target the actual values. Note that under _structuredProperties.properties.structuredProperty.definition.allowedValues.value_ there are no options except `__typename`. I would think this is where the values would be returned from.
```{
  dataset(urn:"urn:li:dataset:(urn:li:dataPlatform:redshift,test_db_2.test_schema_2.test_table_2,PROD)"){
    type
    properties{
      name
      description
      qualifiedName
    }
    structuredProperties{
      properties{
        structuredProperty{
          urn
          definition{
            qualifiedName
            displayName
            description
            cardinality
            allowedValues{
              description
              value{
                __typename
              }
            }
          }
        }
      }
    }
  }
}```
Here is the output. Notice that the descriptions are there, but not the values under _allowedValues.value_.
```{
  "data": {
    "dataset": {
      "type": "DATASET",
      "properties": {
        "name": "test_table_2",
        "description": "Test dataset created via OpenAPI",
        "qualifiedName": "test_db_2.test_schema_2.test_table_2"
      },
      "structuredProperties": {
        "properties": [
          {
            "structuredProperty": {
              "urn": "urn:li:structuredProperty:my.test.MyProperty01",
              "definition": {
                "qualifiedName": "my.test.MyProperty01",
                "displayName": "MyProperty01",
                "description": "test description",
                "cardinality": "MULTIPLE",
                "allowedValues": [
                  {
                    "description": "test foo value",
                    "value": {
                      "__typename": "StringValue"
                    }
                  },
                  {
                    "description": "test bar value",
                    "value": {
                      "__typename": "StringValue"
                    }
                  }
                ]
              }
            }
          }
        ]
      }
    }
  },
  "extensions": {}
}```
I am trying to understand why there are differences across the API calls and what one would or should expect to be returned. Please advise.