Adding platform_instance capability to a custom data platform for dataset extraction and configuration updates

Original Slack Thread

Hi! Is there a possibility to add platform_instance capability to a custom data platform so it extracts the instance from the dataset urn?

  {
    "auditHeader": null,
    "proposedSnapshot": {
      "com.linkedin.pegasus2avro.metadata.snapshot.DataPlatformSnapshot": {
        "urn": "urn:li:dataPlatform:Backstage",
        "aspects": [
          {
            "com.linkedin.pegasus2avro.dataplatform.DataPlatformInfo": {
              "datasetNameDelimiter": ".",
              "name": "Backstage",
              "type": "RELATIONAL_DB",
              "logoUrl": "<https://github.com/backstage/backstage/blob/master/microsite/static/logo_assets/png/Backstage_Identity_Assets_Artwork_RGB_04_Icon_Teal.png?raw=true>"
            }
          }
        ]
      }
    },
    "proposedDelta": null
  }
]```
This is the configuration json we use for custom config ingestion. We're running Datahub 0.10.5. The Dataset URN is the same as for mySQL data platform.
datasetURN: (urn:li:dataPlatform:Backstage,GitHub.&lt;db&gt;.&lt;table&gt;,CORP)

attachmentattachment

First image is missing the platform_instance and the second shows that it actually does some parsing.

It should work, <@U04NUK1721W> can provide input

        val schemaMetadata = SchemaMetadata()
            .apply {
                setVersion(0L)
                schemaName = "&lt;not used&gt;"
                platform = DataPlatformUrn(databaseMetadataProperties.defaultPlatform)
                platformSchema = SchemaMetadata.PlatformSchema.create(OtherSchema().apply { rawSchema = "" })
                created = AuditStamp().setTime(Instant.now().toEpochMilli())
                    .setActor(Urn.createFromString("urn:li:corpuser:ingestion"))
                hash = ""
                fields = SchemaFieldArray(datasetColumns.map {
                    SchemaField().apply {
                        fieldPath = it
                        type = SchemaFieldDataType().setType(SchemaFieldDataType.Type.create(FixedType()))
                        nativeDataType = "Field"
                    }
                })
            }
        val mcpw = MetadataChangeProposalWrapper.builder()
            .entityType("dataset")
            .entityUrn(datasetUrn)
            .upsert()
            .aspect(schemaMetadata)
            .build()

        restEmitterClient.emit(mcpw, object : Callback {
            override fun onCompletion(response: MetadataWriteResponse?) {
                if (response?.isSuccess == false)
                    logger.error("Failed to emit to $datasetUrn due to " + response.responseContent.toString())
            }

            override fun onFailure(ex: Throwable?) {
                logger.error("Failed to emit to $datasetUrn due to " + ex?.message)
            }
        })
        return true
    }```
This is how we create the dataset.
    search(input: {type: DATA_PLATFORM_INSTANCE, query: "", count: 10}) {
        start
        count
        total
        searchResults {
            entity {
                urn
                type
                ... on DataPlatformInstance {
                    urn
                    type
                    instanceId
                }
            }
        }
    }
}```
Does not return any DataPlatformInstance, not even for mysql.
    searchAcrossEntities(
        input: {types: [], query: "", start: 0, count: 10, orFilters: []}
    ) {
        start
        count
        total
        facets {
            aggregations {
                value
                count
                entity {
                    ... on DataPlatformInstance {
                        urn
                        type
                        instanceId
                    }
                }
            }
            field
            displayName
        }
    }
}```
However, this query does show aggregated results for the mysql platform instances.

What changes in the UI do you want to see regarding platform instance? In any case, you should update your createDataset method to also create a DataPlatformInstance aspect for your dataset urn. I’m not familiar with the java emitter, but I expect you should be able to do something like

    ...
    .aspect(new DataPlatformInstance().setPlatform(platformUrn).setInstance(instanceUrn)
    build();```
Note that the `platformUrn` should look like `urn:li:dataPlatform:{platformName}` and the `instanceUrn` should look like `"urn:li:dataPlatformInstance:({platformUrn},{instanceName})"`

Thank you Andrew! It fixed the problem.