In the Java SDK for Datahub, how do i programmatically create a Dataset URN and emit it to Datahub GMS?
Hey there! 
<@U06TM7M40RK> Can you help with the questions in this thread?
Please continue to tag RunLLM (e.g <@U06TM7M40RK>) to have it continue generating responses!
To programmatically create a Dataset URN and emit it to DataHub GMS using the Java SDK, you can follow these steps:
-
Create a Dataset URN: Use the
UrnUtils.toDatasetUrn
method to create a Dataset URN. This method requires the data platform, dataset name, and environment as parameters. -
Prepare Metadata: Construct the metadata for the dataset using
SchemaMetadata
andSchemaField
classes. This includes setting the schema name, platform, version, and fields. -
Create a MetadataChangeProposalWrapper: Use the
MetadataChangeProposalWrapper
to wrap the metadata change proposal. This involves specifying the entity type, entity URN, and the aspect (metadata) to be upserted. -
Emit to DataHub GMS: Use the
RestEmitter
to send the metadata change proposal to the DataHub GMS server.
Here is a code example based on the provided data sources:
import com.linkedin.common.AuditStamp;
import com.linkedin.common.urn.CorpuserUrn;
import com.linkedin.common.urn.DataPlatformUrn;
import com.linkedin.common.urn.DatasetUrn;
import com.linkedin.common.urn.UrnUtils;
import com.linkedin.schema.DateType;
import com.linkedin.schema.OtherSchema;
import com.linkedin.schema.SchemaField;
import com.linkedin.schema.SchemaFieldArray;
import com.linkedin.schema.SchemaFieldDataType;
import com.linkedin.schema.SchemaMetadata;
import com.linkedin.schema.StringType;
import datahub.client.MetadataWriteResponse;
import datahub.client.rest.RestEmitter;
import datahub.event.MetadataChangeProposalWrapper;
import java.io.IOException;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Future;
public class DatasetAdd {
public static void main(String[] args)
throws IOException, ExecutionException, InterruptedException {
// Create Dataset URN
DatasetUrn datasetUrn = UrnUtils.toDatasetUrn("hive", "fct_users_deleted", "PROD");
CorpuserUrn userUrn = new CorpuserUrn("ingestion");
AuditStamp lastModified = new AuditStamp().setTime(1640692800000L).setActor(userUrn);
// Prepare Schema Metadata
SchemaMetadata schemaMetadata = new SchemaMetadata()
.setSchemaName("customer")
.setPlatform(new DataPlatformUrn("hive"))
.setVersion(0L)
.setPlatformSchema(SchemaMetadata.PlatformSchema.create(new OtherSchema().setRawSchema("__insert raw schema here__")))
.setLastModified(lastModified);
SchemaFieldArray fields = new SchemaFieldArray();
fields.add(new SchemaField().setFieldPath("address.zipcode")
.setType(new SchemaFieldDataType().setType(SchemaFieldDataType.Type.create(new StringType())))
.setNativeDataType("VARCHAR(50)")
.setDescription("This is the zipcode of the address.")
.setLastModified(lastModified));
schemaMetadata.setFields(fields);
// Create MetadataChangeProposalWrapper
MetadataChangeProposalWrapper mcpw = MetadataChangeProposalWrapper.builder()
.entityType("dataset")
.entityUrn(datasetUrn)
.upsert()
.aspect(schemaMetadata)
.build();
// Emit to DataHub GMS
String token = ""; // Add your token here
RestEmitter emitter = RestEmitter.create(b -> b.server("<http://localhost:8080>").token(token));
Future<MetadataWriteResponse> response = emitter.emit(mcpw, null);
System.out.println(response.get().getResponseContent());
}
}
This example demonstrates how to create a dataset URN, prepare the schema metadata, wrap it in a metadata change proposal, and emit it to the DataHub GMS using the Java SDK.
Would you like more details on any specific part of this process?
Sources:
- https://github.com/datahub-project/datahub/blob/master/metadata-integration/java/examples/src/main/java/io/datahubproject/examples/DatasetAdd.java|DatasetAdd.java
- <Dataset | DataHub Dataset>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
Hi, how did my answer work out for you? Need any more assistance? You can also tag <@U07FV3UHE4D> to let your fellow community members know you’re still looking for guidance.
Hi ! Did this interaction resolve your issue or do you still require assistance ?
Please tag @ RunLLM if you still need further assistance !
Or tag @Datahub Community Support if you require a person to take a look
Hope this helps ! ><