<@U06TM7M40RK> does python SDK support setting custom or structured properties in a dataset field
Summary:
Yes, the DataHub Python SDK supports setting both custom properties and structured properties at the dataset level, but only structured properties can be set directly on a dataset field (i.e., a column/schema field). Custom properties are not supported at the field level; they are only available at the dataset (asset) level.
Custom Properties
- Custom properties are key-value pairs (strings) that can be attached to datasets, but not to individual fields/columns.
- You can set custom properties on a dataset using the Python SDK, as shown in the <https://datahubproject.io/docs/0.14.0/api/tutorials/custom-properties|official API tutorial> and <https://datahubproject.io/docs/0.14.0/features/feature-guides/properties|feature guide>.
- Example for dataset-level custom properties:
from datahub.emitter.mce_builder import make_dataset_urn from datahub.emitter.mcp import MetadataChangeProposalWrapper from datahub.metadata.schema_classes import DatasetPropertiesClass from datahub.emitter.rest_emitter import DatahubRestEmitter emitter = DatahubRestEmitter("http://localhost:8080") dataset_urn = make_dataset_urn(platform="hive", name="example_dataset", env="PROD") dataset_properties = DatasetPropertiesClass( description="A sample dataset", customProperties={"customKey1": "customValue1", "customKey2": "customValue2"} ) metadata_event = MetadataChangeProposalWrapper( entityUrn=dataset_urn, aspect=dataset_properties ) emitter.emit(metadata_event)
- Custom properties cannot be set on schema fields/columns. Attempts to do so will result in errors, as the
SchemaMetadataClass
does not accept acustomProperties
argument (<Slack reference>).
Structured Properties
- Structured properties are more advanced, typed, and validated properties that can be attached to both datasets and individual fields (columns).
- You can define structured properties and then associate them with a dataset or a specific field using YAML and the CLI, or programmatically via the SDK or OpenAPI.
- Example YAML for associating a structured property with a field:
Then upsert with:- urn: urn:li:dataset:(urn:li:dataPlatform:postgres,mydb.myschema.mytable,PROD) schema: fields: - id: my_column structured_properties: io.acryl.privacy.retentionTime: 30 structured_properties: io.acryl.privacy.retentionTime: 30
datahub dataset upsert -f dataset.yaml
(<https://datahubproject.io/docs/0.14.0/api/tutorials/structured-properties#set-structured-property-to-a-dataset|Structured Properties API Tutorial>)
- Python SDK: While most SDK examples focus on dataset-level structured properties, the YAML/CLI approach is the recommended and documented way to set structured properties on fields. There is currently no direct Python SDK helper for setting structured properties on fields, but you can use the CLI or OpenAPI for this purpose (<https://datahubproject.io/docs/0.14.0/api/tutorials/structured-properties|structured properties guide>).
Would you like more detail or a step-by-step example for a specific method (Python SDK, CLI, or OpenAPI) for setting structured properties on a dataset field?
Sources:
- <https://datahubproject.io/docs/0.14.0/api/tutorials/custom-properties|Custom Properties API Tutorial>
- <https://datahubproject.io/docs/0.14.0/api/tutorials/structured-properties|Structured Properties API Tutorial>
- <https://datahubproject.io/docs/0.14.0/features/feature-guides/properties|Properties Feature Guide>
- <https://datahubspace.slack.com/archives/CUMUWQU66/p1723523969.029449|Slack: customProperties not supported on schema fields>
- <https://datahubproject.io/docs/0.14.0/api/tutorials/structured-properties#set-structured-property-to-a-dataset|Structured Properties YAML Example>
0 button
0 button
Hint: Mention @runllm in the thread for followups.
Hi <@U08NZEMJUMR>, how did my answer work out for you? Need any more assistance? You can also tag <@U07FV3UHE4D> to let your fellow community members know you’re still looking for guidance.