<@U06TM7M40RK> datahub 0.14.04를 쓰고 있는데, 만약 entity는 데이터 셋이고 custom aspect를 만들건데, 이때 사용할 urn에 어떤 규칙이 있나?
To create a custom aspect for a dataset entity in DataHub 0.14.0, you need to follow the URN (Uniform Resource Name) conventions specified by DataHub. Here are the key rules and guidelines for constructing URNs:
-
URN Structure: The general structure of a URN in DataHub is:
urn:<Namespace>:<Entity Type>:<ID>
- Namespace: Typically,
li
is used as the namespace, but it can be customized for your organization. - Entity Type: This specifies the type of entity, such as
dataset
. - ID: This is the unique identifier for the entity within the specified namespace and entity type.
- Namespace: Typically,
-
Dataset URN Example: For a dataset, the URN might look like:
urn:li:dataset:(urn:li:dataPlatform:<platform>,<name>,<env>)
- platform: The data platform, e.g.,
kafka
,hdfs
. - name: The name of the dataset.
- env: The environment, e.g.,
PROD
,DEV
.
- platform: The data platform, e.g.,
-
Restrictions: There are certain characters that are reserved and should not be used in URN fields:
- Commas
,
- Parentheses
(
and)
- Colons
:
- Commas
-
Custom Aspect: When creating a custom aspect, you will need to define the URN for the dataset entity that the aspect will be associated with. Ensure that the URN adheres to the above structure and restrictions.
Here is an example of a URN for a dataset in a Kafka platform in the production environment:
urn:li:dataset:(urn:li:dataPlatform:kafka,PageViewEvent,PROD)
Would you like more detailed steps or examples on how to set up and configure these components?
Sources:
- <https://github.com/datahub-project/datahub/blob/master/docs/what/urn.md|What is URN?>
- <https://datahubproject.io/docs/0.14.0/what/urn|URN Documentation> 0 button 0 button Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> 내가 아래와 같이 만들면 정상인가?
The URN you provided follows the general structure and rules for constructing a URN in DataHub. Here is a breakdown of your URN:
urn:li:dataset:(urn:li:dataPlatform:sql,query-1-1-1-1,PROD)
- Namespace:
li
- Entity Type:
dataset
- ID: This is a tuple consisting of:
- Data Platform:
sql
- Dataset Name:
query-1-1-1-1
- Environment:
PROD
- Data Platform:
Based on the provided data sources, your URN appears to be correctly formatted. It adheres to the structure and restrictions specified for URNs in DataHub.
Would you like more detailed steps or further assistance on any of these points?
Sources:
- <https://github.com/datahub-project/datahub/blob/master/docs/what/urn.md|What is URN?> 0 button 0 button Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> query-1-1-1-1대신 1.1.1.1로 써도 되나?
Using 1.1.1.1
instead of query-1-1-1-1
in your URN is not recommended because periods .
are not explicitly listed as reserved characters in the provided data sources, but they can potentially cause issues in certain contexts. The recommended practice is to avoid using special characters that might be misinterpreted or cause parsing issues.
Here is the URN with 1.1.1.1
:
urn:li:dataset:(urn:li:dataPlatform:sql,1.1.1.1,PROD)
While this URN might technically be valid, it is safer to use alphanumeric characters and hyphens to avoid any potential issues. If you need to use a specific format, consider using hyphens or underscores instead of periods.
Would you like more detailed steps or further assistance on any of these points?
Sources:
- https://datahubspace.slack.com/archives/CUMUWQU66/p1717504405.515149|https://datahubspace.slack.com/archives/CUMUWQU66/p1717504405.515149 0 button 0 button Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> 그러면, 1.1.1.1대신 1-1-1-1쓰면 된나?
Yes, using 1-1-1-1
instead of 1.1.1.1
is a good approach. Hyphens -
are safe to use and will not cause parsing issues. Here is the updated URN with 1-1-1-1
:
urn:li:dataset:(urn:li:dataPlatform:sql,1-1-1-1,PROD)
This URN adheres to the structure and restrictions specified for URNs in DataHub and should work without any issues.
Would you like more detailed steps or further assistance on any of these points?
Sources:
- https://datahubspace.slack.com/archives/CUMUWQU66/p1717504405.515149|https://datahubspace.slack.com/archives/CUMUWQU66/p1717504405.515149 0 button 0 button Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> platform자리에 sql대신 query쓰면 오류발생하나?
Using query
instead of sql
in the platform part of the URN is not recommended and will likely cause issues. The platform part of the URN should correspond to a recognized data platform in DataHub. Common examples include mysql
, hive
, kafka
, etc.
Here is the URN with query
:
urn:li:dataset:(urn:li:dataPlatform:query,1-1-1-1,PROD)
Since query
is not a recognized data platform, this could lead to errors or unexpected behavior in DataHub. It is important to use a valid data platform identifier that DataHub recognizes.
Would you like more detailed steps or further assistance on any of these points?
Sources:
- <https://github.com/datahub-project/datahub/blob/master/docs/what-is-datahub/datahub-concepts.md|DataHub Concepts> 0 button 0 button Hint: Mention @runllm in the thread for followups.
<@U06TM7M40RK> 그러면 sql은 괜찮지?
Yes, using sql
as the platform part of the URN is appropriate and should not cause any issues. Here is the URN with sql
:
urn:li:dataset:(urn:li:dataPlatform:sql,1-1-1-1,PROD)
This URN adheres to the structure and restrictions specified for URNs in DataHub and should work without any issues.
Would you like more detailed steps or further assistance on any of these points?
Sources:
- https://datahubspace.slack.com/archives/CUMUWQU66/p1710081738.221519|https://datahubspace.slack.com/archives/CUMUWQU66/p1710081738.221519
- https://datahubspace.slack.com/archives/CUMUWQU66/p1724811760.177249|https://datahubspace.slack.com/archives/CUMUWQU66/p1724811760.177249 0 button 0 button Hint: Mention @runllm in the thread for followups.