Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Advance Entity Field Set to Stage 1 #2461

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tinnytintin10
Copy link
Contributor

This PR advances the Entity Field Set RFC (0049) from Stage 0 (strawperson) to Stage 1 (draft).

Changes Since Stage 0

Since the initial Stage 0 proposal (PR #2434), the following additions have been made:

  • Added a "Usage" section highlighting how the entity field set enables normalized entity data querying and its role in the upcoming security solution inventory experience

  • Added "Source data" section explaining how the field set's taxonomy allows entity modeling from any data source

  • Added "Concerns" section addressing potential challenges (To Do)

  • Added subject matter experts to the "People" section

  • Created YAML schema definition in the rfcs/text/0049/ directory

Next Steps

After advancing to Stage 1, we plan to:

  1. Implement experimental field definitions in the ECS schema
  2. Gather feedback from early adopters
  3. Refine the field definitions based on practical usage
  4. Begin work toward Stage 2 criteria

@tinnytintin10
Copy link
Contributor Author

@mjwolf, to what level of detail are we supposed to document usage and source data sections in a stage 1 RFC? Does the current level of detail I provide suffice? Also, for the concerns section, are we supposed to update that during the PR review process or upfront (I guess a mix of both but wanted to clarify)? Thanks!

@tinnytintin10 tinnytintin10 marked this pull request as ready for review April 1, 2025 12:51
@tinnytintin10 tinnytintin10 requested a review from a team as a code owner April 1, 2025 12:51
@tinnytintin10 tinnytintin10 requested a review from hop-dev April 1, 2025 13:02
field sets (e.g., host), this field should mirror the corresponding *.name value.
example: my-production-database, web-server-01, payment-processing-queue

- name: entity.url
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we consider entity.reference here? url is quite specific

can be preserved in entity.raw.
example: i-04ff5d36be3d6896c, arn:aws:s3:::my-bucket, projects/123456789/locations/us-central1/instances/my-db

- name: entity.source
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about provider? I couldn't find any example of *.source field

multi_fields:
- name: text
type: text
short: The human-readable name of the entity.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not very human readeable in the examples. Do we need additional entity.title to capture the human readable title of the entity? as in EC2 instance maxcold-ec2 with title EC2 instance for testing

<!--
* Stage 1: https://github.com/elastic/ecs/pull/NNN
...
-->
* Stage 0: https://github.com/elastic/ecs/pull/2434
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can leave this in, for historical reference

The following are the people that consulted on the contents of this RFC.

* Author: @tinnytintin10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can leave this info from stage 0 in, along with the below sections. The RFCs stages are intended to build on each other, so you can keep relevant data from past stages.


<!--
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessarily required, but if you leave in the comments for future stages, it'll be easier for you to update this doc when you get to those stages.

level: core
type: keyword
short: Source module or integration that provided the entity data.
description: >
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the document source can usually be determined from event.dataset or the index its in. Do you know of any cases where the existing data isn't enough to determine the source?

title: Entity
group: 2
type: group
short: Fields to describe various types of entities across IT environments.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: What's IT environments? We might have term to describe de space in which entities can be found.

IT Environments has the common usage in inside the company environments, meanwhile this field should capture entities from services and external references too

Comment on lines +22 to +28
A unique identifier for the entity. When multiple identifiers exist, this should be
the most stable and commonly used identifier that: 1) persists across the entity's
lifecycle, 2) ensures uniqueness within its scope, 3) is commonly used for queries
and correlation, and 4) is readily available in most observations (logs/events).
For entities with dedicated field sets (e.g., host, user), this value should match
the corresponding *.id field. Alternative identifiers (e.g., ARNs values in AWS, URLs)
can be preserved in entity.raw.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this value should match the corresponding *.id field. Alternative identifiers (e.g., ARNs values in AWS, URLs) can be preserved in entity.raw

Specially the ARN values in AWS seems to contradict what was previously suggested as guidance to entity ids. Am I misreading something?

Comment on lines +46 to +48
A standardized high-level classification of the entity. This provides a normalized way
to group similar entities across different providers or systems. There will be an
allowed set of values maintained for this field to ensure consistency.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This contradicts what was previously mentioned in the document

The entity.type field needs a controlled vocabulary to maintain consistency and interoperability. However, an overly restrictive list might limit the field set's utility for emerging technologies and use cases.
Potential solution: Establish a governance process for entity.type values, including an initial set of well-defined types and a mechanism for proposing and reviewing new types. Document a clear taxonomy with examples to guide users in selecting appropriate types.

Do we really want to add an allowed set of values maintained for this field? or only guidance? Also the wording "there will be" seems like we are promising things in a documentation. Maybe it's good to avoid it?

Supports existence queries, exact value matches, and simple aggregations.
dynamic: true

- name: entity.risk
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we map also *risk.calculated_level, *risk.calculated_score and *risk.calculated_score_norm? Or add a reference to risk? And maybe also update that page to mention entity.risk

provides more granular classification than entity.type. While entity.type provides a normalized
classification across different systems, entity.sub_type preserves the provider-specific
categorization.
example: aws_s3_bucket, gcp_cloud_storage_bucket, azure_blob_container, aws_lambda_function
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should favor the original asset type string as it's defined by the vendor and only create one if there isn't any?

this list would be:

name vendor name
aws_s3_bucket AWS::S3::Bucket
gcp_cloud_storage_bucket storage.googleapis.com/Bucket
azure_blob_container Microsoft.Storage/storageAccounts/blobServices/containers
aws_lambda_function AWS::Lambda::Function

romulets added a commit to romulets/kibana that referenced this pull request Apr 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants