-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Advance Entity Field Set to Stage 1 #2461
base: main
Are you sure you want to change the base?
Conversation
@mjwolf, to what level of detail are we supposed to document usage and source data sections in a stage 1 RFC? Does the current level of detail I provide suffice? Also, for the concerns section, are we supposed to update that during the PR review process or upfront (I guess a mix of both but wanted to clarify)? Thanks! |
field sets (e.g., host), this field should mirror the corresponding *.name value. | ||
example: my-production-database, web-server-01, payment-processing-queue | ||
|
||
- name: entity.url |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we consider entity.reference
here? url
is quite specific
can be preserved in entity.raw. | ||
example: i-04ff5d36be3d6896c, arn:aws:s3:::my-bucket, projects/123456789/locations/us-central1/instances/my-db | ||
|
||
- name: entity.source |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about provider
? I couldn't find any example of *.source
field
multi_fields: | ||
- name: text | ||
type: text | ||
short: The human-readable name of the entity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not very human readeable in the examples. Do we need additional entity.title
to capture the human readable title of the entity? as in EC2 instance maxcold-ec2
with title EC2 instance for testing
<!-- | ||
* Stage 1: https://github.com/elastic/ecs/pull/NNN | ||
... | ||
--> | ||
* Stage 0: https://github.com/elastic/ecs/pull/2434 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can leave this in, for historical reference
The following are the people that consulted on the contents of this RFC. | ||
|
||
* Author: @tinnytintin10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can leave this info from stage 0 in, along with the below sections. The RFCs stages are intended to build on each other, so you can keep relevant data from past stages.
|
||
<!-- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not necessarily required, but if you leave in the comments for future stages, it'll be easier for you to update this doc when you get to those stages.
level: core | ||
type: keyword | ||
short: Source module or integration that provided the entity data. | ||
description: > |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the document source can usually be determined from event.dataset
or the index its in. Do you know of any cases where the existing data isn't enough to determine the source?
title: Entity | ||
group: 2 | ||
type: group | ||
short: Fields to describe various types of entities across IT environments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: What's IT environments? We might have term to describe de space in which entities can be found.
IT Environments has the common usage in inside the company environments, meanwhile this field should capture entities from services and external references too
A unique identifier for the entity. When multiple identifiers exist, this should be | ||
the most stable and commonly used identifier that: 1) persists across the entity's | ||
lifecycle, 2) ensures uniqueness within its scope, 3) is commonly used for queries | ||
and correlation, and 4) is readily available in most observations (logs/events). | ||
For entities with dedicated field sets (e.g., host, user), this value should match | ||
the corresponding *.id field. Alternative identifiers (e.g., ARNs values in AWS, URLs) | ||
can be preserved in entity.raw. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this value should match the corresponding *.id field. Alternative identifiers (e.g., ARNs values in AWS, URLs) can be preserved in entity.raw
Specially the ARN values in AWS seems to contradict what was previously suggested as guidance to entity ids. Am I misreading something?
A standardized high-level classification of the entity. This provides a normalized way | ||
to group similar entities across different providers or systems. There will be an | ||
allowed set of values maintained for this field to ensure consistency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This contradicts what was previously mentioned in the document
The
entity.type
field needs a controlled vocabulary to maintain consistency and interoperability. However, an overly restrictive list might limit the field set's utility for emerging technologies and use cases.
Potential solution: Establish a governance process forentity.type
values, including an initial set of well-defined types and a mechanism for proposing and reviewing new types. Document a clear taxonomy with examples to guide users in selecting appropriate types.
Do we really want to add an allowed set of values maintained for this field? or only guidance? Also the wording "there will be" seems like we are promising things in a documentation. Maybe it's good to avoid it?
Supports existence queries, exact value matches, and simple aggregations. | ||
dynamic: true | ||
|
||
- name: entity.risk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we map also *risk.calculated_level
, *risk.calculated_score
and *risk.calculated_score_norm
? Or add a reference to risk? And maybe also update that page to mention entity.risk
provides more granular classification than entity.type. While entity.type provides a normalized | ||
classification across different systems, entity.sub_type preserves the provider-specific | ||
categorization. | ||
example: aws_s3_bucket, gcp_cloud_storage_bucket, azure_blob_container, aws_lambda_function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should favor the original asset type string as it's defined by the vendor and only create one if there isn't any?
this list would be:
name | vendor name |
---|---|
aws_s3_bucket | AWS::S3::Bucket |
gcp_cloud_storage_bucket | storage.googleapis.com/Bucket |
azure_blob_container | Microsoft.Storage/storageAccounts/blobServices/containers |
aws_lambda_function | AWS::Lambda::Function |
This PR advances the Entity Field Set RFC (0049) from Stage 0 (strawperson) to Stage 1 (draft).
Changes Since Stage 0
Since the initial Stage 0 proposal (PR #2434), the following additions have been made:
Added a "Usage" section highlighting how the entity field set enables normalized entity data querying and its role in the upcoming security solution inventory experience
Added "Source data" section explaining how the field set's taxonomy allows entity modeling from any data source
Added "Concerns" section addressing potential challenges (To Do)
Added subject matter experts to the "People" section
Created YAML schema definition in the
rfcs/text/0049/
directoryNext Steps
After advancing to Stage 1, we plan to: