-
Notifications
You must be signed in to change notification settings - Fork 447
Update Entity Fields RFC for 9.2 Stack Release #2513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🤖 GitHub commentsExpand to view the GitHub comments
Just comment with:
|
Hi Mike, Thanks for the proposal! Is there a reason for having the new |
Good question. I don't know when this was added, or how it is used, but I found it in the existing mappings in 8.19.1.
|
Add example for entity.attributes.Managed
| Field | Type | Description | | ||
|-------|------|-------------| | ||
| entity.definition_id | keyword | Used Elastic solutions (e.g., Security, Observability) to denote the ID of the entity definition which is used to extract entity details from ingested logs, events, intelligence, and other data types. Use of this value is reserved, and ECS producers, including data ingestion pipelines, must not populate this field| | ||
| entity.definition_id | keyword | Used by Elastic solutions (e.g., Security, Observability) to denote the version of the entity definition which is used to extract entity details from ingested logs, events, intelligence, and other data types. Use of this value is reserved, and ECS producers, including data ingestion pipelines, must not populate this field| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
entity.definition_id
is repeated here
|
||
| Field | Type | Description | | ||
|-------|------|-------------| | ||
| entity.definition_id | keyword | Used Elastic solutions (e.g., Security, Observability) to denote the ID of the entity definition which is used to extract entity details from ingested logs, events, intelligence, and other data types. Use of this value is reserved, and ECS producers, including data ingestion pipelines, must not populate this field| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too sure about this field being reserved. There are no other fields in ECS that are reserved for use by Elastic, and I don't really know if it makes sense to have them. Since the intention of ECS is a common schema that's shared with others, I don't if this would make sense to include.
Are there alternatives to adding this, such as using a custom field that's not defined in ECS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree, it's an internal-only field - no need to have it defined in the ECS spec. We already intend to publish the entity store schema in our docs. Any problem with using entity.Definition_id
as a custom field? (nesting a custom leaf field under an ECS-defined root object entity.*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MikePaquette as we speak we are storing this information under entity.Metadata.EngineType
. What do you think about keeping it under metadata?
rfcs/text/0049-entity-fields.md
Outdated
| entity.reference | keyword | A URI, URL, or other direct reference to access or locate the entity in its source system. This could be an API endpoint, web console URL, or other addressable location. Format may vary by entity type and source system. | | ||
| entity.attributes.* | object | Normalized entity attributes using capitalized field names (e.g., `entity.attributes.StorageClass`, `entity.attributes.MfaEnabled`). Use this field set when you need specific data types, advanced search capabilities, or normalized values across different providers/sources. The capitalization pattern indicates these are entity-specific fields that won't be enumerated in the ECS schema. | | ||
| entity.raw.* | flattened | Original, unmodified fields from the source system stored in a flattened format that maintains basic searchability. While `entity.attributes` should be used for normalized fields requiring advanced queries, this field preserves all source metadata with basic search capabilities. Supports existence queries, exact value matches, and simple aggregations. | | ||
| entity.attributes.* | object | A set of static or semi-static attributes of the entity. Usually boolean or keyword field data types. Examples include: `entity.attributes.Storage_class`, `entity.attributes.Mfa_enabled` , `entity.attributes.Privileged` , `entity.attributes.Granted_permissions` , `entity.attributes.Known_redirect` , `entity.attributes.Asset` , `entity.attributes.Managed` ,`entity.attrbitues.Os_current` , `entity.attibutes.Os_patch_current` , `entity.attributes.Oauth_consent_restriction`). Use this field set when you need to track static or semi-static characterstics of an entity for advanced searching and correlation of normalized values across different providers/sources and entity types. Note the initial capitalization pattern for Examples indicates that any such fields are custom entity-specific fields that won't be enumerated in the ECS schema, and won't collide with any fields that may be defined by ECS in the future. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| entity.attributes.* | object | A set of static or semi-static attributes of the entity. Usually boolean or keyword field data types. Examples include: `entity.attributes.Storage_class`, `entity.attributes.Mfa_enabled` , `entity.attributes.Privileged` , `entity.attributes.Granted_permissions` , `entity.attributes.Known_redirect` , `entity.attributes.Asset` , `entity.attributes.Managed` ,`entity.attrbitues.Os_current` , `entity.attibutes.Os_patch_current` , `entity.attributes.Oauth_consent_restriction`). Use this field set when you need to track static or semi-static characterstics of an entity for advanced searching and correlation of normalized values across different providers/sources and entity types. Note the initial capitalization pattern for Examples indicates that any such fields are custom entity-specific fields that won't be enumerated in the ECS schema, and won't collide with any fields that may be defined by ECS in the future. | | |
| entity.attributes | object | A set of static or semi-static attributes of the entity. Usually boolean or keyword field data types. Examples include: `entity.attributes.Storage_class`, `entity.attributes.Mfa_enabled` , `entity.attributes.Privileged` , `entity.attributes.Granted_permissions` , `entity.attributes.Known_redirect` , `entity.attributes.Asset` , `entity.attributes.Managed` ,`entity.attrbitues.Os_current` , `entity.attibutes.Os_patch_current` , `entity.attributes.Oauth_consent_restriction`). Use this field set when you need to track static or semi-static characterstics of an entity for advanced searching and correlation of normalized values across different providers/sources and entity types. Note the initial capitalization pattern for Examples indicates that any such fields are custom entity-specific fields that won't be enumerated in the ECS schema, and won't collide with any fields that may be defined by ECS in the future. | |
The existing flattened fields in ECS don't use .*
in the name. I think it makes sense to remove here too, to be consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mjwolf what do you mean by "flattened" here? As proposed, the object will contain keyword or boolean leaf fields, and not the flattened field datatype. No problem with removing the .*
though, thanks.
|
||
For ECS producers, such as Beats, Elastic Agent integrations, ingest pipelines, and other methods for shipping data to Elastic, the `entity.*` fields are expected to be nested as follows: | ||
- If the entity type is one of host, user, service, cloud, orchestrator), then the entity fields should be nested under the respecitve root field set, for example `host.entity.*` , `user.entity.*`, etc. | ||
- If the entity type is not one of the above, then that `entity.*` fields should be nested under a new root-level object, called `generic`, as `generic.entity.*` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think generic
will have any other fields apart from generic.entity.*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not foreseen. Can anyone else think of a reason why we'd use the root field set generic.*
for another purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't think of any other reason
|
||
### ECS Consumers or Data Stores | ||
|
||
For ECS consumers, such as the Elastic Security Solution entity store indices, the `entity.*` fields should be used directly at the root of the events. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you expand on what this means, I don't think I understand it. What do you mean by "used". Why can a data store only use entity.*
if the producers can write to other top level fieldsets?
I think this is also an implementation specific detail that doesn't need to be part of ECS, so maybe it can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we can remove this from the ECS spec, but the idea is that the entity store schema will include entity.* at the root level in each document. this will allow us and users to query the entity store for entity ID's (and other attributes) w/o burdening them with knowing under which entity class {host, user, service, generic, etc.} it might be stored.
typo Co-authored-by: Uri Weisman <68195305+uri-weisman@users.noreply.github.com>
fix typos Co-authored-by: Rômulo Farias <romulodefarias@gmail.com>
|
||
| Field | Type | Description | | ||
|-------|------|-------------| | ||
| entity.definition_id | keyword | Used Elastic solutions (e.g., Security, Observability) to denote the ID of the entity definition which is used to extract entity details from ingested logs, events, intelligence, and other data types. Use of this value is reserved, and ECS producers, including data ingestion pipelines, must not populate this field| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MikePaquette as we speak we are storing this information under entity.Metadata.EngineType
. What do you think about keeping it under metadata?
|-------|------|-------------| | ||
| entity.definition_id | keyword | Used Elastic solutions (e.g., Security, Observability) to denote the ID of the entity definition which is used to extract entity details from ingested logs, events, intelligence, and other data types. Use of this value is reserved, and ECS producers, including data ingestion pipelines, must not populate this field| | ||
| entity.definition_id | keyword | Used by Elastic solutions (e.g., Security, Observability) to denote the version of the entity definition which is used to extract entity details from ingested logs, events, intelligence, and other data types. Use of this value is reserved, and ECS producers, including data ingestion pipelines, must not populate this field| | ||
| entity.schema_version | keyword | Denotes the version of the entity schema,as published in Elastic Security documentation, to which this entity information conforms. Usually conforms to the Elastic Stack version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a new guideline? Or is it something that already happens? I'm not aware of an entity schema published in Elastic Security Docs that usually conforms to elastic stack version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I found this in the existing index mappings 8.19.1, and was not aware of it. Included here in case it was already in use, but it does not appear to be used, so we can remove it from this RFC.
| entity.reference | keyword | A URI, URL, or other direct reference to access or locate the entity in its source system. This could be an API endpoint, web console URL, or other addressable location. Format may vary by entity type and source system. | | ||
| entity.attributes.* | object | Normalized entity attributes using capitalized field names (e.g., `entity.attributes.StorageClass`, `entity.attributes.MfaEnabled`). Use this field set when you need specific data types, advanced search capabilities, or normalized values across different providers/sources. The capitalization pattern indicates these are entity-specific fields that won't be enumerated in the ECS schema. | | ||
| entity.raw.* | flattened | Original, unmodified fields from the source system stored in a flattened format that maintains basic searchability. While `entity.attributes` should be used for normalized fields requiring advanced queries, this field preserves all source metadata with basic search capabilities. Supports existence queries, exact value matches, and simple aggregations. | | ||
| entity.attributes | object | A set of static or semi-static attributes of the entity. Usually boolean or keyword field data types. Examples include: `entity.attributes.Storage_class`, `entity.attributes.Mfa_enabled` , `entity.attributes.Privileged` , `entity.attributes.Granted_permissions` , `entity.attributes.Known_redirect` , `entity.attributes.Asset` , `entity.attributes.Managed` ,`entity.attributes.Os_current` , `entity.attributes.Os_patch_current` , `entity.attributes.Oauth_consent_restriction`). Use this field set when you need to track static or semi-static characterstics of an entity for advanced searching and correlation of normalized values across different providers/sources and entity types. Note the initial capitalization pattern for Examples indicates that any such fields are custom entity-specific fields that won't be enumerated in the ECS schema, and won't collide with any fields that may be defined by ECS in the future. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current pattern of snake case with capital first letter isn't friendly for most coding tools and linters, since it's not a standard pattern.
Could we adopt a pattern such as PascalCase
. Reading the documentation over capitizaion for non ecs fields I see precedent for it, since it's mentioned both in HAProxy and NGINX examples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mjwolf do you have a position on this topic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ECS uses snake case for multiword fields, so I think it makes the most sense to keep using that. OTel semantic conventions also uses snake case.
I think for these examples in documentation it should keep using snake case. For the actual implementations, we don't require it, so other cases are allowed (as long as it starts with a capital).
| entity.reference | keyword | A URI, URL, or other direct reference to access or locate the entity in its source system. This could be an API endpoint, web console URL, or other addressable location. Format may vary by entity type and source system. | | ||
| entity.attributes.* | object | Normalized entity attributes using capitalized field names (e.g., `entity.attributes.StorageClass`, `entity.attributes.MfaEnabled`). Use this field set when you need specific data types, advanced search capabilities, or normalized values across different providers/sources. The capitalization pattern indicates these are entity-specific fields that won't be enumerated in the ECS schema. | | ||
| entity.raw.* | flattened | Original, unmodified fields from the source system stored in a flattened format that maintains basic searchability. While `entity.attributes` should be used for normalized fields requiring advanced queries, this field preserves all source metadata with basic search capabilities. Supports existence queries, exact value matches, and simple aggregations. | | ||
| entity.attributes | object | A set of static or semi-static attributes of the entity. Usually boolean or keyword field data types. Examples include: `entity.attributes.Storage_class`, `entity.attributes.Mfa_enabled` , `entity.attributes.Privileged` , `entity.attributes.Granted_permissions` , `entity.attributes.Known_redirect` , `entity.attributes.Asset` , `entity.attributes.Managed` ,`entity.attributes.Os_current` , `entity.attributes.Os_patch_current` , `entity.attributes.Oauth_consent_restriction`). Use this field set when you need to track static or semi-static characterstics of an entity for advanced searching and correlation of normalized values across different providers/sources and entity types. Note the initial capitalization pattern for Examples indicates that any such fields are custom entity-specific fields that won't be enumerated in the ECS schema, and won't collide with any fields that may be defined by ECS in the future. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand what you mean "static or semi-static attributes" as opposed to lifecycle
and behaviour
fields, but maybe we have a better way of describing what do we mean by that? By the static or semi-static
definition, first_seen
and issued_at
would also fit under it, wouldn't it?
As I see attributes should be used to describe non temporal entity properties that could not be expressed in other parts of the entity field set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point, perhaps we can add "non-temporal" to the definition of the entity.attributes.*
fields? Would that help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so!
| entity.attributes | object | A set of static or semi-static attributes of the entity. Usually boolean or keyword field data types. Examples include: `entity.attributes.Storage_class`, `entity.attributes.Mfa_enabled` , `entity.attributes.Privileged` , `entity.attributes.Granted_permissions` , `entity.attributes.Known_redirect` , `entity.attributes.Asset` , `entity.attributes.Managed` ,`entity.attributes.Os_current` , `entity.attributes.Os_patch_current` , `entity.attributes.Oauth_consent_restriction`). Use this field set when you need to track static or semi-static characterstics of an entity for advanced searching and correlation of normalized values across different providers/sources and entity types. Note the initial capitalization pattern for Examples indicates that any such fields are custom entity-specific fields that won't be enumerated in the ECS schema, and won't collide with any fields that may be defined by ECS in the future. | | ||
| entity.lifecycle.* | object | A set of temporal characteristics of the entity. Usually date field data type. Examples include: `entity.lifecycle.First_seen`, `entity.lifecycle.Last_activity` , `entity.lifecycle.Issued_at` , `entity.lifecycle.Last_password_change` ,etc. ). Use this field set when you need to track temporal characterstics of an entity for advanced searching and correlation of normalized values across different providers/sources and entity types. Note the initial capitalization pattern indicates that any such fields are custom entity-specific fields that won't be enumerated in the ECS schema, and won't collide with any fields that may be defined by ECS in the future. | | ||
| entity.behavior.* | object | A set of ephemeral characteristics of the entity, derived from observed behaviors during a specific time period. Behaviors are usually captured in event logs under fields such as `event.action` and other fields, but this field set captures "attributified" behavior indicators, using semantics like "this behavior was seen one or more times during this time period." Sytems using this field set may need to force a "reset" of these behavioral indicators at the end of their current period. Usually boolean field data type. Examples include: `entity.behavior.Used_usb_device`, `entity.behavior.Brute_force_victim` , `entity.behavior.New_country_login` ,etc. ). Use this field set when you need to capture and track ephemeral characterstics of an entity for advanced searching, correlation of normalized values across different providers/sources and entity types. Note the initial capitalization pattern indicates that any such fields are custom entity-specific fields that won't be enumerated in the ECS schema, and won't collide with any fields that may be defined by ECS in the future. | | ||
| entity.raw.* | object | Original, unmodified fields from the source system stored in a flattened format that maintains basic searchability. While `entity.attributes` should be used for normalized fields requiring advanced queries, this field preserves all source metadata with basic search capabilities. Supports existence queries, exact value matches, and simple aggregations. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be of type flattened
instead of object
, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so for entity.raw
. We've agreed to remove the .*
from the objects, and this would be the same.
|
||
For ECS producers, such as Beats, Elastic Agent integrations, ingest pipelines, and other methods for shipping data to Elastic, the `entity.*` fields are expected to be nested as follows: | ||
- If the entity type is one of host, user, service, cloud, orchestrator), then the entity fields should be nested under the respecitve root field set, for example `host.entity.*` , `user.entity.*`, etc. | ||
- If the entity type is not one of the above, then that `entity.*` fields should be nested under a new root-level object, called `generic`, as `generic.entity.*` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't think of any other reason
This approach ensures backward compatibility, maintains existing ECS patterns, and preserves the rich metadata capabilities of established field sets while allowing the entity fields to provide a consistent way to identify and categorize *all* entity types. The entity fields serve as a complementary layer that enables unified entity representation, particularly for entity types that don't have dedicated field sets. | ||
|
||
## Usage | ||
## Field Re-use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe Field Usage
it's still a good section name since we are also talking about ECS consumers
using entity.*
as root
clarification. Co-authored-by: Rômulo Farias <romulodefarias@gmail.com>
two typos. Co-authored-by: Rômulo Farias <romulodefarias@gmail.com>
extraneous ")" Co-authored-by: Rômulo Farias <romulodefarias@gmail.com>
1. What does this PR do?
Adds newly proposed
entity.*
fields to the Entity Fields RFCAdds currently used
entity.*
fields to this RFCProposes a nested location in the naming hierarchy for
entity.*
fields populated by ECS producersProposes a root namespace location for
entity.*
fields when used by ECS consumers and entity-related data stores.2. Which ECS fields are affected/introduced?