SPDX-FileCopyrightText | SPDX-License-Identifier | title | author | footer | description | keywords | color | class | style | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
© 2023 Menacit AB <[email protected]> |
CC-BY-SA-4.0 |
Logging course: Enrichment |
Joel Rangsmo <[email protected]> |
© Course authors (CC BY-SA 4.0) |
Basics of automated enrichment in logging course |
|
#ffffff |
|
section.center {
text-align: center;
}
table strong {
color: #d63030;
}
table em {
color: #2ce172;
}
|
"Enrichment" is the process of improving the value of our logs.
Often this means providing useful context for analysts and machines alike.
We've already played around with adding GeoIP information.
Let's look at some more examples and how to implement them in OpenSearch.
- IP reputation
- IP type (residential, cloud, proxy, etc.)
- Current host patch level
- Vulnerability scan and/or Shodan results
- All kinds of CMDB data!
- Role description
- Employment location / Timezone
- Occurrence in data leaks
- Contact information
Enrichment can be performed during ingestion or at search-time.
Like with field parsing, both have their pros/cons.
Current relevance VS Historic accuracy.
- GeoIP and user agent
- DNS (forward/reverse lookups)
- Translate
- JDBC and Memcached
- HTTP client
...and as always, "ruby"!
# Forward lookup
$ host suspicious.example.com
suspicious.example.com has address 93.184.215.14
suspiciousexample.com has IPv6 address
2606:2800:21f:cb07:6820:80da:af6b:8b2c
# Reverse lookup
$ host 93.184.215.14
14.215.184.93.in-addr.arpa domain name pointer
suspicious.example.com.
Erghh - less talk, more examples!
157.245.96.121,Observed in logs during 2022 Xmplify incident
185.120.19.98,Associated with Explum spear phishing campaign
194.61.40.74,Have been trying to brutforce our VPN for years!
[...]
if [source][ip] {
translate {
source => "[source][ip]"
target => "ip_related_to_incident"
dictionary_path => "/var/ioc/evil_ip.csv"
}
}
[...]
[...]
"must": [
{
"match_phrase": {
"tags.keyword": "web_server_access"
}
},
{
"exists": {
"field": "ip_related_to_incident"
}
}
]
[...]
[...]
"hits" : [
{
"_index" : "logs-web_servers-2023.11.20",
"_id" : "6C0B74sB7PKVx7m-L2xx",
"_score" : 1.0048822,
"_source" : {
"url" : "/internal/nuke_control.aspx",
"ip_related_to_incident" : "Associated with Explum spear phishing campaign",
"source" : {
"ip" : "185.120.19.98",
[...]
While OpenSearch relies heavily on parsing/enrichment during ingestion, there are some neat things we can do at search-time.
{
"known_evil_ip_addresses": [
"34.76.96.55",
"198.235.24.39",
"157.245.96.121",
"143.198.117.36"
],
"scripted_http_clients": [
"curl",
"Go-http-client",
"Python Requests",
"Nmap Network Scanner"
]
}
$ curl \
"${BASE_URL}/mylookupdata/_doc/ioc" \
--request PUT --data @ioc.json \
--header 'Content-Type: application/json'
{
"query": {
"bool": {
"must": [
{
"match_phrase": {
"tags.keyword": "web_server_access"
}
},
{
"terms": {
"source.ip": {
"index": "mylookupdata",
"id": "ioc",
"path": "known_evil_ip_addresses"
}
}
}
],
"must_not": [
{
"match": {
"raw_user_agent": {
"query": "CensysInspect"
}
}
}
],
"should": [
{
"terms": {
"user_agent.name": {
"index": "mylookupdata",
"id": "ioc",
"path": "scripted_http_clients"
}
}
}
]
}
}
}
[...]
"must": [
{
"match_phrase": {
"tags.keyword": "web_server_access"
}
},
{
"terms": {
"source.ip": {
"index": "mylookupdata",
"id": "ioc",
"path": "known_evil_ip_addresses"
}
}
}
],
[...]
[...]
"must_not": [
{
"match": {
"raw_user_agent": {
"query": "CensysInspect"
}
}
}
],
[...]
[...]
"should": [
{
"terms": {
"user_agent.name": {
"index": "mylookupdata",
"id": "ioc",
"path": "scripted_http_clients"
}
}
}
]
[...]
[...]
"hits" : {
"total" : {
"value" : 28,
"relation" : "eq"
},
"max_score" : 2.0053382,
"hits" : [
{
"_index" : "logs-web_servers-2023.11.20",
"_id" : "53JE6osBQrucVyA5EqK1",
"_score" : 2.0053382,
"_source" : {
"request_method": "GET"
"request_path" : "/admin.php",
"raw_user_agent" : "curl/8.1.2",
"source" : {
"ip" : "143.198.117.36",
"geo" : {
"country_iso_code" : "US",
"continent_code" : "NA",
"country_name" : "United States"
}
[...]
Search pipelines and Painless scripts may be able to help, but a bit out of scope for this course.
Elastic have since the fork added a feature to the proprietary Elasticsearch called "runtime" fields.
Acts a bit like JOIN statements does in traditional SQL databases.
Very useful for enrichment and OpenSearch is working on a similar solution.
{
"query": {
"match": {
"ids_alert_title": {
"query": "exploit attempt"
}
}
},
"runtime_mappings": {
"cve_details": {
"type": "lookup",
"target_index": "myvulns",
"input_field": "related_cve",
"target_field": "id",
"fetch_fields": [
"cvss_score",
"description",
"included_in_kev"
]
}
}
}
input {
opensearch {
hosts => ["https://opensearch:9200"]
user => "logger"
password => "G0d="
ssl => true
ssl_certificate_verification => false
schedule => "00 03 * * *"
index => "logs-*"
query => '{"query": {"match_all": {}}}'
}
}
[...]
Doing all that processing ain't free and will add latency.
Increased query and storage costs.
Complexity in ingestion pipelines increase the risk of disturbances.
You've hopefully tasted the sweet fruit of possibilities!
Most organizations have tons of potentially useful data laying around - let's use it!
Computers are cheap, humans are not.