SPDX-FileCopyrightText

SPDX-License-Identifier

title

author

footer

description

keywords

color

class

style

CC-BY-SA-4.0

Logging course: Enrichment

Joel Rangsmo <joel@menacit.se>

Basics of automated enrichment in logging course

logging

siem

course

#ffffff

invert

section.center { text-align: center; } table strong { color: #d63030; } table em { color: #2ce172; }

Enriching logs

Aiding our data analysis

"Enrichment" is the process of improving the value of our logs.

Often this means providing useful context for analysts and machines alike.

We've already played around with adding GeoIP information.

Let's look at some more examples and how to implement them in OpenSearch.

What about that source/dest?

IP reputation
IP type (residential, cloud, proxy, etc.)
Current host patch level
Vulnerability scan and/or Shodan results
All kinds of CMDB data!

Let's not forget humans!

Role description
Employment location / Timezone
Occurrence in data leaks
Contact information

Enrichment can be performed during ingestion or at search-time.

Like with field parsing, both have their pros/cons.

Current relevance VS Historic accuracy.

Useful filter plugins

GeoIP and user agent
DNS (forward/reverse lookups)
Translate
JDBC and Memcached
HTTP client

...and as always, "ruby"!

# Forward lookup
$ host suspicious.example.com

suspicious.example.com has address 93.184.215.14
suspiciousexample.com has IPv6 address
2606:2800:21f:cb07:6820:80da:af6b:8b2c

# Reverse lookup
$ host 93.184.215.14

14.215.184.93.in-addr.arpa domain name pointer
suspicious.example.com.

Erghh - less talk, more examples!

/var/ioc/evil_ip.csv

157.245.96.121,Observed in logs during 2022 Xmplify incident
185.120.19.98,Associated with Explum spear phishing campaign
194.61.40.74,Have been trying to brutforce our VPN for years!

Logstash filter pipeline

[...]

if [source][ip] {
  translate {
    source => "[source][ip]"
    target => "ip_related_to_incident"
    dictionary_path => "/var/ioc/evil_ip.csv"
  }
}

[...]

[...]

"must": [
  {
    "match_phrase": {
      "tags.keyword": "web_server_access"
    }
  },
  {
    "exists": {
      "field": "ip_related_to_incident"
    }
  }
]

[...]

[...]

"hits" : [
  {
    "_index" : "logs-web_servers-2023.11.20",
    "_id" : "6C0B74sB7PKVx7m-L2xx",
    "_score" : 1.0048822,
    "_source" : {
      "url" : "/internal/nuke_control.aspx",
      "ip_related_to_incident" : "Associated with Explum spear phishing campaign",
      "source" : {
        "ip" : "185.120.19.98",
      [...]

While OpenSearch relies heavily on parsing/enrichment during ingestion, there are some neat things we can do at search-time.

{
  "known_evil_ip_addresses": [
    "34.76.96.55",
    "198.235.24.39",
    "157.245.96.121",
    "143.198.117.36"
  ],
  "scripted_http_clients": [
    "curl",
    "Go-http-client",
    "Python Requests",
    "Nmap Network Scanner"
  ]
}

$ curl \
  "${BASE_URL}/mylookupdata/_doc/ioc" \
  --request PUT --data @ioc.json \
  --header 'Content-Type: application/json'

{
  "query": {
    "bool": {
      "must": [
        {
          "match_phrase": {
            "tags.keyword": "web_server_access"
          }
        },
        {
          "terms": {
            "source.ip": {
              "index": "mylookupdata",
              "id": "ioc",
              "path": "known_evil_ip_addresses"
            }
          }
        }
      ],
      "must_not": [
        {
          "match": {
            "raw_user_agent": {
              "query": "CensysInspect"
            }
          }
        }
      ],
      "should": [
        {
          "terms": {
            "user_agent.name": {
              "index": "mylookupdata",
              "id": "ioc",
              "path": "scripted_http_clients"
            }
          }
        }
      ]
    }
  }
}

[...]

   "must": [
     {
       "match_phrase": {
         "tags.keyword": "web_server_access"
       }
     },
     {
       "terms": {
         "source.ip": {
           "index": "mylookupdata",
           "id": "ioc",
           "path": "known_evil_ip_addresses"
         }
       }
     }
   ],

[...]

[...]

  "must_not": [
    {
      "match": {
        "raw_user_agent": {
          "query": "CensysInspect"
        }
      }
    }
  ],

[...]

[...]

  "should": [
    {
      "terms": {
        "user_agent.name": {
          "index": "mylookupdata",
          "id": "ioc",
          "path": "scripted_http_clients"
        }
      }
    }
  ]

[...]

[...]

  "hits" : {
    "total" : {
      "value" : 28,
      "relation" : "eq"
    },
    "max_score" : 2.0053382,
    "hits" : [
      {
        "_index" : "logs-web_servers-2023.11.20",
        "_id" : "53JE6osBQrucVyA5EqK1",
        "_score" : 2.0053382,
        "_source" : {
          "request_method": "GET"
          "request_path" : "/admin.php",
          "raw_user_agent" : "curl/8.1.2",
          "source" : {
            "ip" : "143.198.117.36",
            "geo" : {
              "country_iso_code" : "US",
              "continent_code" : "NA",
              "country_name" : "United States"
            }

[...]

Search pipelines and Painless scripts may be able to help, but a bit out of scope for this course.

Elastic have since the fork added a feature to the proprietary Elasticsearch called "runtime" fields.

Acts a bit like JOIN statements does in traditional SQL databases.

Very useful for enrichment and OpenSearch is working on a similar solution.

{
  "query": {
    "match": {
      "ids_alert_title": {
        "query": "exploit attempt"
      }
    }
  },
  "runtime_mappings": {
    "cve_details": {
      "type": "lookup",
      "target_index": "myvulns",
      "input_field": "related_cve",
      "target_field": "id", 
      "fetch_fields": [
        "cvss_score",
        "description",
        "included_in_kev"
      ]
    } 
  }
}

The middle path

input {
  opensearch {
    hosts => ["https://opensearch:9200"]
    user => "logger"
    password => "G0d="
    ssl => true
    ssl_certificate_verification => false

    schedule => "00 03 * * *"
    index => "logs-*"
    query => '{"query": {"match_all": {}}}'
  }
}

[...]

Beware of the cost

Doing all that processing ain't free and will add latency.

Increased query and storage costs.

Complexity in ingestion pipelines increase the risk of disturbances.

Conclusion

You've hopefully tasted the sweet fruit of possibilities!

Most organizations have tons of potentially useful data laying around - let's use it!

Computers are cheap, humans are not.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

24-enrichment.md

24-enrichment.md

Enriching logs

Aiding our data analysis

What about that source/dest?

Let's not forget humans!

Useful filter plugins

/var/ioc/evil_ip.csv

Logstash filter pipeline

The middle path

Beware of the cost

Conclusion

Files

24-enrichment.md

Latest commit

History

24-enrichment.md

File metadata and controls

Enriching logs

Aiding our data analysis

What about that source/dest?

Let's not forget humans!

Useful filter plugins

/var/ioc/evil_ip.csv

Logstash filter pipeline

The middle path

Beware of the cost

Conclusion