Skip to content

Latest commit

 

History

History
472 lines (393 loc) · 9.3 KB

24-enrichment.md

File metadata and controls

472 lines (393 loc) · 9.3 KB
SPDX-FileCopyrightText SPDX-License-Identifier title author footer description keywords color class style
© 2023 Menacit AB <[email protected]>
CC-BY-SA-4.0
Logging course: Enrichment
Joel Rangsmo <[email protected]>
© Course authors (CC BY-SA 4.0)
Basics of automated enrichment in logging course
logging
siem
course
#ffffff
invert
section.center { text-align: center; } table strong { color: #d63030; } table em { color: #2ce172; }

Enriching logs

Aiding our data analysis

bg right:30%


"Enrichment" is the process of improving the value of our logs.

Often this means providing useful context for analysts and machines alike.

We've already played around with adding GeoIP information.

Let's look at some more examples and how to implement them in OpenSearch.

bg right:30%


What about that source/dest?

  • IP reputation
  • IP type (residential, cloud, proxy, etc.)
  • Current host patch level
  • Vulnerability scan and/or Shodan results
  • All kinds of CMDB data!

bg right:30%


Let's not forget humans!

  • Role description
  • Employment location / Timezone
  • Occurrence in data leaks
  • Contact information

bg right:30%


Enrichment can be performed during ingestion or at search-time.

Like with field parsing, both have their pros/cons.

Current relevance VS Historic accuracy.

bg right:30%


Useful filter plugins

  • GeoIP and user agent
  • DNS (forward/reverse lookups)
  • Translate
  • JDBC and Memcached
  • HTTP client

...and as always, "ruby"!

bg right:30%


# Forward lookup
$ host suspicious.example.com

suspicious.example.com has address 93.184.215.14
suspiciousexample.com has IPv6 address
2606:2800:21f:cb07:6820:80da:af6b:8b2c

# Reverse lookup
$ host 93.184.215.14

14.215.184.93.in-addr.arpa domain name pointer
suspicious.example.com.

Erghh - less talk, more examples!

bg right:30%


/var/ioc/evil_ip.csv

157.245.96.121,Observed in logs during 2022 Xmplify incident
185.120.19.98,Associated with Explum spear phishing campaign
194.61.40.74,Have been trying to brutforce our VPN for years!

Logstash filter pipeline

[...]

if [source][ip] {
  translate {
    source => "[source][ip]"
    target => "ip_related_to_incident"
    dictionary_path => "/var/ioc/evil_ip.csv"
  }
}

[...]

bg right:30%


[...]

"must": [
  {
    "match_phrase": {
      "tags.keyword": "web_server_access"
    }
  },
  {
    "exists": {
      "field": "ip_related_to_incident"
    }
  }
]

[...]

bg right:30%


[...]

"hits" : [
  {
    "_index" : "logs-web_servers-2023.11.20",
    "_id" : "6C0B74sB7PKVx7m-L2xx",
    "_score" : 1.0048822,
    "_source" : {
      "url" : "/internal/nuke_control.aspx",
      "ip_related_to_incident" : "Associated with Explum spear phishing campaign",
      "source" : {
        "ip" : "185.120.19.98",
      [...]

While OpenSearch relies heavily on parsing/enrichment during ingestion, there are some neat things we can do at search-time.

bg right:30%


{
  "known_evil_ip_addresses": [
    "34.76.96.55",
    "198.235.24.39",
    "157.245.96.121",
    "143.198.117.36"
  ],
  "scripted_http_clients": [
    "curl",
    "Go-http-client",
    "Python Requests",
    "Nmap Network Scanner"
  ]
}
$ curl \
  "${BASE_URL}/mylookupdata/_doc/ioc" \
  --request PUT --data @ioc.json \
  --header 'Content-Type: application/json'

bg right:30%


{
  "query": {
    "bool": {
      "must": [
        {
          "match_phrase": {
            "tags.keyword": "web_server_access"
          }
        },
        {
          "terms": {
            "source.ip": {
              "index": "mylookupdata",
              "id": "ioc",
              "path": "known_evil_ip_addresses"
            }
          }
        }
      ],
      "must_not": [
        {
          "match": {
            "raw_user_agent": {
              "query": "CensysInspect"
            }
          }
        }
      ],
      "should": [
        {
          "terms": {
            "user_agent.name": {
              "index": "mylookupdata",
              "id": "ioc",
              "path": "scripted_http_clients"
            }
          }
        }
      ]
    }
  }
}

bg right:30%


[...]

   "must": [
     {
       "match_phrase": {
         "tags.keyword": "web_server_access"
       }
     },
     {
       "terms": {
         "source.ip": {
           "index": "mylookupdata",
           "id": "ioc",
           "path": "known_evil_ip_addresses"
         }
       }
     }
   ],

[...]

bg right:30%


[...]

  "must_not": [
    {
      "match": {
        "raw_user_agent": {
          "query": "CensysInspect"
        }
      }
    }
  ],

[...]

bg right:30%


[...]

  "should": [
    {
      "terms": {
        "user_agent.name": {
          "index": "mylookupdata",
          "id": "ioc",
          "path": "scripted_http_clients"
        }
      }
    }
  ]

[...]

bg right:30%


[...]

  "hits" : {
    "total" : {
      "value" : 28,
      "relation" : "eq"
    },
    "max_score" : 2.0053382,
    "hits" : [
      {
        "_index" : "logs-web_servers-2023.11.20",
        "_id" : "53JE6osBQrucVyA5EqK1",
        "_score" : 2.0053382,
        "_source" : {
          "request_method": "GET"
          "request_path" : "/admin.php",
          "raw_user_agent" : "curl/8.1.2",
          "source" : {
            "ip" : "143.198.117.36",
            "geo" : {
              "country_iso_code" : "US",
              "continent_code" : "NA",
              "country_name" : "United States"
            }

[...]

bg right:30%


Search pipelines and Painless scripts may be able to help, but a bit out of scope for this course.

bg right:30%


Elastic have since the fork added a feature to the proprietary Elasticsearch called "runtime" fields.

Acts a bit like JOIN statements does in traditional SQL databases.

Very useful for enrichment and OpenSearch is working on a similar solution.

bg right:30%


{
  "query": {
    "match": {
      "ids_alert_title": {
        "query": "exploit attempt"
      }
    }
  },
  "runtime_mappings": {
    "cve_details": {
      "type": "lookup",
      "target_index": "myvulns",
      "input_field": "related_cve",
      "target_field": "id", 
      "fetch_fields": [
        "cvss_score",
        "description",
        "included_in_kev"
      ]
    } 
  }
}

bg right:30%


The middle path

input {
  opensearch {
    hosts => ["https://opensearch:9200"]
    user => "logger"
    password => "G0d="
    ssl => true
    ssl_certificate_verification => false

    schedule => "00 03 * * *"
    index => "logs-*"
    query => '{"query": {"match_all": {}}}'
  }
}

[...]

bg right:30%


Beware of the cost

Doing all that processing ain't free and will add latency.

Increased query and storage costs.

Complexity in ingestion pipelines increase the risk of disturbances.

bg right:30%


Conclusion

You've hopefully tasted the sweet fruit of possibilities!

Most organizations have tons of potentially useful data laying around - let's use it!

Computers are cheap, humans are not.

bg right:30%