Skip to content

Conversation

@CTIBurn0ut
Copy link
Contributor

Summary

Fixes #5309 by ensuring we do not use CrowdStrike actor identifiers (e.g., LABYRINTHCHOLLIMA) as Intrusion Set names when processing actors associations found in collections like indicators, reports, yara_master, and snort_suricata_master.

Instead, when an object payload includes actors: ["<CS_ACTOR_ID>"], we resolve that identifier to the actor's canonical display name (and aliases where relevant) via CrowdStrike’s actor entity endpoint and use that name consistently when creating/updating the corresponding Intrusion Set. oai_citation:1‡e50b7e94-e625-496a-bef8-d88e8175f5d9_CrowdStrike_Connector_Issues.pdf

Problem

Multiple collections return an actors field containing CrowdStrike internal actor IDs, not the human-readable name. The connector previously treated the value as the Intrusion Set name, which led to Intrusion Sets being renamed back and forth:

  • Correct name when imported via the actors collection (e.g., "Wicked Panda")
  • Incorrect identifier name when imported via indicators/reports/etc. (e.g., "WICKEDPANDA" / "LABYRINTHCHOLLIMA")

This broke entity stability and enrichment workflows dependent on consistent STIX IDs.

Changes

  • Resolve actor identifiers from actors[] to canonical actor names before creating/updating Intrusion Sets.
  • Add lightweight caching to avoid repeated lookups for the same actor identifier within a connector run.
  • Ensure we do not overwrite an existing Intrusion Set name with an unresolved identifier if resolution fails (failsafe behavior).
  • Preserve existing relationship creation behavior, except for correcting the referenced Intrusion Set identity/name.

Why this approach

CrowdStrike’s actors array contains identifiers, and the correct name must be retrieved by querying the actor definition endpoint (GetIntelActorEntities). This makes the Intrusion Set naming stable across all collections while remaining faithful to the source.

Testing

Manual / functional verification:

  • Ingest indicators (and/or reports) that contain actors: ["<CS_ACTOR_ID>"]
  • Confirm Intrusion Sets are created/linked using the resolved actor name, not the identifier
  • Re-run ingestion across multiple collections and confirm Intrusion Set names do not change
  • Confirm expected relationships to Intrusion Sets remain intact

Logs:

  • Added debug logs showing actor-id → actor-name resolution (and cache hits) to aid troubleshooting.

Notes / Follow-ups

  • If an API key can read actors but not indicators (or vice versa), this change avoids breaking ingestion by failing safely when actor resolution isn’t possible.
  • Future enhancement: persist actor-id → name cache between runs (optional), if we see repeated resolution costs.

@CTIBurn0ut CTIBurn0ut requested a review from Kakudou December 13, 2025 22:21
Copy link
Member

@Kakudou Kakudou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,

Thanks for the PR.
Unfortunately, I have to mark it as "Request Changes" for a few reasons.

The first and most important one:
Those fixes do not seem to work. I still see intrusion names being created using the slug/id name.
They were retrieved using the "indicator" scope only.
image

Regarding the report part, it feels over-engineered and contains a lot of unused code.
I was unable to reach that code path, even after ingesting a full day of historical data.
Could you provide some insight about your configuration for this part, so I can retrieve the same dataset as you and observe the same behavior?

Comment on lines +132 to +141
elif isinstance(actor, str) and actor:
if actor in self._actor_cache:
actor_entity = self._actor_cache[actor]
elif self.actor_resolver is not None:
try:
actor_entity = self.actor_resolver(actor)
except Exception:
logger.exception("Failed to resolve actor identifier '%s'", actor)
actor_entity = None
self._actor_cache[actor] = actor_entity
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've let the ingestion running on a long time (timestamp at 0), to be able to catch that case.
Was unable to enter that elif, feel a little like overengineered, since the case doesn't look like to occurs on my side.

Where you able to trigger it ? Any insight on the conf/dataset you are using, so i can replicate that on my side ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. To me, this part of the code is unnecessary because "actors" always seems to be represented as a dict and contain all the information we need (especially the name) when retrieving reports.

related_indicators_with_related_entities,
self.report_guess_relations,
malwares_from_field=malwares_from_field,
actor_resolver=self.reports_api_cs.get_actor_entity_by_id,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As all previous comments, look overengineered, wasn't able to go through it a single time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. To me, this part of the code is unnecessary because "actors" always seems to be represented as a dictionary and contain all the information we need (especially the name) when retrieving reports.

Comment on lines +71 to +73
# NOTE: FalconPy Intel exposes GetIntelActorEntities; the underlying client is `self.cs_intel`.
# The response is expected to be a dict with a `body` that contains `resources`.
response = self.cs_intel.get_intel_actor_entities(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the previous comment, i never enter the 'actor_resolver' and so this method is never called.

But also, feel like get_intel_actor_entities don't exist on the falconpy, as we can see:
https://www.falconpy.io/Service-Collections/Intel.html?highlight=GetIntelActor#getintelactorentities
image

look like you should have used get_actor_entities instead.
But once again, in my tests cases, i've never encounter any reports going through 'actor_resolver'

Comment on lines +125 to +127
# Reports may provide actors as either full actor entities (dict) or as
# CrowdStrike actor identifiers (str). For identifiers, resolve via the
# provided resolver (if any) to get the canonical actor name.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was unable to verify that statement, leading to a lot of 'unused code' for my review.
I need more insight on how to trigger that.

And was unable to get one matching elif isinstance(actor, str) and actor

Comment on lines +62 to +64
if fields is None:
# Start with basic – can switch to "__full__" if you need more.
fields = ["__basic__"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not cleaned_slugs:
return ""

conditions = [f"name:'{slug}'" for slug in cleaned_slugs]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we filter on slug instead of name ?
image


fql_filter = self.build_slug_filter(cleaned_slugs)

return self.get_combined_actor_entities(

This comment was marked as outdated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not convinced of the use of
image

As described, it's used to retrieve IDs based on FQL query.

while:
image
Is used to retrieve the actor.

And so, to find the 'real name' we need the actor, not the IDs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you right on :
Furthermore, I'm getting 400 errors in my tests using this API when resolving actors associated to an indicator [{'code': 400, 'message': 'field last_modified_timestamp is not available for sorting'}]

got it too, even with query_actor_entities !

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[crowdstrike] Intrusion Set Names Losing Formatting (Missing Spaces / Forced Uppercase)

4 participants