Skip to content

Conversation

@itsmvd
Copy link
Collaborator

@itsmvd itsmvd commented Mar 11, 2025

This PR adds a new LLM-powered feature to automatically generate forensic reports (stored as a Story) from starred events.

  • Adds a "Generate forensic report" button that appears when starred events filter is active
  • Processes starred events through an LLM provider to create a forensic analysis report
  • Stores the generated report as a Story
  • Implements metrics for tracking events processed and stories created
  • Limits initial implementation to 1000 starred events per report (will be expanded once we add max_token_count tracking for LLM providers)

New button in EventList.vue (only visible when viewing starred events)
image

Example LLM generated report from starred events
image

@itsmvd itsmvd marked this pull request as ready for review March 11, 2025 16:46
@itsmvd
Copy link
Collaborator Author

itsmvd commented Mar 11, 2025

Two notes:

@jaegeral
Copy link
Collaborator

comment from the sideline, I believe keeping the limit to 1000 events is plenty, even if we increase the possible number to use for llm in general, having somewhat of a limit here could be just fine.

@jkppr jkppr added the LLM AI all the things label Mar 29, 2025
this.isSummaryLoading = false
})
},
generateForensicReport() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really want to call it a Forensic Report? The term could be misunderstood, so maybe something more specific to what this function is doing: keyEventReport?

Copy link
Collaborator Author

@itsmvd itsmvd Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, in that case I opt we go for generateStarredEventsReport, and to call the feature llm_starred_events_report etc. Agreed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

@itsmvd itsmvd requested a review from jkppr April 7, 2025 11:38
Copy link
Collaborator

@jkppr jkppr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

round 2

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new placement makes sense to me. However, please remove all mentions of any AI/LLM context, since this utils should be used independently. (e.g. the default title)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename variables as discussed. llm_starred_events_report etc

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for the prod deployment we also need add the prompt file to the # Fetch default Timesketch config files step in the contrib/deploy_timesketch.sh and ps1 files.

Comment on lines +1007 to +1008
this.warningSnackBar('This feature is currently limited to a 1000 starred events, try setting a timerange filter. ' +
'This limit will be increased soon.', 10000);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about making this limit configurable in the timesketch.conf? This would allow admins to increase it themself if the like to any time.

In general: Lets remove the "This limit will be increased soon." part from the message. It is out of the control of the TS operator or user and we don't have a timeline for when we can increase it. So not having this, is long-term more reliable than having this and never increase the limit.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove this last part of the message. Making it configurable for an admin I'm not a fan of at the moment, because if they increase the limit to a size that won't fit inside the chosen LLM provider's context window, the feature will break. So better if we can control this setting for now until we can include such granular settings.

sketch: The Sketch object containing events to analyze.
**kwargs: Additional arguments including:
- form: Form data containing query and filter information.
- datastore: OpenSearch instance for querying.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the backend, you actually don't need the datastore object to query. You only need the sketch_id and can then use the wrapper I mentioned above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which wrapper "mentioned above" are you referring to? I can't seem to find it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your generate_prompt function takes a datastore object as input argument. But it does not need that, since you can just load from timesketch.lib.datastore.opensearch import OpenSearchDataStore as datastore and execute datastore.search() on the backend from within the function without it being handed over as argument. There is only one datastore to be used in Timesketch and that is not going to change anytime soon.

Comment on lines +184 to +185
if not form:
raise ValueError("Missing 'form' data in kwargs")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If form is a required argument, why is it in **kwargs and not in the normal args list?

)

if events_df is None or events_df.empty:
return "No events to analyze for forensic report."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this will be the response handled by the llm API endpoint. iirc it will then still call the "process results" function? Something you want to address here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be fixed by introducing a custom SkipLLMError exception, I have a commit that will address that will address this issue for this feature as well as for llm_summarize feature, which I'll send once the changes in #3379 have been merged.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#3379 was merged.

timeline_ids=timeline_ids,
)

total_events_count = len(events_df)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlikely since the icon only shows when there are stared events, but maybe catch if there are no results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

LLM AI all the things

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants