extract_single_snapshot(df, day)
Extract and format a single snapshot from a DataFrame for a given day.
-
df:pd.DataFrame
The input DataFrame containing network data.
-
day:int
The specific day for which to extract the snapshot.
-
str
A formatted string representing the snapshot for the given day.
aggregate_edges(df, gt)
Aggregate edges in a DataFrame while counting packets and adding labels.
-
df:pd.DataFrame
The input DataFrame containing network data.
-
gt:pd.DataFrame
The ground truth labels for source IPs.
-
pd.DataFrame
An aggregated DataFrame with added packet count and labels for edges.
get_contacted_dst_ports(df)
Get the total number of contacted destination ports per source IP.
-
df:pd.DataFrame
The input DataFrame containing network data.
-
pd.DataFrame
A DataFrame with the total number of contacted destination ports per source IP.
get_stats_per_dst_port(df)
Get general statistics of packets per destination port.
-
df:pd.DataFrame
The input DataFrame containing network data.
-
pd.DataFrame
A DataFrame with general statistics of packets per destination port per source IP.
get_contacted_src_ips(df)
Get the total number of contacted source IPs per destination port.
-
df:pd.DataFrame
The input DataFrame containing network data.
-
pd.DataFrame
A DataFrame with the total number of contacted source IPs per destination port.
get_stats_per_src_ip(df)
Get general statistics of packets per source IP per destination port.
-
df:pd.DataFrame
The input DataFrame containing network data.
-
pd.DataFrame
A DataFrame with general statistics of packets per source IP per destination port.
get_contacted_dst_ips(df, dummy=False)
Get the total number of contacted darknet IPs per source IP or destination port.
-
df:pd.DataFrame
The input DataFrame containing network data.
-
dummy:bool, optional
If True, calculates the total number of contacted darknet IPs per destination port, by default False.
-
pd.DataFrame
A DataFrame with the total number of contacted darknet IPs per source IP or destination port.
get_stats_per_dst_ip(df, dummy=False)
Get general statistics of packets per destination IP per source IP or destination port.
-
df:pd.DataFrame
The input DataFrame containing network data.
-
dummy:bool, optional
If True, calculates statistics per destination IP per destination port, by default False.
-
pd.DataFrame
A DataFrame with general statistics of packets per destination IP per source IP or destination port.
get_packet_statistics(df, by='src_ip')
Get general packet statistics per source IP or destination port.
-
df:pd.DataFrame
The input DataFrame containing network data.
-
by:str, optional
The column by which to group the packet statistics ('src_ip' or 'dst_port'), by default 'src_ip'.
-
pd.DataFrame
A DataFrame with general packet statistics per source IP or destination port.
uniform_features(df, lookup, node_type)
Uniformly format and index features DataFrame based on node lookup.
-
df:pd.DataFrame
The input DataFrame containing node features.
-
lookup:dict
A dictionary mapping node names to IDs.
-
node_type:str
The type of nodes in the DataFrame (e.g., 'src_ip', 'dst_port').
-
pd.DataFrame
A uniformly formatted and indexed DataFrame of node features.
generate_adjacency_matrices(flist, weighted=True)
Generate adjacency matrices from a list of DataFrame files.
-
flist:list
A list of file paths, each containing a DataFrame of network data.
-
weighted:bool, optional
If True, the edges in the generated matrices will be weighted, by default True.
-
list
A list of torch sparse tensors representing the adjacency matrices.
drop_duplicates(x)
Remove consecutive duplicate elements from a NumPy array.
-
x:numpy.ndarray
The input NumPy array from which consecutive duplicates will be removed.
-
numpy.ndarray
A NumPy array with consecutive duplicate elements removed.
split_array(arr, step=1000)
Split a NumPy array into smaller sub-arrays of a specified step size.
-
arr:numpy.ndarray
The input NumPy array to be split.
-
step:int, optional
The size of each sub-array, by default 1000.
-
list
A list of NumPy sub-arrays obtained by splitting the input array.
generate_negatives(anomaly_num, active_source, active_dest, real_edges)
Generate negative edges for self-supervised training.
-
anomaly_num:int
Number of negative edges to generate.
-
active_source:numpy.ndarray
Array of active source nodes.
-
active_dest:numpy.ndarray
Array of active destination nodes.
-
real_edges:numpy.ndarray
Array of real edges in the graph.
-
torch.Tensor
A tensor containing the generated negative edges.
get_self_supervised_edges(X_to_predict, cuda, ns)
Get self-supervised edges for training.
-
X_to_predict:torch.Tensor
The input adjacency matrix for which self-supervised edges are generated.
-
cuda:bool
Indicates whether to use CUDA (GPU) for tensor operations.
-
ns:int
Number of negative samples to generate for each positive edge.
-
tuple
A tuple containing the generated negative edges tensor and the index tensor.
load_single_file(file, day)
Load and preprocess a single data file.
-
file:str
The path to the data file to load.
-
day:int
The day associated with the loaded data.
-
pandas.DataFrame
A DataFrame containing the preprocessed data.
apply_packets_filter(df, min_packets)
Apply a packet count filter to a DataFrame.
-
df:pandas.DataFrame
The input DataFrame containing packet data.
-
min_packets:int
The minimum number of packets a source IP must have to be retained.
-
pandas.DataFrame
A DataFrame with the packet count filter applied.
apply_port_filter(df, max_ports)
Apply a port count filter to a DataFrame.
-
df:pandas.DataFrame
The input DataFrame containing packet data.
-
max_ports:int
The maximum number of ports to retain in the "dst_port" column.
-
pandas.DataFrame
A DataFrame with the port count filter applied.