Skip to content

Commit

Permalink
Vuneravility Analyst
Browse files Browse the repository at this point in the history
In this example, we first load data from the KDD Cup 1999 dataset, which contains labeled network activity logs. Then, we perform the necessary data preprocessing, such as removing non-relevant columns and encoding categorical variables.

After normalizing the data, we build an Autoencoder model using the TensorFlow Keras library. The Autoencoder is trained to reconstruct the input data with the least possible loss. Next, we calculate the reconstruction error on the training data and set a threshold for anomaly detection based on a high percentile of the reconstruction error.

Finally, we use the set threshold to predict anomalies in the test data and evaluate the performance of the model using classification metrics. This advanced approach allows us to detect intrusions in a network environment using unsupervised learning techniques based on Autoencoders.
  • Loading branch information
TrinityBerserker authored Apr 15, 2024
1 parent 4c7eec7 commit d1555be
Showing 1 changed file with 51 additions and 0 deletions.
51 changes: 51 additions & 0 deletions vulnerabilidad5.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Cargar los datos del conjunto de datos KDD Cup 1999
data = pd.read_csv("kddcup.csv")

# Preprocesamiento de datos
# Eliminar columnas no relevantes y codificar variables categóricas

# Separar características y etiquetas
X = data.drop("label", axis=1)
y = data["label"]

# Normalizar los datos
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Dividir los datos en conjuntos de entrenamiento y prueba
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Construir el modelo de Autoencoder
model = Sequential([
Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
Dense(32, activation='relu'),
Dense(64, activation='relu'),
Dense(X_train.shape[1], activation='sigmoid')
])

model.compile(optimizer='adam', loss='mse')

# Entrenar el modelo de Autoencoder
model.fit(X_train, X_train, epochs=10, batch_size=32, validation_data=(X_test, X_test))

# Reconstruir los datos de entrada y calcular el error de reconstrucción
X_train_pred = model.predict(X_train)
train_mse = np.mean(np.power(X_train - X_train_pred, 2), axis=1)

# Establecer un umbral para la detección de anomalías
threshold = np.percentile(train_mse, 95)

# Predecir anomalías en los datos de prueba
X_test_pred = model.predict(X_test)
test_mse = np.mean(np.power(X_test - X_test_pred, 2), axis=1)
y_pred = (test_mse > threshold).astype(int)

# Imprimir métricas de evaluación del modelo
print(classification_report(y_test, y_pred))

0 comments on commit d1555be

Please sign in to comment.