This project implements AI-based targeted password guessing using transformer models (T5, BART, and Llama) to predict passwords based on user information and context. The system can be trained on custom datasets and used to generate password guesses for security research and penetration testing purposes.
- Python 3.8+
- PyTorch
- Transformers
- Datasets
- Other dependencies listed in requirements (see ```requirements.txt`` file)
-
Clone the repository:
-
Install required packages:
pip install requirements.txt- Set up your Hugging Face token in
tunners/constants.pyif using gated models like Llama.
├── config.json # Configuration for training and testing
├── generate_guesses.py # Main script for password generation
├── tunne_models.py # Model training script
├── generators/
│ ├── generator.py # Base generator class
│ └── llama_generator.py # Llama-specific generator
├── tunners/
│ ├── bart_tunner.py # BART fine-tuning
│ ├── llama_tunner.py # Llama fine-tuning
│ ├── t5_tunner.py # T5 fine-tuning
│ ├── tunner.py # Base tuner class
│ └── constants.py # Training constants and hyperparameters
└── dev/ # Development files
The main configuration is defined in config.json with separate sections for training and testing:
{
"id": 0,
"name": "T5",
"description": "T5 training",
"model_name": "google-t5/t5-base",
"model_path": "t5/model",
"training_set_path": "datasets/train.jsonl"
}{
"id": 0,
"name": "T5",
"model_path": "t5/model",
"test_set_path": "datasets/test.jsonl",
"guesses": 1000,
"temperature": 1.0,
"top_k": 50,
"top_p": 0.9
}Each line should be a JSON object with input-output pairs:
{"input": "[email protected]", "output": "password123"}Test data includes additional metadata:
{"input": "[email protected]", "output": "password123", "email": "[email protected]"}You can include rich user information (up to 128 characters by default):
{"input": "Username: [email protected]\nName: John Smith\nNationality: Unknown\nGender: Male\nYear: 1984", "output": "password123"}Train a model using the configuration file:
python tunne_models.py config.json <model_id>Where <model_id> corresponds to the model configuration in your JSON file:
0for T51for BART2for Llama
Use the trained models to generate password guesses:
python generate_guesses.py config.json <model_id>Modify training parameters in tunners/constants.py:
MAX_MODEL_INPUTS_LENGHT: Maximum input length (default: 128)LEARNING_RATE: Learning rate for training (default: 3e-5)NUM_TRAIN_EPOCH: Number of training epochs (default: 2)PER_DEVICE_TRAIN_BATCH_SIZE: Training batch size (default: 16)
Configure generation behavior in the test configuration:
guesses: Number of password guesses to generatetemperature: Controls randomness (higher = more random)top_k: Limits vocabulary to top-k tokenstop_p: Nucleus sampling threshold
The system provides built-in evaluation metrics including:
- Password match accuracy
- Levenshtein distance analysis
- Generation time statistics
- Success rate at different guess counts
This tool is designed for:
- Security research
- Penetration testing with proper authorization
- Educational purposes
Please use responsibly and only on systems you own or have explicit permission to test.
This project is for educational and research purposes. Please ensure compliance with your local laws and regulations when using this tool.