|
1 | | -# mlbridge-machine_learning |
| 1 | +# DNS Alert Model |
| 2 | + |
| 3 | +This directory contains the the code for the training and evaluation of a binary |
| 4 | +classifier for alerting whether a person is querying a malicious domain. |
| 5 | + |
| 6 | +The `notebooks` directory contains the Jupyter Notebook where the training |
| 7 | +procedure can be observed. The `saved_models` directory contains the model that |
| 8 | +has achieved the maximum validation accuracy while training. |
| 9 | + |
| 10 | +## Training |
| 11 | + |
| 12 | +The deep-learning model is trained on a COVID-19 Cyber Threat Coalition |
| 13 | +Blacklist for malicious domains that can be found |
| 14 | +[here](https://blacklist.cyberthreatcoalition.org/vetted/domain.txt) and on a |
| 15 | +list of benign domains from DomCop that can be found |
| 16 | +[here](https://www.domcop.com/top-10-million-domains). |
| 17 | + |
| 18 | +Currently, the pre-trained model has been trained on the top 500 domain names |
| 19 | +from both these datasets. The final version of the pre-trained model will be |
| 20 | +trained on the entirety of both the datasets. |
| 21 | + |
| 22 | +The dataset was created by combining the malicious domains as well as the benign |
| 23 | +domains. The dataset was split as follows: |
| 24 | +- Train Set: 80% of the dataset. |
| 25 | +- Validation Set: 10 % of the dataset |
| 26 | +- Test Set: 10% of the dataset |
| 27 | + |
| 28 | +## Accuracy |
| 29 | + |
| 30 | +The accuracy for the Train Set, Validation Set and Test Set is as follows: |
| 31 | + |
| 32 | +| Metric | Train Set | Validation Set | Test Set | |
| 33 | +|----------|-------------|----------------|----------| |
| 34 | +| Accuracy | 99.25 % | 98.00 % | 98.00 % | |
| 35 | + |
| 36 | +The training graphs, confusion matrices and other metrics can be found in the |
| 37 | +`training_dns_alert_model.ipynb` notebook in the `notebooks` directory. |
0 commit comments