Games downloaded from third party repositories are notorious for hosting trojanized applications. Once downloaded and active on the mobile device, trojans pose a huge threat to corporate and government networks. Because malware has the ability to obfuscate its network traffic and subvert host-based defenses (e.g., anti-virus, mobile device managers), traditional network and host intrusion detection systems become less effective in providing situational awareness into wireless networks. As an alternative, an application aware network-based malware monitoring method has been developed and has shown promise. Your task is to evaluate the efficacy of using this approach to accurately provide situational awareness into wireless networks by modifying a framework developed by a project team from a past year, to test 100s of legitimate and trojanized applications on a physical device.
The framework uses the Android Application Exerciser Monkey along with a smartphone, and a network monitor (i.e., PC) to create ICMP ping profiles for one hundred pieces of malware and one hundred pieces of legitimate software. You will need to modify this framework to extract layer 1, layer 2 (i.e., 802.11), and layer 3 parameters, and create datasets with meaningful features, that can be used by machine learning to demonstrate the ability to discern malware operation from legitimate operation.
The overall goal is to detect Android malware automatically based on machine learning. The project can be divided into three parts.
We plan to collect 100 Android malwares, and 100 legal apps as target dataset. To guarantee the robustness, we will collect different types of apps, range included game, social media and etc.
In this case, to ensure test environment is pure, we would install and test malware one by one. More specifically, we would install apps by adb and extract package list, then run and test apps one by one with Monkeyrunner’s random operation. After installing one malware into Android device, we will use another laptop to ping this Andriod device and monitor all traffic between certain Android device and laptop. More specifically, we will ping the device and wait for the reply, the delta delay time represents the current traffic condition. In order to reduce random error, for each malware, we will repeat individual test 100 times via 5 minutes period for each sample. After that we will use wireshark to analyze the package and extract useful information.
First we will try random forest tree algorithm, then we will try different models to find one have best performance. Since we have a relative larger dataset, there is no need to use 10-fold cross validation. We plan to use 70% data as training dataset, 10% as validation dataset and 20% as test dataset.