List

Diploma thesis:Malware Detection ( PDF )
Author:Pluskal Ondřej
Supervisor:Ing. Jan Šedivý CSc.
Keywords:
Abstract:The goal of this thesis was to create new weights for an existing behavioral malware detector. The antivirus company AVG provided us with their databases consisting of nearly ten million samples. We extracted the database into a dataset ready to employ supervised machine learning techniques. We analyzed the current classifier and came into the conclusion that it is a binary classifier using linear function. We introduced a novel metric designed to capture the specifics of the given task. The algorithm of choice is a soft-margin SVM. As solver we used LIBOCAS library. Because of the big size of the dataset we developed a new approach to store binary feature vectors as bit arrays. Enhancing the LIBOCAS library to be able to handle bit arrays we managed to not only maintain, but improve the computational time compared to double representation. Moreover thanks to the effective representation of the feature vectors we save the space in RAM up to 64 times. Using a sophisticated method of optimization of regularization constant by assigning different optimization to different feature vectors we managed to significantly improve the performance of the malware classifier. In the future we would like to incorporate sequential analysis to the classification process and create a new validation system using cloud computing.
Submited:May 2013
More info: