Abstract
The metamorphic malware variants with the same malicious behavior (family), can obfuscate themselves to look different from each other. This variation in structure leads to a huge signature database for traditional signature matching techniques to detect them. In order to effective and efficient detection of malware in large amounts of executables, we need to partition these files into groups which can identify their respective families. In addition, the grouping criteria should be chosen such a way that, it can also be applied to unknown files encounter on computers for classification. This paper discusses the study of malware and benign executables in groups to detect unknown malware with high accuracy. We studied sizes of malware generated by three popular second generation malware (metamorphic malware) creator kits viz. G2, PS-MPC and NGVCK, and observed that the size variation in any two generated malware from same kit is not much. Hence, we grouped the executables on the basis of malware sizes by using Optimal k-Means Clustering algorithm and used these obtained groups to select promising features for training (Random forest, J48, LMT, FT and NBT) classifiers to detect variants of malware or unknown malware. We find that detection of malware on the basis of their respected file sizes gives accuracy up to 99.11% from the classifiers.
Abstract (translated by Google)
具有相同恶意行为(系列)的变形恶意软件变体可以使自己模糊不清以使彼此看起来不同。结构的这种变化导致用于传统签名匹配技术的巨大签名数据库来检测它们。为了有效和高效地检测大量可执行文件中的恶意软件,我们需要将这些文件分成可以识别其各自系列的组。此外,应该选择分组标准,使其也可以应用于计算机上遇到的未知文件进行分类。本文讨论了对组中恶意软件和良性可执行文件的研究,以高精度地检测未知恶意软件。我们研究了三种流行的第二代恶意软件(变形恶意软件)创建工具包生成的恶意软件的大小。 G2,PS-MPC和NGVCK,并观察到来自同一套件的任何两个生成的恶意软件的大小变化并不多。因此,我们使用Optimal k-Means Clustering算法根据恶意软件大小对可执行文件进行分组,并使用这些获得的组来选择有前途的训练特征(Random forest,J48,LMT,FT和NBT)分类器来检测恶意软件的变体或未知恶意软件。我们发现,基于其尊重的文件大小检测恶意软件可使分类器的准确率高达99.11%。
URL
http://arxiv.org/abs/1606.06908