国内图书分类号:TP393.0 学校代码:10213 国际图书分类号:621.3 密级:公开
工学硕士学位论文
基于统计的网络流量异常检测技术研究
硕士研究生:曹国祥
导师:丁宇新副教授
申请学位:工学硕士
学科、专业:计算机科学与技术
网站流量统计分析工具所在单位:深圳研究生院
答辩日期:2011年12月
授予学位单位:哈尔滨工业大学
Classified Index: TP393.0
U.D.C: 621.3
Thesis for the Master Degree in Engineering
NETWORK TRAFFIC ANOMALY DETECTION TECHNIQUES BASED ON
STA TISTICS
Candidate:Guoxiang Cao
Supervisor:Associate Prof. Yuxin Ding Academic Degree Applied for:Master of Engineering Specialty: Computer Science &Technology Affiliation: Shenzhen Graduate School Date of Defence:December , 2011
Degree-Conferring-Institution:Harbin Institute of Technology
摘要
网络流量异常指的是流量偏离正常模型。引起流量异常的原因有很多,比如恶意攻击、网络设备故障、正常的突发用户行为等。网络异常检测的目的就是及时检测出异常的发生,便于网络管理员采取
相应措施以保证网络的正常运行和服务质量。目前网络异常的检测方法主要有:基于统计的方法、基于数据流的方法和基于机器学习的方法。由于基于统计的方法有诸多优势,所以本文重点研究了这种方法。具体来说是三种统计方法:基于流独立性和稳态性的短时非相关流平衡性(ASTUTE:A Short-Timescale Uncorrelated-Traffic Equilibrium)方法、基于卡尔曼(Kalman)滤波的方法和基于时间序列预测的指数加权滑动平均(Exponentially Weighted Moving Average: EWMA)方法。
从另一个角度看,网络异常检测方法又可以分为两类:基于流量大小(volume)改变的检测方法和基于流量特征分布(traffic feature distribution)改变的检测方法。其中基于流量特征分布改变的检测方法更具优势,一般会用信息熵来度量其改变。本文用EWMA算法分别实现了这两种方法。
经调研发现,网络异常流量检测研究领域存在两个问题:一是网络流量和异常事件的种类是多种多样的,没有一个通用的检测器适用于所有的场景,所以需要明确哪一个检测器适用于哪一种场景;二是检测阈值的设置目前是凭借经验或者理论,但基本为固定阈值,自适应调节阈值仍然是一个难题。
本文研究并实现了ASTUTE、Kalman、EWMA(基于流量大小和基于流量特征)算法,并用两种数据源:作了标注的MAWI Lab数据集和在本地主机上通过wireshark采集的数据集,做了大量实验。本文仔细分析了实验结果,并对算法检测到的异常事件做了人工根因分析。实验结果表明:上述三种方法可以检测出的异常种类并不相同,ASTUTE算法擅长检测产生许多小IP流的异常,而Kalman和EWMA
算法擅长检测产生少量的大IP流的异常。基于实验结果分析,本文利用偏离分数的思想对ASTUTE算法的评估值进行了优化;去除了EWMA算法中滑动窗口中的噪音数据;用EWMA公式
-I-
对ASTUTE算法的评估值进行了平滑处理。实验结果表明,我们的算法是有效的。
关键词:网络异常流量检测;短时非相关流平衡模型;卡尔曼滤波;指数加权滑动平均;偏离分数
-II-
Abstract
Network traffic anomaly means that traffic deviates from a normal model. There are many kinds of anomaly, for example: vicious attack, network equipment outages, abrupt normal user behaviors and so on. The objective of anomaly detection is to detect the anomaly as soon as it happens, then network administrators can take actions about anomaly to ensure the workin g order of network and quality of service. Now there are mainly three kinds of anomaly detection methods: methods based on statistics, methods based on data streaming and methods based on machine learning. Because of the advantages of methods base on statistics, in the thesis three types of statistic methods are de
eply analyzed. They are: ASTUTE (A Short-Timescale Uncorrelated-Traffic Equilibrium) method based on flow independence and stationary, Kalman filter method and EWMA (Exponentially Weighted Moving Average) method based on time series analysis.
From another perspective, network anomaly detection method s can be divided into two types: volume-based methods and traffic feature-based methods. Feature-based methods have more advantages than volume based. It uses entropy to evaluate the change of traffic feature distribution. The article implements the two kinds of method using EWMA algorithm separately.
There are two kinds of problems in network anomaly detection field. The first is that because of the variety of traffic feature and anomaly types, there is no general detect method which can be applied in all situations. So, we need to know which kind of detection method can be applied in different situations. The second problem is that the threshold used in detection is set by experience or a fixed value derived from theory. In this thesis we study the method how to adaptively adjust the threshold.
Our main contribution is as follows. The article implements three kinds of methods: ASTUTE, Kalman and EWMA (volume-based and entropy-based). We do a large number of experiments using two kinds of datasets:labeled dataset from MAWI Lab and dataset captured from a local host by wiresh
ark. We analyze the experiment results in detail and do root cause analysis about the anomalies. Experiments show that the three methods can detect different kinds
-III-
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论