Gordon Cormack School of Computer Science University of Waterloo Waterloo,Ontario,Canada
Thomas Lynam School of Computer Science University of Waterloo Waterloo,Ontario,Canada
TREC’s Spam Filtering Track(Cormack&Lynam, 2005)introduces a standard testing framework that is designed to model a spamfilter’s usage as closely as possible,to measure quantities that reflect thefil-ter’s effectiveness for its intended purpose,and to yield rolled and statistically valid)re-sults.The TREC Spam Filter Evaluation Toolkit is free software that,given a corpus and afilter,auto-matically runs thefilter on each message in the cor-pus,compares the result to the gold standard for the corpus,and reports effectiveness measures with95% confidence limits.The corpus consists of a chronolog-ical sequence of email messages,and a gold standard judgement for each message.We are concerned here with the creation of appropriate corpora for use with the toolkit.
editor evaluating revisionIt is a simple matter to capture all the email delivered to a recipient or a set of recipients.Using this captured email in a public corpus,as for the other TREC tasks, is not so simple.Few individuals are willing to publish their email,because doing so would compromise their privacy and the privacy of their correspondents.So we are left with the choice between using an artificial public collection of messages and using a more realistic collection that must be kept private.
Artificial ,2003;An-droutsopoulos et al.,2000;Michelakis et al.,2004)may be created by using mailing list messages as opposed to personal email,by selecting non-sensitive messages from a real email collection,by mixing messages from diverse sources,or by obfuscating genuine messages1. All of these approaches conflict with our design crite-ria–that realfilter usage be modelled as closely as possible–and may compromise the very information thatfilters use to discriminate ham from spam,either by removing pertinent details or by introducing extra-neous information that may aid or hinder thefilter.
Experience
We have employed this technique on a collection of 49086private email messages.G0was captured from the recipient’s feedback to a spamfilter.Figure1il-lustrates thefive revision steps forming G1through G5, thefinal gold standard.S→H is the number of mes-sage classifications revised from spam to ham;H→S is the opposite.Note that G0had421spam messages incorrectly classified as ham.Left uncorrrected,these errors would cause the evaluation kit to overreport the false positive rate of thefilters by this a mount–a fac-tor of seventy for the bestfilters and a factor of2.4for the worst.In other words,the results captured from user feedback alone–G0–are not accurate enough to form a useful gold standard.
G0→G1
483
G2→G3
1015
G4→G5
G0→G5
G5
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论