文章摘要
基于空间向量模型的垃圾文本过滤方法
Garbage text classification filtering method Based on VSM
  
DOI:
中文关键词: 空间向量模型  垃圾文本  分类  过滤
英文关键词: VSM  the garbage text  classification  filtering
基金项目:国家自然科学基金资助项目(61305088)
作者单位
吴玮 苏州工业职业技术学院 软件与服务外包学院江苏 苏州 215104 
摘要点击次数: 1280
全文下载次数: 1649
中文摘要:
      针对垃圾文本识别计算的需求特性,应用VSM文本聚类算法思想,综合现有TFIDF算法特点,提出一种基于VSM和改进的TFIDF特征项提取算法.本方法在对垃圾文本高聚类特征项权值进行放大的同时,有效减小由二类数据样本数量偏差对计算结果带来的影响,提高了垃圾文本过滤识别效率和准确率.为垃圾文本识别提供了一种新的改进算法选择.
英文摘要:
      A feature item extraction algorithm was proposed that based on VSM and improved TFIDF, according to the demand characteristics for the recognition and calculation of spam text by applying VSM’s text clustering algorithm and summarizing features of existing TFIDF algorithm. The algorithm not only zoomed in weighted value for feature item of spam text clustering but also effectively reduced the impact on the result affected by the difference of sample number of second-class data and improve identification efficiency and accuracy in filtering spam text. It provided a new improved algorithm selection for identification of spam text.
查看全文   查看/发表评论  下载PDF阅读器
关闭