Abstract:A feature item extraction algorithm was proposed that based on VSM and improved TFIDF, according to the demand characteristics for the recognition and calculation of spam text by applying VSM’s text clustering algorithm and summarizing features of existing TFIDF algorithm. The algorithm not only zoomed in weighted value for feature item of spam text clustering but also effectively reduced the impact on the result affected by the difference of sample number of second-class data and improve identification efficiency and accuracy in filtering spam text. It provided a new improved algorithm selection for identification of spam text.