0xDE (11011110) wrote,

Christmas ham

While visiting relatives for Christmas, I heard a pretty damning account from one of my cousins (who works for a company that develops spam filtering software) about the uselessness of recent Ph.D.s in this area. If I understand the issue correctly, there is a pretty big mismatch between typical machine learning / information retrieval models of the spam filtering problem (a relatively static corpus of spam and ham messages, from which one must learn to filter the spam with the best possible combination of precision and recall) and the actual behavior of spammers (who are actively engaged in seeking out holes in spam filtering software, blasting as much spam as possible through any hole they find until it is patched or the system learns to filter it, and then moving on to the next hole).

In connection with this I found a paper by Tom Fawcett that made very similar points, nearly a decade ago. But it's easy to find recent and highly-cited works that don't take Fawcett's lessons to heart.
Tags: academia
  • Post a new comment


    default userpic

    Your reply will be screened

    Your IP address will be recorded