While visiting relatives for Christmas, I heard a pretty damning account from one of my cousins (who works for a company that develops spam filtering software) about the uselessness of recent Ph.D.s in this area. If I understand the issue correctly, there is a pretty big mismatch between typical machine learning / information retrieval models of the spam filtering problem (a relatively static corpus of spam and ham messages, from which one must learn to filter the spam with the best possible combination of precision and recall) and the actual behavior of spammers (who are actively engaged in seeking out holes in spam filtering software, blasting as much spam as possible through any hole they find until it is patched or the system learns to filter it, and then moving on to the next hole).
In connection with this I found a paper by Tom Fawcett that made very similar points, nearly a decade ago. But it's easy to find recent and highly-cited works that don't take Fawcett's lessons to heart.
Some changes have been made to LiveJournal, and we hope you enjoy them! As we continue to improve the site on a daily basis to make your experience here better and faster, we would greatly appreciate your feedback about these changes. Please let us know what we can do for you!