Spam Filters

July 9, 2008

I always wondered how spam filters were so good at finding spam and avoiding good emails. Paul Graham apparently figured the whole thing out a while ago. If you’re interested, check out his somewhat technical explanation of spam filters.

If not, here’s the gist:

Take 4000 spam emails and 4000 good emails. Figure out how often words appear in both categories. For instance, “hiking” would probably appear only in the good emails, while terms like “sexy” would appear almost entirely in the spam emails. Then based on that information, calculate the chances that a new email is spam based on which words are contained in it. Paul Graham noted that an email with the words “sex” and “sexy” is 99.97% likely to be spam.

One Response to “Spam Filters”

  1. Karl Fisher Says:

    so will a sexy spam filter catch this sexy comment?


Leave a comment