Spam Filters

July 9, 2008

I always wondered how spam filters were so good at finding spam and avoiding good emails. Paul Graham apparently figured the whole thing out a while ago. If you’re interested, check out his somewhat technical explanation of spam filters.

If not, here’s the gist:

Take 4000 spam emails and 4000 good emails. Figure out how often words appear in both categories. For instance, “hiking” would probably appear only in the good emails, while terms like “sexy” would appear almost entirely in the spam emails. Then based on that information, calculate the chances that a new email is spam based on which words are contained in it. Paul Graham noted that an email with the words “sex” and “sexy” is 99.97% likely to be spam.


One Response to “Spam Filters”

  1. Karl Fisher Says:

    so will a sexy spam filter catch this sexy comment?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: