« Playing The Race Card |
Main
| Wrong Building »
OK, What Now?
As I noted on the post update, I thought that I'd come up with a clever way to eliminate a lot of these nonsense-domain spams, by blocking the letter "q" followed by anything other than a "u," "a," or a period, comma or space (so we could still write "Iraq"). Here's now I implemented it: q[ua\ \.\,]
Unfortunately, in testing it, I get a large number of false positives (about 25%) When I see the comments flagged by the new filter, I don't even see a "q" in the comment. What's the problem?
Posted by Rand Simberg at April 01, 2006 05:27 PM
TrackBack URL for this entry:
http://www.transterrestrial.com/mt-diagnostics.cgi/5251
Listed below are links to weblogs that reference
this post from
Transterrestrial Musings.
Comments
I can't help with the programming, but if you get it to work you might want to ad "i". Otherwise you'll be blocking "Iraqi."
Posted by Stephen Macklin at April 1, 2006 06:48 PM
1) What tool exactly is doing the regex comparison? (There's some oddities in some)
2) Can we see one of the false positives?
Nothing stands out as bizarre there.
Posted by Al at April 1, 2006 07:02 PM
You don't need backslashes inside square brackets. Also, you need a negative character class, assuming the regex should match-to-block.
In addition to the previous commenter's point, you'll also block '.../foo/iraq/something' unless you add '/' to the bracketed list.
The correct syntax to match a url to be blocked is
q[^ua .,]
Tested by greping /usr/shar/dict/words on a SUSE Linux box.
Posted by Glenn at April 1, 2006 07:07 PM
Post a comment