BeBits Information Developer Central Submit Application Your Account Web Links Contact Us
BeBits
Please support our sponsors!
AGMSBayesianSpam
Talkback
 Go back to the AGMSBayesianSpam page
 Post a new Talkback comment!
Ignoring Attachments for Spam Detection
 By Alexander G M Smith - Posted on October 8, 2002 - 21:17:21   (#3498)
 Current version when comment was posted: 1.49
nerfherder (Jeremy) asked about just looking at headers and message body, rather than the whole thing. It's one of the features I'm thinking of adding for the next version, if I can make use of the Mail Kit to conveniently break up the messages into parts. If that works, it will be easy. I'm also going to look at putting in the related only-download-headers feature.

Now that the Ottawa International Animation Festival is over (what a blast, besides a solid 5 days (9am to 11pm+) of cartoons, I got to see John Kricfalusi (Ren and Stimpy), Mo Willems (Sheep in the Big City) and David Fine (Bob and Margaret) in the writing-for-TV-series workshop, and met several other interesting animators). Anyway, that kind of delayed any progress last week. But now I have the next week and a half for anti-spam measures! Guess that's kind of a promise that I'll get something out for you guys by then.

- Alex

Spurious Messages Bug and Progress Bar Bug
 By Alexander G M Smith - Posted on October 8, 2002 - 08:47:48   (#3493)
 Current version when comment was posted: 1.49
SteveH wrote:
> I did notice that after dealing with mail I am sometimes left with a spurious message in the new mail windows.

It's a problem with the Mail Daemon Replacement and/or the BFS indices. It happens even without the spam detector. The MDR team knows about it and have several theories but haven't fixed it yet.

I get rid of the incorrect new message display by quitting the MDR (right click on the mailbox icon in the deskbar to get the menu) then restarting it (start up the E-mail preferences and hit the save button). I sometimes also have to do a query in Tracker to find all e-mail with status containing "new", then open all those messages in BeMail and quit BeMail (that marks them as "read").

Not to mention that the progress bar is incorrect too while downloading e-mail and spam checking it. It may even be downloading messages twice. Possibly because the spam detector reads the whole message and then something else reads it again.

- Alex

suggestion
 By nerfherder - Posted on October 6, 2002 - 09:29:41   (#3460)
 Current version when comment was posted: 1.49
Have you considered not looking at the attachments as part of the email, just the body and the header information?

--jeremy

Great app + some feedback
 By SteveH - Posted on October 3, 2002 - 13:07:05   (#3427)
 Current version when comment was posted: 1.49
Just installed last night - very impressed although I need to do a little training to get rid of false +ve's.

I did notice that after dealing with mail I am sometimes left with a spurious message in the new mail windows. This eventually clears itself when I reboot but I had never had a problem like this before.

About More Suggestions...
 By Alexander G M Smith - Posted on September 28, 2002 - 12:28:59   (#3353)
 Current version when comment was posted: 1.47
I think a white list of sender e-mails should be a separate BeMail add-on. I'd expect that it would be an enhancement of the current rules filter (which is also known as "Match Header" in the e-mail preferences settings), but with the option of allowing through mails "from:" trusted people. You can do this right now, but awkwardly, if you have one rules filter for each person. If it's not from a trusted person, then it would look at the spam probability ratio which had been added by the AGMSBayesianSpam filter, which would be run earlier in the filter chain. That way it keeps things nice and modular. Incidentally, have a look at http://www.bebits.com/app/2726 (Dean Richie's BeSpamFilter which has correspondant lists). Maybe it does what you want.

As for automatically moving messages, it would be the job of the rule filter to do it. Unfortunately it currently can only test string attributes, not numeric ones like the spam ratio. However, you can tell the AGMSBayesianSpam filter to add "[Spam xx%]" in front of the subject and then you can use the rule filter to look for "[Spam*]" in the subject, and move or delete or flag messages. Personally, I just have the Tracker show the spam ratio column, sort by it, and quickly weed out the spam, then go back to sorting by thread+date and reading messages.

I was thinking of doing drag and drop earlier. Unfortunately it hits the Tracker maximum 100 files passed to an external program, just like open with... and the file requester have. I'll put it in anyway since it should make it more convenient, and I'll need half a dozen different ways of marking spam just to see what's best.

- Alex

More suggestions...
 By jaf - Posted on September 28, 2002 - 01:01:48   (#3351)
 Current version when comment was posted: 1.47
Hi again,

After using AGMSBayesianSpam for a bit longer, I have a few more suggestions for improvements:

1) How about allowing the user to specify a "whitelist" of email addresses, so that any email that comes "From" one of these addresses is automatically considered Not Spam, no matter what it contains? (This would be useful, since certain emails, e.g. from mailing lists, do look similar to spam and thus sometimes get miscategorized)

2) There may be a way to do this now, I'm not sure: automatically move any email with spam-level greater than a certain percentage (e.g. 98%) into the trash, or automatically delete them, or etc. (I tried adding a "match header" filter to do this, but I don't think it worked. If there is some way of chaining the AGMS filter together so that other filters can use the results of its analysis by examining the attributes it adds to the email, that would be very powerful)

3) Have areas in the AGMS GUI that you can drag an email's icon onto in order to quickly add it to the database as an example of Spam or Genuine email. (opening a file requester is a pain as there are thousands of emails in my email folder...) multi-icon drags should be supported too, of course :^)

That's all for now, keep up the great work :^)

Filtering HTML Tags
 By Alexander G M Smith - Posted on September 27, 2002 - 12:21:00   (#3343)
 Current version when comment was posted: 1.47
Hmmm, definitely worth thinking about, at least for simple HTML tags (and if they use a greater than or less than in their text, too bad). But first, I should get it to parse mail messages into MIME labeled components and then only look for words in text/* ones. Then do the HTML processing for text/HTML ones. Full HTML decoding would mean essentially HTML rendering, perhaps via Lynx or some other HTML to text converter? Sounds like a job for a translator... Anyone know of some existing HTML to UTF-8 plain text converters?

sorry for being a bitch
 By rain - Posted on September 26, 2002 - 21:12:44   (#3332)
 Current version when comment was posted: 1.47
but wouldn't having an option to filter out html tags be a lot easier for the end user? and having it turned on by default would be clever, since I think most "regular users" wouldn't care to check this. Suddenly they might have their Outlook-using friend's e-mails classified as spam.

Correction - Spreadsheet Mode
 By Alexander G M Smith - Posted on September 26, 2002 - 12:25:23   (#3328)
 Current version when comment was posted: 1.47
When I say use Gobe Productive to edit the database file, I mean use the spreadsheet mode (see the readme for details). That lets you sort the word list in ways you want, and even chart it.

HTML in E-Mails as a Spam Indicator
 By Alexander G M Smith - Posted on September 26, 2002 - 12:23:39   (#3327)
 Current version when comment was posted: 1.47
rain wrote:
> Cause when I let it scan an e-mail it actually sees them as words, which can cause it to believe that any e-mail containing html is a spam-mail.

It's a good indicator of spam for some users, so that's a feature, not a bug :-). The problem is that the initial database I made uses example messages from various BeOS mailing lists, which don't have HTML attachments. You just have to feed it about 300 example genuine messages that use HTML to counteract that bias, then it will know what YOUR typical genuine message looks like. Or manually edit the database file with StyledEdit or Gobe Productive to remove the HTML tags that look spammy.

html and addons
 By rain - Posted on September 25, 2002 - 20:50:48   (#3313)
 Current version when comment was posted: 1.47
this is a good idea, thanks for trying to make it happend :)

he wasn't talking about "open-with" he was talking about making two tracker add-ons. You can find out how to make tracker add-ons in BeBook->Tracker->Tracker Add-on Protocol.
However, these will show up with any file, not just e-mails. But as soon as the tracker becomes file-type aware when it comes to add-ons this problem will go away.
Until then, can't we agree to name all email related add-ons "E-mail - Addon name".. for example "E-mail - Classify as Spam" or "E-mail - Blacklist sender"

Also, one complaint. It should filter out all html tags from the emails. Cause when I let it scan an e-mail it actually sees them as words, which can cause it to believe that any e-mail containing html is a spam-mail.

Open With... Suggestion
 By Alexander G M Smith - Posted on September 24, 2002 - 14:06:44   (#3303)
 Current version when comment was posted: 1.47
That's a good idea, jaf. Now how do I actually get two open-with items in Tracker? Time for some more research...

Actually, the user interface for marking spam/genuine is unexplored territory. There's also the extra Delete Spam button in BeMail idea. Or incestiously using the spam rating of a new message to also add it as an example (I suspect that things could go statistically wrong with that). And zillions of other things to try. And sound effects; I wonder if we could get the rights to the Monty Python Spam sketch.

Excellent!
 By jaf - Posted on September 24, 2002 - 09:37:40   (#3296)
 Current version when comment was posted: 1.47
This is going to be an extremely useful app, thanks! :^) My only suggestion is to add a way to select a set of one or more email files in Tracker, then right click to bring up the pop-up menu... in the Add-Ons submenu there would be two new menu items: "Classify as Genuine" and "Classify as Spam"; selecting one of these would immediately add the selected emails to the filter's database. This (with a hotkey-equivalent for each of the two entries of course ;^)) would make classifying new spams and emails as they come in very quick.

Anyway, great job, and I look forward to upcoming versions.

yea! spam
 By nerfherder - Posted on September 23, 2002 - 21:13:39   (#3287)
 Current version when comment was posted: 1.47
This is great, I will be happy to receive some spam now : )

--jeremy

Oops - Bug in Step 5B
 By Alexander G M Smith - Posted on September 23, 2002 - 20:34:45   (#3286)
 Current version when comment was posted: 1.47
Forgot to mention that you should hit the Purge button, then you'll see the list of words. It doesn't show anything until it needs to do something (to avoid loading the database when there isn't anything to do).

 
The Green Board
  Recent Downloads  -  # 555
Total Downloads  -  # 2,052
Total Views  -  # 772
User Ratings  -  N/A
  E-Mail
1.  QEMU - 9.62
2.  ScummVM - 9.50
3.  cpu_fix - 9.42
4.  Jukebox - 9.36
5.  Haiku AGP busm... - 9.35
6.  vim6 - 9.31
7.  Beezer - 9.25
8.  BeeF - 9.25
9.  HandBrake - 9.24
10.  DOSBox - 9.22
1.  Ati Radeon Grap... - 251
2.  Realtek RTL8139... - 201
3.  BeOS 5 Personal... - 192
4.  ATI Rage 128 Pr... - 169
5.  Ensoniq AudioPC... - 103
6.  USB Serial driver - 103
7.  Broadcom 440x 10... - 98
8.  DjVu Viewer - 98
9.  VLC Media Player - 71
10.  S3 Trio 64 v2 DX... - 64
You are not logged in.
 Login or create an account...
Hosted by NetConnect

 
Unless otherwise noted, everything is copyright © 1999-2002 Fifth Ace Productions, LLC. All Rights Reserved.
For more legal trivia, take a gander at our
Legal Stuff page and our Privacy Statement.
Fifth Ace Productions