Jump to content

Recommended Posts

Hi all,

I wonder if any of you could help me with this. I'm storing the source emails in a database after I have fetching the emails from imap. I'm running the cron job in every 24 hours to monitor for spam emails in the spam folder so the Bayes database can get update. 

Here is what I use to run on cron job:

/usr/local/cpanel/3rdparty/bin/sa-learn -p ~/.spamassassin/user_prefs --spam ~/mail/mydomain.com/myaccount/.spam/{cur,new


As I'm using to store the emails in a database, can I run the PHP on cron job to get the Bayes database update?

Example:

/usr/local/cpanel/3rdparty/bin/sa-learn -p ~/.spamassassin/user_prefs --spam ~/public_html/username/myfolder/update_bayes.php

Would it ever work to get the Bayes database update so the spamassassin could learn from it?? 

Link to comment
https://forums.phpfreaks.com/topic/315925-run-php-on-cron-job-for-bayes-update/
Share on other sites

Oh right, if I use this:

Update_bayes.php

<?php

ini_set('display_errors', '1');
ini_set('display_startup_errors', '1');
error_reporting(E_ALL);

require_once('../Spamassassin/Client.php');
require_once('../Spamassassin/Client/Exception.php');
require_once('../Spamassassin/Client/Result.php');

$spam_mailbox = $link->prepare("SELECT count(*) FROM `Spam` WHERE readtype = ?");
$spam_mailbox->execute([$unread]);
$row = spam_mailbox->fetch(PDO::FETCH_ASSOC);
$header = $row['header'];

$fp = fopen('/home/user/myfolder/spam.txt', 'w');
fwrite($fp, $header);
fclose($fp);
$message = @file_get_contents('spam.txt');
$params = array(
    "hostname" => "localhost",
    "port" => "783",
    "user" => "root");

$sa = new Spamassassin\Client($params);


if ($isSpam == 'Spam') {
    $sa->report($email_content);
}
else if ($isSpam == 'Not Spam') {
    $sa->revoke($email_content);
}
?>

 

To run on cron job, example:

/usr/local/cpanel/3rdparty/bin/sa-learn -p ~/.spamassassin/user_prefs --spam ~/public_html/username/myfolder/update_bayes.php


Would it work to get the Bayes database update if I run PHP script on cron job so the sa-learn could scan my emails and learn from it??

But what if the server goes offline would the sa-learn lose the information before the server get back online??

Not sure what server you are talking about but if you add some error checking to your script you can determine if the update was successful or not. If unsuccessful you can schedule a one time rerun after some specified amount of time or just wait for the next regularly scheduled cron to run again.

I think maybe you're doing something that is unnecessary.  Assuming you're using this class, then it seems like using the report function will automatically train the message so there's no reason to run sa-learn on it again.

If you did want to use sa-learn, then you cannot just pass the path to your PHP script to it, sa-learn does not understand PHP.  You would need to run your script first to generate a file sa-learn does understand, then run sa-learn with the path to that file.

3 hours ago, kicken said:

I think maybe you're doing something that is unnecessary.  Assuming you're using this class, then it seems like using the report function will automatically train the message so there's no reason to run sa-learn on it again.

If you did want to use sa-learn, then you cannot just pass the path to your PHP script to it, sa-learn does not understand PHP.  You would need to run your script first to generate a file sa-learn does understand, then run sa-learn with the path to that file.

 

What do you mean what I'm doing that is unnecessary? 

Yes, I am using the class, but I can also run shell_exec which it works the same way as the class I use which it can be done by using this:

$output = shell_exec('/usr/local/cpanel/3rdparty/bin/sa-learn -p /home/username/.spamassassin/user_prefs --spam /home/username/public_html/test/{new}');
echo "<pre>$output</pre>";


I have been using it as it works great, but there is a problem. I need to create a new file in order to run the command.

If I report 2 emails as spam and then report one of these emails as not spam, how would sa-learn suppose to know which emails I report is not spam if I use the same filename that I created in the /home/username/public_html/test/new folder??

Edited by mark107
1 hour ago, mark107 said:

What do you mean what I'm doing that is unnecessary?

Because if I am understanding that class correctly, it already runs the message through the appropriate training when you call either the report, revoke, or learn methods.  Since the message gets run through the training at that point, there's no need to do it again using sa-learn later.

sa-learn is for a more passive setup where you just put messages into folders and have sa-learn periodically train on those folders.  It sounds like you're doing active training on individual messages instead when means there is no need for the passive training. 

Of course, we don't know all the details of what you're trying to do/build but based on what's been provided so far it sounds like you're trying to make it way more complicated than it needs to be.  For example, it kind of sounds like you're:

  1. Scanning a mailbox for messages
  2. Storing those messages in a DB and deleting them from the mailbox
  3. Pull those message back out of the DB
  4. Use sa-learn on them to train the filter.

If you're PHP code that scans the mailbox is calling that report method, then you don't need steps 3 or 4 at all.  If you're not using PHP to report the message and want to use sa-learn then it'd be simpler to just do that before you delete the messages from the mailbox, ie:

  • Use sa-learn to train the filter
  • Scan the mailbox for messages
  • Store messages in DB and delete them from the mailbox.

 

Yes this is what I am doing right now to move the emails to spam folder when I mark them as spam. I know that it can be easily be done by insert the data into Spam DB and delete the emails from any DB table. I need to inform sa-learn to let them know that I have mark the emails as spam and I know that I don't need to use sa-learn later when I report the emails as spam or revoke as ham. 

When I report the emails as spam and not spam, I have got two choice to do this.

1. Create a file to output the header in the file and run the sa-learn to scan these spam messages.

2. Call imap to move the emails to junk folder and run the sa-learn to scan these spam messages.


As I can see the files have already been created in "/home/username/mail/mydomain.com/myaccount/.spam/" so there is no need to create the same file with the same output. It will save my disk space from being waste.

It sounds like to me I would need to call imap using with uid to move the emails to junk folder and run the sa-learn later on to scan the emails automatically to train the filter, is that correct?

Edited by mark107

If you want to use sa-learn to scan your spam / junk folders in your mail directory, then your PHP code should not be doing anything related to spam control outside of possibly moving the message to the spam folder.  There's no need to deal with that spam assassin library and report/revoke the messages, sa-learn will process them whenever it is next run.

Otherwise, continue to use your spam assassin library's report/revoke functions as you process messages and forget about sa-learn.

 

Yes I do but I will use sa-learn to scan my spam / junk folders in my mail directory later on. I will let the cron job to do the work automatically.

It sounds like to me I would need to call imap to move the emails to junk folder and let the cron job to run sa-learn later to scan these spam messages in the mail directory, is that correct??

Edited by mark107
This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.