v18 MTA/SMTP general questions

Hi everyone,

I'm a home user running v18.04 MR-4.  I have a few internal servers and services that use XG as a relay MTA outbound to AWS SES and this has been working fine for a while. Unfortunately, a battery on the RAID card in an old Dell server started going bad causing iDRAC to spit out alerts every minute. And, apparently, for some reason I haven't figured out yet, these e-mails all failed with a Bounce error. And... I didn't notice this for a couple of days.

So, first question... is there any way to bulk delete the few thousand failed messages in the mail spool?  A search of the forum indicates that was not easy to do in v17, but I've not seen anything about v18.

Second question... I've got two messages stuck in the spool... one in a failed state, one in a queued state. Deleting from the GUI doesn't do anything. I was hoping restarting the MTA service would un-stick them. But how the blasted blazes do you restart the SMTP and MTA services in v18?  I think the v17 release was something to do with an awarren mta service... but I do not believe that is used in v18. I was hoping to do this without rebooting the device.

Third question, if I tail the smtpd_main.log it is adding about a thousand lines a minutes. Is this normal or related to the massive amount of failed items in the spool? Are there any other good logs to identify why all those messages ended up with a Bounce error?

Thanks,

Gary



Edited TAG
[edited by: emmosophos at 1:27 AM (GMT -8) on 25 Feb 2021]
Parents
  • Hello Gary,

    Thank you for contacting the Sophos Community.

    1. Yes you can run a script located under /scripts/mail to mass delete the emails in the spool.

    2. To restart the services for MTA in v18 you need to run # service smtpd:restart -ds nosync

    3. It would depend on your mail flow, but usually, it is not normal, but also it would depend on what type of lines you’re seeing, lines such as queue-runner process running are normal to see, but yes having issues with the spool will make the smtpd_main.log quite chatty.

    Regards,


     
    Emmanuel (EmmoSophos)
    Community Support Engineer | Sophos Technical Support
    Sophos Support VideosProduct Documentation  |  @SophosSupport  | Sign up for SMS Alerts
    If a post solves your question use the 'Verify Answer' link.
Reply
  • Hello Gary,

    Thank you for contacting the Sophos Community.

    1. Yes you can run a script located under /scripts/mail to mass delete the emails in the spool.

    2. To restart the services for MTA in v18 you need to run # service smtpd:restart -ds nosync

    3. It would depend on your mail flow, but usually, it is not normal, but also it would depend on what type of lines you’re seeing, lines such as queue-runner process running are normal to see, but yes having issues with the spool will make the smtpd_main.log quite chatty.

    Regards,


     
    Emmanuel (EmmoSophos)
    Community Support Engineer | Sophos Technical Support
    Sophos Support VideosProduct Documentation  |  @SophosSupport  | Sign up for SMS Alerts
    If a post solves your question use the 'Verify Answer' link.
Children
  • Thanks Emmanuel, appreciate the information!  One follow-up question for you though... did I just completely miss the documentation where those service commands and mail scripts are explained?

  • And actually, that may not have done it.  I ran the script "delete_invalid_from_spool" and now the GUI shows nothing in the spool... which is great.  I then restarted the smtpd service using the command you gave. I then did a tail on the smtpd_main.log and I'm still getting a about a thousand new lines every couple of seconds.

    Basically a ton of repeats of this type of message...

    13740 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    13740 Considering: [email]
    13740 unique = [email]
    13740 LOG: retry_defer MAIN
    13740   == [email] routing defer (-51): retry time not reached
    2021-02-25 08:53:13.035 [13740] PX6jiL-xHy4L6-01 ==[email] routing defer (-51): retry time not reached
    13740 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    13740 After routing:
    13740   Local deliveries:
    13740   Remote deliveries:
    13740   Failed addresses:
    13740   Deferred addresses:
    13740     [email]
    13741 locking /sdisk/spool/output//db/retry.lockfile
    13741 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

    I went and checked the /sdisk/spool/output/db location just to see what was there.

    • retry - 40K
    • wait-notification_smtp - 12K
    • wait-smarthost_smtp - 5.6M

    So the queue is empty, but those files seem to show quit a bit the SMTP service is still trying to send. I tried to peek into those files and found some references to a smart host I haven't used in almost a year which would explain why it can't send... the host doesn't exist any more.

    How do I empty those out?  Can I just shut down the SMTPD service and then delete those files?

    Thanks!

    Gary

  • Ah, or do I need to just go delete the 8,232 files in the /var/spool/output/input directory?  Or probably some combination of deleting those files and clearing out the db contents most likely? 

  • Hello Gary,

    Thank you for the follow-up.

    Can you try the following:

    rm -f retry retry.lockfile

    rm -f wait-remote_smtp wait-remote_smtp.lockfile

    rm -f wait-static_smtp wait-static_smtp.lockfile

    Then restart smtpd

    service smtpd:restart -ds nosync

    Delete the spool retry emails:

    rm -rf /sdisk/spool/output/db/retry*

    Regards,


     
    Emmanuel (EmmoSophos)
    Community Support Engineer | Sophos Technical Support
    Sophos Support VideosProduct Documentation  |  @SophosSupport  | Sign up for SMS Alerts
    If a post solves your question use the 'Verify Answer' link.
  • Before I try this, I have to ask... what is going to happen with those 8 thousand plus message in the output/input directory? Is it going to pick those up and try to send them all? Because that could get ugly. It's all junk, so should I delete everything there before restarting smtpd?

    Thanks! 

  • I was just about to give this a whirl and take the chance on testing my mail client's ability to handle 8k messages at once when I realized the underlying MTA is Exim. Since I know nothing about Exim, Google was rather helpful in what I ended up doing.

    # service smtpd:stop -ds nosync
    200 OK
    # exim -bp | awk '{print $3}' | while IFS= read -r line; do
    > exim -Mrm $line
    > done
    ... // lots of verbose removing of queue messages
    # rm /sdisk/spool/output/db/*
    # service smtpd:start -ds nosync
    200 OK

    What was surprising to me was that exim had a thousand or so actual messages in queue, but XG thought the queue was empty.  When I cleared the exim queue manually, all of the content of /sdisk/spool/output/input was deleted. Since I honestly have no clue how exim works, I'm not sure if that directory is the queue or just contains the physical queue contents while the queue list itself resides somewhere else.  I also noticed that emptying the queue did not clear out any of the /sdisk/spool/output/db/ files which is why I had to explicitly delete them.

    Anyway, after doing this the smtpd_main.log is quiet except for the expected "process running" messages and mail sent through XG processes insanely fast, kind of like when the device was brand new.