On implementation:
You have identified the control methods. You can turn it on or off globally or by target domain. You can also turn it off with an exception. In this respect, UTM will do whatever you want it to do.
On the complications:
The world's legitimate mail is coming from fewer and fewer sources: Gmail, Outlook.com, Proofpoint, Mimecast, Cisco, etc. These organizations are so large that you cannot keep track of the IP addresses that they are using, and it is possible that the IP address will change on every delivery attempt. Users in this forum complained that they were seeing 24-hour delays from Outlook.com with graylisting enabled. Other posts have indicated that Sophos maintains an internal exception list for these big organizations, and that they updated their list to correct the Outlook.com problem. The best solution would be to have exceptions defined using SPF syntax, so you can say "bypass graylisting for anything coming from Outlook.com servers". (Not because they are immune from bad behavior by their clients, but they are not going to be scared away by graylisting so it is useless as a defense for messages coming from them.)
On my UTM configuration, I have never tried to use it. On another environment with weak spam filtering, I have turned off graylisting and seen no significant change in spam levels. So I am a skeptic.
On data analysis:
The SMTP logs are rather difficult to parse into coherent data. I have just completed a redesign of my log parsing tools while chasing the antispam check failure errors that are mentioned in a new post that I started this morning. It should be possible to collect data about messages that are not retried when graylisting is off, but it will require significant effort. Someone can send me a PM if you want my code (which uses a SQL database).
My tests show that greylisting results in a 5.7% reduction in email volume because of spammers not retrying.
For all of the emails greylisted (about 15% of all emails seen), the ctasd spam test is performed twice. Using the 'SMTP' tab in Mail Manager, I see that not one of the resent emails was quarantined or rejected - all were delivered to the addressee.
My conclusion is that greylisting "costs" more in terms of CPU cycles, but that it reduces the number of spams in the quarantine.
For me, the reduction is not significant, but I will leave the test running for another few weeks to see if the greylisting rate drops from 15% to something less than 5%. I'll also use the technique described in List all domains we've sent email - Whitelist to create a greylisting Exception.
Cheers - Bob
Interesting data analysis. I think we can infer that if a message fails the spam check, the sender is given a PermFail (do not retry) response instead of a graylisting (try later) response.
Once a message is sucessfully retransmitted, what is exempted from graylisting during the next 30 days?
Fro the first two possibilities, I would expect the cache miss rate to be much higher than your experience of 15%.
The local-part (username) of the Envelope From often contains BATV, SRS, or VERP encoding. I would expect this to also increase the graylisting rate due to cache misses.
Interesting data analysis. I think we can infer that if a message fails the spam check, the sender is given a PermFail (do not retry) response instead of a graylisting (try later) response.
Once a message is sucessfully retransmitted, what is exempted from graylisting during the next 30 days?
Fro the first two possibilities, I would expect the cache miss rate to be much higher than your experience of 15%.
The local-part (username) of the Envelope From often contains BATV, SRS, or VERP encoding. I would expect this to also increase the graylisting rate due to cache misses.
"I think we can infer that if a message fails the spam check, the sender is given a PermFail (do not retry) response instead of a graylisting (try later) response."
That seems to be the case, but it probably depends on the selections for 'Reject at SMTP time' and 'Confirmed spam action'. It's curious that the anti-spam & anti-malware scans are repeated when greylisted emails are resent and successfully considered even when bot anti-spam actions are set to 'Blackhole' and 'Reject at SMTP time' is set to "Spam." I see that in a client's SMTP log where they've used greylisting for years.
Statistics from the this client that sends and receives many hundreds of emails per day: 1.3% of SMTP connection from lines led to greylisting and only 43.7% of those were resent. There's such a small percentage greylisted because many were rejected at SMTP time (77.7% of SMTP connections) before they could have been temporarily rejected (greylisted). 96.6% of rejections are from RBL, RDNS and address verification. Only 3.3% of rejections are from anti-spam, so the greylisting percentage would probably be about the same if 'Reject at SMTP time' were set to "Confirmed Spam" and anti-spam actions were set to "Quarantine."
It seems safe to suggest that once greylisting has been used for awhile, less than 2% of SMTP connections would be temporarily rejected. The 15% I saw in my post above covered only the first 4 days after enabling greylisting.
The IP,Sender,Recipient triplet is cached. The From field comes after DATA in the SMTP conversation. Sender in the foregoing triplet is the content of the MAIL FROM: line sent prior to RCPT TO:.
Cheers - Bob