Today I had a kind of deja vu with a 2015 problem of mine: https://community.sophos.com/products/unified-threat-management/f/web-protection-web-filtering-application-visibility-control/46798/ad-sso-stopped-working
But as so much time has passed since then (with several updates to UTM and Windows, including us switching from a level 2003 AD to a level 2012 AD, changes MS made to SMB implementations in reaction to WannaCry, and much more)
The problem is: Users cannot surf the web because they unavoidably get the nagging proxy authentication dialog that they should authenticate to the proxy, but even entering the credentials does not help.
- We tried "sync now" under Definitions & Users -Advanced - AD group membership synchronisation. It went through but did not help.
- Similarly, we tried "prefetch now" of directory users. This went through but did not help.
- Under Definitions & Users - Authentication services - Server - (our AD server), I can perform a test authentication without any problems: The test passes and group memberships are listed correctly.
The only thing so far that helped was to switch to switch to a proxy profile without AD SSO.
So what has happened?
This night, we ran Up2Date from version 9.501-5 to version 9.502-4. So obviously, one suspicion is that the new software version causes the problem. But it might also be the case that the corresponding reboot merely made an older problem surface. So I checked: The last reboot was on 13 June, and since then the following Updates were installed on the AD server: KB4022726 (June-2017 monthly security quality rollup), KB4022717 (June-2017 Security quality update), KB4021558 (cumulative security update for IE11), KB890830 (June-2017 malware removal tool). None of these seems suspicious to me (and I could not find any reports that they were), but maybe someone else knows better? At least the culprit of 2015 (KB3002657) is not installed - but maybe some other update managed to sneak in with the same effect?
I have a last ace up my sleeve: We run on High Availability and one node is still running version 9.501-5. I may give it a try and shut down the 9.502-4 guy this evening (in spite of immediate HA take-over, I know that VPN tunnels will break for a split second and that will cause several dozens of remote users to have to reconnect and complain; otherwise I would have tried this right away), but I do hope that a different solution is possible (as I suppose to really roll back also the 9.502-4 guy is a bit of a hassle?).
UPDATE: I didn't read my mail
It is only now that I noticed the following mail from shortly after the Version update:
There was an error synchronizing subscribed groups. The Sophos UTM will continue to operate with a locally cached copy of the data but will be unable to update from Directory Services until the issue is resolved.
Error was:
failed to run samba command on (our AD domain), exiting now
--
HA Status : HA MASTER (node id: 2)
System Uptime : 0 days 0 hours 36 minutes
System Load : 0.19
System Version : Sophos UTM 9.502-4
Please refer to the manual for detailed instructions.
Basing my search on this, I found hints that I should unjoin (using wrong credentials) and rejoin (though nothing had changed with the credentials)
This thread was automatically locked due to age.