This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Recent Up2Date failures

In the last 24 hours, I've received a bunch of Up2Date prefetch failure alerts from my UTM.  The delay between failure and restoration alerts varies - the shortest was 15 minutes while the longest was nearly five hours.  The fact that I'm getting these alerts at all is unusual - I normally only see this once or twice a year - but that fact that it's been happening on and off for over a day now, makes me wonder if there's a problem with the Up2Date servers.

Sure enough, the Up2Date log shows the following repeated error:

2021:08:20-01:35:04 astaro-2 audld[27684]: Starting Secured Up2Date Package Downloader
2021:08:20-01:35:04 astaro-2 audld[27684]: Using static update server list in HA mode
2021:08:20-01:35:20 astaro-2 audld[27684]: Secured Up2date Authentication
2021:08:20-01:35:29 astaro-2 audld[27684]: >=========================================================================
2021:08:20-01:35:29 astaro-2 audld[27684]: ILLEGAL XML doc received
2021:08:20-01:35:29 astaro-2 audld[27684]: 
2021:08:20-01:35:29 astaro-2 audld[27684]:  1. Modules::Logging::msg:46() /</sbin/audld.plx>Modules/Logging.pm
2021:08:20-01:35:29 astaro-2 audld[27684]:  2. Modules::Audld::Authentication::XML::check:45() /</sbin/audld.plx>Modules/Audld/Authentication/XML.pm
2021:08:20-01:35:29 astaro-2 audld[27684]:  3. Modules::Audld::Authentication::_authenticate:168() /</sbin/audld.plx>Modules/Audld/Authentication.pm
2021:08:20-01:35:29 astaro-2 audld[27684]:  4. Modules::Audld::Authentication::start:66() /</sbin/audld.plx>Modules/Audld/Authentication.pm
2021:08:20-01:35:29 astaro-2 audld[27684]:  5. main::main:187() audld.pl
2021:08:20-01:35:29 astaro-2 audld[27684]:  6. main::top-level:40() audld.pl
2021:08:20-01:35:29 astaro-2 audld[27684]: |=========================================================================
2021:08:20-01:35:29 astaro-2 audld[27684]: Authentication Server us1.utmu2d.sophos.com: 'ERR_XX' 'Server sent a mis-formed XML document: <u2d>error</u2d>
2021:08:20-01:35:29 astaro-2 audld[27684]: 
2021:08:20-01:35:29 astaro-2 audld[27684]: '
2021:08:20-01:35:29 astaro-2 audld[27684]: <=========================================================================

I checked the Sophos support page but I don't see any mention of problems there, nor do I see any recent posts about it here, which prompts me to wonder if the problem is local to my network, but I run multiple services including a work VPN 24x7, and nothing else is reporting any sort of hiccups.

Is anyone else seeing similar behavior?  As a mere Free Home License user, I don't have any access to Support, but if I'm not the only one seeing this, then hopefully someone else will raise a flag.



This thread was automatically locked due to age.
Parents
  • Hi Jon,

    I haven't seen this problem at any of my clients (all over North America), nor here, 350 miles north of you.

    Since the problem was with us1, force an attempt from us2 at the command line with:

    audld.plx --server us2.utmu2d.sophos.com:443

    If that works, then, since this isn't a paid license, you could change the order in the fallbackservers line in /etc/up2date/up2date.conf.

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • Thanks for responding, Bob.  I didn't include the extra log messages, but it was affecting all the servers equally so forcing a particular update server would probably have been fruitless.

    But it's resolved now, and the resolution might surprise you. Slight smile

    When I started to dig further, I found that my standby node was in some sort of weird state - the active node showed it as "Ready", but the console was completely unresponsive.  So I rebooted it, and upon rebooting, it got as far as the "Hit F2 for details" splash screen, then hung completely.  I tried rebooting it a couple more times with the same behavior, then abandoned it and restored from a backup taken 48 hours earlier, after which it happily came back up and registered itself as the standby node.

    But interestingly, the moment I first rebooted that non-responsive standby node, the primary node reported that Up2Date prefetch was working again, and it's been solid ever since.  I wonder if perhaps the standby node had gotten itself into a weird "half-takeover" state in which it was attempting to contact the Up2Date servers itself using the same MAC address as the active node (I have them configured with a single shared MAC because my ISP requires a modem reboot if the connected MAC address changes) and was thus conflicting with the active node's attempts to connect.

    Anyway, whatever it was, it was resolved by that reboot and subsequent restore of the standby node..

    Cheers,
      Jon.

Reply
  • Thanks for responding, Bob.  I didn't include the extra log messages, but it was affecting all the servers equally so forcing a particular update server would probably have been fruitless.

    But it's resolved now, and the resolution might surprise you. Slight smile

    When I started to dig further, I found that my standby node was in some sort of weird state - the active node showed it as "Ready", but the console was completely unresponsive.  So I rebooted it, and upon rebooting, it got as far as the "Hit F2 for details" splash screen, then hung completely.  I tried rebooting it a couple more times with the same behavior, then abandoned it and restored from a backup taken 48 hours earlier, after which it happily came back up and registered itself as the standby node.

    But interestingly, the moment I first rebooted that non-responsive standby node, the primary node reported that Up2Date prefetch was working again, and it's been solid ever since.  I wonder if perhaps the standby node had gotten itself into a weird "half-takeover" state in which it was attempting to contact the Up2Date servers itself using the same MAC address as the active node (I have them configured with a single shared MAC because my ISP requires a modem reboot if the connected MAC address changes) and was thus conflicting with the active node's attempts to connect.

    Anyway, whatever it was, it was resolved by that reboot and subsequent restore of the standby node..

    Cheers,
      Jon.

Children
No Data