DO NOT INSTALL 9.703-2!!!

DO NOT INSTALL 9.703-2!!!

My lab system was Up2Dated to 9.703-2 Thursday evening at 10PM CDT (UTC -0500) and all connection with the outside world immediately stopped.  My local connection would work normally a few minutes at a time and then everything would lock up for a few minutes.  I could not identify the problem with top, but did see a lot of zombie confd processes.  I lost the entire day of Friday because my wife has a big project due next week and was working via Microsoft Teams all day with her colleagues.

I will suggest to Sophos that the file be removed from the ftp site. Grumble.

Cheers - Bob

  • Ugly.  I was unprepared for disaster recovery with my wife working from home.  I found out that my USB stick that hadn't been used in over a year was dead as was the monitor connected to the UTM that hadn't been turned on probably since I replaced the computer several years ago.  Oh, and I was reminded that my client that borrowed my portable DVD burner had never returned it.  Here's an extract from the case I have open with Sophos Support...

    My initial attempt to fix this problem was to restore from a backup made automatically the morning before the 9.703 Up2Date was applied.  That had no effect, so I rebooted the UTM (a UTM 320 running as a generic PC).  Again, the problems continued.

    Note: I don't remember if I changed /etc/asg five years ago after installing an ssi ISO or if I changed it before installing an asg ISO.  That might be something to test: https://community.sophos.com/products/unified-threat-management/f/hardware-installation-up2date-licensing/10917/asg-425-display-with-homelicense/32959#32959

    First, more description of the situation.  Both Reporting and the logs showed that there was no more traffic on the External interface after the reboot following the application of the Up2Date at 22:00 local time on 09 April.

     

    Something was causing things to lock up for several minutes and then work for several minutes.  I decided that I would capture all of the logs from 2020 using WinSCP.

    When the "lock" was on:

    1. I couldn't log into WebAdmin, or, if already logged in, could do nothing or, if something had been started, it was hung.  The same was true with WinSCP.
    2. When trying to ping my laptop from the console, I got a message that the action was not allowed (sorry, don't remember the exact wording) or that the network was unreachable.  I couldn't even ping 10.x.y.34, the IP of the Internal interface.
    3. From my laptop, I got something like "Not found" when I tried to ping 10.x.y.34.

    Strangely, top on the console continued running.  I was surprised that there were so many confd zombies.  Another big user of CPU was mdw - which made no sense to me as I was changing nothing.  At one point, during a lock, I noticed httpproxy take 95% of one CPU, so I waited for WebAdmin to be responsive again and disabled Web Filtering and Snort.  That made no difference and the lock-work cycle continued.

    Finally, I was able to get all of the 2020 logs from /var/log, re-imaged with 9.702 (asg ISO) and restored from backup.  All is now running normally as it was prior to installing 9.703.

    Cheers - Bob

  • In reply to BAlfson:

    Hi Bob,

     

    Thanks for a thorough walthrough of your isses.

    I installed 9.703 when it came out on 1 SG 210, and have not seen anything yet, regarding issues - no explosions.

     

    I run it as ASG (Software) on the appliance to use the home / partner license :)

     

    Looking forward to hear your feedback ;)

    Happy easter ;)

  • In reply to twister5800:

    Just updated a UTM 220 with ASG (Software) 9.703 also, there is also no issues....

  • In reply to twister5800:

    Hello, just to second Bob: I also had this problem. Although I cannot technically verify it as he was able to (due to my lack of knowledge), it felt exactly as Bob described, extreeemly sluggish and strange, and as I posted here:
    https://community.sophos.com/products/unified-threat-management/b/blog/posts/utm-up2date-9-703-released

    The effects of the upgrade were for me the same as for Bob. Maybe as additional info: my hardware is a Fujitsu-PC with home edition. Intel i5-4590, 12GB RAM, 1x Intel NIC onboard, 2x Intel NIC I210-T1, 1x HDD.

  • Mine bonked too. I scheduled the update for early Saturday morning. Woke up to discover no Internet. I swapped in my cold spare then reinstalled 9.702-1 from a freshly downloaded ISO on the production box. Then restored a daily backup. I swapped my production box (Zotac ZBOX-CI325NANO-U) back in this morning (with 9.702-1). Performing DR every once in a while is a good thing, I guess...

  • In reply to BAlfson:

    Hi  

    I have also just tried an upgrade and the exact same thing happened.

    although 2 hours after it performed the upgrade it sent out a backup file.

    I have had no connection to the outside world or any of the VLANs internally.

    I also noticed that the interfaces would all shutdown (no lights) and then start back up again after a few minutes.

    I am now having to re-image the entire SG310.

     

    I am realising that trusting Sophos to do their job, is not working out well (what with the RED issue).

    it is a great product, but they keep on screwing up.

  • I too have had serious issues with 9.703.  The first attempt of update, the system's BIOS could not find the boot drive.  I replaced the drive and reinstalled up to 9.702.  I tried the update again.  This time the system rebooted, but could not get to the Internet or to any internal IP address.  I started a PING on the UTM towards a internal IP.  The pings would either be UNREACHABLE or would be Operation Not Permitted.  It looked like the Middleware was continuously restarting.  

     

  • In reply to Todd Allison:

    Ahh, uff, then I really had luck so far.

    Installed 5 different locations with 9.703 until today, not a single site had problems you describe here. Strange.

  • In reply to jprusch:

    you may well have had luck on this, I must also say that out of the 3 units I upgraded, only the most complex of all of them went completely south.

    the 2 x SG135 both upgraded without a hitch (thankfully)

    the 1 x SG310 had similar symptoms to 

    I am starting to feel that although Sophos have some great products, there are still some serious shortcomings with their QA process, and the fact that when we do log tickets with their support the response has not been very reassuring.

    Along with the fact that when they do release updates for the UTM, these updates seem to be flawed in some critical way.

    I used to be an evangelist for the product range, but this is being tarnished by the lack of attention to their customers (part of me jokes that of course we are cyber secure, if we have no firewall we have are secure!).  I say used to be, unless Sophos pull something out of the hat soon they will loose a considerable portion of their market place.

  • In reply to Argo:

    Seems there is something seriously broken in 9.703.

    I have to go down to MTU=1320 at several sites since we updated to that version there to reach resources outside the LAN.

    This happens with SG210, SG230, SG135w, SG115w, SG105 and a software appliance as well.

    Since the rest of the equipment in the networks didn't  change, I suspect something is wrong with MSS and / or MTU handling.

    Maybe I can do a test with a mangle rule?

  • Hmmm Sophos is still rolling this update to the firewalls, just got mails from several UTM's that the fw is ready for install....

  • In reply to twister5800:

    Today, I can report ugly behaviour from 9.703 UTMs, high CPU usage (before during normal activities 7-15%, now always up to 55% with our SG210.

    Sluggish internet access, sometimes minutes with no DNS / Web like something is "stuck"

  • In reply to jprusch:

    What about system-log and fallback-logs?! ;)

  • In reply to twister5800:

    You mean me looking at that?

  • In reply to jprusch:

    Yes :-)

     

    I have upgraded the devices i have in my lab, no issues at all, but I can see that others are seeing the same, I just wonder what could be wrong.

     

    Often the system log and fallback logs are good places.

     

    Coud you post how they look maybe?

     

    Think many people hear are curious on what the h... is wrong with the update.