UTM 9.601 - RED issues!

Since upgrading all our customers to 9.601, a bigger part of them are complaining about RED's re/disconnection in a no-pattern way.

It started for all of them just the night we upgraded to 9.601, and they all are on different ISP's and located different places around the country.

Been with Sophos support for 2 hours today, and now they escalated it to higher grounds.

Will return with an update....

Suspicious entries in the log - but all connected REDs do this before connection:

2019:03:06-15:15:38 fw01-2 red_server[17509]: SELF: Cannot do SSL handshake on socket accept from 'xxx.xxx.xxx.xxx': SSL connect accept failed because of handshake problems

2019:03:06-15:15:46 fw01-2 red2ctl[12420]: Missing keepalive from reds3:0, disabling peer xxx.xxx.xxx.xxx

I know the last line is written before the tunnel disconnects, because there was no "PING/PONG" answer...

One customer has 2 x RD 50, one 1 100% stable and the other fluctuates in random intervals - we replaced this with a new RED 50, but the same thing occurs.

  • In reply to Peter Riederer:

    I would highly suggest to start to debug your UTM Connection.

    Start by dumping the Port 3410 and Port 3400 Port in a file (ring buffer) and extract it. 

    Then analyse the time frame, when the connection drops and not restart. 

     

    I would guess, this is, as mentioned before, not related to the general RED issue in UTM9.6.

    https://community.sophos.com/kb/en-us/134398

    Maybe you should open another Channel and post your output there. 

  • In reply to LuCar Toni:

    I can also confirm, that we could solve overflow issues on RED15W with disabling the unified firmware. Customer had recurring interuptions. This has really nothing to do with ISPs.... It´s the really bad quality of your software, that isn´t tested sufficiently and causes so many headaches to many customers... You should really think about rolling out security and feature updates seperatly. So someone could still satisfy security needs, while features could be installed later...

  • In reply to seroal:

    And what we get know after disabling the unified firmware is this message:

     

    [INFO-184] RED server not running - restarted

     

    It appears every hour...

  • In reply to seroal:

    Plan a reboot, that fixed the issue here....

     

    If HA reboot both nodes.

  • In reply to twister5800:

    Thank you, we will try that!

  • In reply to seroal:

    another 10 days later, the first location with RED15 is offline again:

    2019:09:15-21:56:48 vpn red_server[5533]: RED15_LOC1: No ping for 30 seconds, exiting.
    2019:09:15-21:56:48 vpn red_server[5533]: id="4202" severity="info" sys="System" sub="RED" name="RED Tunnel Down" red_id="RED15_LOC1" forced="0"
    2019:09:15-21:56:48 vpn red_server[5533]: RED15_LOC1 is disconnected.
    2019:09:15-21:56:48 vpn red_server[4919]: SELF: (Re-)loading device configurations
    2019:09:15-21:56:49 vpn red2ctl[4930]: Overflow happened on reds1:0
    2019:09:15-21:56:49 vpn red2ctl[4930]: Missing keepalive from reds1:0, disabling peer x.x.x.x

    and still no Update from Sophos about a fix!

    disabling unified firmware tonight and reboot the UTM

  • In reply to Peter Riederer:

    Did you open a Support Case? 

  • In reply to Peter Riederer:

    Thanks  Please DM me the case number once it is created.

  • In reply to LuCar Toni:

    finally we disabled the unified firmware successfully on monday and opened a Case via our partner.

    The first response was doing an exchange of the RED Device, but i told the partner that all three devices are affected and this could not be a device issue. Now we are waiting for a reply.

  • In reply to William Fraley:

    William Fraley

    My problem is resolved. There is a known issue related to unified firmware.

    from su -

    cc get red use_unified_firmware

    if value returned = 1

    cc set red use_unified_firmware 0

    reds will update and reboot

    confirm value is 0 rerunning get command above

     

    NOT A PERMANENT FIX. The issue needs to be addressed in Sophos UTM firmware permanently.

     


    Anybody (including Sophos Staff) know if this will work with UTM 9.7?

    Best regards 

    Alex 

  • In reply to Alexander Busch:

    Hi  

    This specific issue regarding RED 50 devices was resolved in UTM v9.605 (https://community.sophos.com/kb/en-us/134398).

    Regards,

  • In reply to FloSupport:

    FloSupport

    Hi  

    This specific issue regarding RED 50 devices was resolved in UTM v9.605 (https://community.sophos.com/kb/en-us/134398).

    Regards,

    Correct, but there are other issues with RED15, which are loosing connection after some time. That's the reason I am asking.
     
    P.S. Just to have an appropriate tool after the update, because a downgrade is not easy.
     
    Best regards
    Alex
  • In reply to Alexander Busch:

    Hi  

    I followed up with the team, and the new RED unified firmware handles routing differently. As a result, customers who previously had working configurations on the legacy firmware (with the RED WAN IP overlapping with a listed split network) will experience issues on the new unified firmware.

    Could you please confirm if you are using the RED in a split mode configuration, and if so  - please check that your RED WAN IP is not overlapping with a listed split network subnet?

    Thanks,

  • In reply to FloSupport:

    This is not the case, because we have RED 15's running in standard/unified mode that exhibit this same behavior. I just had a disconnect event today in all six of our remote offices running RED 15's that have been updated to 9.605-1. It took anywhere from 20 to 45 minutes for each of them to come back online and they all did this at random times throughout the day between 10:15 a.m. and 2:50 p.m.

    But I do think you've hit the nail on the head when you said it handles routing differently. Maybe it's time to go back to the old way of routing until you get the bugs resolved in the new way.