This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

UTM 9.601 - RED issues!

Since upgrading all our customers to 9.601, a bigger part of them are complaining about RED's re/disconnection in a no-pattern way.

It started for all of them just the night we upgraded to 9.601, and they all are on different ISP's and located different places around the country.

Been with Sophos support for 2 hours today, and now they escalated it to higher grounds.

Will return with an update....

Suspicious entries in the log - but all connected REDs do this before connection:

2019:03:06-15:15:38 fw01-2 red_server[17509]: SELF: Cannot do SSL handshake on socket accept from 'xxx.xxx.xxx.xxx': SSL connect accept failed because of handshake problems

2019:03:06-15:15:46 fw01-2 red2ctl[12420]: Missing keepalive from reds3:0, disabling peer xxx.xxx.xxx.xxx

I know the last line is written before the tunnel disconnects, because there was no "PING/PONG" answer...

One customer has 2 x RD 50, one 1 100% stable and the other fluctuates in random intervals - we replaced this with a new RED 50, but the same thing occurs.



This thread was automatically locked due to age.
  • The workaround does work, until it doesn't. Seven out of ten were able to stabilize with the workaround, three had to be RMA'd.

    Lest anyone would get upset over the workaround not working. Should it work for you it's a win. Thanks for everyone's input to the forum, makes life less stressful in most cases.

  • this also worked for me ...

  • Today the Tunnel of one RED was down again (with use_unified_firmware = 0). It was the one that was problematic after updating to 9.601-5

    Disabling and reenabling the RED did not fix the issue. After switching back to 'unified firmware' alle REDs are UP again (use_unified_firmware = 1)

    I think it has to bee something different as the issue still apears after one day (with or without unified firmware). ...

    However it is related to version 9.601 as i did not have issues before the update...

  • Hallo somi and welcome to the UTM Community!

    Based on what others have said above, I would push Sophos Support to RMA the failing RED.

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • Hey Bob,

    thanks, we already replaced that RED with a brand new RED 15 - same thing.

    Regards,

    Michael

  • ...it seems to be a timing issue... When i disable the RED for 5 minutes on the UTM it works fine shortly after reenabling it again.

    No 'stabelizing peers' and such in the log - just 'tunnel -up' -> PING PONG, PING PONG

    Tunnel is stable until the DSL-Line does its 24h reconnect then.
    This brings it out of 'sync'...

    Then until i disable and reenable the RED i can see -> boot -> stabelizing peers -> 5 ICMP Packets go through the tunnel (yay!) -> unstable peers -> reboot (oh no!)
    ... over and over again ...

  • Hi all,

     

    Sorry for the delay, been on vacation for a week - my nerves cound not stand it anymore ;-)

    I also did the "cc set red use_unified_firmware 0" before I left, and can confirm it solved ALL MY ISSUES.

    Had one customer with two RED 50s, one was very unstable and another was completely offline, we have setup temporary SG115's with IPSEC just to keep the customer running.

    After I have disabled the new unified firmware, both RED 50's are back and 100% stable!

    Sophos Support claims that there are no issues with this, but please, keep refering to this community string, so they can see, that there actually are problems.

    I have enabled RED debugging with suppoort, and inserted USB key for debug logging into the red 50, but nothing important was shown.

    We have the unified firmare enabled with several other customers, which have no issues with it, so it's odd, I think it looks like some ttl, ips issues, with the different ISP.

    EDIT:

    Some other issues have been located, and it seems like Sophos it looking into it:

    community.sophos.com/.../9-601-5-update-killed-red-50-home-site-dns-resolution

  • We had the same issues, same errors with RED-50's after updating to 9.601, we have two, one worked fine the other would randomly disconnect our users. Tech Support rolled back firmware on both Red's and so far so good.

  • Just a quick update from support, which I got back today:

    This issue is going to be resolved in 9.602 which is the upcoming firmware. But I do not have any ETA for that yet. You may follow our community thread: community.sophos.com/.../utm-blog for the release notes and update information.

  • Reporting same issues here. (2) RED50s in our setup. One in same USA state as main UTM, but different ISP. One in Hyderabad, India

    Both were working fine after upgrade to 9.601 on 2019-03-10.

    Then, as of 2019-03-30, the USA RED50 dropped offline. Troubleshooting the issue showed all of the same symptoms as reported here.

    Log:

    2019:03:30-19:51:23 astaro red_server[11907]: A34XXXXXXXXXF19: command '{"data":{"key1":"GwQ1b43erVLU4RzintYPM\/T1lczqkZWvfag7yHSQBVo=","key_active":0},"type":"SET_KEY_REQ"}'
    2019:03:30-19:51:23 astaro red_server[11907]: A34XXXXXXXXXF19: Sending json message {"data":{},"type":"SET_KEY_REP"}
    2019:03:30-19:51:24 astaro red_server[11907]: A34XXXXXXXXXF19: No ping for 30 seconds, exiting.
    2019:03:30-19:51:24 astaro red_server[11907]: id="4202" severity="info" sys="System" sub="RED" name="RED Tunnel Down" red_id="A34XXXXXXXXXF19" forced="0"
    2019:03:30-19:51:24 astaro red_server[11907]: A34XXXXXXXXXF19 is disconnected.
    2019:03:30-19:51:24 astaro red_server[4803]: SELF: (Re-)loading device configurations
    2019:03:30-19:51:25 astaro red2ctl[4813]: Overflow happened on reds2:0
    2019:03:30-19:51:25 astaro red2ctl[4813]: Missing keepalive from reds2:0, disabling peer 24.x.y.63
    2019:03:30-19:51:28 astaro red2ctl[4813]: Received keepalive from reds2:0, enabling peer 24.x.y.63
    2019:03:30-19:51:40 astaro red2ctl[4813]: Missing keepalive from reds2:0, disabling peer 24.x.y.63

    2019:03:30-19:52:25 astaro red_server[12523]: SELF: Cannot do SSL handshake on socket accept from '24.x.y.63': SSL connect accept failed because of handshake problems

    2019:03:30-19:53:06 astaro red_server[12528]: SELF: Cannot do SSL handshake on socket accept from '24.x.y.63': SSL wants a read first

     

    further on:

    2019:03:30-19:58:52 astaro red_server[13844]: SELF: New connection from 24.a.b.162 with ID A34XXXXXXXXXF19 (cipher RC4-SHA), rev1
    2019:03:30-19:58:52 astaro red_server[13844]: A34XXXXXXXXXF19: connected OK, pushing config
    2019:03:30-19:59:23 astaro red_server[13844]: A34XXXXXXXXXF19: No ping for 30 seconds, exiting.
    2019:03:30-19:59:23 astaro red_server[13844]: id="4202" severity="info" sys="System" sub="RED" name="RED Tunnel Down" red_id="A34XXXXXXXXXF19" forced="0"
    2019:03:30-19:59:23 astaro red_server[13844]: A34XXXXXXXXXF19 is disconnected.

    It then went silent.

    The LCD screen shows unit going through a similar procedure reported above by Twister5800:

    Network Setup
    ID A34xxxxxxxxxxxx
    Network Setup
    Try wan1
    Try wan2
    Firmware update 1/6 downloading
    Try Prov. Server
    Network Setup
    Try wan1
    Try wan2
    Firmware update 1/6 downloading
    Try Prov. Server
    Shutting down…

     

    We replaced the problematic RED50 with a backup RED10 and it is stable thusfar. Not sure if the RED10 uses the Unifies firmware or not? I checked our config from the console and it does show "red use_unified_firmware=1"

    We have no issue so far with the India RED50 unit. Keeping our fingers crossed and waiting for the new firmware.