This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

UTM 9.601 - RED issues!

Since upgrading all our customers to 9.601, a bigger part of them are complaining about RED's re/disconnection in a no-pattern way.

It started for all of them just the night we upgraded to 9.601, and they all are on different ISP's and located different places around the country.

Been with Sophos support for 2 hours today, and now they escalated it to higher grounds.

Will return with an update....

Suspicious entries in the log - but all connected REDs do this before connection:

2019:03:06-15:15:38 fw01-2 red_server[17509]: SELF: Cannot do SSL handshake on socket accept from 'xxx.xxx.xxx.xxx': SSL connect accept failed because of handshake problems

2019:03:06-15:15:46 fw01-2 red2ctl[12420]: Missing keepalive from reds3:0, disabling peer xxx.xxx.xxx.xxx

I know the last line is written before the tunnel disconnects, because there was no "PING/PONG" answer...

One customer has 2 x RD 50, one 1 100% stable and the other fluctuates in random intervals - we replaced this with a new RED 50, but the same thing occurs.



This thread was automatically locked due to age.
Parents
  • Another spontaneous "overflow". Luckily after hours...

    2019:09:05-16:52:16 neo-2 red_server[6026]: A35xxxxxxxxxxxx: Sending json message {"data":{"seq":39230},"type":"PONG"}
    2019:09:05-16:52:47 neo-2 red_server[6026]: A35xxxxxxxxxxxx: No ping for 30 seconds, exiting.
    2019:09:05-16:52:47 neo-2 red_server[6026]: id="4202" severity="info" sys="System" sub="RED" name="RED Tunnel Down" red_id="A35xxxxxxxxxxxx" forced="0"
    2019:09:05-16:52:47 neo-2 red_server[6026]: A35xxxxxxxxxxxx is disconnected.
    2019:09:05-16:52:47 neo-2 red_server[21506]: SELF: (Re-)loading device configurations
    2019:09:05-16:52:47 neo-2 red2ctl[21514]: Overflow happened on reds2:0
    2019:09:05-16:52:47 neo-2 red2ctl[21514]: Missing keepalive from reds2:0, disabling peer 195.xxxxxxxx
    2019:09:05-16:52:53 neo-2 red2ctl[21514]: Received keepalive from reds2:0, enabling peer 195.xxxxxxxx
    2019:09:05-16:56:23 neo-2 red2ctl[21514]: Missing keepalive from reds2:0, disabling peer 195.xxxxxxxx
    2019:09:05-17:04:25 neo-2 red_server[21506]: SELF: (Re-)loading device configurations
    2019:09:05-17:08:19 neo-2 red_server[2455]: SELF: Cannot do SSL handshake on socket accept from '195.xxxxxxxx': SSL connect accept failed because of handshake problems

     

    We'll see if and when it comes back up...

  • I assuming, this is not related to the Firmware ... Because overflow and other stuff was in the RED protocol since decades and most likely are related to ISP issues. 

    RED Protocol is heavily related to a burst of data (UDP data).
    And some ISP in some scenarios does not like it. 

    Looks like if the RED is firing some data, it could actually crash. 

    If SG/XG is not received enough UDP data in short time and then all those data once, such overflows happen. Maybe you should check with the ISP (Restart router etc, call ISP to reset the wire etc.). 

    Especially if you have only one RED connected. Try to get another RED, check if this is happening with another RED / another location.

    If yes, it could be related to your local SG ISP, if not, it could be related to your RED ISP. 

     

    __________________________________________________________________________________________________________________

Reply
  • I assuming, this is not related to the Firmware ... Because overflow and other stuff was in the RED protocol since decades and most likely are related to ISP issues. 

    RED Protocol is heavily related to a burst of data (UDP data).
    And some ISP in some scenarios does not like it. 

    Looks like if the RED is firing some data, it could actually crash. 

    If SG/XG is not received enough UDP data in short time and then all those data once, such overflows happen. Maybe you should check with the ISP (Restart router etc, call ISP to reset the wire etc.). 

    Especially if you have only one RED connected. Try to get another RED, check if this is happening with another RED / another location.

    If yes, it could be related to your local SG ISP, if not, it could be related to your RED ISP. 

     

    __________________________________________________________________________________________________________________

Children
  • Thanks for the input. I should tell you that we've had these problems for months now and spent hours on the phone with Sophos "Premium" Support, who ended up RMAing the unit. The replacement is what is online now, and we are seeing the same issues. 

    We have of course also talked to our ISP (Deutsche Telekom at both locations, but separate circuits). They cannot detect any drops in service, or sudden data bursts resulting in disconnects. Which makes sense - we have had the same circuits since 2016. The RED was working fine for 3 years - like many others here, the random disconnects started with 9.601-5

    I very much doubt that this is an ISP related issue. Even if the original disconnect were somehow related to drops on the part of the ISP, it would not explain the SSL handshake failure, or why the configuration fails to load, or how the RED manages to come back online at some point...

  • Disabled the unit re-enabled after 10 minutes, now back online. Outage just over an hour... 

  • Totally agree that this cant be an ISP issue, we replaced all 3 old RED Devices with brand new Red15 and the Problem is still there, and we are using M-NET as Provider

    Last evening a Red15 stopped working after 10 days beeing up. This was the last Red15 with MTU1500. No trying the good old workarounds to geht the location online again. 

     

    2019:09:05-19:00:12 vpn red_server[11278]: RED15: No ping for 30 seconds, exiting.
    2019:09:05-19:00:12 vpn red_server[11278]: id="4202" severity="info" sys="System" sub="RED" name="RED Tunnel Down" red_id="RED15" forced="0"
    2019:09:05-19:00:12 vpn red_server[11278]: RED15 is disconnected.
    2019:09:05-19:00:12 vpn red_server[4919]: SELF: (Re-)loading device configurations
    2019:09:05-19:00:14 vpn red2ctl[4930]: Overflow happened on reds2:0
    2019:09:05-19:00:14 vpn red2ctl[4930]: Missing keepalive from reds2:0, disabling peer x.x.x.x
    2019:09:05-19:00:17 vpn red2ctl[4930]: Received keepalive from reds2:0, enabling peer x.x.x.x

     

    2019:09:05-19:13:52 vpn red_server[6476]: SELF: Cannot do SSL handshake on socket accept from 'x.x.x.x': SSL connect accept failed because of handshake problems SSL wants a read first
    2019:09:05-19:13:52 vpn red_server[6477]: SELF: Cannot do SSL handshake on socket accept from 'x.x.x.x': SSL connect accept failed because of handshake problems SSL wants a read first
    2019:09:05-19:37:55 vpn red_server[12483]: SELF: Cannot do SSL handshake on socket accept from 'x.x.x.x': SSL wants a read first

     

    and there is another strange thing, we get connections from an ip adress we dont know:


    2019:09:05-22:50:12 vpn red_server[30085]: SELF: Cannot do SSL handshake on socket accept from '193.188.22.56': SSL accept attempt failed with unknown error error:140760FC:SSL routines:SSL23_GET_CLIENT_HELLO:unknown protocol

    already did a whois search and it is related to well-web.net to a private person ?!

  • Peter Riederer said:
    Totally agree that this cant be an ISP issue, we replaced all 3 old RED Devices with brand new Red15 and the Problem is still there, and we are using M-NET as Provider

    ...

    Agree, because I have a separate IPSec Tunnel to the DSL-Routers which are in front of some of our REDs (for management reasons) and the connection is still up while the RED connection was down. So the ISP theory seems very unlikely to me.

    Best regards

    Alex

    -

  • Location was Offline 17 hours now!!

    Deleted and re-added the Red15 at 10 AM and now after 3 Hours of waiting it is online again, what the f....!!

    I really hope there will be a fix coming very very soon, otherwise we will throw out all our Sophos Products within the next lifecycle, and replacing also the REDs with foreign Products.

    And I dont understand why sophos is not telling the customers whats going on here and when the issues will be finally fixed!

     

    Does anybody have some experiences establishing an IPSEC Tunnel from a FritzBox to a Sophos UTM ?

  • The Game goes on...

    the next location is offline after 9 days since 14:26 o'clock !!! Modem/Internet are up and running, only the Red is not reconnecting!

     

  • What is happening, if you switch back to the old firmware? 

    __________________________________________________________________________________________________________________

  • you mean this:  cc set red use_unified_firmware 0 ?

    i though its only for red 50 devices, isnt it?

  • Do you mean going back to 9.510, Toni?  Won't that old backup have old, incorrect unlock codes requiring deleting and re-installing the REDs?

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • Hi  and  

    Would it be possible to please raise a support case (if you haven't raised one already) and PM me with your case ID's?

    It seems that both of your issues are related to RED 15's and UTM v9.605.

    I would like to follow up so that further investigation can be performed.

    Regards,


    Florentino
    Director, Global Community & Digital Support

    Are you a Sophos Partner? | Product Documentation@SophosSupport | Sign up for SMS Alerts
    If a post solves your question, please use the 'Verify Answer' button.
    The Award-winning Home of Sophos Support Videos! - Visit Sophos Techvids