This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

UTM 9.601 - RED issues!

Since upgrading all our customers to 9.601, a bigger part of them are complaining about RED's re/disconnection in a no-pattern way.

It started for all of them just the night we upgraded to 9.601, and they all are on different ISP's and located different places around the country.

Been with Sophos support for 2 hours today, and now they escalated it to higher grounds.

Will return with an update....

Suspicious entries in the log - but all connected REDs do this before connection:

2019:03:06-15:15:38 fw01-2 red_server[17509]: SELF: Cannot do SSL handshake on socket accept from 'xxx.xxx.xxx.xxx': SSL connect accept failed because of handshake problems

2019:03:06-15:15:46 fw01-2 red2ctl[12420]: Missing keepalive from reds3:0, disabling peer xxx.xxx.xxx.xxx

I know the last line is written before the tunnel disconnects, because there was no "PING/PONG" answer...

One customer has 2 x RD 50, one 1 100% stable and the other fluctuates in random intervals - we replaced this with a new RED 50, but the same thing occurs.



This thread was automatically locked due to age.
Parents
  • Same issues here after 9.601-5 UTM update. 2x RED50 Rev 1. Drop multiple ISPs at varying intervals and lengths. It was advised to re-create RED in UTM. I have performed this, but problems still persist. I was sent two replacement RED50. The first one has been replaced, a new config created, but problem persists. ISPs modems have been replaced although they were reluctant to do so. One of the REDs wont recognize the presence of ISP on WAN1 at all.

    We are losing a lot of productivity and business. We do a sizeable portion of our business via teleconferencing.

    Support Tickets#

    8710435

    8707203

    8707207

     

    The tech alluded to a potential issue with REDs after the update to 9.6.01-5.

  • My problem is resolved. There is a known issue related to unified firmware.

    from su -

    cc get red use_unified_firmware

    if value returned = 1

    cc set red use_unified_firmware 0

    reds will update and reboot

    confirm value is 0 rerunning get command above

     

    NOT A PERMANENT FIX. The issue needs to be addressed in Sophos UTM firmware permanently.

  • Hi Bob,

    actually did not receive a PM from you, but anyway the first B is 9.605, in this scenario given that the REDs are not running the unified firmware prior to the update and are not connected during the update they will not receive a faulty unified firmware but only the fixed unified firmware of 9.605 so will not run into the problem, setting the unified firmware to 0 is actually not necessary in this case.

    The disabling of the REDs is done to prevent them from receiving a faulty firmware in the update process, ones on 9.605 that is not a problem anymore.

    Jan

  • Sorry, Jan, I don't see what I'm not understanding, but I can't reconcile your last post with:


    I just read your response to my PM, and my confusion remains.

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • William Fraley said:

    My problem is resolved. There is a known issue related to unified firmware.

    from su -

    cc get red use_unified_firmware

    if value returned = 1

    cc set red use_unified_firmware 0

    reds will update and reboot

    confirm value is 0 rerunning get command above

     

    NOT A PERMANENT FIX. The issue needs to be addressed in Sophos UTM firmware permanently.

     


    Anybody (including Sophos Staff) know if this will work with UTM 9.7?

    Best regards 

    Alex 

    -

  • Hi  

    This specific issue regarding RED 50 devices was resolved in UTM v9.605 (https://community.sophos.com/kb/en-us/134398).

    Regards,


    Florentino
    Director, Global Community & Digital Support

    Are you a Sophos Partner? | Product Documentation@SophosSupport | Sign up for SMS Alerts
    If a post solves your question, please use the 'Verify Answer' button.
    The Award-winning Home of Sophos Support Videos! - Visit Sophos Techvids
  • FloSupport said:

    Hi  

    This specific issue regarding RED 50 devices was resolved in UTM v9.605 (https://community.sophos.com/kb/en-us/134398).

    Regards,

    Correct, but there are other issues with RED15, which are loosing connection after some time. That's the reason I am asking.
     
    P.S. Just to have an appropriate tool after the update, because a downgrade is not easy.
     
    Best regards
    Alex

    -

  • Hi  

    I followed up with the team, and the new RED unified firmware handles routing differently. As a result, customers who previously had working configurations on the legacy firmware (with the RED WAN IP overlapping with a listed split network) will experience issues on the new unified firmware.

    Could you please confirm if you are using the RED in a split mode configuration, and if so  - please check that your RED WAN IP is not overlapping with a listed split network subnet?

    Thanks,


    Florentino
    Director, Global Community & Digital Support

    Are you a Sophos Partner? | Product Documentation@SophosSupport | Sign up for SMS Alerts
    If a post solves your question, please use the 'Verify Answer' button.
    The Award-winning Home of Sophos Support Videos! - Visit Sophos Techvids
  • This is not the case, because we have RED 15's running in standard/unified mode that exhibit this same behavior. I just had a disconnect event today in all six of our remote offices running RED 15's that have been updated to 9.605-1. It took anywhere from 20 to 45 minutes for each of them to come back online and they all did this at random times throughout the day between 10:15 a.m. and 2:50 p.m.

    But I do think you've hit the nail on the head when you said it handles routing differently. Maybe it's time to go back to the old way of routing until you get the bugs resolved in the new way.

  • Hi FloSupport,

    thank you for followup on that. All of our RED are deployed in standard mode, so at least that is/was not the problem. In this thread one page before, some folks describe the sporadic disconnection problem too. So I'm not the only one who was affected by this.
    I'll try to get the information from elsewhere if that workaround is available. Fortunate here are some guys with a lab, I can't test everything in our production environment. It's a matter of fact, since the release of the unified firmware for the RED, there are some problems with the REDs. And not all of them are resolved today.

    Best Regards

    Alex

    -

  • Did they get to the bottom of what caused this in the logs, as it was a cause of the disconnect

    2019:09:03-09:46:32 sophos-2 red_server[4626]: SELF: RED10rev1 fw version set to 14
    2019:09:03-09:46:32 sophos-2 red_server[4626]: SELF: RED10rev2 local fw version set to 5214R2
    2019:09:03-09:46:32 sophos-2 red_server[4626]: SELF: RED10rev2 fw version set to 2005R2
    2019:09:03-09:46:32 sophos-2 red_server[4626]: SELF: RED15(w) fw version set to 1-424-7131d4e52-e9f0c31
    2019:09:03-09:46:32 sophos-2 red_server[4626]: SELF: RED50 fw version set to 1-424-7131d4e52-0000000
    2019:09:03-09:46:32 sophos-2 red_server[4626]: SELF: IO::Socket::SSL Version: 1.953
    2019:09:03-09:46:32 sophos-2 red_server[4626]: SELF: Startup - waiting 15 seconds ...
    2019:09:03-09:46:32 sophos-2 red2ctl[4635]: Starting REDv2 control daemon
    2019:09:03-09:46:47 sophos-2 red_server[7747]: UPLOAD: Uploader process starting
    2019:09:03-09:46:47 sophos-2 red_server[4626]: SELF: (Re-)loading device configurations
    2019:09:03-09:46:48 sophos-2 red_server[4626]: A3502xxxxxxxxxx: New device
    2019:09:03-09:46:48 sophos-2 red_server[4626]: A3502xxxxxxxxxx: Staging config for upload
    2019:09:03-09:46:48 sophos-2 red_server[4626]: A350XXXXXXXXXXX: New device
    2019:09:03-09:46:48 sophos-2 red_server[4626]: A350XXXXXXXXXXX: Staging config for upload
    2019:09:03-09:46:48 sophos-2 red_server[7747]: [A3502xxxxxxxxxx] Config has not changed, no need to upload to registry service
    2019:09:03-09:46:48 sophos-2 red_server[7747]: [A350XXXXXXXXXXX] Config has not changed, no need to upload to registry service

  • Below is a copy of the entries in my RED logs that happened yesterday, when my six remote offices went down randomly throughout the day. The log entries were the same for each RED 15 device, but I've replaced any IP or MAC identifiers with dashes for security purposes.

    2019:10:03-14:49:39 oscar red_server[12883]: xxxxxxxxxxxxxxxxx: No ping for 30 seconds, exiting.
    2019:10:03-14:49:39 oscar red_server[12883]: id="4202" severity="info" sys="System" sub="RED" name="RED Tunnel Down" red_id="xxxxxxxxxxxxxxx" forced="0"
    2019:10:03-14:49:39 oscar red_server[12883]: xxxxxxxxxxxxxxx is disconnected.
    2019:10:03-14:49:39 oscar red_server[4610]: SELF: (Re-)loading device configurations
    2019:10:03-14:49:41 oscar red2ctl[4629]: Overflow happened on reds2:0
    2019:10:03-14:49:41 oscar red2ctl[4629]: Missing keepalive from reds2:0, disabling peer 174.xxx.xxx.xxx
    2019:10:03-14:49:44 oscar red2ctl[4629]: Received keepalive from reds2:0, enabling peer 174.xxx.xxx.xxx
    2019:10:03-15:05:25 oscar red_server[4610]: SELF: (Re-)loading device configurations
    2019:10:03-15:13:36 oscar red_server[793]: Allow TLS 1.2 only
    2019:10:03-15:13:43 oscar red_server[793]: SELF: Cannot do SSL handshake on socket accept from '174.xxx.xxx.xxx': SSL connect accept failed because of handshake problems SSL wants a read first
    2019:10:03-15:17:53 oscar red2ctl[4629]: Missing keepalive from reds2:0, disabling peer 174.xxx.xxx.xxx
    2019:10:03-15:17:56 oscar red2ctl[4629]: Received keepalive from reds2:0, enabling peer 174.xxx.xxx.xxx
    2019:10:03-15:45:40 oscar red_server[22351]: Allow TLS 1.2 only
    2019:10:03-15:45:40 oscar red_server[22351]: SELF: Cannot do SSL handshake on socket accept from '174.xxx.xxx.xxx': SSL connect accept failed because of handshake problems
    2019:10:03-15:45:42 oscar red_server[22358]: Allow TLS 1.2 only
    2019:10:03-15:45:42 oscar red_server[22358]: SELF: New connection from 174.xxx.xxx.xxx with ID --------------- (cipher AES256-GCM-SHA384), rev1
    2019:10:03-15:45:42 oscar red_server[22358]: xxxxxxxxxxxxxxx: connected OK, pushing config
    2019:10:03-15:45:43 oscar red_server[4610]: SELF: (Re-)loading device configurations
    2019:10:03-15:45:43 oscar red_server[22358]: xxxxxxxxxxxxxxx: command '{"data":{"version":"0"},"type":"INIT_CONNECTION"}'
    2019:10:03-15:45:43 oscar red_server[22358]: xxxxxxxxxxxxxxx: Initializing connection running protocol version 0
    2019:10:03-15:45:43 oscar red_server[22358]: xxxxxxxxxxxxxxx: Sending json message {"data":{},"type":"WELCOME"}
    2019:10:03-15:45:45 oscar red_server[22358]: xxxxxxxxxxxxxxx: command '{"data":{},"type":"CONFIG_REQ"}'
    2019:10:03-15:45:45 oscar red_server[22358]: xxxxxxxxxxxxxxx: Sending json message {"data":{"pin":"","fullbr_dns":"","split_networks":"1.2.3.4","lan2_vids":"","lan4_vids":"","local_networks":"","tunnel_id":2,"manual2_netmask":24,"asg_cert":"[removed]","manual_address":"0.0.0.0","bridge_proto":"none","unlock_code":"ocht5rc2","password":"","manual2_defgw":"0.0.0.0","prev_unlock_code":"ocht5rc2","manual_netmask":24,"lan3_vids":"","version_r2":"2005R2","mac_filter_type":"none","mac":"xx:xx:xx:xx:xx:xx","dial_string":"*99#","manual2_address":"0.0.0.0","version_ng_red50":"1-424-7131d4e52-0000000","manual_dns":"0.0.0.0","lan1_mode":"unused","username":"","activate_modem":0,"tunnel_compression_algorithm":"lzo","version_red50":"1-424-7131d4e52-0000000","fullbr_domains":"","htp_server":"66.xx.xx.xx","uplink_balancing":"failover","asg_key":"[removed]","type":"red15","deployment_mode":"online","uplink2_mode":"dhcp","version_red15":"1-424-7131d4e52-e9f0c31","manual2_...L1496
    2019:10:03-15:45:49 oscar red_server[22358]: id="4201" severity="info" sys="System" sub="RED" name="RED Tunnel Up" red_id="xxxxxxxxxxxxxxx" forced="0"
    2019:10:03-15:45:50 oscar red_server[22358]: xxxxxxxxxxxxxxx: command '{"data{"wan1_ip":"192.168.xxx.xxx","mobile_signal_strength":"","wan2_ip":"","uplink":"WAN1","uplink_state":"0"},"type":"STATUS"}'
    2019:10:03-15:45:50 oscar red2ctl[4629]: Overflow happened on reds2:0
    2019:10:03-15:45:50 oscar red2ctl[4629]: Missing keepalive from reds2:0, disabling peer 174.xxx.xxx.xxx
    2019:10:03-15:45:52 oscar red_server[4610]: SELF: (Re-)loading device configurations
    2019:10:03-15:45:53 oscar red2ctl[4629]: Received keepalive from reds2:0, enabling peer 174.xxx.xxx.xxx
    2019:10:03-15:45:57 oscar red_server[4610]: SELF: (Re-)loading device configurations 

Reply
  • Below is a copy of the entries in my RED logs that happened yesterday, when my six remote offices went down randomly throughout the day. The log entries were the same for each RED 15 device, but I've replaced any IP or MAC identifiers with dashes for security purposes.

    2019:10:03-14:49:39 oscar red_server[12883]: xxxxxxxxxxxxxxxxx: No ping for 30 seconds, exiting.
    2019:10:03-14:49:39 oscar red_server[12883]: id="4202" severity="info" sys="System" sub="RED" name="RED Tunnel Down" red_id="xxxxxxxxxxxxxxx" forced="0"
    2019:10:03-14:49:39 oscar red_server[12883]: xxxxxxxxxxxxxxx is disconnected.
    2019:10:03-14:49:39 oscar red_server[4610]: SELF: (Re-)loading device configurations
    2019:10:03-14:49:41 oscar red2ctl[4629]: Overflow happened on reds2:0
    2019:10:03-14:49:41 oscar red2ctl[4629]: Missing keepalive from reds2:0, disabling peer 174.xxx.xxx.xxx
    2019:10:03-14:49:44 oscar red2ctl[4629]: Received keepalive from reds2:0, enabling peer 174.xxx.xxx.xxx
    2019:10:03-15:05:25 oscar red_server[4610]: SELF: (Re-)loading device configurations
    2019:10:03-15:13:36 oscar red_server[793]: Allow TLS 1.2 only
    2019:10:03-15:13:43 oscar red_server[793]: SELF: Cannot do SSL handshake on socket accept from '174.xxx.xxx.xxx': SSL connect accept failed because of handshake problems SSL wants a read first
    2019:10:03-15:17:53 oscar red2ctl[4629]: Missing keepalive from reds2:0, disabling peer 174.xxx.xxx.xxx
    2019:10:03-15:17:56 oscar red2ctl[4629]: Received keepalive from reds2:0, enabling peer 174.xxx.xxx.xxx
    2019:10:03-15:45:40 oscar red_server[22351]: Allow TLS 1.2 only
    2019:10:03-15:45:40 oscar red_server[22351]: SELF: Cannot do SSL handshake on socket accept from '174.xxx.xxx.xxx': SSL connect accept failed because of handshake problems
    2019:10:03-15:45:42 oscar red_server[22358]: Allow TLS 1.2 only
    2019:10:03-15:45:42 oscar red_server[22358]: SELF: New connection from 174.xxx.xxx.xxx with ID --------------- (cipher AES256-GCM-SHA384), rev1
    2019:10:03-15:45:42 oscar red_server[22358]: xxxxxxxxxxxxxxx: connected OK, pushing config
    2019:10:03-15:45:43 oscar red_server[4610]: SELF: (Re-)loading device configurations
    2019:10:03-15:45:43 oscar red_server[22358]: xxxxxxxxxxxxxxx: command '{"data":{"version":"0"},"type":"INIT_CONNECTION"}'
    2019:10:03-15:45:43 oscar red_server[22358]: xxxxxxxxxxxxxxx: Initializing connection running protocol version 0
    2019:10:03-15:45:43 oscar red_server[22358]: xxxxxxxxxxxxxxx: Sending json message {"data":{},"type":"WELCOME"}
    2019:10:03-15:45:45 oscar red_server[22358]: xxxxxxxxxxxxxxx: command '{"data":{},"type":"CONFIG_REQ"}'
    2019:10:03-15:45:45 oscar red_server[22358]: xxxxxxxxxxxxxxx: Sending json message {"data":{"pin":"","fullbr_dns":"","split_networks":"1.2.3.4","lan2_vids":"","lan4_vids":"","local_networks":"","tunnel_id":2,"manual2_netmask":24,"asg_cert":"[removed]","manual_address":"0.0.0.0","bridge_proto":"none","unlock_code":"ocht5rc2","password":"","manual2_defgw":"0.0.0.0","prev_unlock_code":"ocht5rc2","manual_netmask":24,"lan3_vids":"","version_r2":"2005R2","mac_filter_type":"none","mac":"xx:xx:xx:xx:xx:xx","dial_string":"*99#","manual2_address":"0.0.0.0","version_ng_red50":"1-424-7131d4e52-0000000","manual_dns":"0.0.0.0","lan1_mode":"unused","username":"","activate_modem":0,"tunnel_compression_algorithm":"lzo","version_red50":"1-424-7131d4e52-0000000","fullbr_domains":"","htp_server":"66.xx.xx.xx","uplink_balancing":"failover","asg_key":"[removed]","type":"red15","deployment_mode":"online","uplink2_mode":"dhcp","version_red15":"1-424-7131d4e52-e9f0c31","manual2_...L1496
    2019:10:03-15:45:49 oscar red_server[22358]: id="4201" severity="info" sys="System" sub="RED" name="RED Tunnel Up" red_id="xxxxxxxxxxxxxxx" forced="0"
    2019:10:03-15:45:50 oscar red_server[22358]: xxxxxxxxxxxxxxx: command '{"data{"wan1_ip":"192.168.xxx.xxx","mobile_signal_strength":"","wan2_ip":"","uplink":"WAN1","uplink_state":"0"},"type":"STATUS"}'
    2019:10:03-15:45:50 oscar red2ctl[4629]: Overflow happened on reds2:0
    2019:10:03-15:45:50 oscar red2ctl[4629]: Missing keepalive from reds2:0, disabling peer 174.xxx.xxx.xxx
    2019:10:03-15:45:52 oscar red_server[4610]: SELF: (Re-)loading device configurations
    2019:10:03-15:45:53 oscar red2ctl[4629]: Received keepalive from reds2:0, enabling peer 174.xxx.xxx.xxx
    2019:10:03-15:45:57 oscar red_server[4610]: SELF: (Re-)loading device configurations 

Children