UTM 9.601 - RED issues!

Since upgrading all our customers to 9.601, a bigger part of them are complaining about RED's re/disconnection in a no-pattern way.

It started for all of them just the night we upgraded to 9.601, and they all are on different ISP's and located different places around the country.

Been with Sophos support for 2 hours today, and now they escalated it to higher grounds.

Will return with an update....

Suspicious entries in the log - but all connected REDs do this before connection:

2019:03:06-15:15:38 fw01-2 red_server[17509]: SELF: Cannot do SSL handshake on socket accept from 'xxx.xxx.xxx.xxx': SSL connect accept failed because of handshake problems

2019:03:06-15:15:46 fw01-2 red2ctl[12420]: Missing keepalive from reds3:0, disabling peer xxx.xxx.xxx.xxx

I know the last line is written before the tunnel disconnects, because there was no "PING/PONG" answer...

One customer has 2 x RD 50, one 1 100% stable and the other fluctuates in random intervals - we replaced this with a new RED 50, but the same thing occurs.

  • In reply to FloSupport:

    Hi FloSupport,

    thank you for followup on that. All of our RED are deployed in standard mode, so at least that is/was not the problem. In this thread one page before, some folks describe the sporadic disconnection problem too. So I'm not the only one who was affected by this.
    I'll try to get the information from elsewhere if that workaround is available. Fortunate here are some guys with a lab, I can't test everything in our production environment. It's a matter of fact, since the release of the unified firmware for the RED, there are some problems with the REDs. And not all of them are resolved today.

    Best Regards

    Alex

  • In reply to Alexander Busch:

    Did they get to the bottom of what caused this in the logs, as it was a cause of the disconnect

    2019:09:03-09:46:32 sophos-2 red_server[4626]: SELF: RED10rev1 fw version set to 14
    2019:09:03-09:46:32 sophos-2 red_server[4626]: SELF: RED10rev2 local fw version set to 5214R2
    2019:09:03-09:46:32 sophos-2 red_server[4626]: SELF: RED10rev2 fw version set to 2005R2
    2019:09:03-09:46:32 sophos-2 red_server[4626]: SELF: RED15(w) fw version set to 1-424-7131d4e52-e9f0c31
    2019:09:03-09:46:32 sophos-2 red_server[4626]: SELF: RED50 fw version set to 1-424-7131d4e52-0000000
    2019:09:03-09:46:32 sophos-2 red_server[4626]: SELF: IO::Socket::SSL Version: 1.953
    2019:09:03-09:46:32 sophos-2 red_server[4626]: SELF: Startup - waiting 15 seconds ...
    2019:09:03-09:46:32 sophos-2 red2ctl[4635]: Starting REDv2 control daemon
    2019:09:03-09:46:47 sophos-2 red_server[7747]: UPLOAD: Uploader process starting
    2019:09:03-09:46:47 sophos-2 red_server[4626]: SELF: (Re-)loading device configurations
    2019:09:03-09:46:48 sophos-2 red_server[4626]: A3502xxxxxxxxxx: New device
    2019:09:03-09:46:48 sophos-2 red_server[4626]: A3502xxxxxxxxxx: Staging config for upload
    2019:09:03-09:46:48 sophos-2 red_server[4626]: A350XXXXXXXXXXX: New device
    2019:09:03-09:46:48 sophos-2 red_server[4626]: A350XXXXXXXXXXX: Staging config for upload
    2019:09:03-09:46:48 sophos-2 red_server[7747]: [A3502xxxxxxxxxx] Config has not changed, no need to upload to registry service
    2019:09:03-09:46:48 sophos-2 red_server[7747]: [A350XXXXXXXXXXX] Config has not changed, no need to upload to registry service

  • In reply to FloSupport:

    Below is a copy of the entries in my RED logs that happened yesterday, when my six remote offices went down randomly throughout the day. The log entries were the same for each RED 15 device, but I've replaced any IP or MAC identifiers with dashes for security purposes.

    2019:10:03-14:49:39 oscar red_server[12883]: xxxxxxxxxxxxxxxxx: No ping for 30 seconds, exiting.
    2019:10:03-14:49:39 oscar red_server[12883]: id="4202" severity="info" sys="System" sub="RED" name="RED Tunnel Down" red_id="xxxxxxxxxxxxxxx" forced="0"
    2019:10:03-14:49:39 oscar red_server[12883]: xxxxxxxxxxxxxxx is disconnected.
    2019:10:03-14:49:39 oscar red_server[4610]: SELF: (Re-)loading device configurations
    2019:10:03-14:49:41 oscar red2ctl[4629]: Overflow happened on reds2:0
    2019:10:03-14:49:41 oscar red2ctl[4629]: Missing keepalive from reds2:0, disabling peer 174.xxx.xxx.xxx
    2019:10:03-14:49:44 oscar red2ctl[4629]: Received keepalive from reds2:0, enabling peer 174.xxx.xxx.xxx
    2019:10:03-15:05:25 oscar red_server[4610]: SELF: (Re-)loading device configurations
    2019:10:03-15:13:36 oscar red_server[793]: Allow TLS 1.2 only
    2019:10:03-15:13:43 oscar red_server[793]: SELF: Cannot do SSL handshake on socket accept from '174.xxx.xxx.xxx': SSL connect accept failed because of handshake problems SSL wants a read first
    2019:10:03-15:17:53 oscar red2ctl[4629]: Missing keepalive from reds2:0, disabling peer 174.xxx.xxx.xxx
    2019:10:03-15:17:56 oscar red2ctl[4629]: Received keepalive from reds2:0, enabling peer 174.xxx.xxx.xxx
    2019:10:03-15:45:40 oscar red_server[22351]: Allow TLS 1.2 only
    2019:10:03-15:45:40 oscar red_server[22351]: SELF: Cannot do SSL handshake on socket accept from '174.xxx.xxx.xxx': SSL connect accept failed because of handshake problems
    2019:10:03-15:45:42 oscar red_server[22358]: Allow TLS 1.2 only
    2019:10:03-15:45:42 oscar red_server[22358]: SELF: New connection from 174.xxx.xxx.xxx with ID --------------- (cipher AES256-GCM-SHA384), rev1
    2019:10:03-15:45:42 oscar red_server[22358]: xxxxxxxxxxxxxxx: connected OK, pushing config
    2019:10:03-15:45:43 oscar red_server[4610]: SELF: (Re-)loading device configurations
    2019:10:03-15:45:43 oscar red_server[22358]: xxxxxxxxxxxxxxx: command '{"data":{"version":"0"},"type":"INIT_CONNECTION"}'
    2019:10:03-15:45:43 oscar red_server[22358]: xxxxxxxxxxxxxxx: Initializing connection running protocol version 0
    2019:10:03-15:45:43 oscar red_server[22358]: xxxxxxxxxxxxxxx: Sending json message {"data":{},"type":"WELCOME"}
    2019:10:03-15:45:45 oscar red_server[22358]: xxxxxxxxxxxxxxx: command '{"data":{},"type":"CONFIG_REQ"}'
    2019:10:03-15:45:45 oscar red_server[22358]: xxxxxxxxxxxxxxx: Sending json message {"data":{"pin":"","fullbr_dns":"","split_networks":"1.2.3.4","lan2_vids":"","lan4_vids":"","local_networks":"","tunnel_id":2,"manual2_netmask":24,"asg_cert":"[removed]","manual_address":"0.0.0.0","bridge_proto":"none","unlock_code":"ocht5rc2","password":"","manual2_defgw":"0.0.0.0","prev_unlock_code":"ocht5rc2","manual_netmask":24,"lan3_vids":"","version_r2":"2005R2","mac_filter_type":"none","mac":"xx:xx:xx:xx:xx:xx","dial_string":"*99#","manual2_address":"0.0.0.0","version_ng_red50":"1-424-7131d4e52-0000000","manual_dns":"0.0.0.0","lan1_mode":"unused","username":"","activate_modem":0,"tunnel_compression_algorithm":"lzo","version_red50":"1-424-7131d4e52-0000000","fullbr_domains":"","htp_server":"66.xx.xx.xx","uplink_balancing":"failover","asg_key":"[removed]","type":"red15","deployment_mode":"online","uplink2_mode":"dhcp","version_red15":"1-424-7131d4e52-e9f0c31","manual2_...L1496
    2019:10:03-15:45:49 oscar red_server[22358]: id="4201" severity="info" sys="System" sub="RED" name="RED Tunnel Up" red_id="xxxxxxxxxxxxxxx" forced="0"
    2019:10:03-15:45:50 oscar red_server[22358]: xxxxxxxxxxxxxxx: command '{"data{"wan1_ip":"192.168.xxx.xxx","mobile_signal_strength":"","wan2_ip":"","uplink":"WAN1","uplink_state":"0"},"type":"STATUS"}'
    2019:10:03-15:45:50 oscar red2ctl[4629]: Overflow happened on reds2:0
    2019:10:03-15:45:50 oscar red2ctl[4629]: Missing keepalive from reds2:0, disabling peer 174.xxx.xxx.xxx
    2019:10:03-15:45:52 oscar red_server[4610]: SELF: (Re-)loading device configurations
    2019:10:03-15:45:53 oscar red2ctl[4629]: Received keepalive from reds2:0, enabling peer 174.xxx.xxx.xxx
    2019:10:03-15:45:57 oscar red_server[4610]: SELF: (Re-)loading device configurations 

  • In reply to Brian Stilts:

    Hi Brian and welcome to the UTM Community!

    Rather than dashes, obfuscate IPs like 84.XX.YY.121, 10.X.Y.100, 192.168.X.200 and 172.2X.Y.51.  That lets us see immediately which IPs are local and which are identical.

    Cheers - Bob

  • In reply to BAlfson:

    Updated my previous post to include IP obfuscation as specified.

  • In reply to Brian Stilts:

    I already contacted the Sophos support, nevertheless I wanted to share my observation here as well.

    Our customer uses the RED 15 in "Standard/Split" mode, but the WAN IP overlapping is also not the problem in his case. For some weeks now, the RED 15 (firmware version 9.605-1) that connects the branch office with the SG 210 in the head office of our customer has random disconnects and the VPN tunnel goes down. This doesn't happen every day, the RED can even run 14 days without any trouble, but suddenly out of nowhere, the RED loses the connection and remains offline for 30 - 60 minutes. Although it helps to deactivate and re-activate the RED's interface in the UTM Admin Panel, this is not always an option because this can only be done from another location and not from the branch office itself when the internet connection is lost. I already reduced the MTU to 1400, but it was unsuccessful.

    Today, the problem occurred again. The RED was offline from 8:16 AM to 9:12 AM. Here is the relevant passage from the RED log:

    2019:10:10-08:15:45 vpn red_server[11001]: A3602XXXXXXXXXX: command '{"data":{"seq":42112},"type":"PING"}'
    2019:10:10-08:15:45 vpn red_server[11001]: A3602XXXXXXXXXX: Sending json message {"data":{"seq":42112},"type":"PONG"}
    2019:10:10-08:16:16 vpn red_server[11001]: A3602XXXXXXXXXX: No ping for 30 seconds, exiting.
    2019:10:10-08:16:16 vpn red_server[11001]: id="4202" severity="info" sys="System" sub="RED" name="RED Tunnel Down" red_id="A3602XXXXXXXXXX" forced="0"
    2019:10:10-08:16:16 vpn red_server[11001]: A3602XXXXXXXXXX is disconnected.
    2019:10:10-08:16:16 vpn red_server[4647]: SELF: (Re-)loading device configurations
    2019:10:10-08:16:18 vpn red2ctl[4659]: Overflow happened on reds4:0
    2019:10:10-08:16:18 vpn red2ctl[4659]: Missing keepalive from reds4:0, disabling peer 37.24.xxx.xxx
    2019:10:10-08:16:21 vpn red2ctl[4659]: Received keepalive from reds4:0, enabling peer 37.24.xxx.xxx
    2019:10:10-09:11:51 vpn red_server[20876]: SELF: Cannot do SSL handshake on socket accept from '37.24.xxx.xxx': SSL connect accept failed because of handshake problems
    2019:10:10-09:11:51 vpn red_server[20877]: SELF: Cannot do SSL handshake on socket accept from '37.24.xxx.xxx': SSL connect accept failed because of handshake problems
    2019:10:10-09:11:54 vpn red_server[20882]: SELF: New connection from 37.24.xxx.xxx with ID A3602XXXXXXXXXX (cipher AES256-GCM-SHA384), rev1<30>Oct 10 09:11:54 red_server[20882]: A3602XXXXXXXXXX: connected OK, pushing config
    2019:10:10-09:11:56 vpn red_server[20882]: A3602XXXXXXXXXX: command '{"data":{"version":"0"},"type":"INIT_CONNECTION"}'
    2019:10:10-09:11:56 vpn red_server[20882]: A3602XXXXXXXXXX: Initializing connection running protocol version 0
    2019:10:10-09:11:56 vpn red_server[20882]: A3602XXXXXXXXXX: Sending json message {"data":{},"type":"WELCOME"}
    2019:10:10-09:11:57 vpn red_server[20882]: A3602XXXXXXXXXX: command '{"data":{},"type":"CONFIG_REQ"}'
    2019:10:10-09:11:57 vpn red_server[20882]: A3602XXXXXXXXXX: Sending json message {"data":{"pin":"","fullbr_dns":"","split_networks":"192.168.48.0/24 192.168.1.0/24 1.2.3.4", ...}
    2019:10:10-09:12:02 vpn red_server[20882]: A3602XXXXXXXXXX: command '{"data":{"key1":"R645ggLTzrxwXcapf27r7C+UMOexSoJpTjKCAUmmsCE=","key0":"4onPa3XPBDXHQpWtyJ41eTOH+UQDXTZm3Wpm4HPfc\/k=","key_active":0},"type":"SET_KEY_REQ"}'
    2019:10:10-09:12:02 vpn red_server[20882]: A3602XXXXXXXXXX: Sending json message {"data":{},"type":"SET_KEY_REP"}
    2019:10:10-09:12:03 vpn red2ctl[4659]: Overflow happened on reds4:0
    2019:10:10-09:12:03 vpn red2ctl[4659]: Missing keepalive from reds4:0, disabling peer 37.24.xxx.xxx
    2019:10:10-09:12:03 vpn red_server[20882]: A3602XXXXXXXXXX: command '{"data":{"seq":0},"type":"PING"}'
    2019:10:10-09:12:03 vpn red_server[20882]: id="4201" severity="info" sys="System" sub="RED" name="RED Tunnel Up" red_id="A3602XXXXXXXXXX" forced="0"
    2019:10:10-09:12:03 vpn red_server[20882]: A3602XXXXXXXXXX: Sending json message {"data":{"seq":0},"type":"PONG"}
    2019:10:10-09:12:04 vpn red_server[20882]: A3602XXXXXXXXXX: command '{"data":{"wan1_ip":"192.168.178.21","mobile_signal_strength":"","wan2_ip":"","uplink":"WAN1","uplink_state":"0"},"type":"STATUS"}'
    2019:10:10-09:12:06 vpn red2ctl[4659]: Received keepalive from reds4:0, enabling peer 37.24.xxx.xxx
    2019:10:10-09:12:09 vpn red_server[4647]: SELF: (Re-)loading device configurations
    2019:10:10-09:12:19 vpn red_server[20882]: A3602XXXXXXXXXX: command '{"data":{"seq":1},"type":"PING"}'
    2019:10:10-09:12:19 vpn red_server[20882]: A3602XXXXXXXXXX: Sending json message {"data":{"seq":1},"type":"PONG"}

    We have had a lot of problems with the RED in recent weeks and our customer is already very angry because every time the RED is down, his employees are unable to work. This means high costs and unproductivity for our customer and a lot of frustration for the people sitting in the branch office, because they can't do anything in this time.

    Regards

    Stefan

  • In reply to Stefan Gierke:

    Hallo Stefan and welcome to the UTM Community!

    Did you read through earlier posts in this thread?  Have you tried the following?

    cc set red use_unified_firmware 0

    Cheers - Bob

  • In reply to BAlfson:

    Hello Bob, thank you very much for your reply!

    Yes, I read through the entire thread today, but I have not tried the command so far because the breakdown of the VPN connection only happens sporadically in our case (sometimes only once in two weeks). But I will try to set the unified firmware value to 0 tomorrow and then see how the RED will behave in the coming days. I will let you know whether the disconnect occurs again in the next week.

    Thanks again and best regards

    Stefan

  • Hi All,

    Since the advisory came out, I have been having issues with a RED50, which was replaced under RMA (so now I have 2 units).

    The first issue was an inability to configured itself to use FQDN (and work), it would only configure & work with a Public IP address.

    The second issue I had was that I was unable to ping/communicate with the RED50 or any device beyond the RED50.

    Basically the RED50 firmware was being an unruly teenager.

    I spoke with Support who were initially very good (UK side) and said they would escalate to their 2nd (or is it third line), then support fell flat as the support section, finally I was assigned to one of the techs on East Coast USA, we exchanged emails for sometimes, and had one phone call with them, as time difference was an issue.

    • At no point was I informed of the Advisory (https://community.sophos.com/kb/en-us/134398) I had to find it on here (this post I think).
    • I also found out about the FQDN issue, which I did some testing in-house on the 'faulty' unit.
    • This issue does not happen on the XG (I performed some testing with my own XG which I then realised the 'faulty' unit was not faulty).

    This does have rings of QC/QA not performing, for the SG UTM software(similar to the Microsoft Windows updates test dept. which is a shadow of it's former self).

    The problem I had existed on 9.602 & 9.605 (Virtual & Hardware based units), it was only when 9.7 came out did I test further and can confirm that all my issues were fixed.

    Although I did notice that after I ran "cc set red use_unified_firmware 1", on initial reboot it didn't work as it should (stating it was unable to configure itself), physically switching it off (using the power cable) fixed the issue.

     - Good news - my customer (who bought this unit just prior to the advisory) can now use the RED50 (at last).

     

     

  • In reply to Argo:

    You must be sure to test some further :-)

    We have now exchanged (RMA'ed) 26 RED 50 devices, some of them just broke after a month, and they where all running 9.605-1, which supposedly, should have fixed it. 9.7 "just" came out, but be carefull as 9.605-1 shoud also have "fixed" the RED-50-IS-NOW-BRICKED-BUG, but did not, give it a month with 9.7 and let's see if anything is fixed :-)

    Also this very morning, a RED 50 just crashed, and was showing "Booting..." and never came any further :-(

    Just for the note, when you get a RED 50 RMA, you now receive this with the new RMA device:

  • In reply to twister5800:

    regarding 9.7   

     

    https://community.sophos.com/kb/en-us/134717

     

     

    Sophos is investigating reports from some customers experiencing RED site-to-site tunnel issues after upgrading from v9.605 to v9.7.

  • In reply to neildonaldson:

    neildonaldson

    regarding 9.7   

     

    https://community.sophos.com/kb/en-us/134717

     

     

    Sophos is investigating reports from some customers experiencing RED site-to-site tunnel issues after upgrading from v9.605 to v9.7.

     

     

    yes that was with 9.700-4, it's fixed in 9.700-5 :-)

     

  • In reply to twister5800:

    When did 9.7 come out?? The UTM I'm looking at right now still says 

    Firmware version:   9.605-1

    and no updates are available. To add insult to injury - one of the Red 15wi tunnels to a site office has just gone down again (despite the disabling of the Unified Firmware). Seems to only be good for a max of two weeks and I have to send a tech back out ... kinda glad I don't have 50 (or more) of these like other blokes. Starting to re-think the entire network infrastructure at this point. Having this drag on for months is ridiculous.

  • In reply to Dread:

    Dread

    When did 9.7 come out?? The UTM I'm looking at right now still says 

    Firmware version:   9.605-1

    and no updates are available. ...

     

     

    The release will be rolled out in phases.

    • In phase 1 you can download the update package from the download area.
    • In phase 2 we will make it available via our Up2Date servers in several stages.
    • In phase 3 we will make it available via our Up2Date servers to all remaining installations.

    So, I think Sophos is still in phase 1. See https://community.sophos.com/products/unified-threat-management/b/blog/posts/utm-up2date-9-700-released for the download links.

    Best regards

    Alex

  • In reply to Alexander Busch:

    I am glad this forum is here!

    As I have an open ticket with support and have done since the original advisory came out, I would have expected Sophos to tell me about the updates (both of them)!

    I have now replied to the emails asking for more information.