This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

UTM 9.601 - RED issues!

Since upgrading all our customers to 9.601, a bigger part of them are complaining about RED's re/disconnection in a no-pattern way.

It started for all of them just the night we upgraded to 9.601, and they all are on different ISP's and located different places around the country.

Been with Sophos support for 2 hours today, and now they escalated it to higher grounds.

Will return with an update....

Suspicious entries in the log - but all connected REDs do this before connection:

2019:03:06-15:15:38 fw01-2 red_server[17509]: SELF: Cannot do SSL handshake on socket accept from 'xxx.xxx.xxx.xxx': SSL connect accept failed because of handshake problems

2019:03:06-15:15:46 fw01-2 red2ctl[12420]: Missing keepalive from reds3:0, disabling peer xxx.xxx.xxx.xxx

I know the last line is written before the tunnel disconnects, because there was no "PING/PONG" answer...

One customer has 2 x RD 50, one 1 100% stable and the other fluctuates in random intervals - we replaced this with a new RED 50, but the same thing occurs.



This thread was automatically locked due to age.
Parents
  • Same issues here after 9.601-5 UTM update. 2x RED50 Rev 1. Drop multiple ISPs at varying intervals and lengths. It was advised to re-create RED in UTM. I have performed this, but problems still persist. I was sent two replacement RED50. The first one has been replaced, a new config created, but problem persists. ISPs modems have been replaced although they were reluctant to do so. One of the REDs wont recognize the presence of ISP on WAN1 at all.

    We are losing a lot of productivity and business. We do a sizeable portion of our business via teleconferencing.

    Support Tickets#

    8710435

    8707203

    8707207

     

    The tech alluded to a potential issue with REDs after the update to 9.6.01-5.

  • My problem is resolved. There is a known issue related to unified firmware.

    from su -

    cc get red use_unified_firmware

    if value returned = 1

    cc set red use_unified_firmware 0

    reds will update and reboot

    confirm value is 0 rerunning get command above

     

    NOT A PERMANENT FIX. The issue needs to be addressed in Sophos UTM firmware permanently.

  •  can correct me with more information, but as far as i know, you need to set this CC switch on the master and it should be synced to the other Appliance. (like all CC switches). 

    Also this switch should be "off" after a Firmware Update. But you should check it after the firmware Update. 

     

    Sophos slowly increased the switch in the last updates.

    Currently, the unified firmware will only be applied to installations with fewer than 20 RED devices configured.

    __________________________________________________________________________________________________________________

  • LuCar Toni said:

     can correct me with more information, but as far as i know, you need to set this CC switch on the master and it should be synced to the other Appliance. (like all CC switches). 

    Also this switch should be "off" after a Firmware Update. But you should check it after the firmware Update. 

    Currently, the unified firmware will only be applied to installations with fewer than 20 RED devices configured.

     

     

    Hi all,

     

    UTM 9.601 to 9.603 will ENABLE unified firmware from my experience, we had a dusin VERY angry customers :-(

     

     and  20 is WAY too big a number, we have 6 devices in Germany, and we are located in Denmark, can you even think of the problems we where facing? And first of all, Sophos did not recognize this as an firmware issue, but the mere "What have you done wrong?", "what is your licensing number?", "Do you have power to the device?"

    Please reduce the number to 3, as if even one device fails this, it should have stopped immediately!

     

    Sorry for writing in big letters, but you should know how many problems this has caused us = money (Which I'm not aware of, as I am not the boss!=) ;)

    -----

    Best regards
    Martin

    Sophos XGS 2100 @ Home | Sophos v20 Architect

  • I just got this from Sophos Support:

    1. Are any other REDs in danger of being bricked, or is it just the RED 50?
    - So far we have only seen RED 50 but this doesn't rule out RED 15
    2. In High Availability, does use_unified_firmware need to be set to 0 on all nodes?
    - No it would replicate the command
    3. Instead of physically unplugging REDs, would it suffice to disable the RED server objects in WebAdmin before applying the Up2Date and then enable them after 9.604 has had use_unified_firmware set to 0?
    - In theory yes if you disable the service in the UTM or turn off the RED devices, then they won't be able to get the Firmware Update, so they won't be able to contact the server, and once you re-enable the services they will not search for a new firmware update

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • __________________________________________________________________________________________________________________

  • In 9.605 Issues Resolved:

    "NUTM-10962 [RED] Fix for RED50 does not start up after firmware update for most scenarios"

     

    Can Sophos (or anyone else) comment as to whether the "CC SET RED USE_UNIFIED_FIRMWARE 0" is still required with this update and in future? Has the unified firmware bug for the RED50 truly been squashed?

    Cheers,

    Garth

  • Hi All,


    The new unified RED firmware included in UTM 9.605 includes a fix for the issue which some of you have reported when upgrading the firmware on the RED 50.


    However, you should be aware that there is still the possibility that the issue will occur during the update to 9.605 if the RED 50 has the older firmware installed. This will not occur with further firmware updates to your system in the future and unfortunately, can also not be bypassed by using the previous workaround of disabling the unified firmware. This is due to the fact that the issue is within the firmware update process of the old firmware.


    During our tests, we were only able to reproduce this issue with RED 50 devices which are under significant load. As this is a race condition issue, we cannot guarantee that you will not run into this issue again if you have experienced it before, nor can we predict, if it will occur in any particular scenario. However, the issue is certainly less likely to occur in scenarios where the RED 50 is not under load during the update process, and so, if possible, we would advise that you disconnect the local network behind the RED during the update.


    We apologize for the inconvenience this issue has caused.


    Jan

  • garth1138 said:

    In 9.605 Issues Resolved:

    "NUTM-10962 [RED] Fix for RED50 does not start up after firmware update for most scenarios"

     

    Can Sophos (or anyone else) comment as to whether the "CC SET RED USE_UNIFIED_FIRMWARE 0" is still required with this update and in future? Has the unified firmware bug for the RED50 truly been squashed?

    Cheers,

    Garth

     

     

    yes for us it is still required with the RED50!

  • Jan, this is confusing.  Are you saying that the RED 50s will get bricked after Up2Dating to 9.605 no matter what precaution one takes what workaround one uses to set use_unified_firmware to 0 before the REDs are brought back online?

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • Hi there!

    We made an Update to 9.604-2 with our both SG125.

    Our 5 RED15-Devices going offline. 

    The Workaround helps to 4 came back online.

    Bit 1 Device wil not!!! That is business mission critical!

    I set the MTU of the interface to 1400. DONT help.

    I delete the RED an re-create. DONT help.

    I see a permanent Loop in the Log:

    2019:07:30-07:56:55 rkdfw001-1 red_server[19875]: SELF: New connection from X.X.X.X with ID XXXXXXXXXX (cipher AES256-GCM-SHA384), rev1
    2019:07:30-07:56:55 rkdfw001-1 red_server[19875]: XXXXXXXXXX: connected OK, pushing config
    2019:07:30-07:56:56 rkdfw001-1 red_server[19875]: XXXXXXXXXX: command '{"data":{"version":"0"},"type":"INIT_CONNECTION"}'
    2019:07:30-07:56:56 rkdfw001-1 red_server[19875]: XXXXXXXXXX: Initializing connection running protocol version 0
    2019:07:30-07:56:56 rkdfw001-1 red_server[19875]: XXXXXXXXXX: Sending json message {"data":{},"type":"WELCOME"}
    2019:07:30-07:56:57 rkdfw001-1 red_server[19875]: XXXXXXXXXX: command '{"data":{},"type":"CONFIG_REQ"}'
    2019:07:30-07:56:57 rkdfw001-1 red_server[19875]: XXXXXXXXXX: Sending json message {"data":{"pin":"","fullbr_dns":"","split_networks":"1.2.3.4","lan2_vids":"","lan4_vids":"","local_networks":"","tunnel_id":2,"manual2_netmask":24,"asg_cert":"[removed]","manual_address":"0.0.0.0","bridge_proto":"none","unlock_code":"xxxxxxx","password":"","manual2_defgw":"0.0.0.0","prev_unlock_code":"rlvqiegl","manual_netmask":24,"lan3_vids":"","version_r2":"2005R2","mac_filter_type":"none","mac":"xxx","dial_string":"*99#","manual2_address":"0.0.0.0","version_ng_red50":"5211","manual_dns":"0.0.0.0","lan1_mode":"unused","username":"","activate_modem":"0","tunnel_compression_algorithm":"lzo","version_red50":"5211","fullbr_domains":"","uplink_balancing":"failover","asg_key":"[removed]","type":"red15","deployment_mode":"online","uplink2_mode":"dhcp","version_red15":"5211","manual2_dns":"0.0.0.0","lan2_mode":"unused","debug_level":0,"local_networks_target":"","fai...L1421
    2019:07:30-07:56:59 rkdfw001-1 red_server[19875]: XXXXXXXXXX: command '{"data":{"device":"RED15","type":"tar.gz","version":"5211"},"type":"FW_FILE_REQ"}'
    2019:07:30-07:56:59 rkdfw001-1 red_server[19875]: XXXXXXXXXX: Requesting firmware file /usr/share/red-firmware//red15-v5211.tar.gz
    2019:07:30-07:57:13 rkdfw001-1 red_server[19875]: XXXXXXXXXX: Sending json message {"data":{},"type":"FW_FILE_FIN"}
    2019:07:30-07:57:13 rkdfw001-1 red_server[19875]: XXXXXXXXXX: command '{"data":{"device":"RED15","type":"md5sum","version":"5211"},"type":"FW_FILE_REQ"}'
    2019:07:30-07:57:13 rkdfw001-1 red_server[19875]: XXXXXXXXXX: Requesting firmware file /usr/share/red-firmware//red15-v5211.md5sum
    2019:07:30-07:57:13 rkdfw001-1 red_server[19875]: XXXXXXXXXX: Sending json message {"data":{},"type":"FW_FILE_FIN"}
    2019:07:30-07:57:15 rkdfw001-1 red_server[19875]: XXXXXXXXXX: command '{"data":{"message":"Successfully downloaded firmware version 5211 from UTM"},"type":"DISCONNECT"}'
    2019:07:30-07:57:15 rkdfw001-1 red_server[19875]: XXXXXXXXXX: Disconnecting: Successfully downloaded firmware version 5211 from UTM
    2019:07:30-07:57:15 rkdfw001-1 red_server[19875]: id="4202" severity="info" sys="System" sub="RED" name="RED Tunnel Down" red_id="XXXXXXXXXX" forced="1"
    2019:07:30-07:57:15 rkdfw001-1 red_server[19875]: XXXXXXXXXX is disconnected.

    How can i fix this??

    Best Regards!

    Phill

  • Hi BAlfson, All,

    That is not entirely what I am saying. The issue we have within the RED50 is that the old firmware is having issues when applying the new firmware, hence switching off the unified firmware might not prevent this from happening in case the RED50 has the unified firmware installed on it, as this might cause the issue to happen when applying the not unified firmware. The same unfortunately can happen with the new MR5 firmware as well, as it will be applied by the old faulty firmware.

    What we have found in our testing is that the issue is likely only to occur in cases when the RED50 is under load when the firmware update is being started, hence the recommendation to put the network behind the RED50 offline for the update. This does not mean taking the RED50 itself offline, it needs to stay online to get the new firmware.

    Ones the RED50 has the MR5 firmware installed, the issue is fixed and following firmware updates will not require these steps.

    Hope this provides a little more background.

     

    Jan

Reply
  • Hi BAlfson, All,

    That is not entirely what I am saying. The issue we have within the RED50 is that the old firmware is having issues when applying the new firmware, hence switching off the unified firmware might not prevent this from happening in case the RED50 has the unified firmware installed on it, as this might cause the issue to happen when applying the not unified firmware. The same unfortunately can happen with the new MR5 firmware as well, as it will be applied by the old faulty firmware.

    What we have found in our testing is that the issue is likely only to occur in cases when the RED50 is under load when the firmware update is being started, hence the recommendation to put the network behind the RED50 offline for the update. This does not mean taking the RED50 itself offline, it needs to stay online to get the new firmware.

    Ones the RED50 has the MR5 firmware installed, the issue is fixed and following firmware updates will not require these steps.

    Hope this provides a little more background.

     

    Jan

Children
  • Still not comfortable, Jan...

    What is MR5 firmware?  Does this also fix the same problem with the RED 15s?  Many RED 50s and RED 15s were knocked offline and some RED 50s were bricked.    What is "the issue" or are these the same issue?

    How does one "put the network behind the RED50 offline for the update" and where is that recommendation to be seen?

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • We have 3 RED50s deployed in other states. There is no IT staff at the remote sites. If these things go down, we have no Email, CRM. telephones, nothing. I've been holding off on updates due to this very problem.

    What's my guaranteed, iron-clad way of updating without no unforeseen downtime?

     

  • Hi All,

    2019-08-06 See my final version posted today

    UPDATED 2019-08-01

    I've had several messages back and forth with Sophos folks.  As Jan Weber says in a post, 9.605 fixes the problem with REDs and the only danger is updating the RED firmware when the RED is under a heavy load.  I have suggested that the following instructions be added to the information about the Up2Date (I in blue dot) and the blog post about the 9.605 Up2Date:

    In order to ensure that there's no problem with the update of firmware in RED devices, do the following with two planned outages:

     1. Outage 1 - Up2Date to 9.604:
         A. In WebAdmin, disable all RED Servers for RED appliances.
         B. Apply Up2Dates through 9.604.
         C. At the command line: cc set red use_unified_firmware 0
         D. In WebAdmin, enable all RED Servers for RED appliances.
     2. Outage 2 -
    Disconnect all LAN connections from all REDs, leaving the RED online but with no connection to local clients.
     3. Apply the 9.605 Up2Date.
     4. After the Up2Date is complete, reconnect disconnected LAN cables to the REDs.

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • Hello everyone,

     

    one question about getting the heavy load from the RED: most of our REDs are in branch offices (what the RED is build for, I would think) without any IT personell there, and a few hundret miles between us and them. Normaly, we are making the Sophos updates in timeframes where these offices are empty and there should be no load at all, but to be sure we want of course to disable all connections.

    The question: would it help to disable the interfaces of the RED on the sophos side without disableing the RED itself? Or would this still produce "malicious" workload on the RED side?

     

    Thanks!

     

    Tobias

  • thanks to Jan for the information.

    but concretely what we have do to for reuse the RED50s that can no longer connect?

    Our head office is in Italy and we have a RED50 in turkey that, after updating to 9.6, is unusable: the device always remains in BOOTING, then the error led turns on and then restarts.

    The RED50 is not under maintenance contract: how should I proceed?

    Thanks in advance.

     

    Fabio 

  • Hi Tobias,

    disabling the RED interfaces will not help, in case there is no traffic from the branch through the RED because of time of day, that could already provide the 'less load' scenario. However to be sure it is advisable to disconnect all LAN ports on the RED itself.

    Jan

  • Hi Fabio,

    The only option to recover a RED50 that is in this state is via an RMA with support.

    Jan

  • Hi All,

    small correction to this. Step 1 is only necessary in case you are not already on UTM 9.6, in case you are already on 9.6 Step 2 is the only one you need to make. When you are not on 9.6 already you can also merge steps 1 and 2 going straight to 9.605.

    Instructions coming from 9.5x:

    A. In WebAdmin, disable all RED Servers for RED appliances
    B. Apply Up2Dates through 9.605
    C. At the command line: cc set red use_unified_firmware 0
    D. In WebAdmin, enable all RED Servers for RED appliances

    Instructions when on 9.6x:

    A. Disconnect all LAN connections from all REDs, leaving the RED online but with no connection to local clients
    B. Apply Up2Dates through 9.605
    C. Reconnect all disconnected LAN cables to the REDs

     

    When changing the cc setting for the unified firmware and enabling the RED Servers again the REDs will apply the not unified firmware and hence might run into the issue, if you want to switch from the unified firmware please do this post-installation of 9.605 or by also disconnecting the LAN connections of the RED prior to applying the switch. The issue can only occur during the old firmware applying another firmware image, this is independent of being the unified firmware or not but is only depending on the firmware that is currently running on the RED.

     

    Jan

  • Hi Jan,

     

    but is possible to start an RMA procedure without a maintenance contract?

     

    thanks

     

    fabio

  • Fabio Giacobbe said:

    Hi Jan,

     

    but is possible to start an RMA procedure without a maintenance contract?

     

    thanks

     

    fabio

     

    When having license for reds (network protection), you should be covered ;-)

    -----

    Best regards
    Martin

    Sophos XGS 2100 @ Home | Sophos v20 Architect