This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

RED50 - We lose our connection sporadically - what shall we do? 'overflow, missing keepalive, self re-loading'

Hey Community,

since two weeks we have an unsteady connection - a few times a day our connection to UTM gets dropped. We don't really have some specific times or doings when it happens.

All we can see is, that a new connection is requested, the old gets released, disconnected and then gets connected again. The action is followed by a overflow and a missing keepalive on reds1.
Next we get a keepalive, and everything seems fine again.

Below are two logs: first one was a 1-sec-disconnect;
second one (below underline) was about a minute.

We really need some advice, help, tips or tricks.

UTM is 9.403-4

2016:08:05-17:17:54 astaro red_server[18646]: SELF: New connection from 123.123.123.123 with ID A12312312312312 (cipher AES256-GCM-SHA384), rev1
2016:08:05-17:17:55 astaro red_server[18646]: A12312312312312: already connected, releasing old connection.
2016:08:05-17:17:56 astaro red_server[29422]: id="4202" severity="info" sys="System" sub="RED" name="RED Tunnel Down" red_id="A12312312312312" forced="1"
2016:08:05-17:17:56 astaro red_server[29422]: A12312312312312 is disconnected.
2016:08:05-17:18:01 astaro red_server[18646]: A12312312312312: connected OK, pushing config
2016:08:05-17:17:58 astaro red2ctl[4301]: Overflow happened on reds1:0
2016:08:05-17:17:59 astaro red2ctl[4301]: Missing keepalive from reds1:0, disabling peer 123.123.123.123
2016:08:05-17:18:02 astaro red2ctl[4301]: Received keepalive from reds1:0, enabling peer 123.123.123.123
2016:08:05-17:18:04 astaro red_server[18646]: A12312312312312: command 'UMTS_STATUS value=OK'
2016:08:05-17:18:04 astaro red_server[18646]: A12312312312312: command 'PORTSTATE 1E04,1004,1004,1004,1E04'
2016:08:05-17:18:04 astaro red_server[18646]: A12312312312312: PORTSTATE LAN1: 1Gb/s,LAN2: Down,LAN3: Down,LAN4: Down
2016:08:05-17:18:06 astaro red_server[18646]: A12312312312312: command 'PING 0 uplink=WAN uplinkstate=0'
2016:08:05-17:18:06 astaro red_server[18646]: id="4201" severity="info" sys="System" sub="RED" name="RED Tunnel Up" red_id="A12312312312312" forced="0"
2016:08:05-17:18:06 astaro red_server[18646]: A12312312312312: PING remote_tx=0 local_rx=0 diff=0
2016:08:05-17:18:06 astaro red_server[18646]: A12312312312312: PONG local_tx=0
2016:08:05-17:18:07 astaro red_server[4291]: SELF: (Re-)loading device configurations
2016:08:05-17:18:20 astaro red_server[18646]: A12312312312312: command 'PORTSTATE 1E04,1004,1004,1004,1E04'
2016:08:05-17:18:20 astaro red_server[18646]: A12312312312312: PORTSTATE LAN1: 1Gb/s,LAN2: Down,LAN3: Down,LAN4: Down
2016:08:05-17:18:21 astaro red_server[18646]: A12312312312312: command 'PING 0 uplink=WAN uplinkstate=0'
2016:08:05-17:18:21 astaro red_server[18646]: A12312312312312: PING remote_tx=0 local_rx=0 diff=0
2016:08:05-17:18:21 astaro red_server[18646]: A12312312312312: PONG local_tx=0
2016:08:05-17:18:34 astaro red_server[18646]: A12312312312312: command 'PORTSTATE 1E04,1004,1004,1004,1E04'
2016:08:05-17:18:34 astaro red_server[18646]: A12312312312312: PORTSTATE LAN1: 1Gb/s,LAN2: Down,LAN3: Down,LAN4: Down
2016:08:05-17:18:34 astaro red_server[18646]: A12312312312312: command 'PING 0 uplink=WAN uplinkstate=0'
2016:08:05-17:18:34 astaro red_server[18646]: A12312312312312: PING remote_tx=0 local_rx=0 diff=0
2016:08:05-17:18:34 astaro red_server[18646]: A12312312312312: PONG local_tx=0

______________________________________________________________________________

2016:08:05-18:05:18 astaro red_server[27294]: SELF: New connection from 123.123.123.123 with ID A12312312312312 (cipher AES256-GCM-SHA384), rev1
2016:08:05-18:05:33 astaro red_server[27294]: A12312312312312: already connected, releasing old connection.
2016:08:05-18:05:35 astaro red_server[18646]: id="4202" severity="info" sys="System" sub="RED" name="RED Tunnel Down" red_id="A12312312312312" forced="1"
2016:08:05-18:05:35 astaro red_server[18646]: A12312312312312 is disconnected.
2016:08:05-18:05:38 astaro red2ctl[4301]: Overflow happened on reds1:0
2016:08:05-18:05:38 astaro red2ctl[4301]: Missing keepalive from reds1:0, disabling peer 123.123.123.123
2016:08:05-18:05:41 astaro red2ctl[4301]: Received keepalive from reds1:0, enabling peer 123.123.123.123
2016:08:05-18:05:47 astaro red_server[27316]: SELF: New connection from 79.214.245.190 with ID A12312312312312 (cipher AES256-GCM-SHA384), rev1
2016:08:05-18:05:47 astaro red_server[27316]: A12312312312312: already connected, releasing old connection.
2016:08:05-18:05:48 astaro red_server[27316]: A12312312312312: seems to be still connected, exiting.
2016:08:05-18:06:06 astaro red_server[27294]: A12312312312312: connected OK, pushing config
2016:08:05-18:06:09 astaro red_server[4291]: SELF: (Re-)loading device configurations
2016:08:05-18:06:11 astaro red2ctl[4301]: Missing keepalive from reds1:0, disabling peer 123.123.123.123
2016:08:05-18:06:16 astaro red_server[4291]: SELF: (Re-)loading device configurations
2016:08:05-18:06:36 astaro red_server[27294]: A12312312312312: No ping for 30 seconds, exiting.
2016:08:05-18:06:36 astaro red_server[27294]: id="4202" severity="info" sys="System" sub="RED" name="RED Tunnel Down" red_id="A12312312312312" forced="0"
2016:08:05-18:06:36 astaro red_server[27294]: A12312312312312 is disconnected.
2016:08:05-18:06:57 astaro red_server[27567]: SELF: New connection from 123.123.123.123 with ID A12312312312312 (cipher AES256-GCM-SHA384), rev1
2016:08:05-18:06:57 astaro red_server[27567]: A12312312312312: connected OK, pushing config
2016:08:05-18:07:01 astaro red_server[27567]: A12312312312312: command 'UMTS_STATUS value=OK'
2016:08:05-18:07:01 astaro red_server[27567]: A12312312312312: command 'PORTSTATE 1E04,1004,1004,1004,1E04'
2016:08:05-18:07:01 astaro red_server[27567]: A12312312312312: PORTSTATE LAN1: 1Gb/s,LAN2: Down,LAN3: Down,LAN4: Down
2016:08:05-18:07:01 astaro red_server[27567]: A12312312312312: command 'PING 0 uplink=WAN uplinkstate=0'
2016:08:05-18:07:01 astaro red_server[27567]: id="4201" severity="info" sys="System" sub="RED" name="RED Tunnel Up" red_id="A12312312312312" forced="0"
2016:08:05-18:07:01 astaro red_server[27567]: A12312312312312: PING remote_tx=0 local_rx=0 diff=0
2016:08:05-18:07:01 astaro red_server[27567]: A12312312312312: PONG local_tx=0
2016:08:05-18:07:02 astaro red2ctl[4301]: Overflow happened on reds1:0
2016:08:05-18:07:02 astaro red2ctl[4301]: Missing keepalive from reds1:0, disabling peer 123.123.123.123
2016:08:05-18:07:05 astaro red2ctl[4301]: Received keepalive from reds1:0, enabling peer 123.123.123.123
2016:08:05-18:07:08 astaro red_server[4291]: SELF: (Re-)loading device configurations



This thread was automatically locked due to age.
  • I am actually seeing this same exact issue with a RED 15w that was just installed.  I am imagining it is an issue that has been brought with one of the most recent firmware revisions because I have several other clients running older firmware with the REDs and no disconnects at all.  I will be opening a support case for this tomorrow and I will advise back here if I get any resolution.

    Thanks,
    Hugh

  • hjherron6 said:

    I am actually seeing this same exact issue with a RED 15w that was just installed.  I am imagining it is an issue that has been brought with one of the most recent firmware revisions because I have several other clients running older firmware with the REDs and no disconnects at all.  I will be opening a support case for this tomorrow and I will advise back here if I get any resolution.

    Thanks,
    Hugh

    Something new?

    The time between the disconnects are getting shorter and shorter...
    Still no idea why, whats the reason and also no ideas of a workaround or similar.

    Any help is welcome.

  • Hi there.  I came across your post and wanted to post my workaround for the issue you are referencing.  I worked on this for several weeks and the past two with Sophos Tech support. 

    Needless to say it looks like there is an issue with the 9.4 software and the RED 50 box specifically because our RED 15 boxes did not have this issue.  However tech support has noted the bug and I'm awaiting a fix.  They stated that 9.405 was supposed to fix it; in a nutshell it didn't.


    The only way I could get this to stabilize was to downgrade to 9.356 (the latest 9.3 build) and restore my config file from backup.  Since doing this the system has been up and running with no issues whatsoever.


    Hope this helps, not a solution but at least a way to get your system back to normal without a remote site dropping sporadically all the time.

    Good luck.

  • One last thing, I also lambasted them for not posting advisories and/ or posting items here where they are providing a bulletin board when issues arise.  I wasted way too much of my time jerking with this issue on and off both with the ISP and Sophos to narrow down where this problem was.


    Hopefully this will result in more proactive notifications from Sophos...or at least one can hope.

  • Hi,

    I currently have a case open for this and it has been escalated to the engineering team.  They have definitely identified that there is something wrong when reviewing the similar logs i have to yours.  I will update as soon as I have made more progress.


    Thanks,
    Hugh

  • Sophos support identified this issue as being a problem with the built-in wifi on the RED15w I was having issues with.  Unfortunately this seems to be a different scenario from what you are having as yours is not a wireless unit correct?  I am replacing the 15w unit with a RED15 and AP15 as separate units and seeing if it resolves the issue and I will advise back here once I know.

  • Yes, mine is a RED50 that is having the issue, no wifi.  But good to know it's more than one unit.  I'm just extremely frustrated with halfbaked software that is supposed to be for enterprise usage!

  • Hi 

    I just wanted to add that i am also seeing a similar issue on one of our RED 50's. 

    It was just randomly disconnecting for varying periods of time but now seems to be connected on the users end, as in they have connection via our VPN, but the UTM RED managment shows it as offline, although the last contact keeps resetting.

    The logs show the same cycle of connecting and reconnecting that you have but i also get an error that says  

    2016:10:14-10:05:39 gw-1 red_server[7613]: Self: SSL connect accept failed because of handshake problems SSL wants a read first
     
    As with everyone else i seemed to start seeing this soon after updating the UTM firmware a few weeks ago. 
    Whats strange is that we have several RED 50's out at different sites, all configured pretty much the same way but only this one is showing this problem. 
    I wonder if the RED hasn't updated correctly which is why the UTM wont accept it's connection.
     
    I'm going to open my own support ticket, i'll update here if i find anything. 
     
     

     

  • Hey Guys

    I got exactly the same Problem with the first RED15w I have in place.

    The disconnection summed up to 20-50 disconnections in sequence. Some times the routing was lost after such a sequence.

    It was much worse 2 weeks ago and could only be solved by unplugging the router and the RED15w from power and reconnect it again.

    Now stabilized somehow and "only" appears 1-2 times a day.

    But it's no fun and I'm lucky the customer only has 2 people working on that site...

    Any news or suggestions?

    Cheers Janbo

    _________

    Yesterday - today was still tomorrow...

  • Hey Community

    After some days with nearly no outage: Today it's really bad. This is what's happening -> any ideas?

    -> btw: assuming that there is no "bad" or unstable WAN-connection ;-)

     

    2016:10:17-13:14:17 gw01 red_server[7103]: A360173640F8577: command 'PING 0 uplink=WAN'
    2016:10:17-13:14:17 gw01 red_server[7103]: A360173640F8577: PING remote_tx=0 local_rx=0 diff=0
    2016:10:17-13:14:17 gw01 red_server[7103]: A360173640F8577: PONG local_tx=0
    2016:10:17-13:14:19 gw01 red_server[7103]: A360173640F8577: command 'SYSSTATE unstable peer using stabilization timeout 30'
    2016:10:17-13:14:19 gw01 red_server[7103]: A360173640F8577: command 'SYSSTATE last stable peer status:'
    2016:10:17-13:14:19 gw01 red_server[7103]: A360173640F8577: command 'SYSSTATE 0 weight: 0 remote: 217.92.246.60 (dev 3), RX: miss 0/0, TX: miss 0/2'
    2016:10:17-13:14:19 gw01 red_server[7103]: A360173640F8577: command 'SYSSTATE current peer status:'
    2016:10:17-13:14:19 gw01 red_server[7103]: A360173640F8577: command 'SYSSTATE 0 weight: 0 remote: 217.92.246.60 (dev 3), RX: miss 0/0, TX: miss 0/2'
    2016:10:17-13:14:19 gw01 red_server[7103]: A360173640F8577: command 'CON_CLOSE reason=unstable_peer'
    2016:10:17-13:14:19 gw01 red_server[7103]: id="4202" severity="info" sys="System" sub="RED" name="RED Tunnel Down" red_id="A360173640F8577" forced="1"
    2016:10:17-13:14:19 gw01 red_server[7103]: A360173640F8577 is disconnected.
    2016:10:17-13:14:23 gw01 red_server[7234]: SELF: New connection from 91.60.13.210 with ID A360173640F8577 (cipher AES256-GCM-SHA384), rev1
    2016:10:17-13:14:23 gw01 red_server[7234]: A360173640F8577: connected OK, pushing config
    2016:10:17-13:14:28 gw01 red_server[7234]: A360173640F8577: command 'UMTS_STATUS value=OK'
    2016:10:17-13:14:28 gw01 red_server[7234]: A360173640F8577: command 'PING 0 uplink=WAN'
    2016:10:17-13:14:28 gw01 red_server[7234]: id="4201" severity="info" sys="System" sub="RED" name="RED Tunnel Up" red_id="A360173640F8577" forced="0"
    2016:10:17-13:14:28 gw01 red_server[7234]: A360173640F8577: PING remote_tx=0 local_rx=0 diff=0
    2016:10:17-13:14:28 gw01 red_server[7234]: A360173640F8577: PONG local_tx=0
    2016:10:17-13:14:43 gw01 red_server[7234]: A360173640F8577: command 'PING 0 uplink=WAN'
    2016:10:17-13:14:43 gw01 red_server[7234]: A360173640F8577: PING remote_tx=0 local_rx=0 diff=0
    2016:10:17-13:14:43 gw01 red_server[7234]: A360173640F8577: PONG local_tx=0

    _________

    Yesterday - today was still tomorrow...