This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

RED50 - We lose our connection sporadically - what shall we do? 'overflow, missing keepalive, self re-loading'

Hey Community,

since two weeks we have an unsteady connection - a few times a day our connection to UTM gets dropped. We don't really have some specific times or doings when it happens.

All we can see is, that a new connection is requested, the old gets released, disconnected and then gets connected again. The action is followed by a overflow and a missing keepalive on reds1.
Next we get a keepalive, and everything seems fine again.

Below are two logs: first one was a 1-sec-disconnect;
second one (below underline) was about a minute.

We really need some advice, help, tips or tricks.

UTM is 9.403-4

2016:08:05-17:17:54 astaro red_server[18646]: SELF: New connection from 123.123.123.123 with ID A12312312312312 (cipher AES256-GCM-SHA384), rev1
2016:08:05-17:17:55 astaro red_server[18646]: A12312312312312: already connected, releasing old connection.
2016:08:05-17:17:56 astaro red_server[29422]: id="4202" severity="info" sys="System" sub="RED" name="RED Tunnel Down" red_id="A12312312312312" forced="1"
2016:08:05-17:17:56 astaro red_server[29422]: A12312312312312 is disconnected.
2016:08:05-17:18:01 astaro red_server[18646]: A12312312312312: connected OK, pushing config
2016:08:05-17:17:58 astaro red2ctl[4301]: Overflow happened on reds1:0
2016:08:05-17:17:59 astaro red2ctl[4301]: Missing keepalive from reds1:0, disabling peer 123.123.123.123
2016:08:05-17:18:02 astaro red2ctl[4301]: Received keepalive from reds1:0, enabling peer 123.123.123.123
2016:08:05-17:18:04 astaro red_server[18646]: A12312312312312: command 'UMTS_STATUS value=OK'
2016:08:05-17:18:04 astaro red_server[18646]: A12312312312312: command 'PORTSTATE 1E04,1004,1004,1004,1E04'
2016:08:05-17:18:04 astaro red_server[18646]: A12312312312312: PORTSTATE LAN1: 1Gb/s,LAN2: Down,LAN3: Down,LAN4: Down
2016:08:05-17:18:06 astaro red_server[18646]: A12312312312312: command 'PING 0 uplink=WAN uplinkstate=0'
2016:08:05-17:18:06 astaro red_server[18646]: id="4201" severity="info" sys="System" sub="RED" name="RED Tunnel Up" red_id="A12312312312312" forced="0"
2016:08:05-17:18:06 astaro red_server[18646]: A12312312312312: PING remote_tx=0 local_rx=0 diff=0
2016:08:05-17:18:06 astaro red_server[18646]: A12312312312312: PONG local_tx=0
2016:08:05-17:18:07 astaro red_server[4291]: SELF: (Re-)loading device configurations
2016:08:05-17:18:20 astaro red_server[18646]: A12312312312312: command 'PORTSTATE 1E04,1004,1004,1004,1E04'
2016:08:05-17:18:20 astaro red_server[18646]: A12312312312312: PORTSTATE LAN1: 1Gb/s,LAN2: Down,LAN3: Down,LAN4: Down
2016:08:05-17:18:21 astaro red_server[18646]: A12312312312312: command 'PING 0 uplink=WAN uplinkstate=0'
2016:08:05-17:18:21 astaro red_server[18646]: A12312312312312: PING remote_tx=0 local_rx=0 diff=0
2016:08:05-17:18:21 astaro red_server[18646]: A12312312312312: PONG local_tx=0
2016:08:05-17:18:34 astaro red_server[18646]: A12312312312312: command 'PORTSTATE 1E04,1004,1004,1004,1E04'
2016:08:05-17:18:34 astaro red_server[18646]: A12312312312312: PORTSTATE LAN1: 1Gb/s,LAN2: Down,LAN3: Down,LAN4: Down
2016:08:05-17:18:34 astaro red_server[18646]: A12312312312312: command 'PING 0 uplink=WAN uplinkstate=0'
2016:08:05-17:18:34 astaro red_server[18646]: A12312312312312: PING remote_tx=0 local_rx=0 diff=0
2016:08:05-17:18:34 astaro red_server[18646]: A12312312312312: PONG local_tx=0

______________________________________________________________________________

2016:08:05-18:05:18 astaro red_server[27294]: SELF: New connection from 123.123.123.123 with ID A12312312312312 (cipher AES256-GCM-SHA384), rev1
2016:08:05-18:05:33 astaro red_server[27294]: A12312312312312: already connected, releasing old connection.
2016:08:05-18:05:35 astaro red_server[18646]: id="4202" severity="info" sys="System" sub="RED" name="RED Tunnel Down" red_id="A12312312312312" forced="1"
2016:08:05-18:05:35 astaro red_server[18646]: A12312312312312 is disconnected.
2016:08:05-18:05:38 astaro red2ctl[4301]: Overflow happened on reds1:0
2016:08:05-18:05:38 astaro red2ctl[4301]: Missing keepalive from reds1:0, disabling peer 123.123.123.123
2016:08:05-18:05:41 astaro red2ctl[4301]: Received keepalive from reds1:0, enabling peer 123.123.123.123
2016:08:05-18:05:47 astaro red_server[27316]: SELF: New connection from 79.214.245.190 with ID A12312312312312 (cipher AES256-GCM-SHA384), rev1
2016:08:05-18:05:47 astaro red_server[27316]: A12312312312312: already connected, releasing old connection.
2016:08:05-18:05:48 astaro red_server[27316]: A12312312312312: seems to be still connected, exiting.
2016:08:05-18:06:06 astaro red_server[27294]: A12312312312312: connected OK, pushing config
2016:08:05-18:06:09 astaro red_server[4291]: SELF: (Re-)loading device configurations
2016:08:05-18:06:11 astaro red2ctl[4301]: Missing keepalive from reds1:0, disabling peer 123.123.123.123
2016:08:05-18:06:16 astaro red_server[4291]: SELF: (Re-)loading device configurations
2016:08:05-18:06:36 astaro red_server[27294]: A12312312312312: No ping for 30 seconds, exiting.
2016:08:05-18:06:36 astaro red_server[27294]: id="4202" severity="info" sys="System" sub="RED" name="RED Tunnel Down" red_id="A12312312312312" forced="0"
2016:08:05-18:06:36 astaro red_server[27294]: A12312312312312 is disconnected.
2016:08:05-18:06:57 astaro red_server[27567]: SELF: New connection from 123.123.123.123 with ID A12312312312312 (cipher AES256-GCM-SHA384), rev1
2016:08:05-18:06:57 astaro red_server[27567]: A12312312312312: connected OK, pushing config
2016:08:05-18:07:01 astaro red_server[27567]: A12312312312312: command 'UMTS_STATUS value=OK'
2016:08:05-18:07:01 astaro red_server[27567]: A12312312312312: command 'PORTSTATE 1E04,1004,1004,1004,1E04'
2016:08:05-18:07:01 astaro red_server[27567]: A12312312312312: PORTSTATE LAN1: 1Gb/s,LAN2: Down,LAN3: Down,LAN4: Down
2016:08:05-18:07:01 astaro red_server[27567]: A12312312312312: command 'PING 0 uplink=WAN uplinkstate=0'
2016:08:05-18:07:01 astaro red_server[27567]: id="4201" severity="info" sys="System" sub="RED" name="RED Tunnel Up" red_id="A12312312312312" forced="0"
2016:08:05-18:07:01 astaro red_server[27567]: A12312312312312: PING remote_tx=0 local_rx=0 diff=0
2016:08:05-18:07:01 astaro red_server[27567]: A12312312312312: PONG local_tx=0
2016:08:05-18:07:02 astaro red2ctl[4301]: Overflow happened on reds1:0
2016:08:05-18:07:02 astaro red2ctl[4301]: Missing keepalive from reds1:0, disabling peer 123.123.123.123
2016:08:05-18:07:05 astaro red2ctl[4301]: Received keepalive from reds1:0, enabling peer 123.123.123.123
2016:08:05-18:07:08 astaro red_server[4291]: SELF: (Re-)loading device configurations



This thread was automatically locked due to age.
Parents
  • Hi 

    I just wanted to add that i am also seeing a similar issue on one of our RED 50's. 

    It was just randomly disconnecting for varying periods of time but now seems to be connected on the users end, as in they have connection via our VPN, but the UTM RED managment shows it as offline, although the last contact keeps resetting.

    The logs show the same cycle of connecting and reconnecting that you have but i also get an error that says  

    2016:10:14-10:05:39 gw-1 red_server[7613]: Self: SSL connect accept failed because of handshake problems SSL wants a read first
     
    As with everyone else i seemed to start seeing this soon after updating the UTM firmware a few weeks ago. 
    Whats strange is that we have several RED 50's out at different sites, all configured pretty much the same way but only this one is showing this problem. 
    I wonder if the RED hasn't updated correctly which is why the UTM wont accept it's connection.
     
    I'm going to open my own support ticket, i'll update here if i find anything. 
     
     

     

  • Hey Guys

    I got exactly the same Problem with the first RED15w I have in place.

    The disconnection summed up to 20-50 disconnections in sequence. Some times the routing was lost after such a sequence.

    It was much worse 2 weeks ago and could only be solved by unplugging the router and the RED15w from power and reconnect it again.

    Now stabilized somehow and "only" appears 1-2 times a day.

    But it's no fun and I'm lucky the customer only has 2 people working on that site...

    Any news or suggestions?

    Cheers Janbo

    _________

    Yesterday - today was still tomorrow...

  • Just a heads up, I never had a fix in 9.404-5.  I wound up rolling back to 9.356-3 in order for our devices to work properly again.  I've not had a problem since with any RED boxes dropping.

    According to support, the fix for this issue is supposedly in the 9.407-3 release which came out a short while ago.  I myself have not tested it since I've got other items to work instead of "breaking" the network on purpose between both sites.

    Anyone here that has this issue and is running the latest 9.407-3 release, please let the rest of us know if you still have problems.

    Thanks!

     

    J

  • Hi All,

    Today I experienced a very similar issue both on a RED 15 and a site to site RED and the issue appeared to be resolved when I disabled Tunnel Compression, would either of yourselves be able to check if first you have it enabled, if so then could you turn it off and see what occurs?

    Hopefully this may help!

    Emile

  • Sadly i checked and our affected RED isn't using Tunnel Compression. 

    I've tried talking to the office that is hosting the RED to make sure they or their ISP hasn't started blocking any ports or anything and they have said they haven't. 

    I'm hoping this is just due to the UTM update and that a fix will come out soon as we don't really want to roll back everything just for this one misbehaving RED. 

     

    James

  • Hi James

    We already have raised a Ticket at the ISP to check the Line - therefore I can't say if the Tunnel-Compression disable will help (don't want to change two things simultaneously). I'll give an update in some days.

    My problem is: Sometimes the problem arises very bad and occurs several times a day and sometimes it runs fine for 2 or 3 days.

    Can you tell us how often and how sporadically it occurs with your RED?

     

    Cheers, Janbo

    _________

    Yesterday - today was still tomorrow...

  • Hi Janbo 

    For us with this particular RED 50 its been non stop since the beginning of last week. It was intermittent for a week or 2 after we updated our UTM and then once we updated it a second time it's started doing what it does now. According to the logs on the UTM the RED is in a constant repeating cycle of connecting,disconnecting and reconnecting. This means that the RED never shows as up in RED management but the last seen counter it shows gets reset every 60-100 seconds or so. 

    Whats strange is that the RED is in fact connected, the users on site that are plugged into it are reporting that they have full rdp access to our servers but that it does feel like it pauses every now and then. 

     

    All very stramge. 

    Thanks

    James 

  • Anyone here that has this issue and is running the latest 9.407-3 release AND still having these problems?

     

    Thanks


    Jason

  • Hi Jason, provide me the ticket# with support so that I can look into the steps taken by

    Please provide me the ticket# with support so that I can look into the steps taken by support and check the bug ID. 

    Thanks

    Sachin Gurung
    Team Lead | Sophos Technical Support
    Knowledge Base  |  @SophosSupport  |  Video tutorials
    Remember to like a post.  If a post (on a question thread) solves your question use the 'This helped me' link.

  • What are you asking?  I simply asked if anyone upgraded to the newest firmware and if this issue is recurring?

     

    My original ticket for this was #6327481 but it was closed since they said it was a bug to be fixed in a future release.

     

    Jason

  •  Hi Jason, 

    I wanted the ticket# to look into the case history. Thanks for that.

    All, if the issue is not resolved in the latest release, ask support to take a session and disable fast_failover in RED. Monitor the RED tunnel after disabling the fast_failover option.

    If the issue still persists after the firmware upgrade, take kernel.log, red.log, tcpdump on port 3410 and 3400 and post the captures to support. Request an escalation after providing the required information.

    Thanks

    Sachin Gurung
    Team Lead | Sophos Technical Support
    Knowledge Base  |  @SophosSupport  |  Video tutorials
    Remember to like a post.  If a post (on a question thread) solves your question use the 'This helped me' link.

  • We're having this issue now however only at one site.  75% of the sites have the exact same setup however only one showing this issue.   Overflow happens randomly and at least once an hour, causing the UTM to think the RED50 is dead and resetting the tunnel.  It's definitely not an endpoint issue as we have setup pingplotters and show it clean all the way through when this is happening.  

     

    I have a ticket open so hopefully they get this resolved.  I'll mention the fast_failover to them if that's an option still under the latest software version.  

Reply
  • We're having this issue now however only at one site.  75% of the sites have the exact same setup however only one showing this issue.   Overflow happens randomly and at least once an hour, causing the UTM to think the RED50 is dead and resetting the tunnel.  It's definitely not an endpoint issue as we have setup pingplotters and show it clean all the way through when this is happening.  

     

    I have a ticket open so hopefully they get this resolved.  I'll mention the fast_failover to them if that's an option still under the latest software version.  

Children
No Data