RED50 - We lose our connection sporadically - what shall we do? 'overflow, missing keepalive, self re-loading'

Question

Hey Community, 
 since two weeks we have an unsteady connection - a few times a day our connection to UTM gets dropped. We don't really have some specific times or doings when it happens. 
 All we can see is, that a new connection is requested, the old gets released, disconnected and then gets connected again. The action is followed by a overflow and a missing keepalive on reds1. Next we get a keepalive, and everything seems fine again. 
 Below are two logs: first one was a 1-sec-disconnect; second one (below underline) was about a minute. 
 We really need some advice, help, tips or tricks. 
 UTM is 9.403-4 
 
 2016:08:05-17:17:54 astaro red_server[18646]: SELF: New connection from 123.123.123.123 with ID A12312312312312 (cipher AES256-GCM-SHA384), rev1 2016:08:05-17:17:55 astaro red_server[18646]: A12312312312312: already connected, releasing old connection. 2016:08:05-17:17:56 astaro red_server[29422]: id="4202" severity="info" sys="System" sub="RED" name="RED Tunnel Down" red_id="A12312312312312" forced="1" 2016:08:05-17:17:56 astaro red_server[29422]: A12312312312312 is disconnected. 2016:08:05-17:18:01 astaro red_server[18646]: A12312312312312: connected OK, pushing config 2016:08:05-17:17:58 astaro red2ctl[4301]: Overflow happened on reds1:0 2016:08:05-17:17:59 astaro red2ctl[4301]: Missing keepalive from reds1:0, disabling peer 123.123.123.123 2016:08:05-17:18:02 astaro red2ctl[4301]: Received keepalive from reds1:0, enabling peer 123.123.123.123 2016:08:05-17:18:04 astaro red_server[18646]: A12312312312312: command 'UMTS_STATUS value=OK' 2016:08:05-17:18:04 astaro red_server[18646]: A12312312312312: command 'PORTSTATE 1E04,1004,1004,1004,1E04' 2016:08:05-17:18:04 astaro red_server[18646]: A12312312312312: PORTSTATE LAN1: 1Gb/s,LAN2: Down,LAN3: Down,LAN4: Down 2016:08:05-17:18:06 astaro red_server[18646]: A12312312312312: command 'PING 0 uplink=WAN uplinkstate=0' 2016:08:05-17:18:06 astaro red_server[18646]: id="4201" severity="info" sys="System" sub="RED" name="RED Tunnel Up" red_id="A12312312312312" forced="0" 2016:08:05-17:18:06 astaro red_server[18646]: A12312312312312: PING remote_tx=0 local_rx=0 diff=0 2016:08:05-17:18:06 astaro red_server[18646]: A12312312312312: PONG local_tx=0 2016:08:05-17:18:07 astaro red_server[4291]: SELF: (Re-)loading device configurations 2016:08:05-17:18:20 astaro red_server[18646]: A12312312312312: command 'PORTSTATE 1E04,1004,1004,1004,1E04' 2016:08:05-17:18:20 astaro red_server[18646]: A12312312312312: PORTSTATE LAN1: 1Gb/s,LAN2: Down,LAN3: Down,LAN4: Down 2016:08:05-17:18:21 astaro red_server[18646]: A12312312312312: command 'PING 0 uplink=WAN uplinkstate=0' 2016:08:05-17:18:21 astaro red_server[18646]: A12312312312312: PING remote_tx=0 local_rx=0 diff=0 2016:08:05-17:18:21 astaro red_server[18646]: A12312312312312: PONG local_tx=0 2016:08:05-17:18:34 astaro red_server[18646]: A12312312312312: command 'PORTSTATE 1E04,1004,1004,1004,1E04' 2016:08:05-17:18:34 astaro red_server[18646]: A12312312312312: PORTSTATE LAN1: 1Gb/s,LAN2: Down,LAN3: Down,LAN4: Down 2016:08:05-17:18:34 astaro red_server[18646]: A12312312312312: command 'PING 0 uplink=WAN uplinkstate=0' 2016:08:05-17:18:34 astaro red_server[18646]: A12312312312312: PING remote_tx=0 local_rx=0 diff=0 2016:08:05-17:18:34 astaro red_server[18646]: A12312312312312: PONG local_tx=0 
 ______________________________________________________________________________ 
 2016:08:05-18:05:18 astaro red_server[27294]: SELF: New connection from 123.123.123.123 with ID A12312312312312 (cipher AES256-GCM-SHA384), rev1 2016:08:05-18:05:33 astaro red_server[27294]: A12312312312312: already connected, releasing old connection. 2016:08:05-18:05:35 astaro red_server[18646]: id="4202" severity="info" sys="System" sub="RED" name="RED Tunnel Down" red_id="A12312312312312" forced="1" 2016:08:05-18:05:35 astaro red_server[18646]: A12312312312312 is disconnected. 2016:08:05-18:05:38 astaro red2ctl[4301]: Overflow happened on reds1:0 2016:08:05-18:05:38 astaro red2ctl[4301]: Missing keepalive from reds1:0, disabling peer 123.123.123.123 2016:08:05-18:05:41 astaro red2ctl[4301]: Received keepalive from reds1:0, enabling peer 123.123.123.123 2016:08:05-18:05:47 astaro red_server[27316]: SELF: New connection from 79.214.245.190 with ID A12312312312312 (cipher AES256-GCM-SHA384), rev1 2016:08:05-18:05:47 astaro red_server[27316]: A12312312312312: already connected, releasing old connection. 2016:08:05-18:05:48 astaro red_server[27316]: A12312312312312: seems to be still connected, exiting. 2016:08:05-18:06:06 astaro red_server[27294]: A12312312312312: connected OK, pushing config 2016:08:05-18:06:09 astaro red_server[4291]: SELF: (Re-)loading device configurations 2016:08:05-18:06:11 astaro red2ctl[4301]: Missing keepalive from reds1:0, disabling peer 123.123.123.123 2016:08:05-18:06:16 astaro red_server[4291]: SELF: (Re-)loading device configurations 2016:08:05-18:06:36 astaro red_server[27294]: A12312312312312: No ping for 30 seconds, exiting. 2016:08:05-18:06:36 astaro red_server[27294]: id="4202" severity="info" sys="System" sub="RED" name="RED Tunnel Down" red_id="A12312312312312" forced="0" 2016:08:05-18:06:36 astaro red_server[27294]: A12312312312312 is disconnected. 2016:08:05-18:06:57 astaro red_server[27567]: SELF: New connection from 123.123.123.123 with ID A12312312312312 (cipher AES256-GCM-SHA384), rev1 2016:08:05-18:06:57 astaro red_server[27567]: A12312312312312: connected OK, pushing config 2016:08:05-18:07:01 astaro red_server[27567]: A12312312312312: command 'UMTS_STATUS value=OK' 2016:08:05-18:07:01 astaro red_server[27567]: A12312312312312: command 'PORTSTATE 1E04,1004,1004,1004,1E04' 2016:08:05-18:07:01 astaro red_server[27567]: A12312312312312: PORTSTATE LAN1: 1Gb/s,LAN2: Down,LAN3: Down,LAN4: Down 2016:08:05-18:07:01 astaro red_server[27567]: A12312312312312: command 'PING 0 uplink=WAN uplinkstate=0' 2016:08:05-18:07:01 astaro red_server[27567]: id="4201" severity="info" sys="System" sub="RED" name="RED Tunnel Up" red_id="A12312312312312" forced="0" 2016:08:05-18:07:01 astaro red_server[27567]: A12312312312312: PING remote_tx=0 local_rx=0 diff=0 2016:08:05-18:07:01 astaro red_server[27567]: A12312312312312: PONG local_tx=0 2016:08:05-18:07:02 astaro red2ctl[4301]: Overflow happened on reds1:0 2016:08:05-18:07:02 astaro red2ctl[4301]: Missing keepalive from reds1:0, disabling peer 123.123.123.123 2016:08:05-18:07:05 astaro red2ctl[4301]: Received keepalive from reds1:0, enabling peer 123.123.123.123 2016:08:05-18:07:08 astaro red_server[4291]: SELF: (Re-)loading device configurations

sachingurung · Answer

Hi Jason, 
 I wanted the ticket# to look into the case history. Thanks for that. 
 All, if the issue is not resolved in the latest release, ask support to take a session and disable fast_failover in RED. Monitor the RED tunnel after disabling the fast_failover option. 
 If the issue still persists after the firmware upgrade, take kernel.log, red.log, tcpdump on port 3410 and 3400 and post the captures to support. Request an escalation after providing the required information. 
 Thanks