This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Disconnect Loop RED 15 -

Hi,

ich have a very strange problem with the new RED 15.

Setting:

UTM 9.350-12 at the main office

RED 15 with static IP behind an LTE-router at the remote location

After the first configuration everything works fine. But after some hours the RED diconnect and reconnected every minute.

After a reboot of the UTM (or if i deactivate the RED for some hours)  the connection is stable for some hours.

Here are some lines out of the RED log:

2015:11:10-16:42:48 che-igw01 red_server[20657]: A350124B7XXXXXX: command 'PING 0 uplink=WAN'
2015:11:10-16:42:48 che-igw01 red_server[20657]: A350124B7XXXXXX: PING remote_tx=0 local_rx=0 diff=0
2015:11:10-16:42:48 che-igw01 red_server[20657]: A350124B7XXXXXX:: PONG local_tx=0
2015:11:10-16:42:52 che-igw01 red_server[20939]: SELF: New connection from 2.200.175.176 with ID A350124B7XXXXXX: (cipher AES256-GCM-SHA384), rev1
2015:11:10-16:42:52 che-igw01 red_server[20939]: A350124B7XXXXXX:: already connected, releasing old connection.
2015:11:10-16:42:52 che-igw01 red_server[20657]: id="4202" severity="info" sys="System" sub="RED" name="RED Tunnel Down" red_id="A350124B7XXXXXX" forced="1"
2015:11:10-16:42:52 che-igw01 red_server[20657]: A350124B7XXXXXX: is disconnected.
2015:11:10-16:42:52 che-igw01 red2ctl[4266]: Overflow happened on reds2:0
2015:11:10-16:42:52 che-igw01 red2ctl[4266]: Missing keepalive from reds2:0, disabling peer 2.200.XXX.XXX
2015:11:10-16:42:52 che-igw01 red_server[4255]: SELF: (Re-)loading device configurations
2015:11:10-16:42:53 che-igw01 red_server[20939]: A350124B7XXXXXX:: connected OK, pushing config
2015:11:10-16:42:53 che-igw01 red_server[20939]: A350124B7XXXXXX:: Sending PEERS+178.15.XXX.XXX
2015:11:10-16:42:57 che-igw01 red_server[20939]: A350124B7XXXXXX:: command 'UMTS_STATUS value=OK'
2015:11:10-16:42:57 che-igw01 red_server[20939]: A350124B7XXXXXX:: command 'PING 0 uplink=WAN'
2015:11:10-16:42:57 che-igw01 red_server[20939]: id="4201" severity="info" sys="System" sub="RED" name="RED Tunnel Up" red_id="A350124B7XXXXXX:" forced="0"
2015:11:10-16:42:57 che-igw01 red_server[20939]: A350124B7XXXXXX:: PING remote_tx=0 local_rx=0 diff=0
2015:11:10-16:42:57 che-igw01 red_server[20939]: A350124B7XXXXXX:: PONG local_tx=0
2015:11:10-16:42:58 che-igw01 red_server[4255]: SELF: (Re-)loading device configurations
2015:11:10-16:42:59 che-igw01 red2ctl[4266]: Missing keepalive from reds2:0, disabling peer 2.200.XXX.XXX
2015:11:10-16:43:02 che-igw01 red2ctl[4266]: Received keepalive from reds2:0, enabling peer 2.200.XXX.XXX
2015:11:10-16:43:11 che-igw01 red_server[20939]: A350124B7XXXXXX:: command 'PING 0 uplink=WAN'
2015:11:10-16:43:11 che-igw01 red_server[20939]: A350124B7XXXXXX:: PING remote_tx=0 local_rx=0 diff=0
2015:11:10-16:43:11 che-igw01 red_server[20939]: A350124B7XXXXXX:: PONG local_tx=0
2015:11:10-16:43:26 che-igw01 red_server[20939]: A350124B7XXXXXX:: command 'PING 0 uplink=WAN'
2015:11:10-16:43:26 che-igw01 red_server[20939]: A350124B7XXXXXX:: PING remote_tx=0 local_rx=0 diff=0
2015:11:10-16:43:26 che-igw01 red_server[20939]: A350124B7XXXXXX:: PONG local_tx=0
2015:11:10-16:43:30 che-igw01 red_server[21136]: SELF: New connection from 2.200.XXX.XXX with ID A350124B7XXXXXX: (cipher AES256-GCM-SHA384), rev1
2015:11:10-16:43:30 che-igw01 red_server[21136]: A350124B7XXXXXX:: already connected, releasing old connection.
2015:11:10-16:43:30 che-igw01 red_server[20939]: id="4202" severity="info" sys="System" sub="RED" name="RED Tunnel Down" red_id="A350124B7XXXXXX:" forced="1"
2015:11:10-16:43:31 che-igw01 red_server[20939]: A350124B7XXXXXX: is disconnected.
2015:11:10-16:43:31 che-igw01 red_server[4255]: SELF: (Re-)loading device configurations
2015:11:10-16:43:32 che-igw01 red2ctl[4266]: Overflow happened on reds2:0
2015:11:10-16:43:32 che-igw01 red2ctl[4266]: Missing keepalive from reds2:0, disabling peer 2.200.XXX.XXX
2015:11:10-16:43:32 che-igw01 red_server[21136]: A350124B7XXXXXX:: connected OK, pushing config
2015:11:10-16:43:32 che-igw01 red_server[21136]: A350124B7XXXXXX:: Sending PEERS+178.15.XXX.XXX
2015:11:10-16:43:35 che-igw01 red2ctl[4266]: Overflow happened on reds2:0
2015:11:10-16:43:35 che-igw01 red2ctl[4266]: Missing keepalive from reds2:0, disabling peer 2.200.XXX.XXX
2015:11:10-16:43:35 che-igw01 red_server[21136]: A350124B7XXXXXX:: command 'UMTS_STATUS value=OK'
2015:11:10-16:43:35 che-igw01 red_server[21136]: A350124B7XXXXXX:: command 'PING 0 uplink=WAN'
2015:11:10-16:43:35 che-igw01 red_server[21136]: id="4201" severity="info" sys="System" sub="RED" name="RED Tunnel Up" red_id="A350124B7XXXXXX:" forced="0"
2015:11:10-16:43:35 che-igw01 red_server[21136]: A350124B7XXXXXX:: PING remote_tx=0 local_rx=0 diff=0
2015:11:10-16:43:35 che-igw01 red_server[21136]: A350124B7XXXXXX:: PONG local_tx=0
2015:11:10-16:43:41 che-igw01 red2ctl[4266]: Received keepalive from reds2:0, enabling peer 2.200.XXX.XX
Other RED devices (RED10) at the same UTM works fine.
Any ideas?



This thread was automatically locked due to age.
Parents
  • Could you please check the kernel log? Are there many lines 'auto-removing peer / Autoadd peer' for the respective RED15s?
  • Hi ,

    you are right. This is a part of my kernel log. Every nearly 45 seconds the peer is autoadded and autoremoved. Between these events, there are a lot of martian source events. My ISP said that they doesn't block or restrict any port.

    2015:12:22-11:16:02 che-igw01 kernel: [491552.771984] reds3: Autoadd peer 1 (from 188.103.XXX.XXX:3410 to 178.15.XXX.XXX:3410)
    2015:12:22-11:16:02 che-igw01 kernel: [491553.055368] net_ratelimit: 3 callbacks suppressed
    2015:12:22-11:16:02 che-igw01 kernel: [491553.055424] IPv4: martian source 188.103.XXX.XXX from 178.15.252.218, on dev eth2
    2015:12:22-11:16:02 che-igw01 kernel: [491553.055704] ll header: 00000000: ff ff ff ff ff ff 00 50 56 87 04 aa 08 06 .......PV.....
    2015:12:22-11:16:03 che-igw01 kernel: [491553.554991] reds3: auto-removing peer 188.103.XXX.XXX:3410
    2015:12:22-11:16:03 che-igw01 kernel: [491554.055227] IPv4: martian source 188.103.XXX.XXX from 178.15.252.218, on dev eth2
    2015:12:22-11:16:03 che-igw01 kernel: [491554.055239] ll header: 00000000: ff ff ff ff ff ff 00 50 56 87 04 aa 08 06 .......PV.....
    2015:12:22-11:16:04 che-igw01 kernel: [491555.055404] IPv4: martian source 188.103.XXX.XXX from 178.15.XXX.XXX, on dev eth2
    .....
    2015:12:22-11:16:43 che-igw01 kernel: [491594.280905] RX: decryption failed
    2015:12:22-11:16:44 che-igw01 kernel: [491594.773703] RX: decryption failed
    2015:12:22-11:16:44 che-igw01 kernel: [491595.072628] reds3: auto-removing peer 188.103.XXX.XXX:3410
    2015:12:22-11:16:44 che-igw01 kernel: [491595.281360] RX: decryption failed
    2015:12:22-11:16:45 che-igw01 kernel: [491595.779812] reds3: Autoadd peer 0 (from 188.103.XXX.XXX:3410 to 178.15.XXX.XXX:3410)
    2015:12:22-11:16:45 che-igw01 kernel: [491596.073977] IPv4: martian source 188.103.XXX.XXX from 178.15.XXX.XXX, on dev eth2

  • @reFresh: Those 'martian source' events should normally not be happening. As you wrote before you tried different ISPs for the RED connection, so maybe there is an issue and/or misconfiguration on the UTM side.
    Do you have multiple uplinks and use features like link aggregation, uplink balancing or similar?
    Also, when the UTM has only a single uplink, the RED configuration '2nd UTM hostname' should not be filled but left empty.
  • Thanks for your reply joney. Yes mayby there is a misconfiguration at the UTM side but I have no idea what this misconfiguration could be. Multiple RED 10 work fine and the RED 15 works fine too but only for the first couple of hours before the disconnect loop starts. We don't use link aggregation, uplink balancing or multipath rules. There is no second uplink and no second uplink is filled in the "2nd UTM hostname"-field.
  • I'm also still getting the same problem. Very similar kernel logs as reFresh. 

    2016:03:22-15:12:46 gateway kernel: [1271257.703141] reds4: Autoadd peer 0 (from 220.xxx.90.xxx:3410 to 220.xxx.70.xxx:3410)
    2016:03:22-15:13:23 gateway kernel: [1271295.003595] reds4: auto-removing peer 220.xxx.90.xxx:3410
    2016:03:22-15:14:10 gateway kernel: [1271341.704244] reds4: Autoadd peer 0 (from 220.xxx.90.xxx:3410 to 220.xxx.70.xxx:3410)
    2016:03:22-15:15:45 gateway kernel: [1271436.670208] reds4: Autoadd peer 0 (from 220.xxx.90.xxx:3410 to 220.xxx.70.xxx:3410)
    2016:03:22-15:16:02 gateway kernel: [1271453.666149] RX: decryption failed
    2016:03:22-15:16:02 gateway kernel: [1271453.896073] RX: decryption failed
    2016:03:22-15:16:02 gateway kernel: [1271453.937344] RX: decryption failed
    2016:03:22-15:16:03 gateway kernel: [1271454.494169] reds4: auto-removing peer 220.xxx.90.xxx:3410
    2016:03:22-15:16:03 gateway kernel: [1271454.665877] RX: decryption failed
    2016:03:22-15:16:04 gateway kernel: [1271455.166114] reds4: Autoadd peer 0 (from 220.xxx.90.xxx:3410 to 220.xxx.70.xxx:3410)

    Any fixes I can use yet? I'm about the throw the RED15s (and RED50s) in the bin.

    I've started buying up the RED10s off eBay to get me out of trouble for my existing clients since it is now out of production...

    ------------

    Kevin

  • Yes there are a few new infos i can give.

    We are working together with the support team the past two month to solve the problem. The support implemented a few fixes on our firewall that solved the disconnect bug completely. 

    We had another bug where connected reds seemed to be online, but no device behind the red can be reached. Also this bug was fixed by sophos for most devices. Only one RED15 Device still has this bug but a RED10 just works fine. So this RED15 get kicked out for a RED10 to "solve" the problem completely.

    Yesterday we received a e-mail from sophos to inform us that all this fixes are implemented in UTM9.4 

  • Thanks for the new details. I'm looking forward to the 9.4 update.

    In my case the current situation is like this:

    At one UTM my RED 15 works fine and on another UTM (same version) the RED 15 disconnects requently.

    The ISP is the same for both locations, but another connection type (copper and fibre) and another ISP-router.

    The ISP said they doesn't block any ports.

    Hope the 9.4-version will solve this issue.

  • Did you test my Workaround where you have to simply ping a device behind the red that disconnects? That should help you until 9.4 fully arrives. 

    on the astaro FTP server the up2date file is available for download. ftp.astaro.de/.../ 

Reply Children
  • Hello,

    I have a similar problem.

    We have 2 RED 15 and 1 RED 10, the RED 10 works and the two RED 15 since last night suddenly stopped.

    The RED 15 are configured via USB stick. i.e. static ip.

    I have now again downloaded the config of the UTM and loaded via USB stick to the RED 15 and suddenly it works again.

    The log looks at the time of failure:

    2016:08:26-00:34:39 lhfw1 red2ctl[4235]: Missing keepalive from reds3:0, disabling peer xxx
    2016:08:26-00:34:39 lhfw1 red2ctl[4235]: Missing keepalive from reds2:0, disabling peer xxx
    2016:08:26-00:34:40 lhfw1 red_server[4240]: SELF: (Re-)loading device configurations
    2016:08:26-00:34:55 lhfw1 red_server[32314]: A3501B10xxx: No ping for 30 seconds, exiting.
    2016:08:26-00:34:55 lhfw1 red_server[32314]: id="4202" severity="info" sys="System" sub="RED" name="RED Tunnel Down" red_id="A3501B10xxx" forced="0"
    2016:08:26-00:34:55 lhfw1 red_server[32314]: A3501B10xxx is disconnected.
    2016:08:26-00:35:01 lhfw1 red_server[18675]: A3200F15xxx: No ping for 30 seconds, exiting.
    2016:08:26-00:35:01 lhfw1 red_server[18675]: id="4202" severity="info" sys="System" sub="RED" name="RED Tunnel Down" red_id="A3200F15xxx" forced="0"
    2016:08:26-00:35:01 lhfw1 red_server[18675]: A3200F15xxx is disconnected.
    2016:08:26-00:35:01 lhfw1 red_server[4240]: SELF: (Re-)loading device configurations
    2016:08:26-00:35:03 lhfw1 red_server[6200]: A3501B1xxx: No ping for 30 seconds, exiting.
    2016:08:26-00:35:03 lhfw1 red_server[6200]: id="4202" severity="info" sys="System" sub="RED" name="RED Tunnel Down" red_id="A3501B17FAxxx" forced="0"
    2016:08:26-00:35:03 lhfw1 red_server[6200]: A3501B17xxx is disconnected.
    2016:08:26-00:35:58 lhfw1 redctl[17480]: key length: 32
    2016:08:26-00:35:58 lhfw1 redctl[17481]: key length: 32
    2016:08:26-00:35:58 lhfw1 red_server[17478]: SELF: New connection from xxx with ID A3200F154xxx (cipher AES256-GCM-SHA384), rev1<30>Aug 26 00:35:58 red_server[17478]: A3200F15xxx: connected OK, pushing config
    2016:08:26-00:36:01 lhfw1 red_server[17485]: SELF: New connection from xxx with ID A3200F1541xxx (cipher AES256-GCM-SHA384), rev1<30>Aug 26 00:36:01 red_server[17485]: A3200F1541xxx: already connected, releasing old connection.
    2016:08:26-00:36:01 lhfw1 red_server[17478]: id="4202" severity="info" sys="System" sub="RED" name="RED Tunnel Down" red_id="A3200F154xxx" forced="1"
    2016:08:26-00:36:01 lhfw1 red_server[17478]: A3200F154xxx is disconnected.
    2016:08:26-00:36:02 lhfw1 redctl[17488]: key length: 32
    2016:08:26-00:36:02 lhfw1 redctl[17489]: key length: 32
    2016:08:26-00:36:02 lhfw1 red_server[17485]: A3200F1541xxx: connected OK, pushing config
    2016:08:26-00:36:06 lhfw1 red_server[17485]: A3200F1541xxx: command 'UMTS_STATUS value=OK'
    2016:08:26-00:36:06 lhfw1 red_server[17485]: A3200F1541xxx: command 'PING 0 uplink=WAN'
    2016:08:26-00:36:06 lhfw1 red_server[17485]: id="4201" severity="info" sys="System" sub="RED" name="RED Tunnel Up" red_id="A3200F154xxx" forced="0"
    2016:08:26-00:36:06 lhfw1 red_server[17485]: A3200F1541xxx: PING remote_tx=0 local_rx=0 diff=0
    2016:08:26-00:36:06 lhfw1 red_server[17485]: A3200F1541xxx: PONG local_tx=0
    2016:08:26-00:36:06 lhfw1 red_server[4240]: SELF: (Re-)loading device configurations
    2016:08:26-00:36:22 lhfw1 red_server[17485]: A3200F1541xxx: command 'PING 10 uplink=WAN'
    2016:08:26-00:36:22 lhfw1 red_server[17485]: A3200F1541xxx: PING remote_tx=10 local_rx=10 diff=0
    2016:08:26-00:36:22 lhfw1 red_server[17485]: A3200F1541xxx: PONG local_tx=1

    Someone has an idea what was the cause?

    Tank you