This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

CPU leads to packageloss + kernel spinning around nf_conntrack_tuple_taken

We currently have a problem routenely between 06:20am/07:00am. What we have found out is that the appliance SG310 starts losing packets as we hit a higher load. This is somewhat understandable. We cant find the source of the problems. There is no clear indication that there are really a lot of packets going through firewall.


We can see very high software interrupts on both network cards that completely saturate the corresponding cpu cores. During the day we have about 90000 connections without any problems. Conntrack count (conntrackd -s) shows lower values that our peaks. Traffic on FW (iftop) and our switches do not show any problems.

Perf shows:

+  25.46%      ksoftirqd/0  [kernel.kallsyms]                [k] nf_conntrack_tuple_taken
+  23.68%      kworker/1:1  [kernel.kallsyms]                [k] nf_conntrack_tuple_taken
+  10.17%      ksoftirqd/1  [kernel.kallsyms]                [k] nf_conntrack_tuple_taken
+   9.69%       conntrackd  [kernel.kallsyms]                [k] nf_conntrack_tuple_taken
+   2.26%        confd.plx  [kernel.kallsyms]                [k] nf_conntrack_tuple_taken
+   2.14%          swapper  [kernel.kallsyms]                [k] nf_conntrack_tuple_taken
+   2.01%          mdw.plx  [kernel.kallsyms]                [k] nf_conntrack_tuple_taken
+   1.62%        confd.plx  libperl.so                       [.] Perl_hv_common
+   1.31%      ksoftirqd/0  [kernel.kallsyms]                [k] hash_conntrack_raw
+   1.15%      kworker/1:1  [kernel.kallsyms]                [k] hash_conntrack_raw
+   0.69%      ksoftirqd/0  [kernel.kallsyms]                [k] nf_ct_tuple_equal
+   0.67%      kworker/1:1  [kernel.kallsyms]                [k] nf_ct_tuple_equal
+   0.62%          mdw.plx  libperl.so                       [.] Perl_hv_common
+   0.57%      ksoftirqd/0  [kernel.kallsyms]                [k] nf_ct_invert_tuple
+   0.56%      ksoftirqd/1  [kernel.kallsyms]                [k] hash_conntrack_raw
+   0.51%      kworker/1:1  [kernel.kallsyms]                [k] nf_ct_invert_tuple
+   0.50%         postgres  [kernel.kallsyms]                [k] nf_conntrack_tuple_taken

We have disabled all reporting and reduced retention.


There  is nothing strange in the logs.. mdw/sercice monitor go crazy because the appliance has general network problems then (accessing inet and/or internal network).


Paketfilter log does not show unregular traffic hitting the Appliance.


When we fail over the second unit immediately starts spiking with load (before no load) so this rules out any hardwaredefect.


It is either an internal cronjob that makes the load explode (its bevore the admbs maintainance!!) via some bug in conntrack handling or we have packets hitting the firewall (or empty connections.. dangling connections) but cant find the source.

It is like its completely  hidden from us?

1. Any Ideas for debugging?

2. Where would we see if this a "new connection" issue


When we have the problem again is there a safe way to deactivate conntrack?



This thread was automatically locked due to age.
Parents
  • Hi, Max, and welcome to the UTM Community!

    Assuming that you got Sophos Support involved, can you update us on the cause of the problem?  I would have tried a restore of a configuration backup from before this problem started.

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
Reply
  • Hi, Max, and welcome to the UTM Community!

    Assuming that you got Sophos Support involved, can you update us on the cause of the problem?  I would have tried a restore of a configuration backup from before this problem started.

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
Children
No Data