This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

UTM slow - confd.plx restarting continuously - high CPU - DNS queries timing out

Symptoms:

  • High CPU
  • DNS queries timing out
  • confd.plx restarting continuously
  • Numerous errors in confd.log and confd-debug.log

Since DNS-queries are timing out none of the clients can do much on the WWW.


 

Top snippets:

 

top - 13:23:36 up 3:12, 1 user, load average: 0.54, 0.61, 0.87
Tasks: 155 total, 2 running, 151 sleeping, 0 stopped, 2 zombie
Cpu(s): 58.3%us, 3.8%sy, 0.0%ni, 35.5%id, 1.5%wa, 0.0%hi, 0.8%si, 0.0%st
Mem: 4037776k total, 3686020k used, 351756k free, 118996k buffers
Swap: 2097148k total, 0k used, 2097148k free, 1688392k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
32165 postgres 20 0 608m 60m 14m R 95 1.5 0:07.26 postgres
32167 root 20 0 0 0 0 Z 16 0.0 0:00.58 confd.plx <defunct>
32169 root 20 0 67692 47m 1568 S 4 1.2 0:00.11 confd.plx
6078 httpprox 20 0 1229m 711m 15m S 3 18.0 14:03.55 httpproxy
4669 postgres 20 0 564m 25m 23m S 2 0.6 0:44.51 postgres
4480 snort 15 -5 149m 93m 3280 S 1 2.4 1:19.01 snort
4335 root 20 0 116m 88m 3108 S 1 2.2 0:45.30 named


top - 13:22:45 up 3:11, 1 user, load average: 0.53, 0.62, 0.89
Tasks: 154 total, 1 running, 151 sleeping, 0 stopped, 2 zombie
Cpu(s): 10.2%us, 2.7%sy, 0.0%ni, 86.1%id, 0.7%wa, 0.0%hi, 0.3%si, 0.0%st
Mem: 4037776k total, 3637864k used, 399912k free, 118780k buffers
Swap: 2097148k total, 0k used, 2097148k free, 1687744k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
32096 root 20 0 0 0 0 Z 15 0.0 0:00.55 confd.plx <defunct>
32097 root 20 0 67692 47m 1568 S 3 1.2 0:00.10 confd.plx
3888 root 20 0 12104 10m 2568 S 2 0.3 2:14.05 selfmonng.plx
4669 postgres 20 0 564m 25m 23m S 1 0.6 0:44.29 postgres
6078 httpprox 20 0 1229m 711m 15m S 1 18.0 14:01.23 httpproxy
3 root 20 0 0 0 0 S 0 0.0 0:23.67 ksoftirqd/0
3413 upsd 20 0 2420 620 376 S 0 0.0 0:02.45 usbhid-ups


 

confd.log

2017:06:03-14:39:28 utm confd[3330]: W main::cleanup_sessions:1038() => id="3100" severity="warn" sys="System" sub="confd" name="timeout: removing session" user="system" srcip="127.0.0.1" sid="MvqaGDewxZanMJxGwpgD" facility="system" client="websecreporter"
2017:06:03-14:39:28 utm confd[3330]: W main::cleanup_sessions:1038() => id="3100" severity="warn" sys="System" sub="confd" name="timeout: removing session" user="system" srcip="127.0.0.1" sid="PacitgDMItOdLcPDtoaI" facility="system" client="httpproxy"
2017:06:03-14:39:28 utm confd[3330]: W main::cleanup_sessions:1038() => id="3100" severity="warn" sys="System" sub="confd" name="timeout: removing session" user="system" srcip="127.0.0.1" sid="PRZIUUWbDgBcyfihpgho" facility="system" client="websecreporter"
2017:06:03-14:39:28 utm confd[3330]: W main::cleanup_sessions:1038() => id="3100" severity="warn" sys="System" sub="confd" name="timeout: removing session" user="system" srcip="127.0.0.1" sid="ryXbhyPUcPlZOZHvlGJJ" facility="system" client="websecreporter"
2017:06:03-14:39:28 utm confd[3330]: W main::cleanup_sessions:1038() => id="3100" severity="warn" sys="System" sub="confd" name="timeout: removing session" user="system" srcip="127.0.0.1" sid="sFwRauMfraAYcEVebJMS" facility="system" client="websecreporter"
2017:06:03-14:39:28 utm confd[3330]: W main::cleanup_sessions:1038() => id="3100" severity="warn" sys="System" sub="confd" name="timeout: removing session" user="system" srcip="127.0.0.1" sid="tuJFxZMjmTMNRaIMkMWx" facility="system" client="websecreporter"

confd-debug.log

2017:06:03-14:43:09 utm confd[13776]: D Session::terminate:290() => id="3100" severity="debug" sys="System" sub="confd" name="closing session" user="system" srcip="127.0.0.1" sid="fVCoGqJzWcxvAMmqpfNl" facility="system" client="awed [master]" call="logout" function="logout"
2017:06:03-14:43:09 utm confd[13776]: D sys::DESTROY:231() => id="3100" severity="debug" sys="System" sub="confd" name="worker process exiting" user="system" srcip="127.0.0.1" facility="system" client="awed [master]"
2017:06:03-14:43:09 utm confd[13779]: D Role::authenticate:185() => id="3106" severity="debug" sys="System" sub="confd" name="authentication successful" user="system" srcip="127.0.0.1" sid="PQTsCyFxQcgrqRhEoBNO" facility="system" client="awed [master]" call="new"<31>Jun 3 14:43:09 confd[13779]: D sys::AUTOLOAD:303() => id="3100" severity="debug" sys="System" sub="confd" name="external call" user="system" srcip="127.0.0.1" facility="system" client="awed [master]" lock="none" method="get_SID"
2017:06:03-14:43:09 utm confd[13779]: D sys::AUTOLOAD:303() => id="3100" severity="debug" sys="System" sub="confd" name="external call" user="system" srcip="127.0.0.1" facility="system" client="awed [master]" lock="none" method="get_objects"
2017:06:03-14:43:12 utm confd[13779]: D sys::AUTOLOAD:303() => id="3100" severity="debug" sys="System" sub="confd" name="external call" user="system" srcip="127.0.0.1" facility="system" client="awed [master]" lock="none" method="logout"
2017:06:03-14:43:12 utm confd[13779]: D Session::terminate:290() => id="3100" severity="debug" sys="System" sub="confd" name="closing session" user="system" srcip="127.0.0.1" sid="PQTsCyFxQcgrqRhEoBNO" facility="system" client="awed [master]" call="logout" function="logout"
2017:06:03-14:43:12 utm confd[13779]: D sys::DESTROY:231() => id="3100" severity="debug" sys="System" sub="confd" name="worker process exiting" user="system" srcip="127.0.0.1" facility="system" client="awed [master]"
2017:06:03-14:43:12 utm confd[13780]: D Role::authenticate:185() => id="3106" severity="debug" sys="System" sub="confd" name="authentication successful" user="system" srcip="127.0.0.1" sid="PcQGchybtWFXUUaNcMEb" facility="system" client="awed [master]" call="new"<31>Jun 3 14:43:12 confd[13780]: D sys::AUTOLOAD:303() => id="3100" severity="debug" sys="System" sub="confd" name="external call" user="system" srcip="127.0.0.1" facility="system" client="awed [master]" lock="none" method="get_SID"

http.log (numerous entries like this one)

2017:06:03-12:59:57 utm httpproxy[6078]: id="0002" severity="info" sys="SecureWeb" sub="http" name="web request blocked" action="block" method="CONNECT" srcip="10.xx.xx.xx" dstip="" user="" group="" ad_domain="" statuscode="502" cached="0" profile="REF_HttProContaInterNetwo12 (Profile - eReaders)" filteraction="REF_HttCffFilteEread (Filter - eReaders)" size="0" request="0xde4d1600" url="https://s3.amazonaws.com/" referer="" error="Host not found" authtime="0" dnstime="7" cattime="299" avscantime="0" fullreqtime="321950" device="0" auth="0" ua="" exceptions="" overridereputation="1" category="177" reputation="trusted" categoryname="Content Server"


 

I recently (sometime within the past week) updated to version 9.500-9. In addition, I created a number of additional Web Protection Policies and Filter Actions yesterday. However, it was running reasonably well with the new policies and actions for most of the day yesterday.

I've rebooted several times today and for the first couple of minutes the UTM is usable, but then it deteriorates and becomes unresponsive.



This thread was automatically locked due to age.
  • Hi Arie,

    Looking at the present condition of the top output, the Load Average and Memory is normal which means the UTM is not overloaded with the heavy flow of traffic. I would still like to know what is the model of UTM and how many concurrent users are connected? Is it an HA deployment?

    Keep a close eye on what causes the confd to create the high CPU condition. I think it requires more monitoring than specific troubleshooting steps. Arrange a down time and give a restart to the appliance, in case there are any mdw and confd processes that are stuck and causing high CPU.

    Cheers-

    Sachin Gurung
    Team Lead | Sophos Technical Support
    Knowledge Base  |  @SophosSupport  |  Video tutorials
    Remember to like a post.  If a post (on a question thread) solves your question use the 'This helped me' link.

  • This is a shot in the dark, Arie - what happens if you rebuild the PostgreSQL databases? Caution, although this does not affect the log files, it does delete all of the history in graphs and Reporting.

    /etc/init.d/postgresql92 rebuild

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • Since the device is dealing with LAN <-> WAN only and the ISP doesn't provide more than 3 Mbps, I'm not sure I would classify this as heavy traffic. The number of requests is also relatively low.

    It does seem to be web filtering, as disabling it drops the CPU-load to below 50%. Re-enabling it increases the load to 80-90%.

    I've also disabled activity logging on all Filter Actions, but that made little (if any) difference on the CPU-load.

    The CPU is a dual-core Atom D2500 running at 1.86GHz. Perhaps that's insufficient?

  • I'll give that (i.e. rebuilding the DB) a try if nothing else fixes it...

  • Ah, there was a typo in my previous answer. I mean the Load average and memory seems normal which indicates that the traffic is normal on the device. I have edited my previous comment. 

    What is the AV scan mode configured for Web Filtering in the Anti Virus tab, when you edit the filter actions. Configure it to single scan for maximum performance.

    Thanks

    Sachin Gurung
    Team Lead | Sophos Technical Support
    Knowledge Base  |  @SophosSupport  |  Video tutorials
    Remember to like a post.  If a post (on a question thread) solves your question use the 'This helped me' link.

  • Hi, can you please tell if the pharming protection is enabled? Just curious.

  • Dual AV scan was enabled for most Filter Actions. After I changed it to "single scan" the CPU-load dropped by 10-15% for 15-30 minutes, but after that it inched up again.

    However, since Wednesday the average CPU-load has been relatively - around 50%. This happened after I disabled Web Protection and re-enabled it.

    Postgres still jumps to 99% regularly, so I might take a look at that next.

  • GolfBisho said:

    Hi, can you please tell if the pharming protection is enabled? Just curious.

     

    Where would I find the Pharming Protection option?

  • At the bottom of the 'Misc' tab in 'Filtering Options' in 'Web Protection'.

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA