This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Huge CPU loads on edge servers related to concurrency?

Hello all,

As of recently we have been getting really high loads on 1 or 2 servers sporadically and I am trying to figure out if its related to our concurrency limits being set to high.  Here is a excerpt from my prstat when the loads go really high:

   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP
  5288 pmx       149M  119M run     46    0   0:00:13 2.3% pmx-milter/2
  5301 pmx       149M  119M run     47    0   0:00:13 2.3% pmx-milter/2
  5344 pmx       145M  109M run     38    0   0:00:10 2.2% pmx-milter/2
  5338 pmx       145M  110M run     46    0   0:00:10 2.2% pmx-milter/2
  5307 pmx       149M  119M run     39    0   0:00:13 2.2% pmx-milter/2
  5292 pmx       149M  119M run     48    0   0:00:13 2.2% pmx-milter/2
  5325 pmx       145M  110M run     48    0   0:00:10 2.2% pmx-milter/2
  5348 pmx       145M  107M run     48    0   0:00:09 2.2% pmx-milter/2
  5295 pmx       149M  119M run     45    0   0:00:13 2.2% pmx-milter/2
  5289 pmx       145M  115M run     48    0   0:00:11 2.2% pmx-milter/2
  5304 pmx       149M  119M run     44    0   0:00:13 2.2% pmx-milter/2
  5305 pmx       149M  119M run     45    0   0:00:13 2.1% pmx-milter/2
  5298 pmx       149M  119M run     47    0   0:00:13 2.1% pmx-milter/2
  5309 pmx       149M  119M run     48    0   0:00:12 2.1% pmx-milter/2
  5334 pmx       145M  105M cpu0    55    0   0:00:09 2.1% pmx-milter/2
  5306 pmx       149M  119M run     54    0   0:00:13 2.0% pmx-milter/2
  5294 pmx       145M  113M run     39    0   0:00:10 2.0% pmx-milter/2
  5332 pmx       145M  109M run     44    0   0:00:10 2.0% pmx-milter/2
  5327 pmx       145M  105M run     54    0   0:00:09 2.0% pmx-milter/2
  5336 pmx       145M  102M run     54    0   0:00:08 2.0% pmx-milter/2
  5346 pmx       145M  108M run     47    0   0:00:09 2.0% pmx-milter/2
  5323 pmx       145M  107M run     49    0   0:00:09 2.0% pmx-milter/2
  5287 pmx       149M  119M run     54    0   0:00:12 2.0% pmx-milter/2
  5293 pmx       149M  118M run     54    0   0:00:12 2.0% pmx-milter/2
  5296 pmx       149M  119M run     48    0   0:00:12 2.0% pmx-milter/2
  5318 pmx       145M  106M run     46    0   0:00:09 1.9% pmx-milter/2
  4906 pmx       151M  140M run     48    0   0:00:41 1.9% pmx-milter/2
  5354 pmx       145M  107M run     49    0   0:00:09 1.9% pmx-milter/2
  5320 pmx       145M  102M run     44    0   0:00:08 1.9% pmx-milter/2
  5285 pmx       149M  119M run     46    0   0:00:13 1.9% pmx-milter/2
  5290 pmx       149M  119M run     45    0   0:00:13 1.9% pmx-milter/2
  5330 pmx       145M   98M run     38    0   0:00:08 1.9% pmx-milter/2
  5291 pmx       149M  119M run     49    0   0:00:13 1.9% pmx-milter/2
  5299 pmx       149M  119M run     45    0   0:00:12 1.8% pmx-milter/2
  4872 pmx       152M  140M run     45    0   0:00:38 1.8% pmx-milter/2
  5286 pmx       149M  119M run     49    0   0:00:14 1.8% pmx-milter/2
  5341 pmx       145M  108M run     39    0   0:00:09 1.8% pmx-milter/2
  5316 pmx       145M  109M run     49    0   0:00:10 1.8% pmx-milter/2
  5352 pmx       145M  107M run     39    0   0:00:09 1.8% pmx-milter/2
  5343 pmx       145M  107M run     49    0   0:00:09 1.7% pmx-milter/2
  5308 pmx       145M  115M run     49    0   0:00:11 1.7% pmx-milter/2
  4898 pmx       152M  141M run     48    0   0:00:36 1.5% pmx-milter/2
  5350 pmx       145M  102M run     44    0   0:00:08 1.5% pmx-milter/2
  5371 pmx        24M   23M run     54    0   0:00:08 1.5% pmx-reports-con/1
  4909 pmx       155M  144M run     47    0   0:01:10 1.5% pmx-milter/2
  4911 pmx       173M  163M run     45    0   0:02:40 1.4% pmx-milter/2
  4907 pmx       153M  142M run     46    0   0:00:51 1.3% pmx-milter/2
  5415 pmx        15M   14M run     45    0   0:00:00 1.2% ppm/1
  4900 pmx       152M  140M sleep   59    0   0:00:37 0.9% pmx-milter/2
  4903 pmx       152M  141M sleep   59    0   0:00:43 0.8% pmx-milter/2
  4901 pmx       152M  142M sleep   59    0   0:00:40 0.7% pmx-milter/2
  4887 pmx       151M  141M sleep   59    0   0:00:34 0.5% pmx-milter/2
  5328 postfix  2856K 2056K sleep   59    0   0:00:00 0.3% cleanup/1
  4730 postfix  3856K 2584K sleep   59    0   0:00:00 0.3% smtpd/1
Total: 279 processes, 688 lwps, load averages: 51.07, 33.42, 21.40

I am thinking the loads are caused by an increase in spam, but what can we do to keep the system from getting pegged by the milter process?  Should we lower the concurrency?  Our concurrency is set to 50 as of the present.  The system is a Sunfire v480 with 8GB ram, with 2 1.2 sparc cpu's.

Thanks

:5029


This thread was automatically locked due to age.
  • Hello mrdky,

    This is a fairly complex question as there can be multiple issues contributing to the load.

    $ pmx-throughput

    If you run this as pmx, it gives you a brief summary of a minute by minute mail volume for the day (depending on your logrotate).

    You may be able to indentify if during these time periods you are receiving a legitimate mail spike. 

    Does the system take a long time to recover from these high system loads?

    It may also be worth your while to contact support with a Support Request so they can analyze the different setting in pmx.conf and advise you on some tuning. 

    Generally there isn't much of an issue with Postfix and concurrency settings.  Again, support could advise you better when the whole system is analyzed.

    Cheers,

    MarkJD.

    :5068
  • Thanks MarkJD, will take your advice and have sophos look at the server config.   The load peaks for 20 mins or so then returns to normal.  We have monitoring in place that shoots us alerts when the load peaks so I have been trying to figure out what it is hitting us during these periods of high load. Trying to determine if either it is an influx of inbound spam or could be something someone internal could be shooting out. Can you recommend anything that can printing inbound and outbound stats?  I guess we can hack up the pmx-throughput script if necessary to do this?  Thanks for the command really helpful.

    :5069
  • http://jimsun.linxnet.com/postfix_contrib.html

    I've used pflogsumm.pl in the past and it is a great way to get lots of good information about your mail flow:

    • Total number of:
      • Messages received, delivered, forwarded, deferred, bounced and rejected
      • Bytes in messages received and delivered
      • Sending and Recipient Hosts/Domains
      • Senders and Recipients
      • Optional SMTPD totals for number of connections, number of hosts/domains connecting, average connect time and total connect time
    • Per-Day Traffic Summary (for multi-day logs)
    • Per-Hour Traffic (daily average for multi-day logs)
    • Optional Per-Hour and Per-Day SMTPD connection summaries
    • Sorted in descending order:
      • Recipient Hosts/Domains by message count, including:
        • Number of messages sent to recipient host/domain
        • Number of bytes in messages
        • Number of defers
        • Average delivery delay
        • Maximum delivery delay
      • Sending Hosts/Domains by message and byte count
      • Optional Hosts/Domains SMTPD connection summary
      • Senders by message count
      • Recipients by message count
      • Senders by message size
      • Recipients by message size
      with an option to limit these reports to the top nn.
    • A Semi-Detailed Summary of:
      • Messages deferred
      • Messages bounced
      • Messages rejected
    • Summaries of warnings, fatal errors, and panics
    • Summary of master daemon messages
    • Optional detail of messages received, sorted by domain, then sender-in-domain, with a list of recipients-per-message.
    • Optional output of "mailq" run

    As you can see, there is lots of extremely useful metrics that can be extracted from your maillog with this tool.  It is not a Sophos tool, but it is free.

    I'm sure this would be a quick way to summarize what might be going on. 

    Cheers,

    MarkJD.

    :5070
  • Cool, Thanks again.  I'll give pflogsumm.pl a shot.  Looks very promising. 

    Thanks for the help.

    :5072