This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Upgrading? Need advice? Let us know!

I want to ask you, our esteemed customers, what information Sophos could provide to make upgrading easier. This applies to upgrades from any version, but I'd love to be able to address people's concerns about the upgrade from Enterprise Console version 3 to version 4 especially.

We've got a great couple of upgrade guides, but what else can we do? What would you like to know more about?

Want to tell us how to do our jobs?? :smileyvery-happy: We'd love to hear to your suggestions, so let's talk!!

Thanks,

Lil

:1039


This thread was automatically locked due to age.
  • Hi Matt,

    Thanks for your comments.

    I am one of the SUM developers, and hopefully I'll be able to shed some light on what is happening in your case.

    Each additional subscription does indeed put a significant load on SUM, as SUM will treat each independently when performing consistency checks on the products. SUM is generally designed to be very careful about the correctness of downloads. This means that if for instance you have the SAV XP product in more than one subscription, then when creating and distributing the CIDs it will indeed use significant CPU time to check the correctness of the SAV XP product multiple times.

    However, none of this should affect the functioning of SUM for three reasons:

    Firstly, even if you have the same product in multiple subscriptions, SUM will only download the files once, so you shouldn't suffer from additional network overhead.

    Secondly, (apart from the load on your server) it doesn't matter if the update takes longer than the check time (which is actually 10 minutes by default). If an update is already occurring, then the check is skipped.

    Thirdly, SUM will only do the CPU-heavy distribute if the product has actually changed. This should only be a few times each day.

    I've put together a test case on a virtual machine to try to replicate your configuration: I've got 7 subscriptions each with a variety of products in it (including some of our more obscure platforms).

    On my system the initial update took nearly 30 minutes. This is a little slow (due to the multiple complex subscriptions), but it did complete successfully.

    Subsequent to this, a check for updates takes about a minute. Though I've just noticed a defect in SUM that means this is taking longer than it should: hopefully we'll get this down to a few seconds in a maintenance release.

    To summarise: yes, if you've got lots of subscriptions SUM can take a while to update, but this shouldn't compromise it's functioning or the protection of your systems.

    I'll contact you directly to see if we can understand what you are trying to do with your configuration, and whether we can come up with an improvement for you, or whether this is a case we are going to need to optimise in future SUM development.

    I hope that was informative. Please feel to post again if you would like more clarification.

    Cheers,

    John Reynolds

    :1749
  • Oops, I've just realised that I didn't comment on your point about the email notification. Sorry about that.

    SUM operates on very different model to EMLibrary: SUM itself doesn't have a user interface, and all management, reporting and alerting is done by integration with SEC.

    You can get some email notifications of updating state from SEC. However, you are correct that this is more limited than the email alerts from EMLibrary. If there is enough demand for this (thanks for letting us know!) then this will be improved in a future SEC release.

    The ides that an endpoint has can be seen from the computer details view of the endpoint. I realise that this isn't always as convenient as the email alerting.

    If you select 'View Update Manager Details' on a SUM, then various information (including the times of the last real binary and protection data updates) is available.

    Apart from the issue that you've come across, SUM is much faster at checking for updates than EMLibrary, which is why it now checks for ide updates frequently. This should reduce the latency of getting ide updates to your systems.

    Cheers,

    John Reynolds

    :1750
  • Hi John,

    Some interesting points. Can I just pick up on a few parts from that answer:

    Though SUM only actually downloads the files once, it's not the download that's CPU intensive. I'm assuming (please correct me if I see the mechanics of this incorrectly) that you download changes to the C:\Documents and Settings\All Users\Application Data\Sophos\Update Manager\Update Manager\Warehouse folder updating the checksum 'fileliststore.dat'. So there's a necessary checksum done on this folder which is 425MB's for a subscription of 4 packages on my test system.

    Once the warehouse is built, the local distribution CID folders are updated into c:\Documents and Settings\All Users\Application Data\Sophos\Update Manager\Update Manager\CIDs folders. Once again you checksum and rebuild these folders. and now for a 2 subscription distribution of 4 packages and 1 packages turns 425Mb's in my test platform to 830MB's made up of 667MB's in the CID of all four and 163MB's of the CID that's just one.

    Looking at what's happening here, I'd have to argue with the timings. I'm not seeing "SUM will only do the CPU-heavy distribute if the product has actually changed" each 15 minute interval, I'm seeing the SUM do a full checksum on the warehouse folder (the 650MB's) which it seems to thrash for around 60 seconds followed by a time period of around 7-10 minutes where the CPU load is very high but there's very little disk activity. Is this the " Though I've just noticed a defect in SUM that means this is taking longer than it should" that you too are seeing?

    John, you've got my details, perhaps you can PM me and we can perhaps ping ideas around on how to improve this?

    Matt

    :1756
  • It's possible that something is going wrong on your system; it certainly shouldn't be as bad as what you seem to be seeing.

    On my (slow) test VM with 7 complex subscriptions, a full 'update now' (which forces a paranoid check of all the files) takes less than half an hour, a standard ide update takes about 5 minutes, and a no-update check takes about a minute. It's only the minute that I was commenting on the bug about: the no-update check should only take seconds.

    I think the reason you are seeing high cpu load is what you are seeing is actually SUM doing multiple checksum runs due to the multiple subscriptions. However, with reasonable amounts of RAM most of this will be cached and so CPU bound not disk bound.

    If you are really seeing this every update period, then something is amiss. It shouldn't be checksumming things if nothing in the upstream warehouse has changed. It could however just be that you've been unlucky and there have been updates during the time you've been looking at it. I'd have to have a look at the logs to determine this.

    Cheers,

    John Reynolds

    :1757
  • Hi John,

    Is 1.5GB's sufficient? I'm currently showing 586MB's free according to taskman at the moment. CPU maxed out right now by SophosUpdateMgr.exe. OK, I admit that the CPU isn't up to your spec but it's easily adequate and runs MD5's on the warehouse folder in a fraction of the time that SUM seems to take. I did say that it thrashes the disk for about 60 seconds (I say that's ~ ok for a quick checksum of 450MB's) and then maxes out CPU only for the remaining time. So you maybe right, it's cached but it's still very busy even when no update is required.

    I am indeed seeing this every update check and on this system at the moment, there are just the two subscriptions as detailed with 4 packages in one and 1 in another - nothing particular complex. When I set up the full subscription, the server was not usable at all, maxed by SophosUpdateMgr.exe 24/7.

    Perhaps I'm significntly more affected by this bug?

    Matt

    :1759
  • I  have the feeling that's something wrong beyond design - I've used a 512 MB/2GHz neither-end notebook as test SUM (not many subscriptions but a CID on an USB stick) and interactive work got impossible when the disk refused DMA and reverted to PIO with interrupts eating half the CPU and SUM still managed to complete it's tasks. Interactive work - of course impossible, but at this point the machine was unusable even without SUM.

    Thanks for your input, Matt, but please distribute it more evenly - I might get bored next week :smileywink: More on this and you other topics tomorrow.

    Christian

    :1761
  • Hi Christian,

    I'm talking about a server with SCSI-3 disks. DMA/PIO - not the issue here.

    'Please distribute it more evenly' - huh?

    Matt

    :1767
  • Hello Matt,

    Please distribute it more evenly

    Oh - you brought up a number of issues on the same day and I'm struggling to keep up with the discussions. You should save something for next week :smileywink:

    I'm talking about a server - yup, I'm aware. All I wanted to say is that although there's potential for improvement (to put it euphemistically) it should be usable as it is.

    Took a look at our main server: Warehouse is 500+MB and the server is 4GB, 4 Xeon 2.33GHz xSeries. SophosUpdateManager uses up to 25% CPU for a period of slightly less than 60 seconds. Whenever there is some update it seems to do a lot of unnecessary work - decoding everything and starting deployment only to discover [...] but no new files were decoded and Deployment [...] has already been done [...]. Takes about 7 minutes minimum. Looks like we get roughly the same numbers. I agree this is - to use another euphemism - suboptimal.

    OTOH - the server can take it even though it's hosting another resource hog.

    Christian

    :1773
  • Hi Christian,

    Yep, you're seeing the same issues on a machine that's considerably more powerful than my server. There is a problem here so we need to find it and get it rectified.

    Curiously I'd be interested in a test perhaps you could carry out or maybe you're already doing it. I find that if I distribute a CID across a slow link (VPN Wan connection works great for this), the CPU rises and stays high much longer whereas you'd expect completely the opposite. As the ability to talk to data reduces, the CPU usage should fall with it. i.e. if it cannot get to data rapidly, the CPU cannot checksum as quickly therefore CPU usage drops. If I were to speculate, I'd say that there are threads that are watching other threads and it's the watching threads causing the CPU drain rather than the action threads. I've see this exact same behaviour in one of my own systems in the past and it turned out to be one of those 'doh!' moments - fixed that and the system was light as a feather.

    Matt

    :1775
  • Hi Matt,

    It does sound like there is something strange going on.

    If you look at the SUM trace logs (application data\Sophos\Update Manager\logs\SUMTrace*.log) you'll get a (rather too) detailed look at what SUM is doing (with timestamps, so you can see how long things are taking).

    For a simple update check (where nothing has changed), none of the Decode, CID Generation or CID Deploy operations should take more than a second or two, as SUM should nearly immediately work out that it's the same version and skip the operation. The bug I mentioned earlier is that the GatherCurrencyData operations (which examine software versions in detail to correlate with endpoints) are taking longer than they should (around 6 seconds not 1 second), as they are being redone each time.

    In this case, the check shouldn't take a long time. I've just run another test with a complex set-up on a VM with it taking about 2 minutes using on average about 50% cpu on the single core of that machine.

    If you are seeing significantly different behaviour to this, then something else is going wrong, and we'll need to investigate more.

    Cheers,

    John Reynolds

    :1794