This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Upgrading? Need advice? Let us know!

I want to ask you, our esteemed customers, what information Sophos could provide to make upgrading easier. This applies to upgrades from any version, but I'd love to be able to address people's concerns about the upgrade from Enterprise Console version 3 to version 4 especially.

We've got a great couple of upgrade guides, but what else can we do? What would you like to know more about?

Want to tell us how to do our jobs?? :smileyvery-happy: We'd love to hear to your suggestions, so let's talk!!

Thanks,

Lil

:1039


This thread was automatically locked due to age.
  • John,

    Looking at my SUMTrace logs, I see for example:

    2010-03-10 16:15:01 : Wed Mar 10 16:15:01 2010 - No action
    2010-03-10 16:15:01 : Wed Mar 10 16:15:01 2010 - No action
    2010-03-10 16:15:02 : Wed Mar 10 16:15:02 2010 - No action
    2010-03-10 16:16:00 : Wed Mar 10 16:16:00 2010 - No action
    2010-03-10 16:16:01 : Wed Mar 10 16:16:01 2010 - No action
    2010-03-10 16:16:01 : Wed Mar 10 16:16:01 2010 - No action
    2010-03-10 16:16:02 : Wed Mar 10 16:16:02 2010 - No action
    2010-03-10 16:17:00 : Wed Mar 10 16:17:00 2010 - No action
    2010-03-10 16:17:01 : Wed Mar 10 16:17:01 2010 - No action
    2010-03-10 16:17:01 : Wed Mar 10 16:17:01 2010 - No action
    2010-03-10 16:17:02 : Wed Mar 10 16:17:02 2010 - No action
    2010-03-10 16:18:01 : Wed Mar 10 16:18:01 2010 - No action
    2010-03-10 16:18:02 : Wed Mar 10 16:18:02 2010 - No action
    2010-03-10 16:18:02 : Wed Mar 10 16:18:02 2010 - No action
    2010-03-10 16:18:02 : Wed Mar 10 16:18:02 2010 - No action
    2010-03-10 16:19:01 : Wed Mar 10 16:19:01 2010 - No action
    2010-03-10 16:19:02 : Wed Mar 10 16:19:02 2010 - No action
    2010-03-10 16:19:02 : Wed Mar 10 16:19:02 2010 - No action
    2010-03-10 16:19:02 : Wed Mar 10 16:19:02 2010 - No action

    You can see that every minute it's producing 4 entries (I have 4 packages in two subscriptions so that makes sense) with 'no action'. Even though the log states no action, SophosUpdateMgr is still busy. So this looks like the schedule is running every minute of every day rather than what's in the schedule 'every 15 minutes' for 'threat detection data updating'. Is that right?

    Also, was an update released today to SophosUpdateMgr.exe . Timestamp still shows last november but I noticed that the app has shutdown today and restarted - something that hasn't happened since reboot and I can see the total CPU time in taskman which was over 100 hours has gone back to just 36 minutes so far.

    Matt

    :1813
  • Curiously I'd be interested in a test perhaps you could carry out

    Yeah, sure that's what I get paid for (last time I got a shirt from Sophos was in 2005, but it's good quality and still looks great) :smileyvery-happy:

    You are distributing/pushing from your SUM to a remote share over a slow link? Even if this is not what you want me to test the idea sparked my interest.

    So I created a share in a laptop and throttled the adapter to 10Mbit full duplex (shame as I have 1Gb connectivity). Initial populating of the CIDs (Recommended - Win, Mac and Linux) "never" finished 'cause the laptop somehow lost the default gateway before SUM could fill the CIDs and it always starts from the beginning after a failed attempt (seems to check cidsync.upd which is the last file written).  The server's (2003 EE, SP2 - BTW) performance was not noticeably affected, CPU usage showed the usual pattern. Setting the adapter to full speed the next update cycle filled the share. Watched two updates (new IDEs) today and nothing out of normal on the server. 

    Decided to install a child SUM on the machine (why had it to be WIn 7? Doesn't exactly facilitate SUM install) and am now waiting for SUM to complete it's first update - which will not be soon, link utilization is about 500kbit (sic!) per second. You'll have to wait until tomorrow for the results Wonder whether using a child SUM is preferable over distribution when a slow link is involved.

    Christian

    :1825
  • Thanks Christian,

    Child SUM is not being implemented until I can get the master sorted. I also have some satelite offices where the file server carrying the CID is well below required CPU spec (it doesn't need to be high-spec, it's only file serving and fast enough for their needs).

    Much of the remote sites have ADSL's connected via VPN so the link speeds to those are considerably below 10Mbs. One office in Singapore we can get around 80-100Kbps max. CPU is maxed out here for the complete duration of the update to that site whereas I would have thought it to be relatively light as it just cannot get to the data as quickly as it can checksum it.

    Matt

    :1826
  • Hi Matt,

    The 'No action' part isn't relevant to SUM being busy. All these mean is that one of the schedulers has woken up (which they do every minute - yes, it's a pretty crude algorithm) and decided it doesn't need to trigger an action as it's not appropriate to the schedule (e.g. for the protection data on a 15 minute cycle, you'll usually get 14 no actions, then an action).

    More relevant is the log entries for what SUM is actually doing. There should only be a few ide releases a day, so sum should largely be idle. What's concerning is if something has gone wrong on your system such that the checks when there are no updates (which by default will occur every 15 minutes) are taking significant time.

    I don't think there has been a significant upgrade to SUM, but our operations group may have tinkered with something that may have caused sum to got through the self-update process to make sure it is on the correct version.

    Cheers,

    John Reynolds

    :1832
  • John,

    I'm not sure what the tinkering was, but I'm now seeing periods of 'no action' between each 15 minute cycle whereas I was only seeing 1 or maybe 2 previously. Whatever happened, it's calmed down a bit at the moment and I'm seeing some free CPU time. Checking without an IDE update is lasting around 40-60 seconds and with an IDE and deployment to local CID's around 2-3 minutes. Whatever happend, it's a considerable improvement at the moment.

    I'm going to put a CID on a remote site and look at the CPU usage during the distribution. I'll post up the SUMTrace from an update (once initial deployement has completed) to see if that issue has improved. Can you think of any reason (knowing the code) why CPU load would increase when bandwidth decreases?

    Matt

    :1833
  • You're welcome, Matt

    Didn't see the high CPU usage on my server. SophosUpdateMgr.exe maxes out one core (so it's 25% here) for about one minute but between those it's quiet. Ah - I see the discussion has resumed.

    Meanwhile some mulling over Remote CID vs. Child SUM. Maybe John could comment on this. Looks like initial deployment of a CID might "never" complete when there are frequent (i.e. the interval between errors is shorter than the time needed to deploy the whole subscription) errors in the connection. The distributing SUM seems to start all over again after an interruption. The child SUM resumed fetching data to the warehouse - at least my test suggests it.

    As for ADSL - isn't upstream even slower than downstream (guess that's why there's an A in ADSL)? And in distributing - what does SUM have to read from the CID when updating? Would this affect performance?

    Or course if the remote server's CPU is below spec you are caught between a rock and a hard place.

    Just wondering, Matt, (but perhaps you do not want to disclose this information) how the remote offices are managed over a link like this (I don't say it's slow, I remember when we celebrated the upgrade of our BITNET link from 4.800 to 9.600 baud)? How did you get Sophos on the machines in the first place? Guess Sophos wouldn't recommend synching the CID by some other less expensive means (are there even?).

    Christian     

    :1834
  • Hi Christian,

    Yes upstream rates are much lower but when copying files (via UNC), you may as well just consider the upstream rate regardless as the acks between packets slow the speed down to upstream rates all the time. As I said, one site is in Singapore where they have a 2Mbps/512Kbps ADSL (it's the best they can get in their district) and I'm looking at around 300ms latency which to me gives around 80Kbps throughput over a VPN link. I have a 10Mb Lan point to the internet (10Mb megastream link directly into Telehouse).

    CID deployment works fine to the sites (okay, it's slow but hey it gets there....). PC's all report back to my central EM config for monitoring which works just fine. I can remote manage PC/Servers using RDP which again is fine even on the slower links.

    What interests me more here though is that the CPU usage is very high constantly during remote deployment and I just wouldn't expect it to be. If I was crunching number on remote files at 80Kbps, I'd expect to have plenty of CPU spare while waiting for the file to come back but that's not what I'm seeing. This makes me think we've got a runaway thread somewhere in the update manager that's overeating CPU where it doesn't need to.

    Matt

    :1835
  • Hi Matt,

    I'm not sure what happened with your SUM. Either you were unlucky and looked at the trace logs when updates were actually happening, or something unknown went wrong and a restart of the service fixed it. The times you are reporting now are closer to what I'd expect, except that as I mentioned before the no update check is a bit longer than it should be. Hopefully we'll improve this. You should be able to get the times down by using fewer subscriptions.

    As to the CPU usage when deploying to a remote link, SUM just uses standard Windows file i/o when talking to UNC. It's possible that with a slow link there are more callbacks in the Windows level code, but I've not profiled this.

    Cheers,

    John Reynolds

    :1836
  • Christian,

    As to the performance of distributing:

    The distributing SUM will start again after an error, but shouldn't have to redo the parts it has done before. It's possible that this isn't as robust at it could be though, as an unreliable link to the CID isn't something we've specifically designed for.

    A child SUM should be efficient in resuming after failure (currently only on a per-file basis, though we're looking to improve this), so using a child SUM might be a more effective if your link is highly unreliable. This will also stop the CID being in an inconsistent state for periods when the link is lost.

    SUM will read from remote CIDs to check what is there and verify the structure of the CID.

    If you've got a really slow link you could always post a DVD with the warehouse to the branch office and get SUM to update from that. :)

    Cheers,

    John Reynolds

    :1837
  • "If you've got a really slow link you could always post a DVD with the warehouse to the branch office and get SUM to update from that. :smileyhappy:" - I don't think that's very good advice John. Zero day or near, infection are the most common issues that I deal with.

    This is all rock and hard-place stuff. SUM's too much for some server to run in child mode and CID deployment is brutal on slow links (but an explorer file-copy between same servers isn't). I may think about going back to the good-ol-days using sget in a batch script.

    By the way, I've never said that my links to remote sites are unreliable. I very rarely have a failure during CID deployment. Large updates do take time but succeed in what I estimate is over 99% of the time.

    Matt

    :1838