This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Issue: Sophos Central Admin – US-West region - Delays with the enforcement of Central policies on managed endpoints.

**Update 9** Root cause analysis KBA has been published: see knowledge base article for the latest.

**Update 8** As part of a routine database maintenance task customers may notice a few intermittent install and policy rendering failures. Please retry before contacting support. 7/17/2017 8:00 AM PST

**UPDATE 7** Some customers may notice a few intermittent install failures, please retry before contacting Sophos Support. 7/14/2017 2:00 PM PST

**UPDATE 6** Installations are being processed normally, service is restored. Please re-download installer from Central. 7/14/2017 9:00 AM PST

**UPDATE 5** Installations are now working as of July 13, 2017 19:00 UTC-5. See knowledge base article for the latest.

**UPDATE 4** New installs likely to still fail. http://centralstatus.sophos.com/#!/ has latest update. 

**UPDATE 3** System is now processing backlogs. Please see last updates here.

**UPDATE 2** Issue is ongoing, apologies. Impacts all areas within Central that rely on MCS communication between client and Central. 7/13/2017 8:00 AM PST

**UPDATE** Development has identified root cause and is working on a fix. 

Hello,

We are seeing delays with policy changes and enforcement in Sophos Central (US-West region) as well as installation failures due to inability of new endpoint installations to initially register. Our engineers are working to restore latency. Please note your endpoints remain protected. Updates will be provided on this thread.

KBA: https://community.sophos.com/kb/en-us/126477

Thank you,

Bob



This thread was automatically locked due to age.
Parents
  • We are still experiencing issues and delays with logging in to Sophos Central Admin Console.  I am unable to update the desktops that already have it installed it keeps failing.  I have tried to reinstall and it fails.  I have restarted and it still fails. 

     

    Is there another solution?  Our Security Team is becoming really concerned about the stability of Sophos.

     

    Thank you

     

    Aisha Smith

     

     

Reply
  • We are still experiencing issues and delays with logging in to Sophos Central Admin Console.  I am unable to update the desktops that already have it installed it keeps failing.  I have tried to reinstall and it fails.  I have restarted and it still fails. 

     

    Is there another solution?  Our Security Team is becoming really concerned about the stability of Sophos.

     

    Thank you

     

    Aisha Smith

     

     

Children
  • Thanks for all of your feedback on this subject.  We've heard you. 

    Our Sophos Central team has committed to providing more current and detailed technical information on our Central Status page.  The page has been recently updated and reflects the status of the last two days, including the incident that occurred earlier today. 

    As always, we will provide a complete Root Cause Analysis (RCA) when we are fully confident that we have completely addressed the issues affecting performance.  Until then, please refer to the Sophos Central status page for the most current and accurate information.  

    We are happy to speak with you, your customers or your management teams if you still need more information. 

    Michael Anderson, SVP Global Services

    michael.anderson@sophos.com   +1 408 334 7300  

  • Status page is showing green..but I'm still getting the same old issues...

     

    - UI is very slow, lots of spinning circles

    - Policy enforcement seems randomly applied

    - Installing the encryption module seems be hanging (logs show failure to connect to US-West)

    - Also - correct me if I wrong,  I am fairly certain in the past, when users tried to unsuspend a Bitlocker - if the encryption policy was applied, Bitlocker would re-suspend itself. Well apparently not anymore! I can unsuspend my bitlocker at the moment no problem. 

     

     

     

     

  • Slow policy enforcement and UI again , "last activity fields" haven't updated in 11 hours.

     

    Can the status page remain yellow or red until its actually fixed permanently? It shouldn't be green until the issues are actually resolved correctly. 

  • Yeah I pushed an encryption policy yesterday and the machine still hasn't picked it up.

  • OK I can now confirm that there are still massive delays with US west policies being pushed. A computer that I applied an Encryption policy to yesterday 8/14 just now triggered a Medium Event 'device that should be encrypted it not' alert 2:30pm 8/15 and then prompted the end-point to re-boot for the encryption process to begin.

  • MichaelAnderson said:

    Thanks for all of your feedback on this subject.  We've heard you. 

    Our Sophos Central team has committed to providing more current and detailed technical information on our Central Status page.  The page has been recently updated and reflects the status of the last two days, including the incident that occurred earlier today. 

    As always, we will provide a complete Root Cause Analysis (RCA) when we are fully confident that we have completely addressed the issues affecting performance.  Until then, please refer to the Sophos Central status page for the most current and accurate information.  

    We are happy to speak with you, your customers or your management teams if you still need more information. 

    Michael Anderson, SVP Global Services

    michael.anderson@sophos.com   +1 408 334 7300  

     

     

     

    For the last three days your clients here in the forum have said that they cannot push policies, yet the console shows 3 days of normal operation. Why is that? Is delayed policy updates now considered the normal operation? For example, the policy on my computer hasn't been updated since 8/13/17 despite me making changes yesterday to the global policy.

    Are you still working on resolving this currently? If so, can you make a note in the console and clear it out once we have a product that has its basic functionality back. We are coming up on 6 weeks of constant problems with policies not pushing and install problems...

  • I have a theory. This is an over-provisioning issue. They're adding too many customers and not adding (paying more for) additional AWS/IaaS resources quick enough. They won't acknowledge there is an issue until enough customers complain via the regular ticket/phone support channels. This is why Germany/Ireland didn't have the problem even though they're running the same Sophos software. The difference is the infrastructure.

     

    Again, just a theory as to why this affected one region but not others.

  • This seems very plausible. 

    I'm curious: regarding the uptick in US customers, do you think this is related to April's announcement of Sophos Central, overall market penetration, Gartner/Forrester/NSS Labs reports, relationships with more US channel partners, some kind of combination, etc.? 

  • I think the general growth and therefore load in the US region on the back of the ransomware outbreaks is certainly part of it.  

    If I was a gambling man I would bet on a new region being made available in the US region.  Germany was added in addition to Ireland pretty early on, mainly as a result of German data storage laws, so it can be done but would probably require quite a reasonable amount of infrastructure work for monitoring and deployment and then there is the test effort.  Probably more things to consider now than there was then as these systems grow in complexity.

    The act of moving existing data would also be work so I would imagine all new accounts choosing a NA region would land on a new region.  

    I'm sure any new region would most likely require a Central release and if they happen every 3 weeks\;  A new region is likely to be in that timeframe rather than making a few config changes in AWS.

    In the meantime, reducing message processing seems like the most obvious solution which would probably require and endpoint release of the MCS component.

    I'm sure once it's "fixed" from all sides the issue will be gone for good.

    Just my 2 cents!

     

  • I had a phone call with the Director of Product Management for Sophos Central a few days ago and can confirm most of your suspicions. Issues are on there end due to poor database logic and bad installer they released. This caused the endpoints to call out to Sophos Central and when they got no response, instead of backing off they increased their call out interval. Effectively DDoS'ing themselves as others have mentioned. What compounded this issue recently (And forgive me probably not using the correct wording as I do not work with Databases/SQL at all) is that they were trying to upgrade the database communications to asynchronous messaging, database querying updates, etc. For Germany and Ireland this went fine due to not many clients being on there. But US-WEST had major issues that we are seeing today and apparently trying to roll back now.

    What bothers me most is that he then went onto say that they knew 8 months ago that the database logic/infrastructure wouldn't be sustainable for a high load of clients. With a chuckle he said that it was a great problem for them and started bragging about "triple digit percent growth in such a short period" and how "this is a great problem for us (Sophos)". I responded that this is a terrible problem for your clients and not funny. Meanwhile we get the brunt of this by not being able to push effectively policies to our machines for 6 weeks. 6 weeks and still no resolution... He assured me they're working around the clock to get this fixed and it's their top priority. 6 weeks... If on Sophos status page it's being reported as "normal operations" and we've heard nothing otherwise from Sophos, yet there is a banner across the top of this screen saying their is issues at US-WEST still. Why can the status page not accurately reflect your issues? This is not fixed so stop saying it is. We aren't asking for much, just a functioning service that we pay a lot for and better communication.

    They're opening a new datacenter in the Midwest, but are only putting new clients to "ensure the best experience possible". Which I won't even comment further on. Also I was told that they're investing in better notifications for their clients. That will be a welcome addition to have Sophos be able to email us when we they're down.

    Is anyone else still having issues? I still cannot push policy updates to anyone's machine since August 13, 2017. Now I'm getting high alerts daily for our file servers, print servers, terminal servers and several user computers. I keep getting alerts saying that "one or more services is missing", yet they are all there and through my own figuring out it is pointing to a communication error with their cloud. Anyone else seeing that?