This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Issue: Cloud Web Gateway unable to establish a connection with the cloud

**UPDATE 6** Statement from Product Management in KBA: https://community.sophos.com/kb/en-us/126926 

**UPDATE 5** ChromeOS/Chrome browser agent performance should be back to normal, though there might still be some delay in event reporting during peak hours. Ongoing issues with CWG agents (delays or gaps in event reporting) are still being investigated.

**UPDATE 4** Reports coming in indicating issue is still present. 

**UPDATE 3** As of this morning, the outage is confirmed as resolved. Backlog of events should now be processed and operation should be at 100%. Please let us know below if you are still seeing this issue.

**UPDATE 2** Backlog of queued events are finishing synchronization, after this is complete service should be restored. 

**UPDATE** Chromebooks with extension enabled are unable to browse web. 

Hello,

Currently, Cloud Web Gateway agents are unable to establish connection to the cloud, and may report with a status of “Security Enabled Activity Logs Delayed”. Actions are currently taking place that will resume service. Updates will be provided on this thread.

Thank you,

Bob



This thread was automatically locked due to age.
Parents
  • Our logs are about 24 hours behind again.  The last update to my ticket told me that the delay is during peak hours:

    "Right now we are see that  the peak hours seem to be East coast business hours, 5:30am to 3:30pm PDT / 8:30am to 6:30pm EDT / 2:30pm to 11:30am CEST."

    If that is the case why do the logs never catch up during "off-hours"?  I also received this, but it lead to more questions than answers:

    "This is the info I have gotten from our L3/Dev team, please contact us if you have anymore questions

    Recently, Sophos experienced an outage impacting our Central Web Gateway infrastructure. This outage affected the ability of Central Web Gateway agents to communicate with Sophos Central Admin, resulting in an initial period where event logging and reporting was disrupted. This disruption did not affect the operation of Windows and MacOS clients, as they were still able to filter and block web traffic using local copies of web policies. However, the ChromeOS and Chrome browser agents were impacted because they require a connection to the cloud at all times. Chrome agents were unable to retrieve and filter web traffic while the outage was ongoing.

    The initial outage was caused by a bug in the Central Web Gateway services that manifests itself only under high load. Although Sophos services have redundancy of capacity, using multiple servers in multiple data centers within each, when one server became unresponsive due to the bug, additional load was put on other servers and a domino effect occurred. As a result of this outage Sophos will be reviewing service deployment processes to ensure that this type of incident does not occur again in the future.

    Resolution of the initial outage led to a follow-on period of outage as our systems recovered. The initial outage meant that most Central Web Gateway agents had built up a large backlog of queued event reports. Agents are designed to queue up events in the result of an outage or loss of internet connectivity to prevent loss of reporting data whenever possible. After the initial outage was resolved, our cloud services saw events coming in at five times the normal rate. This load caused further communications problems for agents. In addition, we believe that it also led to some loss of event or report data here and there, but this should be minimal. 

    At the current time, processing of agent events and reports has returned to normal, although there may be delays in reporting and event logging during peak times due to current infrastructure limitations. Sophos engineers are working to improve the efficiency of communication and processing between the Central Web Gateway agents and cloud services.  This work will take a few weeks to complete and fully test. Once it has been rolled out, following our improved deployment processes, performance will improve and delays during peak times will be significantly reduced."

    This would imply that the policy issues only affected Chrome, but I can say first-hand that this is false.  I am very concerned that the work to upgrade/repair the system is still projected to be weeks out.  I suppose we are very lucky this wasn't the AV definition servers that have failed, but this is still a major impact.  This has been ongoing for about a month now.  What is the delay in improving the infrastructure?
  • I also just discovered that the CWG is not functioning normally.  I tried various categories under the test site and found about 50% do not BLOCK/WARN as specified.  It is particularly alarming that 'Phishing and Fraud' is allowed.  I strongly warn anyone considering this product to look very closely at issues such as this before making a decision.  We are stuck in a 3-year contract with this broken and under-supported software.

Reply
  • I also just discovered that the CWG is not functioning normally.  I tried various categories under the test site and found about 50% do not BLOCK/WARN as specified.  It is particularly alarming that 'Phishing and Fraud' is allowed.  I strongly warn anyone considering this product to look very closely at issues such as this before making a decision.  We are stuck in a 3-year contract with this broken and under-supported software.

Children
No Data