This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

HTTPS decryption: Some users cannot browse site: Certificate expired yesterday

We're having a strange situation again after it happened last week already on our SFOS 19.0.1 XG430:

Some users browse to a website that has no exceptions on our firewall for decryption.

The browser (firefox or chrome) show an error that the site is not secure. If you check for the details, you can see, the website has been re-encrypted by the firewall root ca - so far so normal - but the certificate shown has expired yesterday. That is why the page is blocked. This situation is only for some users that are behind a SD-RED60 from our current debugging.

I can see the users that hit the issue with some website do not appear in the TLS logs, insteady their requests only appear in the Webfilter Logs.

Those users that can access the websites, show in both: Webfilter and TLS logs.

Here one example for the website handelsregister.de

Here it works:

Decrypted by Sophos EP on last instance. Cert valid until may 10th.

Lets check the real certificate Chain:

And this is it on the endpoints behind SD-RED60:



This thread was automatically locked due to age.
Parents
  • May be related to NC-100265 which was resolved in 19.5 GA.
    Certificates created by the XG for HTTPS decryption expire after two years.  During the upgrade to 18.0 GA (released Jan 2020) we cleared the cert cache but has not been cleared automatically since.  So people who upgraded/installed to 18.0 GA really early and are not yet on 19.5 GA may experience this problem.

    The workaround is relatively simple.

    Web Service will be interrupted for a minute or two, so do this during off hours.  Non web traffic will not be affected.

    touch /var/certcache/.clear_all_certs_on_reload
    service -ds nosync awarrenhttp:restart

    If this does not resolve the problem it may be a different cause - complicated by the fact that you have XG, RED, and EP all potentially trying to do HTTPS decryption.

  • Thank you all for your posts.

      I think you described the exact issue here:

    I checked the cert cache which I was not aware of.

    I can see the 2 certs for handelsregister.de one for the root, one for www.

    The root redirects to the www currently and the www cache cert has expirted yesterday as can be seen on the browser issue.

    XG430_WP02_SFOS 19.0.1 MR-1-Build365# ls /var/certcache/*handelsregister.de* -ali
    23241307 -rw-------    1 root     0             1814 Mar 20 14:43 /var/certcache/handelsregister.de#1
    23069512 -rw-------    1 root     0             1679 Jan 15  2021 /var/certcache/www.handelsregister.de#1  (the expired one)

    We have never been on 18.0 GA. I think we started with 18.0.1 or 18.0.2 but we came from 17.5.12.

    Answer to and  The endpoint only decrypts on my admin workstation with EAP. I  took the screenshots on my workstation. Yes, the endpoint Root CA makes it even more complicated.

    But we have the root cause elsewhere.

  • I could also identify the same issue for other websites reported today by the certificates in that path.

    Now was a good time to reset the cache folder.

    XG430_WP02_SFOS 19.0.1 MR-1-Build365# cd /var/certcache
    XG430_WP02_SFOS 19.0.1 MR-1-Build365# ls |wc -l
    172639
    XG430_WP02_SFOS 19.0.1 MR-1-Build365# touch /var/certcache/.clear_all_certs_on_reload
    XG430_WP02_SFOS 19.0.1 MR-1-Build365# service -ds nosync awarrenhttp:restart
    200 OK
    XG430_WP02_SFOS 19.0.1 MR-1-Build365# ls |wc -l
    0

  • You don't need to have run 18.0 GA to experience this.  You just need have have run anything after 17.5, two or more years ago.

    Based on the screenshot of filedate I suspect the workaround I posted wont fix it.  But it would be good to try anyway to rule it out.

  • we upgraded from 18.0 MR1 to 18.0 MR4 on 09.01.2021 - the expired certificates have been created just after that date.

    We did the upgrade with a re-image and a backup restore and have been guided by a Sophos Senior Sales SE

    On 15.01.2021 (the time stamp of the cache cert files) we brought back the AUX node to the HA cluster.

    I can say exacltly cause I have all the steps written on a document.

    an other expired cert:

    23070120 -rw-------    1 root     0             2256 Jan 15  2021 /var/certcache/www.hiddenforprivacy.de#1

  •   will the cert cache folder not fill up again? I confirmed I use classic webproxy for a test and browsed some of the sites notified today. the cert cache is now still completely empty - 0 files.

    btw: what does HA Sync do? Should'nt the peer node have either a blank cache folder or get synced with the contents of the primary node?

    XG430_WP02_SFOS 19.0.1 MR-1-Build365# ssh -F /static/ha/hauser.conf hauser@10.1.178.2
    2023-03-20 22:01:52Z Warning: Permanently added '10.1.178.2' (ED25519) to the list of known hosts.
    XG430_WP02_SFOS 19.0.1 MR-1-Build365# cd /var/certcache
    XG430_WP02_SFOS 19.0.1 MR-1-Build365# ls |wc -l
    171661

    XG430_WP02_SFOS 19.0.1 MR-1-Build365# ls -ali www.handelsregister.de#1
    49291009 -rw-------    1 root     0             1814 Jan 14  2021 www.handelsregister.de#1

  • will the cert cache folder not fill up again

    in the meantime some files have been generated in there again. it probably just takes some time and they get not created at the time when https websites are accessed.

  • AFAIK the certcache is not sync'd between nodes.
    The certcache will gather more and more certificates.

    The problem with using certificates more than two years ago is resolved in 19.5 GA.  So sometime in the next two years you will need to upgrade.  By that time v21 or something might be out.  :)

    Did that resolve the problem?

Reply
  • AFAIK the certcache is not sync'd between nodes.
    The certcache will gather more and more certificates.

    The problem with using certificates more than two years ago is resolved in 19.5 GA.  So sometime in the next two years you will need to upgrade.  By that time v21 or something might be out.  :)

    Did that resolve the problem?

Children
  • Did that resolve the problem?

    what resolved what?

    We're about to upgrade 19.0.1 to 19.5.1 in 10 days.

    clearing the cache worked

    we could also find out, the users affected were using classic web proxy. turing them over to TLS engine fixed the issue, too.

    for other readers: NC-100265 Web Expired certificates in certcache are used rather than generating new ones. fixed in 19.0.2 or 19.5

    We're on XG430 (SFOS 19.0.1 MR-1-Build365

    in the initial post I wrongly wrote we were on 19.0.2