Randomly no connection to internet websites but could ping urls works.

Hi guys.
I know there are some thread similar, but this one is different and very strange.

Randomly, once or twice a day, for about one to 5 minutes, we are loosing connection to internet.
Right after that time every thing go back to normal again.

Few thing to noticed:

1. We still can ping Urls

2. DNS seems to work.

3. Accessing URL which are in the DMZ doesn't work as well.

4. I'm not sure if there are more fore shorter time, but this what i know of from my customers.

 

Any help will be appreciate.
[:)]

Goldy

Parents
  • Shalom Goldy,

    1. Not sure what you mean.  Are these pings to FQDNs on the Internet?

    2. My gut feeling is that cached FQDNs work but that your ISP has a problem and will not allow resolution of un-cached FQDNs.  How is DNS configured compared to DNS best practice?

    3. Are you running split DNS?  Is the Internal-to-DMZ traffic handled by the UTM's web proxy?

    4. Is this happening at multiple locations?

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • Hi Bob.
    [:)]

    1. Ping to 8.8.8.8 or Youtube (for example) works.

    2. I don't think it's DNS issue sins dns work fine and resolving are ok. (Ping - google.com, Nslookup...)

    3. strangely, it's comes and go randomly, and for about few seconds to 2 minutes.

    4. When it's happens, it's effect all my Lan.

    5. When it's happens, I can't connect connect to the firewall (web) eather.

    6. When it's happens, I can't connect to to the web service of my mail (In the DMZ).

    7. In all cases, ping still works fine.

    It seems kind of issue with the TCP, since ICMP works.
    Very strange :)

    Thanks

    Yaron Gold

     

  • What does Sophos Support say about this strange phenomenon, Yaron?

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • The don't for now. [:D]

    Right now they try to resolve first another issue:

    Root partition is filling up - please check. Current usage: 85%
    HA Status          : HA SLAVE (node id: 2)

    This is something to do with  Postgres in my SLAVE device.
    (That what they think).

    Maybe i'ts all connected.

    I hop so...

  • What version are you on?

    What result do you get from the following?  Also, run the same command after switching to the Slave with ha_utils ssh.

    du -shx /var/storage/* | sort -rh | head -10

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
Reply
  • What version are you on?

    What result do you get from the following?  Also, run the same command after switching to the Slave with ha_utils ssh.

    du -shx /var/storage/* | sort -rh | head -10

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
Children
  • Hi Bob.

    9.703-3

    > 3.1G    /var/storage/swapfile
    > 701M    /var/storage/chroot-http
    > 132M    /var/storage/chroot-clientlessvpn
    > 94M     /var/storage/chroot-smtp
    > 29M     /var/storage/chroot-revers
    > 17M     /var/storage/chroot-pop3
    > 3.9M    /var/storage/chroot-ftp
    > 16K
    > 16K     /var/storage/agent
    > 4.0K    /var/storage/pgsql92

     

    [Y]

  • Looks good.  What about on the Slave?

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • We are having the exact same issue. Several times a week the webproxy stops working and people are unable to browse the web. The webproxy log shows this when it happens, no other traffic is logged.

     

    2020:07:17-16:44:12 firewall-1 httpproxy[7061]: id="0003" severity="info" sys="SecureWeb" sub="http" request="(nil)" function="confd_config_reload_func" file="confd-client.c" line="658" message="reloading config done, new version 30092"

    2020:07:17-16:45:23 firewall-1 httpproxy[7061]: id="0003" severity="info" sys="SecureWeb" sub="http" request="(nil)" function="confd_config_reload_func" file="confd-client.c" line="594" message="reloading config"

    2020:07:17-16:45:24 firewall-1 httpproxy[7061]: id="0003" severity="info" sys="SecureWeb" sub="http" request="(nil)" function="confd_config_reload_func" file="confd-client.c" line="658" message="reloading config done, new version 30094"

    2020:07:17-16:45:36 firewall-1 httpproxy[7061]: id="0003" severity="info" sys="SecureWeb" sub="http" request="(nil)" function="confd_config_reload_func" file="confd-client.c" line="594" message="reloading config"

    2020:07:17-16:45:36 firewall-1 httpproxy[7061]: id="0003" severity="info" sys="SecureWeb" sub="http" request="(nil)" function="confd_config_reload_func" file="confd-client.c" line="658" message="reloading config done, new version 30095"

    2020:07:17-16:45:56 firewall-1 httpproxy[7061]: id="0003" severity="info" sys="SecureWeb" sub="http" request="(nil)" function="confd_config_reload_func" file="confd-client.c" line="594" message="reloading config"

    2020:07:17-16:45:57 firewall-1 httpproxy[7061]: id="0003" severity="info" sys="SecureWeb" sub="http" request="(nil)" function="confd_config_reload_func" file="confd-client.c" line="658" message="reloading config done, new version 30096"

    2020:07:17-16:45:58 firewall-1 httpproxy[7061]: id="0003" severity="info" sys="SecureWeb" sub="http" request="(nil)" function="confd_config_reload_func" file="confd-client.c" line="594" message="reloading config"

    2020:07:17-16:45:59 firewall-1 httpproxy[7061]: id="0003" severity="info" sys="SecureWeb" sub="http" request="(nil)" function="confd_config_reload_func" file="confd-client.c" line="658" message="reloading config done, new version 30097"

    2020:07:17-16:46:22 firewall-1 httpproxy[7061]: id="0003" severity="info" sys="SecureWeb" sub="http" request="(nil)" function="confd_config_reload_func" file="confd-client.c" line="594" message="reloading config"

    2020:07:17-16:46:23 firewall-1 httpproxy[7061]: id="0003" severity="info" sys="SecureWeb" sub="http" request="(nil)" function="confd_config_reload_func" file="confd-client.c" line="658" message="reloading config done, new version 30100"

    after about 5 minutes the proxy is working again without me doing anything and people are able to browse the internet again. During the webproxy outage all other traffic (firewall and NAT rules) is working fine.

    running version 9.703-3

    space on master and slave:

    <M> firewall:/home/login # du -shx /var/storage/* | sort -rh | head -10
    13G /var/storage/chroot-http
    3.8G /var/storage/pgsql92
    3.1G /var/storage/swapfile
    291M /var/storage/logfilter
    144M /var/storage/cores
    95M /var/storage/chroot-smtp
    94M /var/storage/chroot-reverseproxy
    88M /var/storage/chroot-clientlessvpn
    18M /var/storage/chroot-pop3
    7.9M /var/storage/samba
    <M> firewall:/home/login # ha_utils ssh

    <S> firewall:/home/login # du -shx /var/storage/* | sort -rh | head -10
    3.8G /var/storage/pgsql92
    3.1G /var/storage/swapfile
    947M /var/storage/chroot-http
    289M /var/storage/logfilter
    168M /var/storage/cores
    125M /var/storage/WAF20181030.pcap
    108M /var/storage/WAF.pcap
    92M /var/storage/chroot-smtp
    88M /var/storage/chroot-clientlessvpn
    73M /var/storage/chroot-reverseproxy
    <S> firewall:/home/login #

    Franc.

  • Hoi Franc,

    About four years ago, member Adam Mickiewicz reported a similar reload issue.  He found that the reloads were caused by DNS Host and DNS Group definitions in the Transparent Mode Skiplist re-resolving near the same time.  Could this be your issue?

    What do you learn from the following compared to your log lines above?

    zgrep '17\-16\:45' /var/log/confd/2020/07/*17*|grep resolve|grep -oP 'objname=".*?"'|sort -n|uniq -c

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • Hoi Bob,

    here are the results of your command:

    <M> firewall:/home/login # zgrep '17\-16\:45' /var/log/confd/2020/07/*17*|grep resolve|grep -oP 'objname=".*?"'|sort -n|uniq -c
    1 objname="iprep4.t.ctmail.com"
    <M> firewall:/home/login #

    Groeten,

    Fanc.

  • Checked it also:

    1 objname="iprep4.t.ctmail.com"
          1 objname="microsoft_9"

  • Sounds like both of you guys should get a case open with Sophos Support.

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • We already have. Sophos is taking a look next week.

  • Thanks Bob.

    Already did.[:)]

     

    I'll inform you when something new will come up.

  • Hi Franc.

    Could you please check the root partition to see how full it is?
    Together with the above issue, we are having "Root partition is filling up - please check. Current usage: 86%".
    I'm suspect it's related to the the connections issue.
    The blame seems on Postgress.DB, but we don't know why.