This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

[Solved] WAN UP Link Error

Hi guys!

I am facing an annoying issue...

For some reason my WAN interface reports as UP but with link error

When this happens my backup 3G connection takes over, after a while WAN is up again and I get a ton of emails that link is down/up again 

I have searched the forum and found other issues like this and I saw that the solution was to fix the port speed on the ISP modem with the same speed as the UTM port.

However my situation is different and I don't know if something like the above applies here.

My connection is this: Phone cable-->ADSL modem (router in bridge mode) -->Sophos WAN interface via Ethernet

The WAN interface is PPPOE

The thing is that the led on the Modem which indicates if it is synchronized with the broadband line is on, so the modem does have connection

However since UTM constantly goes to link error I keep losing Internet connection

Two nights ago I performed a factory reset in Sophos (BTW a note that factory reset erases all logs would be a good reminder in the GUI - I would have saved that 15 month worth of logs before reset...)and restored configuration from a backup. Things seemed to stabilize, so thought I got this fixed.

Yesterday evening it started the same thing. And additionally I had broadband disconnects on the Modem (about 130 according to my ISP). I performed a factory reset to the modem, set it up again and things again appeared as stable.

This morning I saw there were about 20 more up/down emails on my mailbox (the 3G kicks in when this happens) from 6 a.m. until 7 a.m.

From that time on it has been stable...

Do you guys have any ideas?

Which log should I check when this happens?

Does the port fixed speed apply to my situtation? If yes where do I set the speed?

Thanks a lot in advance



This thread was automatically locked due to age.
Parents
  • Hi,

    Status UP but Link Error is received when the UTM is able to communicate with the directly connect peer but the ISP connection is down or incorrect IP address is configured. Now as it is a PPPoE link, the IP association will be dynamic. 

    Monitor the fallback.log and kernel.log for any information on the error. To verify the connection take a pcap and verify the packet communication.

    Thanks

    Sachin Gurung
    Team Lead | Sophos Technical Support
    Knowledge Base  |  @SophosSupport  |  Video tutorials
    Remember to like a post.  If a post (on a question thread) solves your question use the 'This helped me' link.

  • Thanks a lot for your help.

    My ISP says that there is no problem at their end, but if I want this to be escalated as a case in their tech department, their policy is to disconnect anything but the modem and leave it like this for at least 4 hours and monitor for any disconnects (bull, if you ask me..)

    When I get home I will also try setting the link speed to 100Mbit fixed for the WAN, because I know that the modem's port are 10/100 and see if that helps (I doubt it, though..)

    I could do it remotely but I suspect it will drop the VPN and leave the house without internet until I get back in the afternoon.

    Thanks also for the hints regarding the logs... 

    I was taking a look at Service Monitor Demon log and this is what I saw a couple of hours ago when I had another disconnect:

    2016:11:11-13:58:37 utm service_monitor[17784]: id="4000" severity="info" sys="System" sub="loadbalancing" name="REF_NetAvaOpendns ICMP 208.67.222.222 changed state to ONLINE"
    2016:11:11-13:58:37 utm service_monitor[17784]: id="4000" severity="info" sys="System" sub="loadbalancing" name="Set Availability Group REF_NetAvaOpendns to 208.67.222.222"
    2016:11:11-13:58:37 utm service_monitor[17784]: id="4000" severity="info" sys="System" sub="loadbalancing" name="REF_NetAvaOpendns ICMP 208.67.220.220 changed state to ONLINE"
    2016:11:11-13:58:37 utm service_monitor[17784]: id="4000" severity="info" sys="System" sub="loadbalancing" name="Group UP"
    2016:11:11-13:58:37 utm service_monitor[17784]: id="4000" severity="info" sys="System" sub="loadbalancing" name="Set Availability Group REF_NetAvaGoogledns to 8.8.8.8"
    2016:11:11-13:58:38 utm service_monitor[17784]: id="4000" severity="info" sys="System" sub="loadbalancing" name="Set Availability Group REF_NetAvaOpendns to 208.67.220.220"
    2016:11:11-13:58:40 utm service_monitor[17784]: id="4000" severity="info" sys="System" sub="loadbalancing" name="REF_NetAvaTellasdns ICMP 62.169.194.47 changed state to OFFLINE"
    2016:11:11-13:58:40 utm service_monitor[17784]: id="4000" severity="info" sys="System" sub="loadbalancing" name="REF_NetAvaTellasdns ICMP 62.169.194.47 changed state to OFFLINE"
    2016:11:11-13:58:40 utm service_monitor[17784]: id="4000" severity="info" sys="System" sub="loadbalancing" name="Set Availability Group REF_NetAvaTellasdns to 62.169.194.48"
    2016:11:11-13:58:40 utm service_monitor[17784]: id="4000" severity="info" sys="System" sub="loadbalancing" name="Set Availability Group REF_NetAvaTellasdns to 62.169.194.47"
    I realized that my ISP's DNS do not respond to ICMP and went ahead and removed their DNS from the monitoring hosts. I only left Google and Open DNS - let's see if this does any better.
    I also removed their DNS from the Forwarders page - don't think this will do anything, though..
     
    P.S.: Yesterday I performed a reset on my modem and set it up again, just for the heck of it, and I also added an additional address in Sophos, in order to access its webui.
    So when this happens again, at least I will be able to see if the modem is properly connected to the ISP, while before I couldn't..
    P.P.S: In kernel log I see only this
    2016:11:11-13:58:13 utm kernel: [58347.261509] Loading kernel module for a network device with CAP_SYS_MODULE (deprecated).  Use CAP_NET_ADMIN and alias netdev-ppp0 instead.

    Does it say anything to you?
     
    Sophos XG Home Licence.

    Machine: Checkpoint 3100 appliance (Intel Atom C2558 CPU, 6GB Ram, 250GB sata SSD)

  • Hello, again...

    Another short disconnect happened...

    This is the fallback log

    2016:11:11-16:48:20 utm [daemon:info] irqd[6154]:  ppp1 ppp <pointopoint,multicast,noarp> group 0 
    2016:11:11-16:48:20 utm [daemon:info] nwd[32118]:  Waiting for MDW cycle to end
    2016:11:11-16:48:28 utm [daemon:info] irqd[6154]:  ppp1 ppp <pointopoint,multicast,noarp,up,running,lowerup> group 0 
    2016:11:11-16:48:28 utm [daemon:info] irqd[6154]:  ppp1: detected 1 queue(s), 'network' cpuset
    2016:11:11-16:48:28 utm [daemon:info] irqd[6154]:  ppp1:0: affinity irq=0x3 rps/xps=0x3
    2016:11:11-16:48:28 utm [daemon:info] irqd[6154]:  ppp1: up
    2016:11:11-16:48:28 utm [daemon:info] irqd[6154]:  ppp0 ppp <pointopoint,multicast,noarp> group 0 
    2016:11:11-16:48:28 utm [daemon:info] irqd[6154]:  ppp0: down
    2016:11:11-16:48:30 utm [daemon:info] cssd[17184]:  [     (nil)] epoll_loop (epoll.c:358) starting exit cleanup
    2016:11:11-16:48:30 utm [local0:debug] ctipd:  Request to stop ctipd while 1 ctipd processes are running
    2016:11:11-16:48:30 utm [local0:info] [ctipd] [17206]: Caught SIGTERM
    2016:11:11-16:48:30 utm [local0:info] [ctipd] [17206]: Stopping
    2016:11:11-16:48:30 utm [daemon:info] irqd[6154]:  ifb0 ether 9e:xx:98:xx:76:xx <broadcast,noarp> group 0 
    2016:11:11-16:48:30 utm [daemon:info] irqd[6154]:  ifb0: down
    2016:11:11-16:48:30 utm [daemon:info] irqd[6154]:  ifb1 ether 9a:xx:77:xx:6d:xx <broadcast,noarp> group 0 
    2016:11:11-16:48:30 utm [daemon:info] irqd[6154]:  ifb1: down
    2016:11:11-16:48:30 utm [daemon:info] irqd[6154]:  ifb1 ether 9a:xx:77:xx:6d:xx <broadcast,noarp> group 0 
    2016:11:11-16:48:30 utm [daemon:info] irqd[6154]:  ifb0 ether 9e:xx:98:xx:76:xx <broadcast,noarp> group 0 
    2016:11:11-16:48:30 utm [daemon:info] irqd[6154]:  ifb0 ether xx:xx:00:xx:4f:xx <broadcast,noarp> group 0 
    2016:11:11-16:48:30 utm [daemon:info] irqd[6154]:  ifb1 ether ee:xx:33:xx:fb:f4 <broadcast,noarp> group 0 
    2016:11:11-16:48:30 utm [daemon:info] irqd[6154]:  ifb0 ether xx:xx:00:xx:4f:xx <broadcast,noarp,up,running,lowerup> group 0 
    2016:11:11-16:48:30 utm [daemon:info] irqd[6154]:  ifb0: detected 1 queue(s), 'network' cpuset
    2016:11:11-16:48:30 utm [daemon:info] irqd[6154]:  ifb0:0: affinity irq=0x3 rps/xps=0x3
    2016:11:11-16:48:30 utm [daemon:info] irqd[6154]:  ifb0: up
    2016:11:11-16:48:30 utm [daemon:info] irqd[6154]:  ifb1 ether ee:xx:33:xx:fb:f4 <broadcast,noarp,up,running,lowerup> group 0 
    2016:11:11-16:48:30 utm [daemon:info] irqd[6154]:  ifb1: detected 1 queue(s), 'network' cpuset
    2016:11:11-16:48:30 utm [daemon:info] irqd[6154]:  ifb1:0: affinity irq=0x3 rps/xps=0x3
    2016:11:11-16:48:30 utm [daemon:info] irqd[6154]:  ifb1: up
    2016:11:11-16:48:30 utm [daemon:info] cssd[17184]:  [     (nil)] epoll_exit (epoll.c:139) epoll subsystem shutting down
    2016:11:11-16:48:30 utm [daemon:info] cssd[17184]:  [     (nil)] epoll_exit (epoll.c:152) epoll subsystem shut down
    2016:11:11-16:48:31 utm [local0:info] [ctipd] [17206]: CIpRepCache::Save() - Saved to file /tmp/ctipd.cache
    2016:11:11-16:48:31 utm [local0:info] [ctipd] [17206]: CIpRepCache::Save() - Saved to file /tmp/ctipd.cache_v6
    2016:11:11-16:48:33 utm [local0:info] [ctipd] [17206]: Done
    2016:11:11-16:48:33 utm [local0:debug] ctipd:  Trying to stop ctipd (1 processes still alive)
    2016:11:11-16:48:34 utm [local0:debug] ctipd:  Stopping ctipd took 4 seconds
    2016:11:11-16:48:34 utm [daemon:info] irqd[6154]:  ppp0 ppp <pointopoint,multicast,noarp> group 0 
    2016:11:11-16:48:40 utm [daemon:info] irqd[6154]:  ppp0 ppp <pointopoint,multicast,noarp> group 0 
    2016:11:11-16:48:40 utm [daemon:info] irqd[6154]:  ppp0 ppp <pointopoint,multicast,noarp> group 0 
    2016:11:11-16:48:40 utm [daemon:info] irqd[6154]:  ppp0 ppp <pointopoint,multicast,noarp,up,running,lowerup> group 0 
    2016:11:11-16:48:40 utm [daemon:info] irqd[6154]:  ppp0: detected 1 queue(s), 'network' cpuset
    2016:11:11-16:48:40 utm [daemon:info] irqd[6154]:  ppp0:0: affinity irq=0x3 rps/xps=0x3
    2016:11:11-16:48:40 utm [daemon:info] irqd[6154]:  ppp0: up
    2016:11:11-16:48:40 utm [daemon:info] nwd[32118]:  Interface ppp1 is up and link is back up  
    2016:11:11-16:48:40 utm [daemon:info] nwd[32118]:  Interface ifb0 is up and link is back up  
    2016:11:11-16:48:40 utm [daemon:info] nwd[32118]:  Interface ifb1 is up and link is back up  
    2016:11:11-16:48:40 utm [daemon:info] nwd[32118]:  Interface ppp0 is up and link is back up  
    2016:11:11-16:48:41 utm [daemon:info] nwd[32118]:  Waiting for MDW cycle to end
    2016:11:11-16:48:41 utm [daemon:info] cssd[10904]:  [     (nil)] main (cssd.c:345) starting up...
    2016:11:11-16:48:41 utm [daemon:info] cssd[10904]:  [     (nil)] read_config (cssd.c:115) reading config
    2016:11:11-16:48:41 utm [daemon:info] cssd[10904]:  [     (nil)] main (cssd.c:362) initializing Sophos virus scanner engine
    2016:11:11-16:48:41 utm [daemon:info] irqd[6154]:  ifb0 ether xx:xx:00:xx:4f:xx <broadcast,noarp> group 0 
    2016:11:11-16:48:41 utm [daemon:info] irqd[6154]:  ifb0: down
    2016:11:11-16:48:41 utm [daemon:info] irqd[6154]:  ifb1 ether ee:15:33:19:fb:f4 <broadcast,noarp> group 0 
    2016:11:11-16:48:41 utm [daemon:info] irqd[6154]:  ifb1: down
    2016:11:11-16:48:41 utm [daemon:info] irqd[6154]:  ifb1 ether ee:15:33:19:fb:f4 <broadcast,noarp> group 0 
    2016:11:11-16:48:41 utm [daemon:info] irqd[6154]:  ifb0 ether xx:xx:00:xx:4f:xx <broadcast,noarp> group 0 
    2016:11:11-16:48:41 utm [daemon:info] irqd[6154]:  ifb0 ether 9e:xx:0a:xx:59:xx <broadcast,noarp> group 0 
    2016:11:11-16:48:41 utm [daemon:info] irqd[6154]:  ifb1 ether be:xx:61:xx:bb:xx <broadcast,noarp> group 0 
    2016:11:11-16:48:41 utm [daemon:info] irqd[6154]:  ifb2 ether xx:e9:xx:89:xx:8b <broadcast,noarp> group 0 
    2016:11:11-16:48:41 utm [daemon:info] irqd[6154]:  ifb0 ether 9e:xx:0a:xx:59:xx <broadcast,noarp,up,running,lowerup> group 0 
    2016:11:11-16:48:41 utm [daemon:info] irqd[6154]:  ifb0: detected 1 queue(s), 'network' cpuset
    2016:11:11-16:48:41 utm [daemon:info] irqd[6154]:  ifb0:0: affinity irq=0x3 rps/xps=0x3
    2016:11:11-16:48:41 utm [daemon:info] irqd[6154]:  ifb0: up
    2016:11:11-16:48:41 utm [daemon:info] irqd[6154]:  ifb1 ether be:xx:61:xx:bb:xx <broadcast,noarp,up,running,lowerup> group 0 
    2016:11:11-16:48:41 utm [daemon:info] irqd[6154]:  ifb1: detected 1 queue(s), 'network' cpuset
    2016:11:11-16:48:41 utm [daemon:info] irqd[6154]:  ifb1:0: affinity irq=0x3 rps/xps=0x3
    2016:11:11-16:48:41 utm [daemon:info] irqd[6154]:  ifb1: up
    2016:11:11-16:48:41 utm [daemon:info] irqd[6154]:  ifb2 ether xx:e9:xx:89:xx:8b <broadcast,noarp,up,running,lowerup> group 0 
    2016:11:11-16:48:41 utm [daemon:info] irqd[6154]:  ifb2: detected 1 queue(s), 'network' cpuset
    2016:11:11-16:48:41 utm [daemon:info] irqd[6154]:  ifb2:0: affinity irq=0x3 rps/xps=0x3
    2016:11:11-16:48:41 utm [daemon:info] irqd[6154]:  ifb2: up
    2016:11:11-16:48:44 utm [daemon:info] irqd[6154]:  ppp1 ppp <pointopoint,multicast,noarp> group 0 
    2016:11:11-16:48:44 utm [daemon:info] irqd[6154]:  ppp1: down
    2016:11:11-16:48:44 utm [daemon:info] irqd[6154]:  ppp1 ppp <pointopoint,multicast,noarp> group 0 
    2016:11:11-16:48:45 utm [daemon:info] irqd[6154]:  ifb0 ether 9e:xx:0a:xx:59:xx <broadcast,noarp> group 0 
    2016:11:11-16:48:45 utm [daemon:info] irqd[6154]:  ifb0: down
    2016:11:11-16:48:45 utm [daemon:info] irqd[6154]:  ifb1 ether be:xx:61:xx:bb:xx <broadcast,noarp> group 0 
    2016:11:11-16:48:45 utm [daemon:info] irqd[6154]:  ifb1: down
    2016:11:11-16:48:45 utm [daemon:info] irqd[6154]:  ifb2 ether xx:e9:xx:89:xx:8b <broadcast,noarp> group 0 
    2016:11:11-16:48:45 utm [daemon:info] irqd[6154]:  ifb2: down
    2016:11:11-16:48:45 utm [daemon:info] irqd[6154]:  ifb2 ether xx:e9:xx:89:xx:8b <broadcast,noarp> group 0 
    2016:11:11-16:48:45 utm [daemon:info] irqd[6154]:  ifb1 ether be:xx:61:xx:bb:xx <broadcast,noarp> group 0 
    2016:11:11-16:48:45 utm [daemon:info] irqd[6154]:  ifb0 ether 9e:xx:0a:xx:59:xx <broadcast,noarp> group 0 
    2016:11:11-16:48:45 utm [daemon:info] irqd[6154]:  ifb0 ether ba:xx:9b:xx:3d:xx <broadcast,noarp> group 0 
    2016:11:11-16:48:45 utm [daemon:info] irqd[6154]:  ifb1 ether c2:xx:76:xx:e7:xx <broadcast,noarp> group 0 
    2016:11:11-16:48:45 utm [daemon:info] irqd[6154]:  ifb0 ether ba:xx:9b:xx:3d:xx <broadcast,noarp,up,running,lowerup> group 0 
    2016:11:11-16:48:45 utm [daemon:info] irqd[6154]:  ifb0: detected 1 queue(s), 'network' cpuset
    2016:11:11-16:48:45 utm [daemon:info] irqd[6154]:  ifb0:0: affinity irq=0x3 rps/xps=0x3
    2016:11:11-16:48:45 utm [daemon:info] irqd[6154]:  ifb0: up
    2016:11:11-16:48:45 utm [daemon:info] irqd[6154]:  ifb1 ether c2:xx:76:xx:e7:xx <broadcast,noarp,up,running,lowerup> group 0 
    2016:11:11-16:48:45 utm [daemon:info] irqd[6154]:  ifb1: detected 1 queue(s), 'network' cpuset
    2016:11:11-16:48:45 utm [daemon:info] irqd[6154]:  ifb1:0: affinity irq=0x3 rps/xps=0x3
    2016:11:11-16:48:45 utm [daemon:info] irqd[6154]:  ifb1: up
    2016:11:11-16:48:48 utm [daemon:info] cssd[10904]:  [     (nil)] saviscanner_init (saviscanner.c:66) SAVI threat data successfully loaded, engine 3.66.3, threat data 5.32 from 4/10/2016 (12108327 detected threats)
    2016:11:11-16:48:48 utm [daemon:info] cssd[10904]:  [     (nil)] main (cssd.c:368) virus scanner initialization finished
    2016:11:11-16:48:51 utm [local0:info] [ctipd] [10908]: CEnginesContainer::UpdateSettings() - Updating
    2016:11:11-16:48:51 utm [local0:info] [ctipd] [10908]: CEnginesContainer::UpdateSettings() - Updating
    2016:11:11-16:48:51 utm [local0:info] [ctipd] [10908]: CIpRepCache::Load() - Loading cache from file /tmp/ctipd.cache...
    2016:11:11-16:48:51 utm [local0:info] [ctipd] [10908]: CIpRepCache::Load() - Loading cache from file /tmp/ctipd.cache_v6...
    2016:11:11-16:48:51 utm [local0:info] [ctipd] [10908]: LRU::Load() - Loading cache from file /tmp/ctipd.DM_counters
    2016:11:11-16:48:51 utm [local0:info] [ctipd] [10908]: LRU::Load() - Loading cache from file /tmp/ctipd.DM_counters_v6
    2016:11:11-16:48:51 utm [local0:info] [ctipd] [10908]: Stats server listening on port /tmp/ctipd.stats
    2016:11:11-16:48:51 utm [local0:info] [ctipd] [10908]: RBL server listening on port 54
    2016:11:11-16:48:51 utm [local0:info] [ctipd] [10908]: Ready
    2016:11:11-16:49:01 utm [daemon:info] nwd[32118]:  Interface ifb0 is up and link is back up  
    2016:11:11-16:49:01 utm [daemon:info] nwd[32118]:  Interface ifb1 is up and link is back up  
    2016:11:11-16:49:01 utm [daemon:info] nwd[32118]:  Interface ifb2 is up and link is back up  
    2016:11:11-16:49:01 utm [daemon:info] nwd[32118]:  Interface ifb0 is up and link is back up  
    2016:11:11-16:49:01 utm [daemon:info] nwd[32118]:  Interface ifb1 is up and link is back up  

    Any ideas from this?

    P.S.: This is kernel log (the same stuff as before)

    2016:11:11-16:48:34 utm kernel: [68573.003893] Loading kernel module for a network device with CAP_SYS_MODULE (deprecated).  Use CAP_NET_ADMIN and alias netdev-ppp0 instead.
    2016:11:11-16:48:44 utm kernel: [68583.022973] Loading kernel module for a network device with CAP_SYS_MODULE (deprecated).  Use CAP_NET_ADMIN and alias netdev-ppp1 instead.

     
    Sophos XG Home Licence.

    Machine: Checkpoint 3100 appliance (Intel Atom C2558 CPU, 6GB Ram, 250GB sata SSD)

  • Hi,

    What is the firmware version that resides on UTM?

    Thanks

    Sachin Gurung
    Team Lead | Sophos Technical Support
    Knowledge Base  |  @SophosSupport  |  Video tutorials
    Remember to like a post.  If a post (on a question thread) solves your question use the 'This helped me' link.

  • Hello again!

    Firmware version 9.408-4

     
    Sophos XG Home Licence.

    Machine: Checkpoint 3100 appliance (Intel Atom C2558 CPU, 6GB Ram, 250GB sata SSD)

  • Hi,

    I  forgot to ask for mdw.log. Can you pleaes also post the mdw logs?

    Also, please check #7 in the amazing Rulz by Bob here. Any catch with that?

    Thanks

    Sachin Gurung
    Team Lead | Sophos Technical Support
    Knowledge Base  |  @SophosSupport  |  Video tutorials
    Remember to like a post.  If a post (on a question thread) solves your question use the 'This helped me' link.

  • Is the new.log the middleware log?

    If yes I will post it later.

    Regarding Bob's rules, yes I have seen them.

    I do have a Realtek NIC on the WAN interface. And an additional Intel 82574L for the LAN. Changing to an Intel NIC is not a viable solution ATM; this machine has only one expansion I think...

    I may be able to swap them, though.

    I did follow the rule regarding fixed speed on the interface, though, and also changed the modem.

    I did this about 12 hours ago and things have been stable so far.

    But not 100% sure yet, because the thing stabilized also two days ago, after resetting the old modem, but after half a day started again acting up...

     
    Sophos XG Home Licence.

    Machine: Checkpoint 3100 appliance (Intel Atom C2558 CPU, 6GB Ram, 250GB sata SSD)

  • I think I am going to close this thread, because it seems that it was a line issue...

    It had momentary drops and sophos was obviously realizing the line drops and was trying to bring up the backup connection (I have in standby mode)

    Looks like the upgrade to the latest firmware happened coincidentally at the same time when the problem started and I thought it was a problem with the utm..

    I am not 100% certain about all the above, but I think it is for the best to close this issue...

     
    Sophos XG Home Licence.

    Machine: Checkpoint 3100 appliance (Intel Atom C2558 CPU, 6GB Ram, 250GB sata SSD)

Reply
  • I think I am going to close this thread, because it seems that it was a line issue...

    It had momentary drops and sophos was obviously realizing the line drops and was trying to bring up the backup connection (I have in standby mode)

    Looks like the upgrade to the latest firmware happened coincidentally at the same time when the problem started and I thought it was a problem with the utm..

    I am not 100% certain about all the above, but I think it is for the best to close this issue...

     
    Sophos XG Home Licence.

    Machine: Checkpoint 3100 appliance (Intel Atom C2558 CPU, 6GB Ram, 250GB sata SSD)

Children