This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Ethernet Port Issue - State-Up, Link-Error

Hi:

I have a two month old SG330, running  9.315-2.

This has happened four times. On the dashboard, the external interface shows as State - Up, Link - Error. It first happened maybe a month ago. 

Since then it has happened three times, last Thursday (aug 27), last Saturday (aug 29), and today (aug 31st).

What happens then is our external users lose their VPN connections, and we lose internet connections, until our backup internet connection kicks in (have uplink balancing both ISP connections active) and multipath rules to steer users out the primary connection and fail to the backup if needed, and to steer some of our BYOB wifi hotspots out the backup connection as primary).

Anyway, when this happened last Thursday, I changed the ISP port on the Sophos from Eth4 (where it was) to Eth5, and also changed the Ethernet cable (all of 3 feet long) between the ISP's box and the Sophos.

We're on a SM fiber connection, with a vendor provided Cisco 3000 series switch, with three active ports - one for the fiber SFP, one Ethernet for the internet connection, and one ethernet that does our phone PRI. 

When the internet connection goes down, there is no issue with the phone.

As I said, last Thursday, I switched the ISP primary connection on the Sophos from Eth4 to Eth5. 

On Saturday it went down again, and I got a call at home (we are a 24x7operation). I could ping the ISP's DG, but could not connect to the firewall through the primary ISP.   I logged remotely into the SG through the backup internet connection. Same as before. On the dashboard page it showed ETH5as State - Up, Link - Error. I disabled it, and re-enabled it, and it came back up. 

Today, same situation. I was on site, saw it happen (before people started calling), logged into the SG through the internal interface and Eth5 showed the same. Went to the interfaces tab on interfaces and routing, disabled and re-enabled the interface, and it came right back up.

There are several additional addresses on that port also - we have a /28 from the carrier and have some of the other additional addresses NAT-ing to servers, etc. 

Now I'm getting the ISP saying that it's the firewall, etc. 

I had Sophos support on the line today, and the connected in, and - in their words, they see line fluctuations, port not responding, which in their minds shows an issue with the vendor's switch.

The ISP says they show no issues, other than when they see my disable/enable the port. 

I've changed the port on the SG from Eth4 to ETH5, and changed the Ethernet cable. There is nothing other than the Ethernet cable between the two boxes.

I've attached some snippets of log files that Sophos pulled up. 

Ideas? 

Thanks,

John S.


This thread was automatically locked due to age.
  • John, I had a client with the same problem with a UTM 120 on a fiber optic router.  We went round and round with Support at tier-1 and tier-2.  We swapped out the 120 - no change.  The ISP spent a lot of money chasing the problem on their end.  We put in an SG - no change.

    Although he'd been having the problem for over a year, he hadn't told me about it, so I'd gotten Support involved as soon as he told me about the problem having persisted since the initial installation in January 2014.    Finally, the light went on...

    This is #7 in Rulz!

    A fleeting disconnection followed by a failure of the UTM and the Cisco to negotiate speed/duplex.  I had them configure the UTM with a fixed setting of 100Mbps-Full and they had the vendor do likewise on their switch - problem solved.  You might be able to do 1Gps-Full if your ISP's switch can be fixed at that, my client's ISP was unable to configure that fixed.

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
    • I'll set the utm for 1000, fixed and contact the ISP to do the same. Do you have a link to all of your Rulz?

      I'll report back with results.

      John s
      • That was a link in my post above, John. [:)]

        Let us know if that solves it for you.

        Cheers - Bob
         
        Sophos UTM Community Moderator
        Sophos Certified Architect - UTM
        Sophos Certified Engineer - XG
        Gold Solution Partner since 2005
        MediaSoft, Inc. USA
        • Thanks for the Rulz.

          Last Tuesday (9/1) it went down again. I called the ISP while it was still down. They logged into their switch and the port (on their end) showed up. On the Sophos showed State - Up, Link - Error. 

          Based on BAlfson's suggestions I asked them to set the port to 1000/full rather than auto/auto. They would only set it to 100/full (since we're subscribed to a 20/20 with surge to 25/25). Don't understand why that would make a difference, but oh well - not in the mood to argue.

          So they set their port to 100/full, and I did likewise on the SG330, set the port to 100/full. 

          It's been running fine since. Will keep an eye on it though. There are three other ports on the SG330 all running fine and set at auto/auto. No issues. The carriers switch has been running almost 22 months, since we got a dedicated fiber installed, without a burp. And if the fiber had gone out, we would have knows it, as our phone PRI rides over the same fiber, and comes out on a different port on the carriers switch. 

          And had an ASG320 running fine (but that might have been using a 100M port on the appliance.

          Not sure yet, but I have seen some strange auto negotiate things with gigabit ports. Not sure why the new SG330 ran for several months without a burp. But...

          One of those mysteries of like. 
          Thanks for the hints, and keep those best practices coming!!

          Oh, and Yogesh from Sophos tech support , bent over backwards, called back several times to see how things were going, ran diagnostics, etc. Shout out to him. 
           
          John s.
          • Hi

            I got the same issue it is showing in the dashboard that the External (WAN) link is error and from the Uplink Monitoring it is showing OFFLINE but we are able to connect to internet.

            Anybody has a solution on this or update why is this happening?

            My device is SG 330

            Regards,

            Jason

            • Hi

              I have the same problem with a SG210 9 405-5

              Anybody help us ?

              Thanks in advance

              Benoit

              • I'm experiencing the same problem. Sophos UTM home, Intel NIC.

                Firmware 9.407-3, but it happened on previous versions also.

                eth1 | External (WAN) | Ethernet | Up Error

                No problems connecting to the internet, though.

                • Are you saying that you tried the fix I posted above and the problem remains?

                  Cheers - Bob

                   
                  Sophos UTM Community Moderator
                  Sophos Certified Architect - UTM
                  Sophos Certified Engineer - XG
                  Gold Solution Partner since 2005
                  MediaSoft, Inc. USA
                  • I also have the same problem with Sophos UTM home, Intel NIC. 

                    Link is fluctuating between Error and Up all the time which makes Internet being inaccessable for short moments. 

                    I tried a lot of things like

                    - Setting 100Fdx in borth ends

                    - Setting MTU to 1300

                    - Changing the power settings for the nic in BIOS

                    - Probably some more that I don't remember. 

                    - Put a switch between the UTM and the fiber box

                    - Moved the WAN connection to another NIC

                    but nothing seems to help

                    Had the same problem running sophos XG on the same hardware. 

                    • Hi Christer,

                      Could you please it to the support team with all your findings and testing?

                      Thanks

                      Sachin Gurung
                      Team Lead | Sophos Technical Support
                      Knowledge Base  |  @SophosSupport  |  Video tutorials
                      Remember to like a post.  If a post (on a question thread) solves your question use the 'This helped me' link.

                      • I think there are confusion what the Link Error in the dashboard actually indicates. I now have the impression that it can indicate an Ethernet link problem but also a Uplink monitoring problem. This is rather confusing and was not clear to me.

                         

                        Regards

                         

                        Christer

                      • Hi Arie,

                        Go to Interface & Routing >>>>> Uplink Monitoring >>>>> then advanced then create NEW Monitoring Host like Google DNS then create host then IP Address 8.8.8.8 then save then apply...

                        Note:

                        uncheck Automatic Monitoring for you to use google DNS for uplink monitoring

                        Regards,

                         

                        Jason

                        • Hi and thanks for your answer. I now understand that I have had two different problems.

                          1. Some kind of negotiation problem between my UTM hardware and the fiber box. This problem caused the Link to go down and a restart or a manual disable/enable of the interface restored operation when this happened.
                            1. I solved this by simply creating a VLAN with two ports in my Procurve switch and put it in between. Not a beautiful solution but it works. 
                          2. Disturbances in the Uplink monitoring that was indicated by the Link being in error. 
                            1. I have done this for the uplink monitoring since long. Actually I found out later yesterday that my ISP had disturbances that caused my problems. 

                           

                          I did not realize that the Link being in Error on the Dashboard could indicate a high level problem from the uplink monitoring. I always thought it indicated a low level ethernet problem. This is very confusing and I think uplink monitoring problems could be indicated in some other way.

                          Regards 

                           

                          Christer

                          • Apologies for the delay - been busy moving...

                             

                            Your solutions worked great! Just three items for my wish list:

                            1. Better documentation of this setting (or even a link from the dashboard).
                            2. Use of DNS groups.
                            3. Overview of which hosts are up/down.

                            Also, I noticed somewhere in the documentation that the UTM is able to make different types of requests to hosts (e.g. ICMP, HTTP). Any idea how/where that's configured?

                            • We are experiencing the same issue.

                              Two ISP lines, both working like a charm if i force the traffic to both Interfaces.
                              On the Dashboard Line A is shown as On/Up, Line B as On/Error.

                              BGP shows no Error and i can reach any Host on the Internet on both Lines from the Shell as Loginuser.
                              I turned automatic monitoring off in uplink monitoring and only inserted Google DNS (8.8.8.8) for test, still the error remains.

                              Is there any protocol file logging the reason for the Error State?
                              I could still manually force parts of the traffic to Line B, but we would like automatic fallback if one or the other Line fails.

                              Kind regards
                              Dietmar

                              • Dears

                                 

                                this issue is due wrong MTU configuration

                                you can check your correct MTU from below site

                                http://www.letmecheck.it/mtu-test.php