This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

UPS gets lost after a few days

Hi to all!

I am facing an annoying issue..

I am using an Eaton 5s1000 UPS connected via USB to my sophos UTM.

It is working great, alerts and all, but the UTM loses connection after a few days. It is a random thing not in specific time frames.

Below is the normal appearance:

When it loses connection I only see the UPS icon, but nothing in the percentage (the bar is totally empty)

The only way is to unplug the UPS and plug it back in -  then all is back to normal again.

Any ideas about what may be wrong?

Do you know of any command I can use (via shell) in order to avoid physically unplugging and plugging the UPS back (in case I am away e.g.)?



This thread was automatically locked due to age.
Parents
  • Hi ChriZ,

    I'm curious if running, as root, the following brings it back.

    /etc/init.d/upsd -c reload

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • Hello again, Bob!

    It has happened again today.

    giving the above command returns an error. Restarting the service results to failed..

    utm:/root # /etc/init.d/upsd -c reload
    Usage: /etc/init.d/upsd {start|stop|status|try-restart|restart|force-reload|reload|probe|powerdown|try-powerdown}
    utm:/root # /etc/init.d/upsd -c force-reload
    Usage: /etc/init.d/upsd {start|stop|status|try-restart|restart|force-reload|reload|probe|powerdown|try-powerdown}
    utm:/root # /etc/init.d/upsd force-reload
    Reload service NUT UPS                                               failed
    utm:/root # /etc/init.d/upsd restart
    Shutting down NUT UPS monitor                                        done
    Shutting down NUT UPS server                                         done
    Shutting down NUT UPS drivers                                        done
    Starting NUT UPS drivers                                             failed
    utm:/root #
    

    Any other ideas, please? (I am starting to think that there is something wrong with the connection - perhaps change the USB cable?. Seems like upsd is not starting because it sees no UPS connected - can anyone confirm that this is the standard behavior?)

    Thanks a lot again!

     
    Sophos XG Home Licence.

    Machine: Checkpoint 3100 appliance (Intel Atom C2558 CPU, 6GB Ram, 250GB sata SSD)

  • I'm just guessing, too, but I'd definitely change the cable after seeing that.

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • Well, I was using a 2m cable from an old powerware UPS. I found the cable shipped with the  Eaton UPS. ( which is a 1m cable BTW) and replaced the 2m one. I 'll see how it goes.

    Perhaps the problem was the cable length...

     
    Sophos XG Home Licence.

    Machine: Checkpoint 3100 appliance (Intel Atom C2558 CPU, 6GB Ram, 250GB sata SSD)

  • Well, I have an update on this...

    Seems that the problem remains, even after replacing the cable with the one that came with the Eaton UPS.

    It is not happening very often, maybe once in a week, but it is still happening...

     
    Sophos XG Home Licence.

    Machine: Checkpoint 3100 appliance (Intel Atom C2558 CPU, 6GB Ram, 250GB sata SSD)

  • Well, a few months have passed...

    The problem is still there...

    In the meanwhile I have acquired another UPS (the same as this one - Eaton 5S1000)

    I have this second UPS connected to a linux server, running ubuntu 14.04

    I see the same thing happening in ubuntu, too, a few times: the connection is lost.

    But in ubuntu's case I see that after a few seconds it re-establishes connection

    However this is not the case with sophos... the only way to re establish the connection is to physically remove and reinsert the usb cable...

    Any ideas regarding how to - sort of - reset the usb connection like ubuntu does to reestablish connection? (not even sure what it does, TBH) 

     

    Later edit: Not sure if it is worth mentioning, but although I have enabled the notification on ups connection/disconnection I get no emails

    The notification email does work, though, because if the utm is connected normally to the ups and i unplug the usb i do get an email

    But when the miscommunication with the ups happens "automatically" I get nothing

     

     
    Sophos XG Home Licence.

    Machine: Checkpoint 3100 appliance (Intel Atom C2558 CPU, 6GB Ram, 250GB sata SSD)

Reply
  • Well, a few months have passed...

    The problem is still there...

    In the meanwhile I have acquired another UPS (the same as this one - Eaton 5S1000)

    I have this second UPS connected to a linux server, running ubuntu 14.04

    I see the same thing happening in ubuntu, too, a few times: the connection is lost.

    But in ubuntu's case I see that after a few seconds it re-establishes connection

    However this is not the case with sophos... the only way to re establish the connection is to physically remove and reinsert the usb cable...

    Any ideas regarding how to - sort of - reset the usb connection like ubuntu does to reestablish connection? (not even sure what it does, TBH) 

     

    Later edit: Not sure if it is worth mentioning, but although I have enabled the notification on ups connection/disconnection I get no emails

    The notification email does work, though, because if the utm is connected normally to the ups and i unplug the usb i do get an email

    But when the miscommunication with the ups happens "automatically" I get nothing

     

     
    Sophos XG Home Licence.

    Machine: Checkpoint 3100 appliance (Intel Atom C2558 CPU, 6GB Ram, 250GB sata SSD)

Children
  • "But when the miscommunication with the ups happens "automatically" I get nothing."

    So it sounds like the Linux system has some sort of keep-alive mechanism that should be added to the UTM for that brand of UPS.  Can you identify anything in the System messages, Fallback, Middleware and Self monitoring logs?

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • Hello, Bob!

    Unfortunately I am not sure about the timeframe I should use for the searches. Because this happens (I get no email on disconnect) and I don't login frequently in the webgui, so I don't know when the communication was lost

    But now I have this post and I know I disconnected/reconnected physically the USB cable on the 20th of April.

    Today is the 22nd and I just logged in and the UPS is connected.

    I will keep an eye on it (will try to login daily and check) so next time I will have a timeframe to search through..

    Thanks!

     
    Sophos XG Home Licence.

    Machine: Checkpoint 3100 appliance (Intel Atom C2558 CPU, 6GB Ram, 250GB sata SSD)

  • OK, here I am again!

    It has been working perfectly with no disconnects recently.

    On Saturday was a maintenance day, when I shut down all my machines, blow out dust etc..

    Well, yesterday I noticed that communication was again lost...

    I have set a timeframe from the 29th of April, until the second of May while searching in the logs...

    Here we go:

     

     
    System Messages
    /var/log/system/2017/04/system-2017-04-29.log.gz:2017:04:29-15:27:21 utm upsmon[22821]: Signal 15: exiting
    /var/log/system/2017/04/system-2017-04-29.log.gz:2017:04:29-15:27:21 utm upsd[22817]: User upsmon@127.0.0.1 logged out from UPS [ups]
    /var/log/system/2017/04/system-2017-04-29.log.gz:2017:04:29-15:27:21 utm upsmon[22820]: upsmon parent: read
    /var/log/system/2017/04/system-2017-04-29.log.gz:2017:04:29-15:27:21 utm upsd[22817]: mainloop: Interrupted system call
    /var/log/system/2017/04/system-2017-04-29.log.gz:2017:04:29-15:27:21 utm upsd[22817]: Signal 15: exiting
    /var/log/system/2017/04/system-2017-04-29.log.gz:2017:04:29-21:38:00 utm upsmon[3766]: UPS running on battery<29>Apr 29 21:39:05 upsmon[3766]: UPS running on line
    /var/log/system/2017/04/system-2017-04-29.log.gz:2017:04:29-21:40:50 utm upsmon[3766]: UPS running on battery
    /var/log/system/2017/04/system-2017-04-29.log.gz:2017:04:29-21:41:25 utm upsmon[3766]: UPS running on line
    /var/log/system/2017/04/system-2017-04-29.log.gz:2017:04:29-21:44:20 utm upsmon[3766]: UPS running on battery
    /var/log/system/2017/04/system-2017-04-29.log.gz:2017:04:29-21:44:35 utm upsmon[3766]: UPS running on line
    /var/log/system/2017/04/system-2017-04-29.log.gz:2017:04:29-21:45:25 utm upsmon[3766]: UPS running on battery
    /var/log/system/2017/04/system-2017-04-29.log.gz:2017:04:29-21:46:00 utm upsmon[3766]: UPS running on line
    /var/log/system/2017/04/system-2017-04-30.log.gz:2017:04:30-20:18:40 utm upsmon[3766]: Signal 15: exiting
    /var/log/system/2017/04/system-2017-04-30.log.gz:2017:04:30-20:18:40 utm upsmon[3765]: upsmon parent: read
    /var/log/system/2017/04/system-2017-04-30.log.gz:2017:04:30-20:18:40 utm upsd[3762]: User upsmon@127.0.0.1 logged out from UPS [ups]<27>Apr 30 20:18:40 upsd[3762]: mainloop: Interrupted system call
    /var/log/system/2017/04/system-2017-04-30.log.gz:2017:04:30-20:18:40 utm upsd[3762]: Signal 15: exiting
     
    After I unplug the USB cable and plug it back in, this is what I get (additionally)
    /var/log/system.log:2017:05:03-21:42:39 utm upsd[5265]: listening on 127.0.0.1 port 3493
    /var/log/system.log:2017:05:03-21:42:39 utm upsd[5265]: Connected to UPS [ups]: usbhid-ups-ups
    /var/log/system.log:2017:05:03-21:42:39 utm upsd[5266]: Startup successful
    /var/log/system.log:2017:05:03-21:42:39 utm upsmon[5269]: Startup successful
    /var/log/system.log:2017:05:03-21:42:39 utm upsd[5266]: User upsmon@127.0.0.1 logged into UPS [ups]
     

    Fallback 

    /var/log/fallback/2017/04/fallback-2017-04-29.log.gz:2017:04:29-15:27:21 utm [daemon:info] usbhid-ups[22813]: Signal 15: exiting
    /var/log/fallback/2017/04/fallback-2017-04-30.log.gz:2017:04:30-20:18:39 utm [daemon:debug] usbhid-ups[3758]: libusb_get_interrupt: error submitting URB: No such device<31>Apr 30 20:18:39 usbhid-ups[3758]: libusb_get_report: error sending control message: No such device
    /var/log/fallback/2017/04/fallback-2017-04-30.log.gz:2017:04:30-20:18:40 utm [daemon:info] usbhid-ups[3758]: Signal 15: exiting
     
    After I unplug the USB cable and plug it back in, this is what I get (additionally)
    /var/log/fallback.log:2017:05:03-21:42:39 utm [daemon:info] usbhid-ups[5262]: Startup successful
     
    Middleware & Selfmonitoring logs return nothing... :(
     
    If it helps more, in live log, kernel messages, this is what I get after plugging again the cable 
     
    2017:05:03-21:42:26 utm kernel: [360117.135508] usb 3-1: USB disconnect, device number 4
    2017:05:03-21:42:30 utm kernel: [360120.655024] usb 3-1: new low-speed USB device number 5 using xhci_hcd
    2017:05:03-21:42:31 utm kernel: [360121.269399] usb 3-1: New USB device found, idVendor=0463, idProduct=ffff
    2017:05:03-21:42:31 utm kernel: [360121.269402] usb 3-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
    2017:05:03-21:42:31 utm kernel: [360121.269403] usb 3-1: Product: 5S
    2017:05:03-21:42:31 utm kernel: [360121.269405] usb 3-1: Manufacturer: EATON
    2017:05:03-21:42:31 utm kernel: [360121.269445] usb 3-1: ep 0x81 - rounding interval to 128 microframes, ep desc says 160 microframes
    2017:05:03-21:42:32 utm kernel: [360123.134718] hid-generic 0003:0463:FFFF.0003: hiddev0,hidraw0: USB HID v1.10 Device [EATON 5S] on usb-0000:00:14.0-1/input0
    2017:05:03-21:42:36 utm kernel: [360126.416096] usb 3-1: ep 0x81 - rounding interval to 128 microframes, ep desc says 160 microframes
    2017:05:03-21:46:33 utm kernel: [360363.953185] net_ratelimit: 7 callbacks suppressed
     
    Any ideas welcome!
     
     
    EDIT: Actually now that I read my own post and saw this 
    new low-speed USB device number 5 using xhci_hcd
     
    I thought it might worth checking in BIOS about disabling xhci (not sure if it has such a setting - might worth trying though...)
     
    EDIT2: No such setting... I saw, though, that I had connected the usb cable to a usb3 port.
    Connecting to a usb2 port yielded the same message in the log about xhci_hcd, but let's see how it goes...
     
     
     
    Sophos XG Home Licence.

    Machine: Checkpoint 3100 appliance (Intel Atom C2558 CPU, 6GB Ram, 250GB sata SSD)

  • Hello again..

    As an update, the issue still occurs.

    And the other UPS that I have connected on an Ubuntu machine still remains connected (It may get a connection lost error once in a while, but something is there that gets initiated in case that happens and it establishes the connection after moments)

    At this point I thing it is worth mentioning that since my last reference to the ubuntu machine, that ubuntu machine is now running as a guest inside an Esxi host and the UPS is still connected to that machine (passed through from the host). And it still manages to maintain/re-establish the connection to the UPS.

    So there is definitely something in Ubuntu that mitigates this issue which the UTM does not have...

    Any idea from the gurus as to what this might be and whether it can be added to the UTM?

     
    Sophos XG Home Licence.

    Machine: Checkpoint 3100 appliance (Intel Atom C2558 CPU, 6GB Ram, 250GB sata SSD)

  • Hi guys,

     

    Most likely your problem is related to this issue: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=810449

    If you are with root access to the OS, try to use that PPA: https://launchpad.net/~clepple/+archive/ubuntu/nut

    The unstable state could be from the Virtual VMware USB Hub, from the host USB 2.0 (or 3.0) port or even from additional hardware USB hub.

    Again, if you have root access to the system, you can use the syslog messages to trigger automated recovery activities as a dirty bandage :)

    For example, with SysV inittab reborn monitoring process - let's name that executalbe (755) file nut-usb-shake in the /etc/nut folder.

    Two lines in the /etc/inittab file:

    # Monitor and revive the NUT managed UPS devices
    nut-usb-shake:12345:respawn:/etc/nut/nut-usb-shake

    The /etc/nut/nut-usb-shake file:

    #! /bin/sh

    tail -Fn0 /var/log/syslog | \
    while read line ; do

    echo "$line" | grep -q "upsmon\[.*Poll UPS.*failed - Driver not connected"
    if [ $? = 0 ]
    then

    service nut-client stop
    service nut-server stop
    sleep 10
    service nut-server start
    service nut-client start

    fi

    done

    Finally reload the inittab context with "init q" and every-time when the monitored message is intercepted in the syslog, the remediation commands will be executed.

    You can adjust to monitor and execute whatever will serve your needs - the above is just an example.

    Good luck!

  • Hello and thanks for your extensive answer!

    However, I think you misunderstood my problem: The problem I am referring to, does not occur on my Ubuntu server machine (which is indeed a Vmware guest.)

    A while back that Ubuntu guest was a physical machine. 

    As a physical, or as a Virtual machine, when the connection to the UPS is lost, it is immediately re-established (this may happen once a week)

    My problem is that the same UPS model is connected to my Sophos machine and on my Sophos when the connection is lost it is never re-established again.

    The only way to see the UPS again is to remove the USB cable, wait a few seconds and reinsert it; it does not even work when I reboot the UTM

    So there must be something done in the Sophos machine (or better yet in the sophos firmware) in order to make it re-establish the connection automatically  

     
    Sophos XG Home Licence.

    Machine: Checkpoint 3100 appliance (Intel Atom C2558 CPU, 6GB Ram, 250GB sata SSD)

  • Probably I didn't explained the idea good enough :)

    The Sophos UTM is running on top of linux and that linux is using libusb library, just like the Ubuntu box - that's one of the contributors to the overall instability of the USB connection.

    You can verify that relatively easy - when the connection to the UTM is broken, just verify from the CLI with "lsusb" if you can see the connected UPS in the list.
    Example from linux box (I haven't Sophos):

    root@debian:/# lsusb
    Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
    Bus 001 Device 004: ID 051d:0002 American Power Conversion Uninterruptible Power Supply
    Bus 001 Device 003: ID 0e0f:0002 VMware, Inc. Virtual USB Hub
    Bus 001 Device 002: ID 09ae:4004 Tripp Lite
    Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
    root@debian:/#

    If the Sophos UTM is not allowing you to ass PPA, to update and to adjust (hack) the underlying Linux OS, most likely you'll not be able to fix that issue yourself and have to ask the product support for assistance.

    Another workaround, I can think of, is to use the Ubuntu box for the USB connection of both UPS devices. The NUT package (used in the Sophos appliance) is covering this scenario when one device is used to provide connectivity to the UPS (nut-server component) and another device is monitoring the UPS state via network connection to the first one (nut-client component)

    You can verify if Sophos is supporting the nut-client UPS monitoring mode and if your Ubuntu is in the same LAN, you can try the stability in that way...

  • Ok , I see what you mean..

    Just for the record, this is what I see with lsusb on UTM:

    utm:/root # lsusb
    Bus 001 Device 002: ID 8087:8008 Intel Corp.
    Bus 002 Device 002: ID 8087:8000 Intel Corp.
    Bus 003 Device 018: ID 0463:ffff MGE UPS Systems UPS
    Bus 003 Device 006: ID 12d1:1003 Huawei Technologies Co., Ltd. E220 HSDPA Modem / E230/E270/E870 HSDPA/HSUPA Modem
    Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
    Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
    Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
    Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub

    (Right now the UPS works normally and lsusb sees it.

    I will try again when it is lost)

    Even if I fix the issue myself, it will most certainly be overridden on the next UTM update, so there is no point in doing it.

    The sophos team must upgrade the code to mitigate the issue.

    Nevertheless I will try to look into the above and find out myself if your suggestion indeed solves the problem

    If it does, then at least I will know what needs to be done and will be able to file this as a bug report (and hope that it will be fixed in a later utm version)

    Thanks a lot for your suggestion!

     
    Sophos XG Home Licence.

    Machine: Checkpoint 3100 appliance (Intel Atom C2558 CPU, 6GB Ram, 250GB sata SSD)

  • Hello again, guys!

    The UPS is missing again from the webui... 

     

     :

    Running lsusb returns this:


    utm:/root # lsusb
    Bus 001 Device 002: ID 8087:8008 Intel Corp.
    Bus 002 Device 002: ID 8087:8000 Intel Corp.
    Bus 003 Device 021: ID 0463:ffff MGE UPS Systems UPS
    Bus 003 Device 006: ID 12d1:1003 Huawei Technologies Co., Ltd. E220 HSDPA Modem / E230/E270/E870 HSDPA/HSUPA Modem
    Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
    Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
    Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
    Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub

    So it looks like the UPS is still there, connected, but the UTM does not see anything

    Now, regarding your previous post, where you proposed a monitoring workaround with a script to restart the ups daemon:

    There is no service nut-server/client on Sophos. The way I can do it is what  suggested on a previous post

    /etc/init.d/ upsd something. The available options are:

    Usage: /etc/init.d/upsd {start|stop|status|try-restart|restart|force-reload|reload|probe|powerdown|try-powerdown}

    So this is what I get:

    utm:/root # /etc/init.d/upsd status
    Checking for service NUT UPS server unused
    Checking for service NUT UPS monitor unused

    utm:/root # /etc/init.d/upsd force-reload
    Reload service NUT UPS failed

    utm:/root # /etc/init.d/upsd restart
    Shutting down NUT UPS monitor done
    Shutting down NUT UPS server done
    Shutting down NUT UPS drivers done
    Starting NUT UPS drivers failed

    So, as you can see, the problem is somewhere else...

    The UPS still shows as connected using lsusb, but nut-server is somehow crashed...

    I will go now pull the USB cable for 5 seconds and put it back in.. I will write a follow up post with the results of the above commands, after doing this...

     
    Sophos XG Home Licence.

    Machine: Checkpoint 3100 appliance (Intel Atom C2558 CPU, 6GB Ram, 250GB sata SSD)

  • OK, follow-up:

    lsusb returns the same as before the only difference is that before the line regarding the UPS was returning:

    Bus 003 Device 021: ID 0463:ffff MGE UPS Systems UPS

    And now it returns:

    Bus 003 Device 022: ID 0463:ffff MGE UPS Systems UPS

    (don't know what the difference is)

    Now everything works again and the UPS is back on the GUI

    And upsd restart works as it should

    utm:/root # /etc/init.d/upsd restart
    Shutting down NUT UPS monitor done
    Shutting down NUT UPS server done
    Shutting down NUT UPS drivers done
    Starting NUT UPS drivers done
    Starting NUT UPS server done
    Starting NUT UPS monitor done

     

    Any more ideas welcome...[:)]

     
    Sophos XG Home Licence.

    Machine: Checkpoint 3100 appliance (Intel Atom C2558 CPU, 6GB Ram, 250GB sata SSD)