This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Sophos XG Virtuall Appliance Random Crashes

Hi

i have a Virtuall Appliance with Random Crashes, its running on a HP Proliant G9 with VMWare HPE Custom Image 7.0 Update 2 (updates just some weeks beforeu3, waiting for the Next Update Window for U3)

does anyone have an idea what i can also check? its the only vm behaving like that, sophos utm on nearly the same hardware had none of this issues

CPU 16 CPUs x Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz

Adapter 2 - HP Ethernet 1Gb 4-port 331i Adapter

Adapter 3 - HP Ethernet 10Gb 2-port 560FLR-SFP+ Adapter

Network Adapters not directly mapped just vmnet3 Adapters
i just got a capture of it failing with Serial Output

[?25lGNU GRUB  version 2.02


+----------------------------------------------------------------------------+||||||||||||||||||||||||+----------------------------------------------------------------------------+     Use the ^ and v keys to select which entry is highlighted.          

      Press enter to boot the selected OS, `e' to edit the commands       

      before booting or `c' for a command-line.                             SFLoader                                                                    18_5_1_318                                                                 *18_5_1_326                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 The highlighted entry will be executed automatically in 5s.                    The highlighted entry will be executed automatically in 4s.                    The highlighted entry will be executed automatically in 3s.                    The highlighted entry will be executed automatically in 2s.                    The highlighted entry will be executed automatically in 1s.                    The highlighted entry will be executed automatically in 0s.                 [?25h  Booting `18_5_1_326'


[    4.001409] sd 0:0:0:0: [sda] Assuming drive cache: write through
[    4.006629] sd 0:0:1:0: [sdb] Assuming drive cache: write through
Loading configuration
Performing automated file system integrity checks. It will take some time before your system is available.
Examining Config partition.....
Examining Signature partition.....
Examining Report partition.....

### System Detail ###

Number of cores:                4
Total RAM:                      6144 MB
Total Number of interfaces:     3
Total Primary Disk:             4 GB
Total Auxiliary Disk:           80 GB

#####################

Password: [1663599.304140] BUG: unable to handle kernel NULL pointer dereference at 0000000000000042
[1663599.307942] IP: 0xffffffffc0507729
[1663599.309569] PGD 800000010b683067 P4D 800000010b683067 PUD 10b684067 PMD 0 
[1663599.312649] Oops: 0002 [#1] SMP PTI
[1663599.314301] Modules linked in: nfnetmap_queue(O) nf_conntrack_ipslb xt_svp xt_xfrmpolicy ah4 ppp_synctty ppp_async crc_ccitt pppoe pppox ppp_generic slhc vfp_firewall(O) xt_addrtype debug_cntrs(O) nf_nat_ftp nf_conntrack_ftp xt_CT ebtable_filter ebtable_nat ebtables ip6t_MASQUERADE xt_muser xt_conntrack xt_LBS ip6table_filter iptable_filter xt_DNAT xt_SNAT nf_nat_masquerade_ipv6 xt_nat_lookup xt_UST xt_ust xt_firewall nat_rules sfos_rules_framework firewall ip_set_hash_mlmwsticky ip_set_hash_sslvpn iptable_mangle ip_set_hash_mac ip_set_hash_bw nf_conntrack_dns nf_nat_sip nf_conntrack_sip nf_nat_irc nf_conntrack_irc nf_nat_tftp nf_conntrack_tftp nf_nat_h323 nf_conntrack_h323 nf_nat_pptp nf_conntrack_pptp usbhid hid_generic hid ohci_pci ohci_hcd xhci_pci xhci_hcd uhci_hcd ehci_pci ehci_hcd fw_handle_ngfw_notification
[1663599.345113]  fp2sp_api fp_notifier bonding cifs red red2 appdev nf_conntrack_netlink nf_nat_proto_gre nf_conntrack_proto_gre set_sessiontbl sessiontbl ip_gre gre ipcomp xfrm_ipcomp esp4 xfrm4_mode_transport xfrm4_mode_tunnel xfrm4_tunnel xfrm_user af_key xfrm_algo cls_u32 act_mirred sch_ingress ifb sch_hfsc sch_leafprio sch_headprio sch_sfq sch_htb xt_MULTISET xt_MLM xt_SRCNETMAP xt_MARKROUTE xt_CONTINUE xt_LOGDROP xt_ULOG xt_TCPMSS xt_REDIRECT nf_nat_redirect ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_OUT_OUTDEV ip6t_rpfilter ipt_rpfilter ebt_nflog ebt_pkttype xt_serviceset xt_appset xt_hostset xt_pkttype xt_recent xt_state xt_status xt_cet xt_OUTDEV xt_iprange xt_limit xt_hashlimit xt_tcpudp xt_multiport nf_conntrack_relate xt_IPMACFILTER xt_RANGENAT xt_VHDNAT ip_set_bitmap_vhost xt_FWSET xt_set
[1663599.375033]  ip_set_bitmap_hotspotuser ip_set_hash_hotspotmac ip_set_bitmap_tlsrule ip_set_bitmap_appset ip_set_bitmap_fwrule ip_set_bitmap_ctrxss ip_set_bitmap_user sp2fp_api ip_set_bitmap_userpolicy ip_set_hash_ipuser ip_set_bitmap_service ip_set_bitmap_host ip_set_hash_ipmaciface ip_set_hash_l2mac ip_set_hash_ipmac ip_set_hash_ip ip_set arptable_filter arp_tables pcnet32 e100 e1000_nm(O) e1000e_nm(O) igb_nm(O) i2c_algo_bit i2c_core hwmon ptp pps_core vmxnet3_nm(O) netmap(O) ip6table_nat nf_nat_ipv6 ip6table_mangle ip6table_raw iptable_nat iptable_raw nf_nat_ipv4 xt_dscp nf_nat ip6_tables ip_tables tun af_packet 8021q nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 ip6_tunnel tunnel6 sit ip_tunnel tunnel4 ppdev parport_pc parport nf_conntrack lineartable bitmap_api br_netfilter bridge nf_defrag_ipv4
[1663599.405597]  ipv6 stp llc x_tables nfnetlink button evdev [last unloaded: nfnetmap_queue]
[1663599.409293] CPU: 2 PID: 3680 Comm: snort Tainted: G           O    4.14.38 #2
[1663599.412513] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
[1663599.417241] task: ffff8ec7f125e000 task.stack: ffff8ec7d6344000
[1663599.419981] RIP: 0010:0xffffffffc0507729
[1663599.421794] RSP: 0018:ffff8ec7d63478f8 EFLAGS: 00010286
[1663599.424140] RAX: 0000000000000001 RBX: ffff8ec727caea00 RCX: 0000000000000012
[1663599.427273] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000006
[1663599.430402] RBP: 0000000000000002 R08: ffffffffc04b70a0 R09: ffffffffc049e501
[1663599.433555] R10: ffff8ec7d63478f8 R11: ffff8ec7a34db902 R12: 0000000000000002
[1663599.436685] R13: ffff8ec7a34db900 R14: ffff8ec7a34db900 R15: 0000000000000001
[1663599.439818] FS:  00007f820c8d3000(0000) GS:ffff8ec83ff00000(0000) knlGS:0000000000000000
[1663599.443402] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1663599.445953] CR2: 0000000000000042 CR3: 0000000156342001 CR4: 00000000001606e0
[1663599.449128] Call Trace:
[1663599.450316]  nf_reinject+0xe5/0x150
[1663599.451994]  0xffffffffc1e2b501
[1663599.453503]  ? __getnstimeofday64+0x36/0xc0
[1663599.455408]  ? do_gettimeofday+0x10/0x50
[1663599.457201]  ? netmap_ioctl+0x23d/0x11f0 [netmap]
[1663599.459360]  ? 0xffffffffc1e2ba74
[1663599.460963]  0xffffffffc1e2ba74
[1663599.462456]  netmap_pipe_txsync+0xca/0x670 [netmap]
[1663599.464716]  netmap_ioctl+0x298/0x11f0 [netmap]
[1663599.466783]  ? netmap_poll+0x48e/0x580 [netmap]
[1663599.468853]  ? __kmalloc_track_caller+0x1e/0x100
[1663599.470936]  linux_netmap_change_mtu+0x506/0x610 [netmap]
[1663599.473370]  ? core_sys_select+0x15f/0x250
[1663599.475236]  ? core_sys_select+0x194/0x250
[1663599.477138]  ? SyS_sendto+0xae/0x130
[1663599.478789]  do_vfs_ioctl+0x88/0x5c0
[1663599.480452]  SyS_ioctl+0x36/0x70
[1663599.481957]  do_syscall_64+0x63/0x120
[1663599.483684]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[1663599.485980] RIP: 0033:0x7f820d9a4037
[1663599.487626] RSP: 002b:00007ffc7824be78 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[1663599.490932] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f820d9a4037
[1663599.494127] RDX: 0000000000000000 RSI: 0000000000006994 RDI: 0000000000000034
[1663599.497308] RBP: 00007f81b1d4d000 R08: 0000000000000001 R09: 0000000000000000
[1663599.500449] R10: 0000000000000000 R11: 0000000000000202 R12: 000000000b72eef0
[1663599.503751] R13: 00000000000001ff R14: 00000000000001fe R15: 0000000000000000
[1663599.507633] Code: 2e 48 8b 8b 88 00 00 00 49 8b b5 b0 00 00 00 48 85 c9 0f 84 a6 01 00 00 48 8b 49 10 48 85 c9 0f 84 99 01 00 00 8b 89 08 01 00 00 <66> 89 4e 42 83 f8 01 0f 85 19 fc ff ff e9 07 fc ff ff 48 0f ba 
[1663599.515820] RIP: 0xffffffffc0507729 RSP: ffff8ec7d63478f8
[1663599.518226] CR2: 0000000000000042
[1663599.519989] ---[ end trace 07a716d7507a91e3 ]---
[1663599.522074] Kernel panic - not syncing: Fatal exception
[1663599.524544] Kernel Offset: 0x3b000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[1663599.532836] Rebooting in 3 seconds..
[1663602.558252] ACPI MEMORY or I/O RESET_REG.
[?25lGNU GRUB  version 2.02


+----------------------------------------------------------------------------+||||||||||||||||||||||||+----------------------------------------------------------------------------+     Use the ^ and v keys to select which entry is highlighted.          

      Press enter to boot the selected OS, `e' to edit the commands       

      before booting or `c' for a command-line.                             SFLoader                                                                    18_5_1_318                                                                 *18_5_1_326                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 The highlighted entry will be executed automatically in 5s.                    The highlighted entry will be executed automatically in 4s.                    The highlighted entry will be executed automatically in 3s.                    The highlighted entry will be executed automatically in 2s.                    The highlighted entry will be executed automatically in 1s.                    The highlighted entry will be executed automatically in 0s.                 [?25h  Booting `18_5_1_326'


[    4.020727] sd 0:0:0:0: [sda] Assuming drive cache: write through
[    4.024217] sd 0:0:1:0: [sdb] Assuming drive cache: write through
Loading configuration
Performing automated file system integrity checks. It will take some time before your system is available.
Examining Config partition.....
Examining Signature partition.....
Examining Report partition.....

### System Detail ###

Number of cores:                4
Total RAM:                      6144 MB
Total Number of interfaces:     3
Total Primary Disk:             4 GB
Total Auxiliary Disk:           80 GB

#####################

Password: 



This thread was automatically locked due to age.
Parents
  • Hi,

    have you locked the CPU and disk space to the XG?

    Ian

    XG115W - v20 GA - Home

    XG on VM 8 - v20 GA

    If a post solves your question please use the 'Verify Answer' button.

  • Disc is Thick Provisioned

    CPU has no Reservations and isn't locket so specified cores, also HT is activated, cpu ready is lower than 10, cpu loads round about 5-6% memory 55GB used 100gb provisioned

    the system is not overprovisioned cpu ready is all the time well within specs, of the 16 cores 16 virtual cores are assigned which should be well within specs i think

  • Hi,

    if this a home system then you will in theory only be using 4 CPUs. You still need to lock CPUs in a virtual machine otherwise. HT is not recommended for the XG as it provides minimal processing improvement under load because HT shares resources with its parent CPU core.

    Ian

    XG115W - v20 GA - Home

    XG on VM 8 - v20 GA

    If a post solves your question please use the 'Verify Answer' button.

  • yes its a home system so i only added 4 cores and 6gb ram

    with this setup it should not jump to HT, i have the same issue since nearly a year and back than it had only 10cores in use by all VMs, so no need for the hypervisor to utilize HT but i can also disable HT and fix the CPU cores for this VM

    this is just one of the home lab servers that i also use for the firewall so i can remove all other workloads

    from the trace it looks like there is something happening within snort, so i keep it enabled and unchanged until the next crash

    these happen random for me, i don't see any pattern at the moment

  • Hi,

    we start with the obvious things then work up to the more difficult items. HT is only an issue if you assign a virtual cpu that is a HT not a real CPU. Previous posters have advised that their VM uses more than the 4 CPUs.

    Look at your disk stability and reiiabilty.

    Ian

    XG115W - v20 GA - Home

    XG on VM 8 - v20 GA

    If a post solves your question please use the 'Verify Answer' button.

  • Disk Controller is OK (Smart Array P440ar Controller)

    Cache Battery is OK

    Raid is OK

    All Disks in this Array are OK, this Array was also never expanded

    Storage is Local 4x Matching HPE EG0600JEHMA discs

    no encryption on the Storage

    Measured Latency for Storage Requests is lower than 1ms

    as i said this is mainly a lab System, no big loads on this one, i have also removed some unused test VMs and will remove HT the next time i can reboot this Server (this weekend)

  • Hello Florian,

    Adding to what rfcat mentioned, what Firmware Version are you running?

    Anything under /var/cores?

    Are you able to disable firewall-acceleration to see if this improves the situation (You would need to SSH and be un the Console (5>4))

    console> system firewall-acceleration disable

    Regards,


     
    Emmanuel (EmmoSophos)
    Technical Team Lead, Global Community Support
    Sophos Support VideosProduct Documentation  |  @SophosSupport  | Sign up for SMS Alerts
    If a post solves your question use the 'Verify Answer' link.
Reply Children
No Data