Issue description
During production load we've observed repeated crashes of VPP on ip4_lookup_node (or, in this case ip4_lookup_node_fn_hsw). Based on the logs, the crash happened when bird started to add and replace several thousands of routes in the Linux FIB.
Environment
The system is being used as linux-cp router with bird2 as routing daemon.
System:
CPU: AMD EPYC 7303P 16-Core Processor
Memory: 2x32GB DDR4 ECC RDIMM 2400 MT/s HMA84GR7MFR4N-UH
Distribution: Ubuntu 24.04.03 LTS
Kernel: 6.8.0-87-generic
NIC types:
pci@0000:01:00.0 Ethernet Controller XL710 for 40GbE QSFP+
pci@0000:01:00.1 Ethernet Controller XL710 for 40GbE QSFP+
pci@0000:81:00.0 MT28800 Family [ConnectX-5 Ex]
pci@0000:81:00.1 MT28800 Family [ConnectX-5 Ex]
pci@0000:82:00.0 Ethernet Controller X710 for 10GbE SFP+
pci@0000:82:00.1 Ethernet Controller X710 for 10GbE SFP+
pci@0000:82:00.2 Ethernet Controller X710 for 10GbE SFP+
pci@0000:82:00.3 Ethernet Controller X710 for 10GbE SFP+
VPP:
Version: v25.10-release
Compiled by: root
Compile host: a55774381098
Compile date: 2025-10-29T10:59:07
Compile location: /w/workspace/vpp-merge-2510-ubuntu2404-x86_64
Compiler: Clang/LLVM 18.1.3 (1ubuntu1)
Current PID: 294929
Command line arguments:
/usr/bin/vpp
unix
{
log
/var/log/vpp/vpp.log
nodaemon
full-coredump
cli-listen
/run/vpp/cli.sock
gid
vpp
exec
/etc/vpp/rendered_config.vpp
}
api-trace
{
on
}
api-segment
{
gid
vpp
}
socksvr
{
default
}
memory
{
main-heap-size
1536M
main-heap-page-size
default-hugepage
}
cpu
{
main-core
0
workers
3
}
buffers
{
buffers-per-numa
128000
default
data-size
2048
page-size
default-hugepage
}
dpdk
{
dev
0000:01:00.0
{
}
dev
0000:01:00.1
{
}
dev
0000:81:00.0
{
}
dev
0000:81:00.1
{
}
dev
0000:82:00.0
{
}
dev
0000:82:00.1
{
}
dev
0000:82:00.2
{
}
dev
0000:82:00.3
{
}
}
statseg
{
size
1G
page-size
default-hugepage
per-node-counters
off
}
plugins
{
plugin
linux_cp_plugin.so
{
enable
}
plugin
linux_nl_plugin.so
{
enable
}
}
logging
{
default-log-level
info
default-syslog-log-level
notice
}
linux-cp
{
default
netns
dataplane
lcp-sync
}
vpp-dump.log
Coredump
The coredump is, unfortunately, too big for GitHub, even when compressed. The file has been uploaded to a S3 bucket of mine: https://nbg1.your-objectstorage.com/wp-login.php/293066.dump.gz
If you have a better idea where to upload coredumps to, please let me know.
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: vpp[293066]: received signal SIGSEGV, PC 0x73352e67a4b3, faulting address 0x75d20c0b8978
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: received signal SIGSEGV, PC 0x73352e67a4b3, faulting address 0x75d20c0b8978
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: vpp[293066]: Code: 46 8b 24 80 41 f6 c4 01 75 21 41 d1 ec 49 69 c4 40 05 00 00
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: Code: 46 8b 24 80 41 f6 c4 01 75 21 41 d1 ec 49 69 c4 40 05 00 00
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: vpp[293066]: #0 0x000073352e67a4b3 ip4_lookup_node_fn_hsw + 0x1443
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: vpp[293066]: from /lib/x86_64-linux-gnu/libvnet.so.25.10
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: #0 0x000073352e67a4b3 ip4_lookup_node_fn_hsw + 0x1443
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: from /lib/x86_64-linux-gnu/libvnet.so.25.10
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: vpp[293066]: #1 0x000073352d63969f vlib_exit_with_status + 0x80f
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: vpp[293066]: from /lib/x86_64-linux-gnu/libvlib.so.25.10
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: #1 0x000073352d63969f vlib_exit_with_status + 0x80f
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: from /lib/x86_64-linux-gnu/libvlib.so.25.10
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: vpp[293066]: #2 0x000073352d63c65e vlib_exit_with_status + 0x37ce
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: vpp[293066]: from /lib/x86_64-linux-gnu/libvlib.so.25.10
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: #2 0x000073352d63c65e vlib_exit_with_status + 0x37ce
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: from /lib/x86_64-linux-gnu/libvlib.so.25.10
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: vpp[293066]: #3 0x000073352d67f88e vlib_worker_thread_bootstrap_fn + 0x4e
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: vpp[293066]: from /lib/x86_64-linux-gnu/libvlib.so.25.10
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: #3 0x000073352d67f88e vlib_worker_thread_bootstrap_fn + 0x4e
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: from /lib/x86_64-linux-gnu/libvlib.so.25.10
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: vpp[293066]: #4 0x000073352d29caa4 pthread_condattr_setpshared + 0x684
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: vpp[293066]: from /lib/x86_64-linux-gnu/libc.so.6
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: #4 0x000073352d29caa4 pthread_condattr_setpshared + 0x684
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: from /lib/x86_64-linux-gnu/libc.so.6
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: vpp[293066]: #5 0x000073352d329c6c __clone + 0x24c
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: vpp[293066]: from /lib/x86_64-linux-gnu/libc.so.6
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: #5 0x000073352d329c6c __clone + 0x24c
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: from /lib/x86_64-linux-gnu/libc.so.6
Message: Process 293066 (vpp_main) of user 0 dumped core.
Module libsystemd.so.0 from deb systemd-255.4-1ubuntu8.12.amd64
Module libzstd.so.1 from deb libzstd-1.5.5+dfsg2-2build1.1.amd64
Stack trace of thread 293069:
#0 0x000073352d29eb2c pthread_kill (libc.so.6 + 0x9eb2c)
#1 0x000073352d24527e raise (libc.so.6 + 0x4527e)
#2 0x000073352d2288ff abort (libc.so.6 + 0x288ff)
#3 0x00005d6b0a53aa75 os_exit (vpp + 0x6a75)
#4 0x000073352d6a04f8 n/a (libvlib.so.25.10 + 0xa04f8)
#5 0x000073352d245330 n/a (libc.so.6 + 0x45330)
#6 0x000073352e67a4b3 ip4_lookup_node_fn_hsw (libvnet.so.25.10 + 0xc7a4b3)
#7 0x000073352d63969f n/a (libvlib.so.25.10 + 0x3969f)
#8 0x000073352d63c65e n/a (libvlib.so.25.10 + 0x3c65e)
#9 0x000073352d67f88e vlib_worker_thread_bootstrap_fn (libvlib.so.25.10 + 0x7f88e)
#10 0x000073352d29caa4 n/a (libc.so.6 + 0x9caa4)
#11 0x000073352d329c6c n/a (libc.so.6 + 0x129c6c)
Stack trace of thread 293070:
#0 0x000073352d63af18 n/a (libvlib.so.25.10 + 0x3af18)
#1 0x000073352d67f88e vlib_worker_thread_bootstrap_fn (libvlib.so.25.10 + 0x7f88e)
#2 0x000073352d29caa4 n/a (libc.so.6 + 0x9caa4)
#3 0x000073352d329c6c n/a (libc.so.6 + 0x129c6c)
Stack trace of thread 293071:
#0 0x00007334cbe34af4 dpdk_input_node_fn_hsw (dpdk_plugin.so + 0xe34af4)
#1 0x000073352d63afda n/a (libvlib.so.25.10 + 0x3afda)
#2 0x000073352d67f88e vlib_worker_thread_bootstrap_fn (libvlib.so.25.10 + 0x7f88e)
#3 0x000073352d29caa4 n/a (libc.so.6 + 0x9caa4)
#4 0x000073352d329c6c n/a (libc.so.6 + 0x129c6c)
Stack trace of thread 293066:
#0 0x000073352d32be6b recvmsg (libc.so.6 + 0x12be6b)
#1 0x00007334ccd603bf nl_recv (libnl-3.so.200 + 0x113bf)
#2 0x00007334ccd60cf5 nl_recvmsgs_report (libnl-3.so.200 + 0x11cf5)
#3 0x00007334ccd6131d nl_recvmsgs (libnl-3.so.200 + 0x1231d)
#4 0x00007334ca892b75 n/a (linux_nl_plugin.so + 0x8b75)
#5 0x00007334ca89addc n/a (linux_nl_plugin.so + 0x10ddc)
#6 0x000073352d62a0ee vlib_file_poll (libvlib.so.25.10 + 0x2a0ee)
#7 0x000073352d636354 vlib_main (libvlib.so.25.10 + 0x36354)
#8 0x000073352d69fc1d n/a (libvlib.so.25.10 + 0x9fc1d)
#9 0x000073352f729fa0 clib_calljmp (libvppinfra.so.25.10 + 0x7ffa0)
ELF object binary architecture: AMD x86-64
Issue description
During production load we've observed repeated crashes of VPP on
ip4_lookup_node(or, in this caseip4_lookup_node_fn_hsw). Based on the logs, the crash happened when bird started to add and replace several thousands of routes in the Linux FIB.Environment
The system is being used as linux-cp router with bird2 as routing daemon.
vpp-dump.log
Coredump
The coredump is, unfortunately, too big for GitHub, even when compressed. The file has been uploaded to a S3 bucket of mine: https://nbg1.your-objectstorage.com/wp-login.php/293066.dump.gz
If you have a better idea where to upload coredumps to, please let me know.