Skip to content

stable/202510 - crash on ip4_lookup_node #3692

@sinuscosinustan

Description

@sinuscosinustan

Issue description

During production load we've observed repeated crashes of VPP on ip4_lookup_node (or, in this case ip4_lookup_node_fn_hsw). Based on the logs, the crash happened when bird started to add and replace several thousands of routes in the Linux FIB.

Environment

The system is being used as linux-cp router with bird2 as routing daemon.

System:
  CPU: AMD EPYC 7303P 16-Core Processor
  Memory: 2x32GB DDR4 ECC RDIMM 2400 MT/s HMA84GR7MFR4N-UH
  Distribution: Ubuntu 24.04.03 LTS
  Kernel: 6.8.0-87-generic
  NIC types:
    pci@0000:01:00.0 Ethernet Controller XL710 for 40GbE QSFP+
    pci@0000:01:00.1 Ethernet Controller XL710 for 40GbE QSFP+
    pci@0000:81:00.0 MT28800 Family [ConnectX-5 Ex]
    pci@0000:81:00.1 MT28800 Family [ConnectX-5 Ex]
    pci@0000:82:00.0 Ethernet Controller X710 for 10GbE SFP+
    pci@0000:82:00.1 Ethernet Controller X710 for 10GbE SFP+
    pci@0000:82:00.2 Ethernet Controller X710 for 10GbE SFP+
    pci@0000:82:00.3 Ethernet Controller X710 for 10GbE SFP+

VPP:
Version:                  v25.10-release
Compiled by:              root
Compile host:             a55774381098
Compile date:             2025-10-29T10:59:07
Compile location:         /w/workspace/vpp-merge-2510-ubuntu2404-x86_64
Compiler:                 Clang/LLVM 18.1.3 (1ubuntu1)
Current PID:              294929
Command line arguments:
  /usr/bin/vpp
  unix
    {
    log
    /var/log/vpp/vpp.log
    nodaemon
    full-coredump
    cli-listen
    /run/vpp/cli.sock
    gid
    vpp
    exec
    /etc/vpp/rendered_config.vpp
    }
  api-trace
    {
    on
    }
  api-segment
    {
    gid
    vpp
    }
  socksvr
    {
    default
    }
  memory
    {
    main-heap-size
    1536M
    main-heap-page-size
    default-hugepage
    }
  cpu
    {
    main-core
    0
    workers
    3
    }
  buffers
    {
    buffers-per-numa
    128000
    default
    data-size
    2048
    page-size
    default-hugepage
    }
  dpdk
    {
    dev
    0000:01:00.0
      {
      }
    dev
    0000:01:00.1
      {
      }
    dev
    0000:81:00.0
      {
      }
    dev
    0000:81:00.1
      {
      }
    dev
    0000:82:00.0
      {
      }
    dev
    0000:82:00.1
      {
      }
    dev
    0000:82:00.2
      {
      }
    dev
    0000:82:00.3
      {
      }
    }
  statseg
    {
    size
    1G
    page-size
    default-hugepage
    per-node-counters
    off
    }
  plugins
    {
    plugin
    linux_cp_plugin.so
      {
      enable
      }
    plugin
    linux_nl_plugin.so
      {
      enable
      }
    }
  logging
    {
    default-log-level
    info
    default-syslog-log-level
    notice
    }
  linux-cp
    {
    default
    netns
    dataplane
    lcp-sync
    }

vpp-dump.log

Coredump

The coredump is, unfortunately, too big for GitHub, even when compressed. The file has been uploaded to a S3 bucket of mine: https://nbg1.your-objectstorage.com/wp-login.php/293066.dump.gz

If you have a better idea where to upload coredumps to, please let me know.

Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: vpp[293066]: received signal SIGSEGV, PC 0x73352e67a4b3, faulting address 0x75d20c0b8978
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: received signal SIGSEGV, PC 0x73352e67a4b3, faulting address 0x75d20c0b8978
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: vpp[293066]: Code:  46 8b 24 80 41 f6 c4 01 75 21 41 d1 ec 49 69 c4 40 05 00 00
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: Code:  46 8b 24 80 41 f6 c4 01 75 21 41 d1 ec 49 69 c4 40 05 00 00
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: vpp[293066]: #0  0x000073352e67a4b3 ip4_lookup_node_fn_hsw + 0x1443
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: vpp[293066]:      from /lib/x86_64-linux-gnu/libvnet.so.25.10
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: #0  0x000073352e67a4b3 ip4_lookup_node_fn_hsw + 0x1443
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]:      from /lib/x86_64-linux-gnu/libvnet.so.25.10
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: vpp[293066]: #1  0x000073352d63969f vlib_exit_with_status + 0x80f
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: vpp[293066]:      from /lib/x86_64-linux-gnu/libvlib.so.25.10
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: #1  0x000073352d63969f vlib_exit_with_status + 0x80f
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]:      from /lib/x86_64-linux-gnu/libvlib.so.25.10
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: vpp[293066]: #2  0x000073352d63c65e vlib_exit_with_status + 0x37ce
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: vpp[293066]:      from /lib/x86_64-linux-gnu/libvlib.so.25.10
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: #2  0x000073352d63c65e vlib_exit_with_status + 0x37ce
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]:      from /lib/x86_64-linux-gnu/libvlib.so.25.10
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: vpp[293066]: #3  0x000073352d67f88e vlib_worker_thread_bootstrap_fn + 0x4e
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: vpp[293066]:      from /lib/x86_64-linux-gnu/libvlib.so.25.10
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: #3  0x000073352d67f88e vlib_worker_thread_bootstrap_fn + 0x4e
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]:      from /lib/x86_64-linux-gnu/libvlib.so.25.10
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: vpp[293066]: #4  0x000073352d29caa4 pthread_condattr_setpshared + 0x684
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: vpp[293066]:      from /lib/x86_64-linux-gnu/libc.so.6
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: #4  0x000073352d29caa4 pthread_condattr_setpshared + 0x684
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]:      from /lib/x86_64-linux-gnu/libc.so.6
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: vpp[293066]: #5  0x000073352d329c6c __clone + 0x24c
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: vpp[293066]:      from /lib/x86_64-linux-gnu/libc.so.6
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]: #5  0x000073352d329c6c __clone + 0x24c
Feb 09 01:10:23 er1.cgn1.asX.net vpp[293066]:      from /lib/x86_64-linux-gnu/libc.so.6

       Message: Process 293066 (vpp_main) of user 0 dumped core.

                Module libsystemd.so.0 from deb systemd-255.4-1ubuntu8.12.amd64
                Module libzstd.so.1 from deb libzstd-1.5.5+dfsg2-2build1.1.amd64
                Stack trace of thread 293069:
                #0  0x000073352d29eb2c pthread_kill (libc.so.6 + 0x9eb2c)
                #1  0x000073352d24527e raise (libc.so.6 + 0x4527e)
                #2  0x000073352d2288ff abort (libc.so.6 + 0x288ff)
                #3  0x00005d6b0a53aa75 os_exit (vpp + 0x6a75)
                #4  0x000073352d6a04f8 n/a (libvlib.so.25.10 + 0xa04f8)
                #5  0x000073352d245330 n/a (libc.so.6 + 0x45330)
                #6  0x000073352e67a4b3 ip4_lookup_node_fn_hsw (libvnet.so.25.10 + 0xc7a4b3)
                #7  0x000073352d63969f n/a (libvlib.so.25.10 + 0x3969f)
                #8  0x000073352d63c65e n/a (libvlib.so.25.10 + 0x3c65e)
                #9  0x000073352d67f88e vlib_worker_thread_bootstrap_fn (libvlib.so.25.10 + 0x7f88e)
                #10 0x000073352d29caa4 n/a (libc.so.6 + 0x9caa4)
                #11 0x000073352d329c6c n/a (libc.so.6 + 0x129c6c)

                Stack trace of thread 293070:
                #0  0x000073352d63af18 n/a (libvlib.so.25.10 + 0x3af18)
                #1  0x000073352d67f88e vlib_worker_thread_bootstrap_fn (libvlib.so.25.10 + 0x7f88e)
                #2  0x000073352d29caa4 n/a (libc.so.6 + 0x9caa4)
                #3  0x000073352d329c6c n/a (libc.so.6 + 0x129c6c)

                Stack trace of thread 293071:
                #0  0x00007334cbe34af4 dpdk_input_node_fn_hsw (dpdk_plugin.so + 0xe34af4)
                #1  0x000073352d63afda n/a (libvlib.so.25.10 + 0x3afda)
                #2  0x000073352d67f88e vlib_worker_thread_bootstrap_fn (libvlib.so.25.10 + 0x7f88e)
                #3  0x000073352d29caa4 n/a (libc.so.6 + 0x9caa4)
                #4  0x000073352d329c6c n/a (libc.so.6 + 0x129c6c)

                Stack trace of thread 293066:
                #0  0x000073352d32be6b recvmsg (libc.so.6 + 0x12be6b)
                #1  0x00007334ccd603bf nl_recv (libnl-3.so.200 + 0x113bf)
                #2  0x00007334ccd60cf5 nl_recvmsgs_report (libnl-3.so.200 + 0x11cf5)
                #3  0x00007334ccd6131d nl_recvmsgs (libnl-3.so.200 + 0x1231d)
                #4  0x00007334ca892b75 n/a (linux_nl_plugin.so + 0x8b75)
                #5  0x00007334ca89addc n/a (linux_nl_plugin.so + 0x10ddc)
                #6  0x000073352d62a0ee vlib_file_poll (libvlib.so.25.10 + 0x2a0ee)
                #7  0x000073352d636354 vlib_main (libvlib.so.25.10 + 0x36354)
                #8  0x000073352d69fc1d n/a (libvlib.so.25.10 + 0x9fc1d)
                #9  0x000073352f729fa0 clib_calljmp (libvppinfra.so.25.10 + 0x7ffa0)
                ELF object binary architecture: AMD x86-64

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions