A comprehensive toolset for performance analysis and troubleshooting of virtualized network systems using eBPF technologies.
This repository provides a collection of eBPF-based tools for monitoring, tracing, and analyzing network performance issues in virtualized environments. The tools are designed to help identify packet drops, measure latency, trace data paths, and analyze system performance bottlenecks.
Main directory containing all eBPF-based monitoring and troubleshooting tools organized by system component:
CPU and scheduler monitoring tools:
- Off-CPU time analysis
- Scheduler latency monitoring
- Futex and pthread lock tracing
KVM virtualization network stack monitoring:
kvm/- KVM IRQ injection and interrupt statisticstun/- TUN/TAP device monitoring (ring buffer, GSO, TX stats)vhost-net/- vhost eventfd, queue correlation, buffer peek statsvirtio-net/- virtio-net polling, IRQ monitoring, RX path tracing
Linux kernel network stack tools:
- Connection tracking (conntrack) monitoring
- IP fragmentation/defragmentation tracing
packet-drop/- Comprehensive packet drop detection and analysis
Open vSwitch specific monitoring:
- Userspace megaflow analysis
- Kernel module drop monitoring
- Upcall latency measurement
Network performance measurement tools:
system-network/- System-level network latency and metrics (ICMP RTT, TCP latency)vm-network/- VM-specific network performance analysis- VM network latency decomposition
vm_pair_latency/- Inter-VM latency monitoringvm_pair_latency_gap/- Latency gap and jitter analysis
Additional tracing tools:
- Abnormal ARP detection
- OVS connection tracking invalid states
- Network offloading and segmentation tracing
- Qdisc and TX queue monitoring
Documentation and guides:
publish/- User manuals and deployment guides- Design documents for various monitoring approaches
Test configurations and specifications for all tool categories
- Linux kernel with eBPF support (4.1+ recommended)
- Root privileges for BPF operations
- BCC (BPF Compiler Collection) - Required for Python BPF tools
- bpftrace - Required for .bt scripts
- Python 2/3 - Most tools support both versions
- Python 2 for el7 systems with bcc package
- Python 3 for oe1 systems with bpfcc package
- Install BCC:
# For RHEL/CentOS 7
sudo yum install bcc-tools python2-bcc
# For openEuler distributions
sudo apt-get install bpfcc-tools python3-bpfcc- Install bpftrace:
# For RHEL/CentOS
sudo yum install bpftrace
# For Ubuntu/Debian
sudo apt-get install bpftrace- Clone the repository:
git clone <repository-url>
cd troubleshooting-toolsMonitor packet drops for specific connections:
# Monitor TCP drops for specific connection
sudo ./bcc-tools/packet-drop/multi-protocol-drop-monitor.py \
--src 192.168.1.10 --dst 192.168.1.20 --protocol tcp --dst-port 443
# Monitor all protocol drops
sudo ./bcc-tools/packet-drop/drop.pyMeasure network latency at various layers:
# System network ICMP latency
sudo ./bcc-tools/performance/system-network/icmp_rtt_latency.py \
--src-ip 192.168.1.10 --dst-ip 192.168.1.20 \
--phy-iface1 eth0 --phy-iface2 eth1
# VM network latency
sudo ./bcc-tools/performance/vm-network/vm_latency.py \
--src-ip 192.168.1.10 --dst-ip 192.168.1.20 \
--vm-interface vnet0 --phy-interface eth0 --direction txMonitor OVS operations:
# OVS upcall monitoring
sudo ./bcc-tools/ovs-measurement/ovs-upcall-execute.py
# OVS megaflow analysis
sudo ./bcc-tools/ovs-measurement/ovs_userspace_megaflow.py --debugMonitor CPU-related performance issues:
# Off-CPU time analysis
sudo ./bcc-tools/cpu-measurement/offcputime-ts.py
# Scheduler latency monitoring
sudo ./bcc-tools/cpu-measurement/sched_latency_monitor.shUse bpftrace for quick analysis:
# Trace abnormal ARP events
sudo bpftrace bpftrace-tools/trace-abnormal-arp.bt
# Monitor OVS connection tracking issues
sudo bpftrace bpftrace-tools/trace-ovs-ct-invalid.btThis toolset is designed for virtualized network environments with the following architecture:
- System Network: Physical host network interfaces and kernel network stack
- VM Network: Virtual machine network interfaces (kvm virtlization env , with TUN/TAP devices, vnet interfaces)
- OVS Integration: Open vSwitch datapath monitoring (ovs as vswitch in virtualization env)
- Virtio Network: Virtio network device performance monitoring (vhost-net && virtio-net)
For detailed architecture information, see the documentation in the md/ directory.
Most tools output to stdout by default. To capture logs:
# Redirect output to log file
sudo ./tool-name --options > output.log 2>&1
# Real-time monitoring with logging
sudo ./tool-name --options | tee output.log- Tools may introduce overhead in production environments
- Use filtering options to reduce noise and performance impact
- Monitor system resources when running intensive tracing
- Consider using sampling or time-limited tracing for high-traffic systems: summary version
- Permission Denied: Ensure running with root privileges
- BPF Program Load Failed: Check kernel BPF support and function availability
- Symbol Resolution: Ensure kernel debug symbols are installed
- Interface Not Found: Verify interface names and indices
Many tools support debug mode for detailed output:
sudo ./tool-name --debugWhen adding new tools:
- Follow the existing directory structure
- Include comprehensive help text and examples
- Add appropriate error handling
- Document tool functionality in comments
- Test on target kernel versions
例如数据包在虚拟化数据路径中的处理不符合预期,需要定位 root cause。这类问题往往是非常复杂的,其根本原因在于执行路径上特定位置的处理逻辑不符合预期,再进一步其根本原因在于某些数据结构/元数据的值发生非预期的变化。因此最根本的问题是找到这些影响控制逻辑的非预期的数据结构/元数据值的变化。此工具正是为了便于做这类追踪,基本原理是在数据路径各个关键点上嵌入追踪各类最常见网络处理相关数据结构中的核心数据的能力,方便直观对比各个点的数据变化,进一步分析在对应代码中控制逻辑执行路径变化的原因,定位问题。
类似的,丢包追踪工具仅能初步筛查:虚拟化数据路径的 host 段是否发生特定类型的丢包,丢包位于何处(调用栈)。更进一步分析,需要结合丢包位置前后其他数据路径点上的数据结构/元数据具体内容变化来进一步定位,就需要借助该工具。
使用 performance 目录下的工具进行系统级和虚拟机网络性能分析,包括延迟测量、CPU 绑定优化等。
- 非研发:按照说明采集日志,交研发人员进一步分析
- 研发:根据特定问题确定追踪参数,提供完整命令给售后或自行执行
- 所有工具都需要 root 权限
- 输出默认到 stdout,使用重定向进行日志记录
- 工具针对虚拟化环境设计,监控 tun 接口, ovs internal port, 物理接口等
- 在生产环境中使用时考虑性能影响
- 内核/userspace 程序调用栈追踪需要正确的符号解析: 需要安装相应 kernel-debuginfo
docs/user-manual.md