Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BOLT][AArch64] unsupported CFI opcode on Ubuntu 24.04 and AL2023 #120992

Open
salvatoredipietro opened this issue Dec 23, 2024 · 5 comments
Open
Labels
BOLT crash Prefer [crash-on-valid] or [crash-on-invalid]

Comments

@salvatoredipietro
Copy link
Contributor

salvatoredipietro commented Dec 23, 2024

Description

When attempting to use BOLT on Amazon Linux 2023 (AL2023) and Ubuntu24.04 using an AArch64 instance (AWS m7g.4xlarge), llvm-bolt command fails with "unsupported CFI opcode" error. This happen especially when I am trying to use long perf profile ( > 120 seconds).

Environment

  • Operating System: Amazon Linux 2023
  • Hardware: AWS m7g.4xlarge instance (ARM-based)
  • BOLT version: d33a2c5
  • Application used: PostgreSQL (build locally)

Steps to Reproduce

  1. Compile Postgres 16.4:
export CFLAGS="-g -O2 -fstack-protector-strong -fno-omit-frame-pointer -moutline-atomics -fno-reorder-blocks-and-partition"
export CPPFLAGS="-D_FORTIFY_SOURCE=2"
export LDFLAGS="-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,--emit-relocs"

## Compile it
./configure --disable-rpath --enable-debug --enable-dtrace --enable-tap-tests --enable-thread-safety --with-gnu-ld --with-icu --with-tcl --with-perl --with-python --with-openssl --with-libxml --with-libxslt --prefix=\${HOME}/usr/  
make -j\$(nproc)
sudo make install
  1. Build BOLT on Amazon Linux 2023:
# BOLT  Installation on AL2023
sudo yum install  -y perf cmake ninja-build 
git clone https://github.com/llvm/llvm-project.git  
mkdir build   &&  cd build  
cmake -G Ninja ../llvm-project/llvm -DLLVM_TARGETS_TO_BUILD="X86;AArch64" -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=ON -DLLVM_ENABLE_PROJECTS="bolt"
ninja bolt
  1. Run the following command:
# Get perf information
sudo perf record -e cycles:u  -u postgres -o perf.data -a -- sleep 600      

# Run perf2bolt cmd
sudo ~/build/bin/perf2bolt -p perf.data -o perf.boltdata --nl ${HOME}/usr/bin/postgres

# Run BOLT
sudo ~/build/bin/llvm-bolt ${HOME}/usr/bin/postgres -o ${HOME}/usr/bin/postgres.bolt --data perf.boltdata --reorder-blocks ext-tsp --reorder-functions hfsort --split-functions --split-all-cold --split-eh --update-debug-sections --dyno-stats --print-profile-stats  

Error Message

BOLT-INFO: Starting stub-insertion pass
BOLT-INFO: Inserted 470 stubs in the hot area and 216 stubs in the cold area. Shared 0 times, iterated 3 times.
unsupported CFI opcode
UNREACHABLE executed at /home/ec2-user/llvm-project/bolt/lib/Core/BinaryFunction.cpp:2591!
 #0 0x0000000000e3d810 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/ec2-user/build/bin/llvm-bolt+0xe3d810)
 #1 0x0000000000e3b77c llvm::sys::RunSignalHandlers() (/home/ec2-user/build/bin/llvm-bolt+0xe3b77c)
 #2 0x0000000000e3b8a8 SignalHandler(int) Signals.cpp:0:0
 #3 0x0000ffffbe05d830 (linux-vdso.so.1+0x830)
 #4 0x0000ffffbda83554 __pthread_kill_implementation (/lib64/libc.so.6+0x92554)
 #5 0x0000ffffbda3a3e0 gsignal (/lib64/libc.so.6+0x493e0)
 #6 0x0000ffffbda21224 abort (/lib64/libc.so.6+0x30224)
 #7 0x0000000000dda54c (/home/ec2-user/build/bin/llvm-bolt+0xdda54c)
 #8 0x00000000015d5a44 llvm::bolt::(anonymous namespace)::CFISnapshot::advanceTo(int) BinaryFunction.cpp:0:0
 #9 0x00000000015d80b0 llvm::bolt::BinaryFunction::unwindCFIState(int, int, llvm::bolt::BinaryBasicBlock*, __gnu_cxx::__normal_iterator<llvm::MCInst*, std::vector<llvm::MCInst, std::allocator<llvm::MCInst>>>&) (/home/ec2-user/build/bin/llvm-bolt+0x15d80b0)
#10 0x00000000015da5e0 llvm::bolt::BinaryFunction::finalizeCFIState() (/home/ec2-user/build/bin/llvm-bolt+0x15da5e0)
#11 0x000000000147a15c std::_Function_handler<void (llvm::bolt::BinaryFunction&), llvm::bolt::FinalizeFunctions::runOnFunctions(llvm::bolt::BinaryContext&)::'lambda'(llvm::bolt::BinaryFunction&)>::_M_invoke(std::_Any_data const&, llvm::bolt::BinaryFunction&) BinaryPasses.cpp:0:0
#12 0x00000000016345c8 std::_Function_handler<void (), std::_Bind<llvm::bolt::ParallelUtilities::runOnEachFunction(llvm::bolt::BinaryContext&, llvm::bolt::ParallelUtilities::SchedulingPolicy, std::function<void (llvm::bolt::BinaryFunction&)>, std::function<bool (llvm::bolt::BinaryFunction const&)>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, bool, unsigned int)::'lambda'(std::_Rb_tree_iterator<std::pair<unsigned long const, llvm::bolt::BinaryFunction>>, std::_Rb_tree_iterator<std::pair<unsigned long const, llvm::bolt::BinaryFunction>>) (std::_Rb_tree_iterator<std::pair<unsigned long const, llvm::bolt::BinaryFunction>>, std::_Rb_tree_iterator<std::pair<unsigned long const, llvm::bolt::BinaryFunction>>)>>::_M_invoke(std::_Any_data const&) ParallelUtilities.cpp:0:0
#13 0x0000000000f1ff68 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<std::function<void ()>>>, void>>::_M_invoke(std::_Any_data const&) (/home/ec2-user/build/bin/llvm-bolt+0xf1ff68)
#14 0x0000000000f2074c std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) (/home/ec2-user/build/bin/llvm-bolt+0xf2074c)
#15 0x0000ffffbda86980 __pthread_once_slow (/lib64/libc.so.6+0x95980)
#16 0x0000000000f205bc std::__future_base::_Deferred_state<std::thread::_Invoker<std::tuple<std::function<void ()>>>, void>::_M_complete_async() (/home/ec2-user/build/bin/llvm-bolt+0xf205bc)
#17 0x0000000000f24eec std::_Function_handler<void (), std::shared_future<void> llvm::ThreadPoolInterface::asyncImpl<void>(std::function<void ()>, llvm::ThreadPoolTaskGroup*)::'lambda'()>::_M_invoke(std::_Any_data const&) (/home/ec2-user/build/bin/llvm-bolt+0xf24eec)
#18 0x00000000023f057c llvm::StdThreadPool::processTasks(llvm::ThreadPoolTaskGroup*) (/home/ec2-user/build/bin/llvm-bolt+0x23f057c)
#19 0x00000000023f1104 void* llvm::thread::ThreadProxy<std::tuple<llvm::StdThreadPool::grow(int)::'lambda'()>>(void*) ThreadPool.cpp:0:0
#20 0x0000ffffbda81934 start_thread (/lib64/libc.so.6+0x90934)
#21 0x0000ffffbda25e5c thread_start (/lib64/libc.so.6+0x34e5c)
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Aborted

Binary and perf profile: link

@github-actions github-actions bot added the BOLT label Dec 23, 2024
@salvatoredipietro
Copy link
Contributor Author

Added postgres binary and perf profile to the ticket

@EugeneZelenko EugeneZelenko added the crash Prefer [crash-on-valid] or [crash-on-invalid] label Jan 7, 2025
@salvatoredipietro
Copy link
Contributor Author

Running it with debug mode, I can see a long list of Trying to fix CFI states for each BB after reordering. on multiple functions in the finalize-functions stage.

Full log can be found here

BOLT-INFO: Inserted 1689 stubs in the hot area and 755 stubs in the cold area. Shared 0 times, iterated 3 times.
BOLT-INFO: Finished pass: long-jmp
BOLT-INFO: Starting pass: finalize-functions
Trying to fix CFI states for each BB after reordering.
Trying to fix CFI states for each BB after reordering.
This is the list of CFI states for each BB of Trying to fix CFI states for each BB after reordering.
This is the list of CFI states for each BB of Trying to fix CFI states for each BB after reordering.
_init/1(*2)Trying to fix CFI states for each BB after reordering.
This is the list of CFI states for each BB of RmgrNotFoundThis is the list of CFI states for each BB of : This is the list of CFI states for each BB of Trying to fix CFI states for each BB after reordering.
Trying to fix CFI states for each BB after reordering.
FetchPreparedStatement.part.0/1(*2): ExecHash/1(*2)0pg_dependencies_recvThis is the list of CFI states for each BB of This is the list of CFI states for each BB of Trying to fix CFI states for each BB after reordering.
: 4: 
Trying to fix CFI states for each BB after reordering.
: anytime_typmodin.part.0/1(*2)anyenum_inTrying to fix CFI states for each BB after reordering.
This is the list of CFI states for each BB of Trying to fix CFI states for each BB after reordering.
, 4Trying to fix CFI states for each BB after reordering.
3Trying to fix CFI states for each BB after reordering.
This is the list of CFI states for each BB of 3gtsvectorin: Trying to fix CFI states for each BB after reordering.
: This is the list of CFI states for each BB of Trying to fix CFI states for each BB after reordering.
fdw_handler_outTrying to fix CFI states for each BB after reordering.
This is the list of CFI states for each BB of 4, Trying to fix CFI states for each BB after reordering.
Trying to fix CFI states for each BB after reordering.
This is the list of CFI states for each BB of , brin_inclusion_unionThis is the list of CFI states for each BB of , Trying to fix CFI states for each BB after reordering.
: Trying to fix CFI states for each BB after reordering.
3This is the list of CFI states for each BB of 3Trying to fix CFI states for each BB after reordering.
init_have_lse_atomics/1(*2)Trying to fix CFI states for each BB after reordering.
This is the list of CFI states for each BB of : brinRevmapDesummarizeRangeTrying to fix CFI states for each BB after reordering.
Trying to fix CFI states for each BB after reordering.
This is the list of CFI states for each BB of brin_build_desc, 4Trying to fix CFI states for each BB after reordering.
This is the list of CFI states for each BB of Trying to fix CFI states for each BB after reordering.
This is the list of CFI states for each BB of 3: brin_minmax_multi_serialize/1(*2)3Trying to fix CFI states for each BB after reordering.
Trying to fix CFI states for each BB after reordering.
This is the list of CFI states for each BB of 3Trying to fix CFI states for each BB after reordering.
This is the list of CFI states for each BB of , brin_minmax_multi_optionsTrying to fix CFI states for each BB after reordering.
, This is the list of CFI states for each BB of : Trying to fix CFI states for each BB after reordering.
This is the list of CFI states for each BB of : 3Trying to fix CFI states for each BB after reordering.
This is the list of CFI states for each BB of This is the list of CFI states for each BB of mask_page_hint_bits: 4, This is the list of CFI states for each BB of heap_copytupleThis is the list of CFI states for each BB of brin_bloom_summary_in, : 9, 9This is the list of CFI states for each BB of This is the list of CFI states for each BB of allocateReloptStruct/1(*2), This is the list of CFI states for each BB of index_truncate_tupleginNewScanKey3: This is the list of CFI states for each BB of 3ginVacuumItemPointers8This is the list of CFI states for each BB of heap_reloptions, 7This is the list of CFI states for each BB of , lz4_decompress_datumCreateTupleDesc: 
114ginTraverseLock: dataLocateItem/1(*2): 3, 33, ginEntryInsertentrySplitPage.isra.0/1(*2): 3: : , 8gistplacetopage, : 
gistPopItupFromNodeBuffer: 3gist_bbox_zorder_abbrev_convert/1(*2)7: : 0, Trying to fix CFI states for each BB after reordering.

: 5: 
9, 
9: : , 1347, , 313
: 38: , Trying to fix CFI states for each BB after reordering.
, 0: , 5
511This is the list of CFI states for each BB of Trying to fix CFI states for each BB after reordering.
, 63, Trying to fix CFI states for each BB after reordering.
3, Trying to fix CFI states for each BB after reordering.
314, 67
, , Trying to fix CFI states for each BB after reordering.

13135This is the list of CFI states for each BB of , 3, 70, Trying to fix CFI states for each BB after reordering.
, , error_multiple_recovery_targets/1(*2)This is the list of CFI states for each BB of 5, , 93This is the list of CFI states for each BB of , 13This is the list of CFI states for each BB of 
, 13, 713Trying to fix CFI states for each BB after reordering.
This is the list of CFI states for each BB of , Trying to fix CFI states for each BB after reordering.
, _start11
0, 
5This is the list of CFI states for each BB of 511: storage_name/1(*2), 6, , IndexOnlyRecheck/1(*2)3, pg_mcv_list_in14Trying to fix CFI states for each BB after reordering.
, 6, , This is the list of CFI states for each BB of brin_minmax_multi_summary_out13This is the list of CFI states for each BB of 13: , , Trying to fix CFI states for each BB after reordering.
7This is the list of CFI states for each BB of Trying to fix CFI states for each BB after reordering.
, mask_unused_space, , 3: 5, 93: 
13: , This is the list of CFI states for each BB of 13, 713EncodeSpecialDate.part.0/1(*2): , anyrange_in, 1110, table_am_handler_inThis is the list of CFI states for each BB of 5: 511, , 0, 6, , 93Trying to fix CFI states for each BB after reordering.
, 314findoprnd.part.0/1(*2), 6, , : 913: 13, 
, 7: gist_box_consistent, 5, 311, 5, 3, , This is the list of CFI states for each BB of 13, , : 13, 7133, , 3, 11Trying to fix CFI states for each BB after reordering.
, 0, , 33: 5, 5, , 06, 93brin_bloom_summary_recv, , 3143, 6, , , 1313, , 13This is the list of CFI states for each BB of 117, , 0, 55, 311, , 3, , : 513, , , 13, 7133, 133, call_weak_fn/1(*2), , 33, , 
5
0Trying to fix CFI states for each BB after reordering.
, 6, , Trying to fix CFI states for each BB after reordering.
133, 6This is the list of CFI states for each BB of , , 5This is the list of CFI states for each BB of , 133lz4_decompress_datum_slice3gistunionsubkey/1(*2)
6, , : , : , 13, 3Trying to fix CFI states for each BB after reordering.
3503, 6
7, This is the list of CFI states for each BB of 
, 1413, , , 53, CreateTupleDescCopyTrying to fix CFI states for each BB after reordering.
0, 11Trying to fix CFI states for each BB after reordering.
5, , 3, 136, 7: 3This is the list of CFI states for each BB of , 5, This is the list of CFI states for each BB of 61314
5, , , 3, 7pg_mcv_list_recv0, 
11, heap_copytuple_with_tuple, 33, , , Trying to fix CFI states for each BB after reordering.
6Trying to fix CFI states for each BB after reordering.
1313, 7, , : , 
5: Trying to fix CFI states for each BB after reordering.
, 6
1314, , 5This is the list of CFI states for each BB of , This is the list of CFI states for each BB of , , 3, 
717, 3, 03513: This is the list of CFI states for each BB of Trying to fix CFI states for each BB after reordering.
7, , , , Trying to fix CFI states for each BB after reordering.
, 3, , 136bbsink_zstd_new59brin_minmax_multi_summary_in133, , Trying to fix CFI states for each BB after reordering.
11, 
5, 21, , , This is the list of CFI states for each BB of table_tuple_fetch_row_version.part.0/2(*2)0671313This is the list of CFI states for each BB of 14
6, , : , , 3: 
3, 7This is the list of CFI states for each BB of 17, 
Trying to fix CFI states for each BB after reordering.
0, 3513, brininsert: , , , , , table_am_handler_out, Trying to fix CFI states for each BB after reordering.
, 13659, 3, Trying to fix CFI states for each BB after reordering.
33, gistSplitByKey, 11This is the list of CFI states for each BB of 
date2timestamp_opt_overflow.part.0/1(*2)Trying to fix CFI states for each BB after reordering.
21, , 5: 30871713, : 14This is the list of CFI states for each BB of 6, , , , 3, 3This is the list of CFI states for each BB of , 7: 17, : Trying to fix CFI states for each BB after reordering.
This is the list of CFI states for each BB of , 313, 13, , , , , 393, check_collation_set.part.0/1(*2)
13659, , 3, anycompatiblerange_in41, 13, 113This is the list of CFI states for each BB of mask_lp_flags21
, 17, 3018713, , 14: , , Trying to fix CFI states for each BB after reordering.

39, 3: 
7, , 17, , findFkeyCast.part.0/1(*2): , 13Trying to fix CFI states for each BB after reordering.
...

@ilinpv
Copy link
Contributor

ilinpv commented Jan 21, 2025

There are three problematic functions for BOLT there: ExecInterpExpr/1, __do_global_dtors_aux/1, init_have_lse_atomics/1

Skipping first two with

llvm-bolt --skip-funcs=ExecInterpExpr/1,__do_global_dtors_aux/1

would create bolted binary successfully.

There is work ongoing that may help avoid using skip-funcs.
Applying llvm-bolt on ExecInterpExpr/1 we would get:

BOLT-ERROR: unable to get new address corresponding to input address 0x2aa158 in function ExecInterpExpr/1(*2). Consider adding this function to --skip-funcs=...

which should be fixed by creating entry points for dynamic relocations addresses #120267.

Functions __do_global_dtors_aux/1, init_have_lse_atomics/1 contain pointer authentication code paciasp/autiasp, presumably runtime library used was build with -mbranch-protection=pac-ret enabled.

Binary Function "init_have_lse_atomics/1(*2)" after disassembly {
  All names   : init_have_lse_atomics/1
                init_have_lse_atomics/lse-init.o/1
  Number      : 220
  State       : disassembled
  Address     : 0xda780
  Size        : 0x2c
  MaxSize     : 0x40
  Offset      : 0xda780
  Section     : .text
  Orc Section : .local.text.init_have_lse_atomics/1
  LSDA        : 0x0
  IsSimple    : 1
  IsMultiEntry: 0
  IsSplit     : 0
  BB Count    : 0
}
.LBB0219:
    00000000:   paciasp
    00000004:   stp     x29, x30, [sp, #-0x10]!
    00000008:   mov     x0, #0x10
    0000000c:   mov     x29, sp
    00000010:   bl      __getauxval@PLT # Offset: 16
    00000014:   ubfx    w0, w0, #8, #1
    00000018:   adrp    x1, "__aarch64_have_lse_atomics/1"
    0000001c:   ldp     x29, x30, [sp], #0x10
    00000020:   autiasp
    00000024:   strb    w0, [x1, :lo12:"__aarch64_have_lse_atomics/1"]
    00000028:   ret # Offset: 40
End of Function "init_have_lse_atomics/1(*2)"

Binary Function "__do_global_dtors_aux/1(*2)" after disassembly {
  All names   : __do_global_dtors_aux/1
                __do_global_dtors_aux/crtstuff.c/1
  Number      : 225
  State       : disassembled
  Address     : 0xda880
  Size        : 0x50
  MaxSize     : 0x50
  Offset      : 0xda880
  Section     : .text
  Orc Section : .local.text.__do_global_dtors_aux/1
  LSDA        : 0x0
  IsSimple    : 1
  IsMultiEntry: 0
  IsSplit     : 0
  BB Count    : 0
}
.LBB0224:
    00000000:   paciasp
    00000004:   stp     x29, x30, [sp, #-0x20]!
    00000008:   mov     x29, sp
    0000000c:   str     x19, [sp, #0x10]
    00000010:   adrp    x19, "completed.0/1"
    00000014:   ldrb    w0, [x19, :lo12:"completed.0/1"]
    00000018:   tbnz    w0, #0x0, .Ltmp180 # Offset: 24
    0000001c:   adrp    x0, __BOLT_got_zero+9105408
    00000020:   ldr     x0, [x0, :lo12:__BOLT_got_zero+1280]
    00000024:   cbz     x0, .Ltmp181 # Offset: 36
    00000028:   adrp    x0, "__dso_handle/1"
    0000002c:   ldr     x0, [x0, :lo12:"__dso_handle/1"]
    00000030:   bl      __cxa_finalize@PLT # Offset: 48
.Ltmp181:
    00000034:   bl      "deregister_tm_clones/1" # Offset: 52
    00000038:   mov     w0, #0x1
    0000003c:   strb    w0, [x19, :lo12:"completed.0/1"]
.Ltmp180:
    00000040:   ldr     x19, [sp, #0x10]
    00000044:   ldp     x29, x30, [sp], #0x20
    00000048:   autiasp
    0000004c:   ret # Offset: 76
End of Function "__do_global_dtors_aux/1(*2)"

That can be fixed by work on pointer authentication support in BOLT #120064. As alternative you can use libgcc.a and crtbegin.o built with pointer authentication disabled ( it is off by default in gcc and llvm ). To check on gcc:

gcc -print-libgcc-file-name
# get something like "/usr/lib/gcc/aarch64-linux-gnu/13/libgcc.a"
objdump -d /usr/lib/gcc/aarch64-linux-gnu/13/libgcc.a > libgcc.s
# check for paciasp/autiasp in <init_have_lse_atomics> code in libgcc.s
gcc -print-file-name=crtbegin.o
# get something like "/usr/lib/gcc/aarch64-linux-gnu/13/crtbegin.o"
objdump -d /usr/lib/gcc/aarch64-linux-gnu/13/crtbegin.o > crtbegin.s
# check for paciasp/autiasp in <__do_global_dtors_aux> code in crtbegin.s

@bgergely0
Copy link

Just a sidenote @salvatoredipietro : I suggest running BOLT with --no-threads when trying to create a readable debug log. The broken up/messy log you posted is because different threads don't coordinate in BOLT when writing to stdout.

@salvatoredipietro
Copy link
Contributor Author

@bgergely0 I will do. Thanks for the sugestion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BOLT crash Prefer [crash-on-valid] or [crash-on-invalid]
Projects
None yet
Development

No branches or pull requests

4 participants