Skip to content

Conversation

bulbazord
Copy link
Member

This change is motivated by performance. Creating ConstStrings can be expensive and is a non-trivial amount of ObjectFileMachO::ParseSymtab. This patch reduces the number of ConstStrings created.

LLDB relies on duplicate symbol information being merged into one Symbol object. It does this by merging duplicate symbols together by name as they're created. However, before it performs the merge, it will create a ConstString for the symbol name. If this step is delayed until after symbols are merged, the number of ConstStrings created can be reduced considerably.

I measured the performance difference by profiling a Release build of LLDB as it attaches to a Debug build of LLDB. This reduced the total time of ObjectFileMachO::ParseSymtab by 100-150ms.

This change is motivated by performance. Creating ConstStrings can be
expensive and is a non-trivial amount of ObjectFileMachO::ParseSymtab.
This patch reduces the number of ConstStrings created.

LLDB relies on duplicate symbol information being merged into one Symbol
object. It does this by merging duplicate symbols together by name as
they're created. However, before it performs the merge, it will create a
ConstString for the symbol name. If this step is delayed until after
symbols are merged, the number of ConstStrings created can be reduced
considerably.

I measured the performance difference by profiling a Release build of
LLDB as it attaches to a Debug build of LLDB. This reduced the total
time of ObjectFileMachO::ParseSymtab by 100-150ms.
@llvmbot
Copy link
Member

llvmbot commented Aug 28, 2025

@llvm/pr-subscribers-lldb

Author: Alex Langford (bulbazord)

Changes

This change is motivated by performance. Creating ConstStrings can be expensive and is a non-trivial amount of ObjectFileMachO::ParseSymtab. This patch reduces the number of ConstStrings created.

LLDB relies on duplicate symbol information being merged into one Symbol object. It does this by merging duplicate symbols together by name as they're created. However, before it performs the merge, it will create a ConstString for the symbol name. If this step is delayed until after symbols are merged, the number of ConstStrings created can be reduced considerably.

I measured the performance difference by profiling a Release build of LLDB as it attaches to a Debug build of LLDB. This reduced the total time of ObjectFileMachO::ParseSymtab by 100-150ms.


Full diff: https://github.com/llvm/llvm-project/pull/155931.diff

1 Files Affected:

  • (modified) lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp (+31-34)
diff --git a/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp b/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp
index 924e34053d411..40b796193e70f 100644
--- a/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp
+++ b/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp
@@ -2584,7 +2584,7 @@ void ObjectFileMachO::ParseSymtab(Symtab &symtab) {
   std::vector<uint32_t> N_COMM_indexes;
   typedef std::multimap<uint64_t, uint32_t> ValueToSymbolIndexMap;
   typedef llvm::DenseMap<uint32_t, uint32_t> NListIndexToSymbolIndexMap;
-  typedef llvm::DenseMap<const char *, uint32_t> ConstNameToSymbolIndexMap;
+  typedef llvm::StringMap<uint32_t> ConstNameToSymbolIndexMap;
   ValueToSymbolIndexMap N_FUN_addr_to_sym_idx;
   ValueToSymbolIndexMap N_STSYM_addr_to_sym_idx;
   ConstNameToSymbolIndexMap N_GSYM_name_to_sym_idx;
@@ -4210,37 +4210,20 @@ void ObjectFileMachO::ParseSymtab(Symtab &symtab) {
 
       uint64_t symbol_value = nlist.n_value;
 
-      if (symbol_name_non_abi_mangled) {
-        sym[sym_idx].GetMangled().SetMangledName(
-            ConstString(symbol_name_non_abi_mangled));
-        sym[sym_idx].GetMangled().SetDemangledName(ConstString(symbol_name));
-      } else {
-
-        if (symbol_name && symbol_name[0] == '_') {
-          symbol_name++; // Skip the leading underscore
-        }
-
-        if (symbol_name) {
-          ConstString const_symbol_name(symbol_name);
-          sym[sym_idx].GetMangled().SetValue(const_symbol_name);
-        }
-      }
-
-      if (is_gsym) {
-        const char *gsym_name = sym[sym_idx]
-                                    .GetMangled()
-                                    .GetName(Mangled::ePreferMangled)
-                                    .GetCString();
-        if (gsym_name)
-          N_GSYM_name_to_sym_idx[gsym_name] = sym_idx;
-      }
-
       if (symbol_section) {
         const addr_t section_file_addr = symbol_section->GetFileAddress();
         symbol_value -= section_file_addr;
       }
 
+      if (!symbol_name_non_abi_mangled && symbol_name &&
+          symbol_name[0] == '_') {
+        symbol_name++; // Skip the leading underscore
+      }
+
       if (!is_debug) {
+        const char *name_to_compare = symbol_name_non_abi_mangled
+                                          ? symbol_name_non_abi_mangled
+                                          : symbol_name;
         if (type == eSymbolTypeCode) {
           // See if we can find a N_FUN entry for any code symbols. If we do
           // find a match, and the name matches, then we can merge the two into
@@ -4253,7 +4236,7 @@ void ObjectFileMachO::ParseSymtab(Symtab &symtab) {
           if (range.first != range.second) {
             for (ValueToSymbolIndexMap::const_iterator pos = range.first;
                  pos != range.second; ++pos) {
-              if (sym[sym_idx].GetMangled().GetName(Mangled::ePreferMangled) ==
+              if (llvm::StringRef(name_to_compare) ==
                   sym[pos->second].GetMangled().GetName(
                       Mangled::ePreferMangled)) {
                 m_nlist_idx_to_sym_idx[nlist_idx] = pos->second;
@@ -4288,7 +4271,7 @@ void ObjectFileMachO::ParseSymtab(Symtab &symtab) {
           if (range.first != range.second) {
             for (ValueToSymbolIndexMap::const_iterator pos = range.first;
                  pos != range.second; ++pos) {
-              if (sym[sym_idx].GetMangled().GetName(Mangled::ePreferMangled) ==
+              if (llvm::StringRef(name_to_compare) ==
                   sym[pos->second].GetMangled().GetName(
                       Mangled::ePreferMangled)) {
                 m_nlist_idx_to_sym_idx[nlist_idx] = pos->second;
@@ -4303,13 +4286,9 @@ void ObjectFileMachO::ParseSymtab(Symtab &symtab) {
             }
           } else {
             // Combine N_GSYM stab entries with the non stab symbol.
-            const char *gsym_name = sym[sym_idx]
-                                        .GetMangled()
-                                        .GetName(Mangled::ePreferMangled)
-                                        .GetCString();
-            if (gsym_name) {
+            if (name_to_compare) {
               ConstNameToSymbolIndexMap::const_iterator pos =
-                  N_GSYM_name_to_sym_idx.find(gsym_name);
+                  N_GSYM_name_to_sym_idx.find(name_to_compare);
               if (pos != N_GSYM_name_to_sym_idx.end()) {
                 const uint32_t GSYM_sym_idx = pos->second;
                 m_nlist_idx_to_sym_idx[nlist_idx] = GSYM_sym_idx;
@@ -4331,6 +4310,24 @@ void ObjectFileMachO::ParseSymtab(Symtab &symtab) {
         }
       }
 
+      if (symbol_name_non_abi_mangled) {
+        sym[sym_idx].GetMangled().SetMangledName(
+            ConstString(symbol_name_non_abi_mangled));
+        sym[sym_idx].GetMangled().SetDemangledName(ConstString(symbol_name));
+      } else if (symbol_name) {
+        ConstString const_symbol_name(symbol_name);
+        sym[sym_idx].GetMangled().SetValue(const_symbol_name);
+      }
+
+      if (is_gsym) {
+        const char *gsym_name = sym[sym_idx]
+                                    .GetMangled()
+                                    .GetName(Mangled::ePreferMangled)
+                                    .GetCString();
+        if (gsym_name)
+          N_GSYM_name_to_sym_idx[gsym_name] = sym_idx;
+      }
+
       sym[sym_idx].SetID(nlist_idx);
       sym[sym_idx].SetType(type);
       if (set_value) {

Copy link
Member

@JDevlieghere JDevlieghere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConstStrings are fast when comparing existing ones. But creating them is expensive because the string pool is global. Here we're parsing strings, so we always have to do a create + lookup. By using a StringMap, we can do a cheaper (non global lookup) and only do the more expensive creation after we've eliminated the duplicates.

LGTM!

@jasonmolenda
Copy link
Collaborator

This change looks good to me. I might not have read it sufficiently closely, but the duplicate symbols we're looking at here are only when there are debug map entries for .o file debugging, maybe?

Could you please update the code that's duplicate circa line 2800 too. Otherwise Future Us will need to puzzle out why the code is different between these two copies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants