-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[lldb] Delay creation of ConstStrings when parsing MachO symtab #155931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This change is motivated by performance. Creating ConstStrings can be expensive and is a non-trivial amount of ObjectFileMachO::ParseSymtab. This patch reduces the number of ConstStrings created. LLDB relies on duplicate symbol information being merged into one Symbol object. It does this by merging duplicate symbols together by name as they're created. However, before it performs the merge, it will create a ConstString for the symbol name. If this step is delayed until after symbols are merged, the number of ConstStrings created can be reduced considerably. I measured the performance difference by profiling a Release build of LLDB as it attaches to a Debug build of LLDB. This reduced the total time of ObjectFileMachO::ParseSymtab by 100-150ms.
@llvm/pr-subscribers-lldb Author: Alex Langford (bulbazord) ChangesThis change is motivated by performance. Creating ConstStrings can be expensive and is a non-trivial amount of ObjectFileMachO::ParseSymtab. This patch reduces the number of ConstStrings created. LLDB relies on duplicate symbol information being merged into one Symbol object. It does this by merging duplicate symbols together by name as they're created. However, before it performs the merge, it will create a ConstString for the symbol name. If this step is delayed until after symbols are merged, the number of ConstStrings created can be reduced considerably. I measured the performance difference by profiling a Release build of LLDB as it attaches to a Debug build of LLDB. This reduced the total time of ObjectFileMachO::ParseSymtab by 100-150ms. Full diff: https://github.com/llvm/llvm-project/pull/155931.diff 1 Files Affected:
diff --git a/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp b/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp
index 924e34053d411..40b796193e70f 100644
--- a/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp
+++ b/lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp
@@ -2584,7 +2584,7 @@ void ObjectFileMachO::ParseSymtab(Symtab &symtab) {
std::vector<uint32_t> N_COMM_indexes;
typedef std::multimap<uint64_t, uint32_t> ValueToSymbolIndexMap;
typedef llvm::DenseMap<uint32_t, uint32_t> NListIndexToSymbolIndexMap;
- typedef llvm::DenseMap<const char *, uint32_t> ConstNameToSymbolIndexMap;
+ typedef llvm::StringMap<uint32_t> ConstNameToSymbolIndexMap;
ValueToSymbolIndexMap N_FUN_addr_to_sym_idx;
ValueToSymbolIndexMap N_STSYM_addr_to_sym_idx;
ConstNameToSymbolIndexMap N_GSYM_name_to_sym_idx;
@@ -4210,37 +4210,20 @@ void ObjectFileMachO::ParseSymtab(Symtab &symtab) {
uint64_t symbol_value = nlist.n_value;
- if (symbol_name_non_abi_mangled) {
- sym[sym_idx].GetMangled().SetMangledName(
- ConstString(symbol_name_non_abi_mangled));
- sym[sym_idx].GetMangled().SetDemangledName(ConstString(symbol_name));
- } else {
-
- if (symbol_name && symbol_name[0] == '_') {
- symbol_name++; // Skip the leading underscore
- }
-
- if (symbol_name) {
- ConstString const_symbol_name(symbol_name);
- sym[sym_idx].GetMangled().SetValue(const_symbol_name);
- }
- }
-
- if (is_gsym) {
- const char *gsym_name = sym[sym_idx]
- .GetMangled()
- .GetName(Mangled::ePreferMangled)
- .GetCString();
- if (gsym_name)
- N_GSYM_name_to_sym_idx[gsym_name] = sym_idx;
- }
-
if (symbol_section) {
const addr_t section_file_addr = symbol_section->GetFileAddress();
symbol_value -= section_file_addr;
}
+ if (!symbol_name_non_abi_mangled && symbol_name &&
+ symbol_name[0] == '_') {
+ symbol_name++; // Skip the leading underscore
+ }
+
if (!is_debug) {
+ const char *name_to_compare = symbol_name_non_abi_mangled
+ ? symbol_name_non_abi_mangled
+ : symbol_name;
if (type == eSymbolTypeCode) {
// See if we can find a N_FUN entry for any code symbols. If we do
// find a match, and the name matches, then we can merge the two into
@@ -4253,7 +4236,7 @@ void ObjectFileMachO::ParseSymtab(Symtab &symtab) {
if (range.first != range.second) {
for (ValueToSymbolIndexMap::const_iterator pos = range.first;
pos != range.second; ++pos) {
- if (sym[sym_idx].GetMangled().GetName(Mangled::ePreferMangled) ==
+ if (llvm::StringRef(name_to_compare) ==
sym[pos->second].GetMangled().GetName(
Mangled::ePreferMangled)) {
m_nlist_idx_to_sym_idx[nlist_idx] = pos->second;
@@ -4288,7 +4271,7 @@ void ObjectFileMachO::ParseSymtab(Symtab &symtab) {
if (range.first != range.second) {
for (ValueToSymbolIndexMap::const_iterator pos = range.first;
pos != range.second; ++pos) {
- if (sym[sym_idx].GetMangled().GetName(Mangled::ePreferMangled) ==
+ if (llvm::StringRef(name_to_compare) ==
sym[pos->second].GetMangled().GetName(
Mangled::ePreferMangled)) {
m_nlist_idx_to_sym_idx[nlist_idx] = pos->second;
@@ -4303,13 +4286,9 @@ void ObjectFileMachO::ParseSymtab(Symtab &symtab) {
}
} else {
// Combine N_GSYM stab entries with the non stab symbol.
- const char *gsym_name = sym[sym_idx]
- .GetMangled()
- .GetName(Mangled::ePreferMangled)
- .GetCString();
- if (gsym_name) {
+ if (name_to_compare) {
ConstNameToSymbolIndexMap::const_iterator pos =
- N_GSYM_name_to_sym_idx.find(gsym_name);
+ N_GSYM_name_to_sym_idx.find(name_to_compare);
if (pos != N_GSYM_name_to_sym_idx.end()) {
const uint32_t GSYM_sym_idx = pos->second;
m_nlist_idx_to_sym_idx[nlist_idx] = GSYM_sym_idx;
@@ -4331,6 +4310,24 @@ void ObjectFileMachO::ParseSymtab(Symtab &symtab) {
}
}
+ if (symbol_name_non_abi_mangled) {
+ sym[sym_idx].GetMangled().SetMangledName(
+ ConstString(symbol_name_non_abi_mangled));
+ sym[sym_idx].GetMangled().SetDemangledName(ConstString(symbol_name));
+ } else if (symbol_name) {
+ ConstString const_symbol_name(symbol_name);
+ sym[sym_idx].GetMangled().SetValue(const_symbol_name);
+ }
+
+ if (is_gsym) {
+ const char *gsym_name = sym[sym_idx]
+ .GetMangled()
+ .GetName(Mangled::ePreferMangled)
+ .GetCString();
+ if (gsym_name)
+ N_GSYM_name_to_sym_idx[gsym_name] = sym_idx;
+ }
+
sym[sym_idx].SetID(nlist_idx);
sym[sym_idx].SetType(type);
if (set_value) {
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ConstStrings are fast when comparing existing ones. But creating them is expensive because the string pool is global. Here we're parsing strings, so we always have to do a create + lookup. By using a StringMap, we can do a cheaper (non global lookup) and only do the more expensive creation after we've eliminated the duplicates.
LGTM!
This change looks good to me. I might not have read it sufficiently closely, but the duplicate symbols we're looking at here are only when there are debug map entries for .o file debugging, maybe? Could you please update the code that's duplicate circa line 2800 too. Otherwise Future Us will need to puzzle out why the code is different between these two copies. |
This change is motivated by performance. Creating ConstStrings can be expensive and is a non-trivial amount of ObjectFileMachO::ParseSymtab. This patch reduces the number of ConstStrings created.
LLDB relies on duplicate symbol information being merged into one Symbol object. It does this by merging duplicate symbols together by name as they're created. However, before it performs the merge, it will create a ConstString for the symbol name. If this step is delayed until after symbols are merged, the number of ConstStrings created can be reduced considerably.
I measured the performance difference by profiling a Release build of LLDB as it attaches to a Debug build of LLDB. This reduced the total time of ObjectFileMachO::ParseSymtab by 100-150ms.