characterizations = {
"interpreter": [
"2to3",
"aiohttp",
"chameleon",
"chaos",
"comprehensions",
"coroutines",
"coverage",
"crypto_pyaes",
"dask",
"deepcopy",
"deltablue",
"django_template",
"djangocms",
"docutils",
"dulwich_log",
"fannkuch",
"float",
"generators",
"genshi",
"go",
"gunicorn",
"hexiom",
"html5lib",
"logging",
"mako",
"mypy2",
"nbody",
"nqueens",
"pickle_pure_python",
"pprint",
"pycparser",
"pyflate",
"pylint",
"raytrace",
"regex_compile",
"richards",
"richards_super",
"scimark",
"spectral_norm",
"sqlglot",
"sqlglot_optimize",
"sqlglot_parse",
"sqlglot_transpile",
"sympy",
"thrift",
"tomli_loads",
"tornado_http",
"typing_runtime_protocols",
"unpack_sequence",
"unpickle_pure_python",
"xml_etree",
],
"memory": [
"async_generators",
"json_dumps",
"python_startup",
"python_startup_no_site",
"unpickle_list",
],
"gc": [
"async_tree",
"async_tree_cpu_io_mixed",
"async_tree_cpu_io_mixed_tg",
"async_tree_io",
"async_tree_io_tg",
"async_tree_memoization",
"async_tree_memoization_tg",
"async_tree_tg",
"gc_collect",
"gc_traversal",
],
"kernel": ["asyncio_tcp", "concurrent_imap", "pathlib"],
"libc": ["asyncio_tcp_ssl"],
"library": [
"asyncio_websockets",
"json",
"json_loads",
"pickle",
"pickle_dict",
"pickle_list",
"regex_dna",
"regex_effbot",
"regex_v8",
"sqlite_synth",
"telco",
],
"tuple": ["mdp"],
"miscobj": ["meteor_contest"],
"int": ["pidigits"],
"str": ["unpickle"],
}
By "characterization", I mean what category of functions dominate the runtime of each benchmark.
If we organize them by the top category in each benchmark, we get the following:
Benchmark by top profiling category
If you refine this to only include a benchmark in a category if that category represents more than 50% of the runtime:
Benchmarks that are heavily (over 50%) in a particular category
Interestingly, this doesn't seem to reveal too much related to profiling. (Admittedly, the only category where we would expect significant change is "interpreter"). The following results are for JIT (main) vs. Tier 1 (same commit), HPT at the 99th percentile:
Using only benchmarks where 50% of time is in a single category: