Mypyc Trace Logging

Overview

Mypyc can optionally include logging in the generated code that produces a sampled trace of various events or operations that happen during execution. For example, you can use this to find code locations where instances of a specific class are constructed, or identify the most commonly called method of a class.

The trace logs are line-based text files that use : as the field separator. They are are easy to analyze using ad-hoc Python scripts, Unix command-line tools or an SQL database, for example.

Collecting a trace log

You will first need to compile the target code using trace logging enabled. When the code is executed, it will write the trace log into mypyc_trace.txt. If you use the mypyc command-line tool to compile your code, define the MYPYC_TRACE_LOG environment variable with value 1:

MYPYC_LOG_TRACE=1 mypyc <...>

If you use mypycify(), pass log_trace=True as an extra argument.

If you are analyzing mypy, there is a helper script misc/log_trace_heck.py in the mypy GitHub repository that can compile mypy with trace log enabled and either perform a self check or type check a code fragment (e.g. -c "import some_module").

Format of the log

Here is an example log output (details may change over time and this may not 100% reflect what you will see):

mypy.semanal.SemanticAnalyzer.flatten_lvalues::primitive_op:list_get_item_unsafe
mypy.binder.ConditionalTypeBinder._get::primitive_op:int_gt
mypy.semanal.SemanticAnalyzer.is_func_scope:7094:call_c:CPyList_GetItemShort
mypy.typetraverser.TypeTraverserVisitor.traverse_type_tuple::primitive_op:var_object_size
mypy.typeops.try_contracting_literals_in_union:1080:call_c:PyIter_Next
mypy.renaming.LimitedVariableRenameVisitor.visit_name_expr:544:call_c:PySequence_Contains
mypy.util.split_module_names:80:call_c:CPyList_GetItemShort
mypy.expandtype.ExpandTypeVisitor.visit_instance::primitive_op:int_eq
mypy.semanal.SemanticAnalyzer.process_type_annotation:3823:call_c:CPyList_GetItemShort

Let's look at a few log items in detail. First, this is a call to a native method:

mypy.types.Type.__init__:251:call:mypy.nodes.Context.__init__

The event was logged in mypy.types.Type._init__ on line 251 (based on the first two fields). The type of the event is call (a native call), and the called function was mypy.nodes.Context.__init__.

This event is a list get item operation:

mypy.semanal.SemanticAnalyzer.is_func_scope:7094:call_c:CPyList_GetItemShort

The event type call_c indicates a call of a primitive C function or a Python C API function. Python C API functions have the prefix Py while mypyc primitive functions have the prefix CPy.

This event indicates reading the object size of a variable-length object:

mypy.typetraverser.TypeTraverserVisitor.traverse_type_tuple::primitive_op:var_object_size

It's a low-level primitive operation (primitive_op) and usually isn't very interesting, since these operations are few fast.

This is another very fast operation (integer equality):

mypy.binder.ConditionalTypeBinder._get::primitive_op:int_gt

Notable C primitives and their semantics

In section we will go through some common C primitives or C API functions that are useful to recognize.

These are optimized primitives specialized based on the static types of expressions:

CPyList_GetItemShort: Get list item (a[i])
CPyList_GetItem: Get list item (another variant)
PyList_New: Create a list
PyList_Append: Append to a list (a.append(x))
PyList_Check: Equivalent to isinstance(x, list)
PyDict_New: Create a dict
PyDict_Contains: Dict contains (x in d)
CPyDict_GetItem: Get dict item (d[x])
CPyDict_GetWithNone: Get dict item (another variant)
CPyDict_SetItem: Set dict item (d[x] = y)
CPyStr_Equal: String equality
CPyTagged_Add: Add two integers
PySequence_Contains: x in <sequence>
PySequence_Tuple: Equivalent to tuple(x)
PyUnicode_Contains: Substring check
PyUnicode_Concat: Concatenate strings
PySet_New: Create a set
PySet_Add: Add an item to a set
PySet_Contains: Set contains (x in s)

These are slower, generic operations that work for arbitrary objects (including with Any types):

PyObject_GetIter: Equivalent to iter(x)
PyIter_Next: Equivalent to next(it)
PyObject_IsTrue: Is an object considered to be true
PyObject_RichCompare: Compare two objects
PyObject_Vectorcall: Call an object (much slower than optimized native calls)
PyObject_IsInstance: Equivalent to isinstance(x, t)
CPyObject_Hash: Equivalent to hash(x)
CPyObject_Size: Equivalent to len(x)
CPyObject_GetAttr3: Look up an attribute of an object the slow way
CPyObject_GetAttr: Another way to look up an attribute
PyObject_Not: Boolean not operation (not x)
CPyGen_SetStopIterationValue: Raise StopIteration with a value

These are internal operations that don't directly map to Python code:

CPy_NoErrOccurred: Check if an exception was raised by an operation or function call

Analyzing data using Unix command line tools

You can analyze data easily using standard Unix command line tools, including cut, sort, shuf, uniq and wc. We also use ripgrep (rg) to search for substrings and regular expressions, but grep also works.

Example: Commonly called primitives

Display the most commonly called C primitives through the call_c event:

$ rg :call_c: mypyc_trace.txt | cut -d':' -f4 | sort | uniq -c | sort -n
...
   4970 PyIter_Next
   4987 PyDict_Contains
   5074 PyList_Append
   7470 CPyStr_Equal
   7623 PyList_New
   7987 PySet_Contains
   8489 CPyList_GetItemShort

We are doing lots of PyIter_Next operations, which are generic and thus somewhat slow. Maybe we can do better? The second example digs deeper into this.

Example: Where does `PyIter_Next` get called?

Display a random sample of 15 events that call PyIter_Next and sort by function:

$ rg PyIter_Next mypyc_trace.txt | shuf | tail -15 | sort
mypy.binder.ConditionalTypeBinder.update_from_options:233:call_c:PyIter_Next
mypy.binder.ConditionalTypeBinder.update_from_options:233:call_c:PyIter_Next
mypy.graph_utils.prepare_sccs:69:call_c:PyIter_Next
mypy.indirection.TypeIndirectionVisitor.find_modules:35:call_c:PyIter_Next
mypy.indirection.TypeIndirectionVisitor.find_modules:35:call_c:PyIter_Next
mypy.semanal_main.semantic_analyze_target:409:call_c:PyIter_Next
mypy.solve.solve_constraints:61:call_c:PyIter_Next
mypy.solve.solve_one:266:call_c:PyIter_Next
mypy.types.CallableType.__init__:1887:call_c:PyIter_Next
mypy.types.CallableType.__init__:1887:call_c:PyIter_Next
mypy.types.CallableType.__init__:1887:call_c:PyIter_Next
mypy.types.flatten_nested_tuples:3752:call_c:PyIter_Next
mypy.type_visitor.TypeQuery.query_types:452:call_c:PyIter_Next
mypy.type_visitor.TypeTranslator.translate_types:312:call_c:PyIter_Next
mypy.type_visitor.TypeTranslator.translate_types:312:call_c:PyIter_Next

Group by function where call happens:

$ rg PyIter_Next mypyc_trace.txt | cut -d':' -f1 | sort | uniq -c | sort -n
...
    108 mypy.type_visitor.TypeQuery.query_types
    112 mypy.fastparse.ASTConverter.translate_stmt_list
    115 mypy.typeops.FreezeTypeVarsVisitor.visit_callable_type
    120 mypy.binder.ConditionalTypeBinder.update_from_options
    190 mypy.messages.MessageBuilder.incompatible_argument
    198 mypy.fastparse.ASTConverter.translate_opt_expr_list
    279 mypy.expandtype.ExpandTypeVisitor.expand_types
    351 mypy.indirection.TypeIndirectionVisitor.find_modules
    396 mypy.type_visitor.TypeTranslator.translate_types
   1252 mypy.types.CallableType.__init__

It looks like CallableType.__init__ does a lot of potentially slow generic iteration. On which source lines does this happen?

$ rg PyIter_Next mypyc_trace.txt | rg CallableType.__init__ | cut -d':' -f2 | sort | uniq -c | sort -n
   1252 1887

Everything happens on line 1887.

Uh oh!

Mypyc Trace Logging

Overview

Collecting a trace log

Format of the log

Notable C primitives and their semantics

Analyzing data using Unix command line tools

Example: Commonly called primitives

Example: Where does PyIter_Next get called?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Example: Where does `PyIter_Next` get called?