equip.rewriter package

Submodules

equip.rewriter.merger

Responsible for merging two bytecodes at the specified places, as well as making sure the resulting bytecode (and code_object) is properly created.

copyright:
  1. 2014 by Romain Gaucher (@rgaucher)
license:

Apache 2, see LICENSE for more details.

class equip.rewriter.merger.CodeObject(co_origin)[source]

Bases: object

Class responsible for merging two code objects, and generating a new one. This effectively creates the new bytecode that will be executed.

JUMP_OP = [93, 110, 120, 121, 122, 143, 111, 112, 113, 114, 115, 119]
MERGE_BACKLIST = ('co_code', 'co_firstlineno', 'co_name', 'co_filename', 'co_lnotab', 'co_flags', 'co_argcount')

List of fields in the code_object not to merge. We only keep the ones from the original code_object.

add_get_cellvars_freevars(varname)[source]
add_get_constant(const)[source]
add_get_names(name)[source]
add_get_tuple(value, field_name)[source]
add_get_varnames(const)[source]
add_global_name(global_name)[source]

Adds the global_name as a known imported name. The instrument bytecode will get modified to change any LOAD_* to a LOAD_GLOBAL when finding this name.

Parameters:global_name – The imported global name.
append(op, arg, bc_index=-1, lineno=-1)[source]
emit(op, oparg, arg=None, lineno=-1)[source]

Writes the bytecode and lnotab.

get_instruction_size(op, arg=None, bc_index=0)[source]
get_op_oparg(op, arg, bc_index=0)[source]

Retrieve the opcode (op) and its argument (oparg) from the supplied opcode and argument.

Parameters:
  • op – The current opcode.
  • arg – The current dereferenced argument.
  • bc_index – The current bytecode index.
insert(index, op, arg, bc_index=-1, lineno=-1)[source]
static is_jump_op(op)[source]
merge_fields(co_other)[source]

Merges fields from the code_object. The only fields that aren’t merged, are listed in MERGE_BACKLIST.

Parameters:co_other – The other code_object to merge the co_origin with.
prepend(op, arg, bc_index=-1, lineno=-1)[source]
reset_code()[source]
to_code()[source]
class equip.rewriter.merger.Merger[source]

Bases: object

AFTER = 2

Only valid for MethodDeclaration. This specifies that the instrument code should be injected before each return of the method (i.e., before each encountered RETURN_VALUE in the bytecode).

AFTER_IMPORTS = 6

Valid for ModuleDeclaration or MethodDeclaration. This specifies that the instrument code should be injected after the encountered imports.

BEFORE = 1

Only valid for MethodDeclaration. This specifies that the instrument code should be injected before the body.

BEFORE_IMPORTS = 5

Valid for ModuleDeclaration or MethodDeclaration. This specifies that the instrument code should be injected before the encountered imports.

INSTRUCTION = 4

Valid for all Declaration. This specifies that the instrument code should be injected after each instrument.

LINENO = 3

Valid for all Declaration. This specifies that the instrument code should be injected each time the current line number changes.

MODULE_ENTER = 8

Valid for ModuleDeclaration. This specifies that the code should be injected at the beginning of the module.

MODULE_EXIT = 9

Valid for ModuleDeclaration. This specifies that the code should be injected at the end of the module.

RETURN_VALUES = 7

Unused.

UNKNOWN = 0

Error case for the kind of location for the merge.

static already_instrumented(bc_source, bc_input)[source]

Checks if the instrumentation in bc_input is already in bc_source

static build_bytecode_offsets(new_co, bytecode)[source]
static get_final_bytecode(bc_source, bc_input, co_source, co_input, location, ins_lineno, ins_offset=-1)[source]

Computes the final sequences of opcodes and keep old values. It also tracks what sequences come from the instrument code or the original code, so we can resolve jumps.

Parameters:
  • bc_source – The bytecode of the orignal code.
  • bc_input – The instrument bytecode to inject.
  • co_source – The orignal code object.
  • co_input – The instrument code object.
  • location – The location of the instrumentation. It should be either: BEFORE, AFTER, LINENO, etc.
  • ins_lineno – The line number to inject the instrument at. Only valid when the injection location is LINENO.
  • ins_offset – Not used.
static inline_instrument(dst_bytecode, src_bytecode, original_lineno, instr_counter=-1, template=None, location=0)[source]

Inline the instrument bytecode in place of the current state of dst_bytecode.

Parameters:
  • dst_bytecode – The list that contains the final bytecode.
  • src_bytecode – The bytecode of the instrument.
  • original_lineno – The line number from the original bytecode, so we always map the instrument code line numbers to the code being instrumented.
  • instr_counter – A counter to track the frames of the different instrumentation code being inlined. This is used to resolve jump targets.
  • template – An instrumentation can follow a template, if so, the actual template is supplied here. An example is the instrumentation AFTER which requires to capture the return value. Defaults to None.
static merge(co_source, co_input, location=0, ins_lineno=-1, ins_offset=-1, ins_import_names=None)[source]

The merger makes sure that the bytecode is properly inserted where it should be, but also that the consts/names/locals/etc. are re-indexed. We will always append at the end of the current tuples.

We need to first compute the new bytecode resolve the jumps, and then dump it... if we just emit it as right now, we have an issue since we cannot know where an absolute/relative jump will land since some instr code can be inserted in between.

static merge_exit(new_co, bc_source, bc_input, ins_import_names=None)[source]

Special handler for inserting code at the very end of a module.

static resolve_jump_targets(bytecode, new_co)[source]

Resolves targets of jumps. Since we add new bytecode, absolute (resp. relative) jump address (resp. offset) can change and we need to track the changes to find the new targets.

The resolver works in two phases:

  1. Create the list of bytecode indices based on the size of the opcode and its argument.
  2. For each jump opcode, take its argument and resolve it in the same part of the bytecode (e.g., instrument bytecode or original bytecode).
Parameters:
  • bytecode – The structure computed by get_final_bytecode which overlays the final bytecode sequences and its origin.
  • new_co – The currently created CodeObject.
equip.rewriter.merger.RETURN_CANARY_NAME = '_______0x42024_retvalue'

This global name is always injected as a new variable in co_varnames, and used to carry the return values. We essentially add:

STORE_FAST '_______0x42024_retvalue'
... instrument code that can use `{return_value}`
LOAD_FAST  '_______0x42024_retvalue'
RETURN_VALUE

as specified by the RETURN_INSTR_TEMPLATE.

equip.rewriter.merger.RETURN_INSTR_TEMPLATE = ((125, '_______0x42024_retvalue'), (-2, None), (124, '_______0x42024_retvalue'))

The template that dictates how return values are being captured.

equip.rewriter.simple

A simplified interface (yet the main one) to handle the injection of instrumentation code.

copyright:
  1. 2014 by Romain Gaucher (@rgaucher)
license:

Apache 2, see LICENSE for more details.

class equip.rewriter.simple.SimpleRewriter(decl)[source]

Bases: object

The current main rewriter that works for one Declaration object. Using this rewriter will modify the given declaration object by possibly replacing all of its associated code object.

KNOWN_FIELDS = ('method_name', 'lineno', 'file_name', 'class_name', 'arg0', 'arg1', 'arg2', 'arg3', 'arg4', 'arg5', 'arg6', 'arg7', 'arg8', 'arg9', 'arg10', 'arg11', 'arg12', 'arg13', 'arg14', 'arguments', 'return_value')

List of the parameters that can be used for formatting the code to inject. The values are:

  • method_name: The name of the method that is being called.

  • lineno: The start line number of the declaration object being

    instrumented.

  • file_name: The file name of the current module.

  • class_name: The name of the class a method belongs to.

static format_code(decl, python_code, location)[source]

Formats the supplied python_code with format string, and values listed in KNOWN_FIELDS.

Parameters:
  • decl – The declaration object (e.g., MethodDeclaration, TypeDeclaration, etc.).
  • python_code – The python code to format.
  • location – The kind of insertion to perform (e.g., Merger.BEFORE).
static get_code_object(python_code)[source]

Actually compiles the supplied code and return the code_object to be merged with the source code_object.

Parameters:python_code – The python code to compile.
static get_formatting_values(decl, location)[source]

Retrieves the dynamic values to be added in the format string. All values are statically computed, but formal parameters (of methods) are passed by name so it is possible to dereference them in the inserted code (same for the return value).

Parameters:
  • decl – The declaration object.
  • location – The kind of insertion to perform (e.g., Merger.BEFORE).
static indent(original_code, indent_level=0)[source]

Lousy helper that indents the supplied python code, so that it will fit under an if statement.

insert_after(python_code)[source]

Insert code at each RETURN_VALUE opcode. See insert_before.

insert_before(python_code)[source]

Insert code at the beginning of the method’s body.

The submitted code can be formatted using fields declared in KNOWN_FIELDS. Since string.format is used once the values are dumped, the injected code should be property structured.

Parameters:python_code – The python code to be formatted, compiled, and inserted at the beginning of the method body.
insert_enter_code(python_code, import_code=None)[source]

Insert generic code at the beginning of the module. The code is wrapped in a if __name__ == '__main__' statement.

Parameters:
  • python_code – The python code to compile and inject.
  • import_code – The import statements, if any, to add before the insertion of python_code. Defaults to None.
insert_enter_exit_code(python_code, import_code=None, location=9)[source]
insert_exit_code(python_code, import_code=None)[source]

Insert generic code at the end of the module. The code is wrapped in a if __name__ == '__main__' statement.

Parameters:
  • python_code – The python code to compile and inject.
  • import_code – The import statements, if any, to add before the insertion of python_code. Defaults to None.
insert_generic(python_code, location=0, ins_lineno=-1, ins_offset=-1, ins_module=False, ins_import=False)[source]

Generic code injection utils. It first formats the supplied python_code, compiles it to get the code_object, and merge this new code_object with the one of the current declaration object (decl). The insertion is done by the Merger.

When the injection is done, this method will go and recursively update all references to the old code_object in the parents (when a parent changes, it is as well updated and its new code_object propagated upwards). This process is required as Python’s code objects are nested in parent’s code objects, and they are all read-only. This process breaks any references that were hold on previously used code objects (e.g., don’t do that when the instrumented code is running).

Parameters:
  • python_code – The code to be formatted and inserted.
  • location – The kind of insertion to perform.
  • ins_lineno – When an insertion should occur at one given line of code, use this parameter. Defaults to -1.
  • ins_offset – When an insertion should occur at one given bytecode offset, use this parameter. Defaults to -1.
  • ins_module – Specify the code insertion should happen in the module itself and not the current declaration.
  • ins_import – True of the method is called for inserting an import statement.
insert_import(import_code, module_import=True)[source]

Insert an import statement in the current bytecode. The import is added in front of every other imports.

inspect_all_globals()[source]

Module contents

equip.rewriter

Utilities to merge and rewrite the bytecode.

copyright:
  1. 2014 by Romain Gaucher (@rgaucher)
license:

Apache 2, see LICENSE for more details.