Pin is a tool for the instrumentation of programs. It supports Linux, Windows, and MacOs executables for Intel (R) IA-32, Intel64, and Itanium (R) processors.
Pin allows a tool to insert arbitrary code (written in C or C++) in arbitrary places in the executable. The code is added dynamically while the executable is running. This also makes it possible to attach Pin to an already running process.
Pin provides a rich API that abstracts away the underlying instruction set idiosyncracies and allows context information such as register contents to be passed to the injected code as parameters. Pin automatically saves and restores the registers that are overwritten by the injected code so the application continues to work. Limited access to symbol and debug information is available as well.
Pin includes the source code for a large number of example instrumentation tools like basic block profilers, cache simulators, instruction trace generators, etc. It is easy to derive new tools using the examples as a template.
The best way to think about Pin is as a "just in time" (JIT) compiler. The input to this compiler is not bytecode, however, but a regular executable. Pin intercepts the execution of the first instruction of the executable and generates ("compiles") new code for the straight line code sequence starting at this instruction. It then transfers control to the generated sequence. The generated code sequence is almost identical to the original one, but Pin ensures that it regains control when a branch exits the sequence. After regaining control, Pin generates more code for the branch target and continues execution. Pin makes this efficient by keeping all of the generated code in memory so it can be reused and directly branching from one sequence to another.
The only code ever executed is the generated code. The original code is only used for reference. When generating code, Pin gives the user an opportunity to inject their own code (instrumentation).
These two components are instrumentation and analysis code. Both components live in a single executable, a Pintool. Pintools can be thought of as plugins that can modify the code generation process inside Pin.
The Pintool registers callback routines with Pin that are called from Pin whenever new code needs to be generated. This routine represents the instrumentation component. It inspects the code to be generated, investigates its static properties, and decides if and where to inject calls to analysis code. Those calls can target arbitrary functions inside the Pintool. Pin makes sure that register state is saved and restored as necessary and allow arguments to be passed to the functions.
Since a Pintool works like a plugin, it must run in the same address space as Pin and the executable to be instrumented. Hence the Pintool has access to all of the executable's data. It also shares file descriptors and other process information with the executable.
Pin and the Pintool control a program starting with the very first instruction. For executables compiled with shared libraries this implies that the execution of the dynamic loader and all shared libraries will be visible to the Pintool.
When writing tools, it is more important to tune the analysis code than the instrumentation code. This is because the instrumentation is executed once, but analysis code is called many times.
As described above, Pin's instrumentation is "just in time" (JIT). Instrumentation occurs immediately before a code sequence is executed for the first time. We call this mode of operation trace instrumentation .
Trace instrumentation lets the Pintool inspect and instrument an executable one trace at a time. Traces usually begin at the target of a taken branch and end with an unconditional branch, including calls and returns. Pin guarantees that a trace is only entered at the top, but it may contain multiple exits. If a branch joins the middle of a trace, Pin constructs a new trace that begins with the branch target. Pin breaks the trace into basic blocks, BBLs. A BBL is a single entrance, single exit sequence of instructions. Branches to the middle of a bbl begin a new trace and hence a new BBL. It is often possible to insert a single analysis call for a BBL, instead of one analysis call for every instruction. Reducing the number of analysis calls makes instrumentation more efficient. Trace instrumentation utilizes the TRACE_AddInstrumentFunction API call.
As a convenience for Pintool writers, Pin also offers an instruction instrumentation mode which lets the tool inspect and instrument an executable a single instruction at a time. This is essentially identical to trace instrumentation where the Pintool writer has been freed from the responsibilty of iterating over the instructions inside a trace. As decribed under trace instrumentation, certain BBLs and the instructions inside of them may be generated (and hence instrumented) multiple times. Instruction instrumentation utilizes the INS_AddInstrumentFunction API call.
Sometimes, however, it can be useful to look at different granularity than a trace. For this purpose Pin offers two additional modes: image and routine instrumentation. These modes are implemented by "caching" instrumentation requests and hence incur a space overhead.
Image instrumentation lets the Pintool inspect and instrument an entire image, IMG, when it is first loaded. A Pintool can walk the sections, SEC, of the image, the routines, RTN, of a section, and the instructions, INS of a routine. Instrumentation can be inserted so that it is executed before or after a routine is executed, or before or after an instruction is executed. Image instrumentation utilizes the IMG_AddInstrumentFunction API call. Image instrumentation depends on symbol information to determine routine boundaries hence PIN_InitSymbols must be called before PIN_Init.
Routine instrumentation lets the Pintool inspect and instrument an entire routine before the first time it is called. A Pintool can walk the instructions of a routine. There is not enough information available to break the instructions into BBLs. Instrumentation can be inserted so that it is executed before or after a routine is executed, or before or after an instruction is executed. Routine instrumentation can be more efficient than image instrumentation in space and time when the only a small number of the routines in an image are executed. Routine instrumentation utilizes the RTN_AddInstrumentFunction API call. Instrumentation of routine exits does not work reliably in the presence of tail calls or when return instructions cannot reliably be detected.
To illustrate how to write Pintools, we present some simple examples. In the web based version of the manual, you can click on a function in the Pin API to see its documentation.
The example below instruments a program to count the total number of instructions executed. It inserts a call to docount before every instruction. When the program exits, it prints the count to stderr .
Here is how to run it and the output:
$ pin -t inscount0 -- /bin/ls Makefile atrace.o imageload.out itrace proccount Makefile.example imageload inscount0 itrace.o proccount.o atrace imageload.o inscount0.o itrace.out Count 422838 $
The example can be found in ManualExamples/inscount0.cpp
In the previous example, we did not pass any arguments to docount , the analysis procedure. In this example, we show how to pass arguments. When calling an analysis procedure, Pin allows you to pass the instruction pointer, current value of registers, effective address of memory operations, constants, etc. For a complete list, see IARG_TYPE.
With a small change, we can turn the instruction counting example into a Pintool that prints the address of every instruction that is executed. This tool is useful for understanding the control flow of a program for debugging, or in processor design when simulating an instruction cache.
We change the arguments to INS_InsertCall to pass the address of the instruction about to be executed. We replace docount with printip , which prints the instruction address. It writes it output to to the file itrace.out .
This is how to run it and look at the output:
$ pin -t itrace -- /bin/ls Makefile atrace.o imageload.out itrace proccount Makefile.example imageload inscount0 itrace.o proccount.o atrace imageload.o inscount0.o itrace.out $ head itrace.out 0x40001e90 0x40001e91 0x40001ee4 0x40001ee5 0x40001ee7 0x40001ee8 0x40001ee9 0x40001eea 0x40001ef0 0x40001ee0 $
The example can be found in ManualExamples/itrace.cpp
The previous example instruments all instructions. Sometimes a tool may only want to instrument a class of instructions, like memory operations or branch instructions. A tool can do this by using the Pin API which includes functions that classify and examine instructions. The basic API is common to all instruction sets and is described here. In addition, there is an instruction set specific API for ia32, and ipf.
In this example, we show how to do more selective instrumentation by examining the instructions. This tool generates a trace of all memory addresses referenced by a program. This is also useful for debugging and for simulating a data cache in a processor.
We only instrument instructions that read or write memory. We also use INS_InsertPredicatedCall instead of INS_InsertCall to avoid generating references to instructions that are predicated and the predicate is false (predication is only relevant for Itanium).
Since the instrumentation functions are only called once and the analysis functions are called every time an instruction is executed, it is much faster to only instrument the memory operations, as compared to the previous instruction trace example that instruments every instruction.
Here is how to run it and the sample output:
$ pin -t pinatrace -- /bin/ls Makefile atrace.o imageload.o inscount0.o itrace.out Makefile.example atrace.out imageload.out itrace proccount atrace imageload inscount0 itrace.o proccount.o $ head pinatrace.out 0x40001ee0: R 0xbfffe798 0x40001efd: W 0xbfffe7d4 0x40001f09: W 0xbfffe7d8 0x40001f20: W 0xbfffe864 0x40001f20: W 0xbfffe868 0x40001f20: W 0xbfffe86c 0x40001f20: W 0xbfffe870 0x40001f20: W 0xbfffe874 0x40001f20: W 0xbfffe878 0x40001f20: W 0xbfffe87c $
The example can be found in ManualExamples/pinatrace.cpp
The example below prints a message to a trace file every time and image is loaded or unloaded. It really abuses the image instrumentation mode as the Pintool neither inspects the image nor adds instrumentation code.
If you invoke it on ls, you would see this output:
$ pin -t imageload -- /bin/ls Makefile atrace.o imageload.o inscount0.o proccount Makefile.example atrace.out imageload.out itrace proccount.o atrace imageload inscount0 itrace.o trace.out $ cat imageload.out Loading /bin/ls Loading /lib/ld-linux.so.2 Loading /lib/libtermcap.so.2 Loading /lib/i686/libc.so.6 Unloading /bin/ls Unloading /lib/ld-linux.so.2 Unloading /lib/libtermcap.so.2 Unloading /lib/i686/libc.so.6 $
The example can be found in ManualExamples/imageload.cpp
The example Simple Instruction Count (Instruction Instrumentation) computed the number of executed instructions by inserting a call before every instruction. In this example, we make it more efficient by counting the number of instructions in a BBL at instrumentation time, and incrementing the counter once per BBL, instead of once per instruction.
The example can be found in ManualExamples/inscount1.cpp
The example below instruments a program to count the number of times a procedure is called, and the total number of instructions executed in each procedure. When it finishes, it prints a profile to proccount.out
Executing the tool and sample output:
$ pin -t proccount -- /bin/grep proccount.cpp Makefile proccount_SOURCES = proccount.cpp $ head proccount.out Procedure Image Address Calls Instructions _fini libc.so.6 0x40144d00 1 21 __deregister_frame_info libc.so.6 0x40143f60 2 70 __register_frame_info libc.so.6 0x40143df0 2 62 fde_merge libc.so.6 0x40143870 0 8 __init_misc libc.so.6 0x40115824 1 85 __getclktck libc.so.6 0x401157f4 0 2 munmap libc.so.6 0x40112ca0 1 9 mmap libc.so.6 0x40112bb0 1 23 getpagesize libc.so.6 0x4010f934 2 26 $
The example can be found in ManualExamples/proccount.cpp
It is also possible to use pin to examine binaries without instrumenting them. This is useful when you need to know static properties of an image. The sample tool below counts the number of instructions in an image, but does not insert any instrumentation.
The example can be found in ManualExamples/staticcount.cpp
Pin can relinquish control of application any time when invoked via PIN_Detach. Control is returned to the original uninstrumented code and the application runs at native speed. Thereafter no instrumented code is ever executed.
The example can be found in ManualExamples/detach.cpp
Probe mode is a method of using Pin to insert probes at the start of specified routines. A probe is a jump instruction that is placed at the start of the specified routine. The probe redirects the flow of control to the replacement function. Before the probe is inserted, the first few instructions of the specified routine are relocated. It is not uncommon for the replacement function to call the replaced routine. Pin provides the relocated address to facilate this. See the example below.
In probe mode, the application and the replacement routine are run natively. This improves performance, but it puts more responsibility on the tool writer.
The tool writer must guarantee that there is not jump target where the probe is placed. A probe is six bytes long on IA-32 platforms, seven bytes long on Intel 64 platforms, and 1 bundle in Itanium platforms.
Also, it is the tool writer's responsibility to ensure that no thread is currently executing the code where a probe is inserted or removed. Tool writers are encouraged to insert probes when an image is loaded to avoid this problem.
When using probes, the "-probe" option must be used on the command line, and Pin must be started with the PIN_StartProgramProbed() API.
The example can be found in ManualExamples/replacesigprobed.cpp
The examples in the previous section have introduced a number of ways to register call back functions via the Pin API:
The extra parameter val (shared by all the registration functions) will be passed to fun as its second argument whenever it is "called back". This is a standard mechanism used in GUI programming with call backs.
If this feature is not needed, it is safe to pass 0 for val when registering a call back. The expected use of val is to pass a pointer to an instance of a class. Since val is a generic pointer, fun must cast it back to an object before dereferencing the pointer.
An application and a tool are invoked as follows:
pin [pin-option]. -t [toolname] [tool-options]. -- [application] [application-option]..
The tool-options follow immediately after the tool specification and depend on the tool used.
Everything following the -- is the command line for the application.
For example, to apply the itrace example (Instruction Address Trace (Instruction Instrumentation)) to a run of the "ls" program:
pin -t itrace -- /bin/ls
To get a listing of the available command line options for Pin:
pin -h
To get a listing of the available command line options for the itrace example:
pin -t itrace -h -- /bin/ls
Note that in the last case /bin/ls is necessary on the command line but will not be executed.
The -injection switch is Unix-only and controls the way pin is injected into the application process. The default, dynamic, is recommended for all users. It uses parent injection unless it is unsupported (MacOs and Linux 2.4 kernels). Child injection creates the application process as a child of the pin process so you will see both a pin process and the application process running. In parent injection, the pin process exits after injecting the application and is less likely to cause a problem. Using parent injection on an unsupported platform may lead to nondeterministic errors.
IMPORTANT: The description about invoking assumes that the application is a program binary (and not a shell script). If your application is invoked indirectly (from a shell script or using 'exec') then you need to change the actual invocation of the program binary by prefixing it with pin/pintool options. Here's one way of doing that:
# Track down the actual application binary, say it is 'application_binary'. % mv application_binary application_binary.real # Write a shell script named 'application_binary' with the following contents. # (change 'itrace' to your desired tool) #!/bin/sh pin -t itrace -- application_binary.real $*
After you do this, whenever 'application_binary' is invoked indirectly (from some shell script or using 'exec'), the real binary will get invoked with the right pin/pintool options. ========================================================================================
There are 3 different programs residing in the address space. The application, the Pin instrumentation engine, and your Pintool. This section describes how to use gdb to find bugs in a Pintool. You cannot run Pin directly from gdb since Pin uses the debugging API to start the application. Instead, you must invoke Pin from the command line with the -pause_tool switch, and use gdb to attach to the Pin process from another window. The -pause_tool n switch makes Pin print out the process identifier (pid) and pause for n seconds.
If your tool is called opcodemix and the application is /bin/ls, you can use gdb as follows. Start gdb with your tool, but do not use the run command:
$ gdb opcodemix GNU gdb 5.2.1 Copyright 2002 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu". (gdb)
In another window, start your application with the -pause_tool switch:
$ pin -pause_tool 5 -t opcodemix -- /bin/ls Pausing to attach to pid 28769
Then go back to gdb and attach to the process:
(gdb) attach 28769 Attaching to program: . /build-ia32/SimpleExamples/opcodemix, process 28769 0x011ef361 in ?? () (gdb)
Now, instead of using the gdb run command, you use the cont command to continue execution. You can also set breakpoints as normal:
(gdb) break main Breakpoint 1 at 0x5048d30: file . /PinTools/SimpleExamples/opcodemix.cpp, line 232. (gdb) cont Continuing. Breakpoint 1, main (argc=6, argv=0x4fef534) at . /PinTools/SimpleExamples/opcode.cpp:232 (gdb)
If the program does not exit, then you should detach so gdb will release control:
(gdb) detach Detaching from program: . /build-ia32/SimpleExamples/opcodemix, process 28769 (gdb)
If you recompile your program and then use the run command, gdb will notice that the binary has been changed and reread the debug information from the file. This does not happen automatically when using attach. You must use the file command to make gdb reread the debug information:
(gdb) file opcodemix Load new symbol table from "opcodemix"? (y or n) y Reading symbols from opcodemix. done. (gdb)
They way a Pintool is written can have great impact on the performace of the tool, i.e. how much it slows down the applications it is instrumenting. This section demonstrates some techniques that can be used to improve tool performance. Let's start with an example. The following piece of code is derived from the Examples/edgcnt.cpp:
The instrumentation component of the tool is show below
VOID Instruction(INS ins, void *v) < . if ( [ins is a branch or a call instruction] ) < INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR) docount2, IARG_INST_PTR, IARG_BRANCH_TARGET_ADDR, IARG_BRANCH_TAKEN, IARG_END); > . >
The analysis component looks like this:
VOID docount2( ADDRINT src, ADDRINT dst, INT32 taken ) < if(!taken) return; COUNTER *pedg = Lookup( src,dst ); pedg->_count++; >
The purpose of the tool is to count how often each controlflow changing edge in the control flowgraph is traversed. The tool considers both calls and branches but for brevity we will not mention branches in our description. The tool works as follows: The instrumentation component instruments each branch with a call to docount2. As parameters we pass in the origin and the target of the branch and whether the branch was taken or not. Branch origin and target represent of the source and destination of the controlflow edges. If a branch is not taken the controlflow does not change and hence the analysis routine returns right away. If the branch is taken we use the src and dst parameters to look up the counter associated with this edge (Lookup will create a new one if this edge has not been seen before) and increment the counter. Note, that the tool could have been simplified somewhat by using IPOINT_TAKEN_BRANCH option with INS_InsertCall().
About every 5th instruction executed in a typical application is a branch. Lookup will called whenever these instruction are executed, causing significant application slowdown. To improve the situation we note that the instrumentation code is typically called only once for every instruction, while the analysis code is called everytime the instruction is executed. If we can somehow shift computation from the analysis code to the instrumentation code we will improve the overall performance. Our example tools offer multiple such opportunites which will explore in turn. The first observation is that for most branches we can find out inside of Instruction() what the branch target will be . For those branches we can call Lookup inside of Instruction() rather than in docount2(), for indirect branches which are relatively rare we still have to use our original approach. All this is reflected in the folling code. We add a second "lighter" analsysis function, docount. While the original docount2() remains unchanged:
VOID docount( COUNTER *pedg, INT32 taken ) < if( !taken ) return; pedg->_count++; >
And the instrumentation will be somewhat more complex:
VOID Instruction(INS ins, void *v) < . if (INS_IsDirectBranchOrCall(ins)) < COUNTER *pedg = Lookup( INS_Address(ins), INS_DirectBranchOrCallTargetAddress(ins) ); INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR) docount, IARG_ADDRINT, pedg, IARG_BRANCH_TAKEN, IARG_END); > else < INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR) docount2, IARG_INST_PTR, IARG_BRANCH_TARGET_ADDR, IARG_BRANCH_TAKEN, IARG_END); > . >
The code for docount() is very compact which besides the obvious performance advantages may also cause it to be inlined by pin thereby avoiding the overhead of a call. The heuristics for when a analysis routine is inlined by pin are subject to change. But small routines without any controlflow (single basic block) are almost guaranteed to be inlined. Unfortunately, docount() does have (albeit limited) controlflow. Observing that the parameter, taken, will be zero or one we can eliminate the remaining controlflow as follows:
VOID docount( COUNTER *pedg, INT32 taken ) < pedg->_count += taken; >
There is now no question whether docount() will be inlined or not.
At times we do not care about the exact point where calls to analysis code are being inserted as long as it is within a given basic block. In this case we can let Pin make the decission where to insert. This has the advantage that Pin can select am insertion point that requires minimal register saving and restoring. The following code from ManualExamples/inscount2.cpp shows how this is done for the instruction count example using IPOINT_ANYWHERE with BBL_InsertCall().
Pin improves instrumentation performance by automatically inlining analysis routines that have no control-flow changes. Of course, many analysis routines do have control-flow changes. One particularly common case is that an analysis routine has a single "if-then" test, where a small amount of analysis code plus the test is always executed but the "then" part is executed only once a while. To inline this common case, Pin provides a set of conditional instrumentation APIs for the tool writer to rewrite their analysis routines into a form that does not have control-flow changes. The following example from ManualExamples/isampling.cpp illustrates how such rewriting can be done:
In the above example, the original analysis routine IpSample() has a conditional control-flow change. It is rewritten into two analysis routines: CountDown() and PrintIp(). CountDown() is the simpler one of the two, which doesn't have control-flow change. It also performs the original conditional test and returns the test result. We use the conditional instrumentaton APIs INS_InsertIfCall() and INS_InsertThenCall() to tell Pin that tbe analysis routine specified by an INS_InsertThenCall() (i.e. PrintIp() in this example) is executed only if the result of the analysis routine specified by the previous INS_InsertIfCall() (i.e. CountDown() in this example) is non-zero. Now CountDown(), the common case, can be inlined by Pin, and only once a while does Pin need to execute PrintIp(), the non-inlined case.
To install a kit, unpack a kit and change to the directory:
$ tar zxf /proj/vssad/proj/pin/Kits/pin-2.0-776-ia32.tar.gz $ cd pin-2.0-776-ia32/
Build and test the examples from the manual
$ cd ManualExamples/ $ make test g++ -c -Wall -Werror -Wno-unknown-pragmas -I../Include -DTARGET_IA32 -g1 -o pinatrace.o pinatrace.cpp g++ -static -Wl,-wrap,mmap,-wrap,__mmap,-wrap,brk,-wrap,__brk,--section-start,.interp=0x05048000 -g1 -o pinatrace pinatrace.o -L../Lib/ -lpin -ldwarf -lelf -lencp68 -ldecp68 ../Bin/pin -t pinatrace -- /bin/cp makefile makefile.copy; cmp makefile makefile.copy g++ -c -Wall -Werror -Wno-unknown-pragmas -I../Include -DTARGET_IA32 -g1 -o inscount0.o inscount0.cpp g++ -static -Wl,-wrap,mmap,-wrap,__mmap,-wrap,brk,-wrap,__brk,--section-start,.interp=0x05048000 -g1 -o inscount0 inscount0.o -L../Lib/ -lpin -ldwarf -lelf -lencp68 -ldecp68 ../Bin/pin -t inscount0 -- /bin/cp makefile makefile.copy; cmp makefile makefile.copy Count 277395 g++ -c -Wall -Werror -Wno-unknown-pragmas -I../Include -DTARGET_IA32 -g1 -o itrace.o itrace.cpp g++ -static -Wl,-wrap,mmap,-wrap,__mmap,-wrap,brk,-wrap,__brk,--section-start,.interp=0x05048000 -g1 -o itrace itrace.o -L../Lib/ -lpin -ldwarf -lelf -lencp68 -ldecp68 ../Bin/pin -t itrace -- /bin/cp makefile makefile.copy; cmp makefile makefile.copy g++ -c -Wall -Werror -Wno-unknown-pragmas -I../Include -DTARGET_IA32 -g1 -o proccount.o proccount.cpp g++ -static -Wl,-wrap,mmap,-wrap,__mmap,-wrap,brk,-wrap,__brk,--section-start,.interp=0x05048000 -g1 -o proccount proccount.o -L../Lib/ -lpin -ldwarf -lelf -lencp68 -ldecp68 ../Bin/pin -t proccount -- /bin/cp makefile makefile.copy; cmp makefile makefile.copy $
Run one of the sample tools from the installed directory
$ ../Bin/pin -t pinatrace -- /bin/ls _insprofiler.cpp atrace.out inscount0.o itrace.cpp proccount atrace imageload.cpp inscount1.cpp itrace.o proccount.cpp atrace.cpp inscount0 insprofiler.cpp itrace.out proccount.o atrace.o inscount0.cpp itrace makefile proccount.out $ head pinatrace.out 0x40001ee0: R 0xbfffe1e8 0x40001efd: W 0xbfffe224 0x40001f09: W 0xbfffe228 0x40001f20: W 0xbfffe2b4 0x40001f20: W 0xbfffe2b8 0x40001f20: W 0xbfffe2bc 0x40001f20: W 0xbfffe2c0 0x40001f20: W 0xbfffe2c4 0x40001f20: W 0xbfffe2c8 0x40001f20: W 0xbfffe2cc $
To write your own tool, copy one of the example directories and edit the makefile to add your tool.
Each kit contains Pin and libraries for a specific architecture. Make sure the kit you download is for the right architecture. The Pin libraries use C++, and the compiler you use to build the tool must be compatible with the Pin library. This restriction only applies to building tools; you can instrument applications built by any compiler.
See the README file in the kit for specific information about compiler version and other limitations. If your compiler is not compatible with the kit, send mail to Pin.Project@intel.com.
Send bugs and questions to Pin.Project@intel.com. Complete bug reports that are easy to reproduce are fixed faster, so try to provide as much information as possible. Include: kit number, your OS version, compiler version. Try to reproduce the problem in a simple example that you can send us. Generated on Tue Jan 16 00:09:07 2007 for Pin by 1.4.6