Whether you are a Java developer or architect, CxO or simply the user of a modern smart phone,
Java bytecode
is in your face, quietly supporting the foundation of the Java Virtual Machine (JVM).
无论是Java开发人员还是架构师,CxO,还是智能手机的用户, 都涉及到了Java字节码(
bytecode
),他们默默地支撑着 JVM。
Directors, executives and non-technical folks can take a breather here: All they need to know is that while their development teams are building and preparing to deploy the next amazing version of their software, Java bytecode is silently pumping through the JVM platform.
董事,高管和非技术人员看到这里可以安下心来: 他们需要知道,虽然开发团队正在构建和准备下一个伟大的版本,但JVM平台此时正平稳地执行着Java字节码。
Put simply, Java bytecode is the intermediate representation of Java code (i.e. class files) and it is executed inside the JVM – so why should you care about it? Well, because you cannot run your entire development ecosystem without Java bytecode telling it all what to do, especially how to treat and define the code that Java developers are writing.
简而言之, 字节码是Java代码编译后的中间表示方式,JVM必须依靠它才能执行。 那为什么我们要掌握它呢? Well,如果没有Java字节码来表述所有的操作,整个Java生态系统就跑不起来,程序员编写的代码也就没有了用武之地。 特别是需要加深基本功的Java开发人员更需要理解字节码。
From a technical POV
, Java bytecode is the code set used by the Java Virtual Machine that is JIT-compiled into native code at runtime. Without Java bytecode behind the scenes, the JVM would not be able to compile and mirror the non-bytecode Java code developers write to add new features, fix bugs and produce beautiful apps.
从技术的角度看, Java字节码是Java虚拟机使用的代码集,在运行时被JIT编译为本机机器代码。 如果没有Java字节码的支撑,则JVM将无法编译和映射开发人员编写的Java代码, 也就不能添加新功能,修复bug, 也无法构建精美的应用系统。
Many IT professionals might not have had the time to goof around with assembler or machine code, so Java bytecode
can seem like an obscure piece of low-level magic. But, as you know, sometimes things go really wrong and understanding what is happening at the very foundation of the JVM may be what stands between you and solving the problem at hand.
随着软件行业的生态越来越庞大,很多技术专家也没有精力去研究汇编或者机器代码,因此Java字节码就充当了一种底层语言的角色。 但有经验的开发者都知道,业务系统总不可能没有BUG,了解JVM的基础,会在排查问题和分析错误时非常有用。
In this RebelLabs report you will learn how to read and write JVM bytecode directly, so as to better understand how the runtime works, and be able to disassemble key libraries that you depend on.
本文带你学习如何直接读写JVM字节码,以更好地了解运行时的工作机制,并实践如何反编译第三方类库。
In addition to getting the skinny on Java bytecode, we interviewed bytecode specialists Cédric Champeau and Jochen Theodorou working on the Groovy ecosystem at SpringSource, and tech lead Andrey Breslav working on Kotlin, a newcomer to the JVM language party, from JetBrains.
除了简单介绍Java字节码,我们还采访了SpringSource 中 Groovy 生态系统的字节码专家Cédric Champeau and Jochen Theodorou,以及 Kotlin 技术负责人 Andrey Breslav, 他是来自JetBrains的,刚加入JVM语言party。
We will cover the following topics:
- How to obtain the bytecode listings
- How to read the bytecode
- How the language constructs are mirrored by the compiler: local variables, method calls, conditional logic
- Introduction to ASM
- How bytecode works in other JVM languages like Groovy and Kotlin
So, get ready for your journey to the center of the JVM, and don’t forget your compiler ;-)
本文将介绍以下内容:
- 怎样查看字节码清单
- 如何阅读字节码
- 编译器如何映射语言结构: 局部变量,方法调用,条件逻辑
- ASM简介
- 在其他JVM语言(如Groovy和Kotlin)中字节码如何工作
请做好准备, 我们马上进入JVM的核心,不要忘记了编译器哦 ;-)
- [第一部分:Java字节码简介]()
- [第二部分:ASM入门]()
Java bytecode
is the form of instructions that the JVM executes. A Java programmer, normally, does not need to be aware of how Java bytecode works. However, understanding the low-level details of the platform is what makes you a better programmer after all (and we all want that, right?)
Java字节码是JVM执行的指令格式。 通常,Java程序员不需要知道字节码的工作原理。 但是,了解平台的底层原理和细节是职业进阶的阶梯。 我们都希望成为更好的程序员, 对吧?
Understanding bytecode and what bytecode is likely to be generated by a Java compiler helps Java programmers in the same way that knowledge of assembly helps the C or C++ programmer [http://www.ibm.com/developerworks/ibm/library/it-haggar_bytecode].
了解字节码,以及Java编译器可能会生成什么样的字节码,就如同C和C++程序员需要理解汇编知识一样。 简介文章可以参考: [http://www.ibm.com/developerworks/ibm/library/it-haggar_bytecode]。
Understanding the bytecode, however, is essential to the areas of tooling and program analysis, where the applications can modify the bytecode to adjust the behavior according to the application’s domain. Profilers, mocking frameworks, AOP – to create these tools, developers must understand Java bytecode thoroughly.
而对于工具和程序分析领域来说, 字节码就是至关重要的基础,可以通过修改字节码来调整程序的某些行为。 分析器(Profiler),mocking框架,AOP – 要创建这一类工具,则必须完全了解Java字节码。
Lets start with a very basic example in order to understand how Java bytecode is executed. Consider a trivial expression, 1 + 2, which can be written down in reverse Polish notation as 1 2 +. Why is the reverse Polish notation any good here? It is easy to evaluate such expression by using a stack:
让我们从一个简单的例子开始,来了解Java字节码是如何执行的。
一个基本的数学表达式 1 + 2
, 可以用反向波兰语符号(reverse Polish notation)将其记为 1 2 +
。 为什么在这里使用反向波兰语符号呢? 因为可以用栈结构很容易计算这样的表达式:
The result, 3, is on the top of the stack after the ‘add’ instruction executes.
结果是 3
, 在执行了加法操作之后, 结果只恰好位于栈顶位置。
The model of computation of Java bytecode
is that of a stack-oriented programming language. The example above is expressed with Java bytecode instructions is identical, and the only difference is that the opcodes have some specific semantics attached:
Java字节码的计算模型, 是一种面向栈的编程语言模型。 上面的示例, 可以转换为等价的Java字节码指令表示, 唯一的区别是操作码(opcode) 附加了一些特定的语义:
The opcodes iconst_1
and iconst_2
put constants 1 and 2 to the stack. The instruction iadd
performs addition operation on the two integers and leaves the result on the top of the stack.
操作码 iconst_1
and iconst_2
将常量 1
和2
压进栈。
iadd
指令则对两个整数执行加法运算,并将结果放到栈顶位置。
As the name implies, Java bytecode
consists of one-byte instructions, hence there are 256 possible opcodes. There are a little less real instructions than the set permits – approximately 200 opcodes are utilized, where some of the opcodes are reserved for debugger operation.
有一件有趣的事情, Java bytecode
就如名称所示, 由单个字节(byte)的指令组成,所以最多只能有 256
个操作码。
实际上Java中只有200个左右的操作码, 还有一些操作码则保留了用于调试操作。
Instructions are composed from a type prefix and the operation name. For instance, ‘i’ prefix stands for ‘integer’ and therefore the iadd
instruction indicates that the addition operation is performed for integers.
操作码, 即 指令
, 由类型前缀
和操作名称
组成。
例如,'i
' 前缀代表 ‘integer
’,所以,'iadd
' 应该很容易理解, 表示的是对整数执行加法运算。
Depending on the nature of the instructions, we can group these into several broader groups:
- Stack manipulation instructions, including interaction with local variables.
- Control flow instructions
- Object manipulation, incl. methods invocation
- Arithmetics and type conversion
根据指令的性质,可以分为四类:
- 栈操作指令,包括与局部变量交互的指令。
- 控制程序流转的指令
- 对象操作,包括方法调用指令
- 算术运算和类型转换
There are also a number of instructions of more specialized tasks such as synchronization and exception throwing.
此外还有一些执行专门任务的指令,比如同步(synchronization), 以及抛出异常相关的指令。
To obtain the instruction listings of a compiled class file, we can apply the javap utility, the standard Java class file disassembler distributed with the JDK.
We will start with a class that will serve as an entry point for our example application, the moving average calculator.
可以使用 javap
工具来获取 class 文件的指令清单, 这个工具是标准Java JDK 的一部分, 专门用于反编译class文件。
让我们从头开始, 先创建一个类,作为应用程序的入口点。
public class Main {
public static void main(String[] args){
MovingAverage app = new MovingAverage();
}
}
After the class file is compiled, to obtain the bytecode listing for the example above one needs to execute the following command: javap -c Main
The result is as follows:
然后编译这个类,生成 .class
文件,
生成class文件之后, 执行命令获取上述对应的字节码清单:
javap -c Main
结果如下:
Compiled from "Main.java"
public class algo.Main {
public algo.Main();
Code:
0: aload_0
1: invokespecial #1 // Method java/lang/Object."":()V
4: return
public static void main(java.lang.String[]);
Code:
0: new #2 // class algo/MovingAverage
3: dup
4: invokespecial #3 // Method algo/MovingAverage."":()V
7: astore_1
8: return
}
As you can see there is a default constructor and a main method. You probably always knew that if you don’t specify any constructor for a class there’s still a default one, but maybe you didn’t realize where it actually is. Well, here it is! We just proved that the default constructor actually exists in the compiled class, so it is java compiler who generates it.
The body of the constructor should be empty but there are a few instructions generated still. Why is that? Every constructor makes a call to super()
, right? It doesn’t happen automagically, and this is why some bytecode instructions are generated into the default constructor. Basically, this is the super()
call;
The main method creates an instance of MovingAverage
class and returns. We will review the class instantiation code in chapter 6.
You might have noticed that some of the instructions are referring to some numbered parameters with #1, #2, #3
. This are the references to the pool of constants. How can we find out what the constants are and how can we see the constant pool in the listing? We can apply the -verbose
argument to javap when disassembling the class:
可以看到,反编译后的代码清单中, 有一个默认的构造函数, 以及 main 方法。 刚学Java时我们就知道, 如果不定义任何构造函数,那么仍然会有一个默认的构造函数,这里再次验证了这个知识点。 好吧,这比较容易理解!我们证实了编译后的class文件中存在默认构造函数,所以这是Java编译器生成的, 而不是运行时由JVM字段生成。
构造函数应该是空的方法体,但这里看到里面依然有一些指令。这是为什么呢?
再次回顾Java知识, 每个构造函数都会调用 super()
对吧? 这不会自动执行, 而是由指令控制的,这就是为什么默认构造函数中会有字节码指令的原因。
基本上,这几条指令就是执行 super()
调用;
main 方法创建了 MovingAverage
类的一个实例, 然后就return了。
有些同学应该注意到了, 某些指令后面使用了 #1, #2, #3
这样的编号引用。
这就是对常量池的引用。 那常量池里面有些什么呢? 怎么在代码清单中怎样查看常量池呢?
我们可以在反编译 class 时,指定 -verbose
参数:
$ javap -c -verbose HelloWorld
Here’s some interesting parts that it prints:
结果如下所示:
Classfile /Users/anton/work-src/demox/out/production/demox/algo/Main.class
Last modified Nov 20, 2012; size 446 bytes
MD5 checksum ae15693cf1a16a702075e468b8aaba74
Compiled from "Main.java"
public class algo.Main
SourceFile: "Main.java"
minor version: 0
major version: 51
flags: ACC_PUBLIC, ACC_SUPER
Constant pool:
#1 = Methodref #5.#21 // java/lang/Object."":()V
#2 = Class #22 // algo/MovingAverage
#3 = Methodref #2.#21 // algo/MovingAverage."":()V
#4 = Class #23 // algo/Main
#5 = Class #24 // java/lang/Object
Theres a bunch of technical information about the class file: when it was compiled, the MD5 checksum, which *.java
file it was compiled from, which Java version it conforms to, etc.
We can also see the accessor flags there: ACC_PUBLIC
and ACC_SUPER
. The ACC_PUBLIC
flag is kind of intuitive to understand: our class is public hence there is the accessor flag saying that it is public. But was is ACC_SUPER
for? ACC_SUPER
was introduced to correct a problem with the invocation of super methods with the invokespecial
instruction. You can think of it as a bugfix to the Java 1.0 so that it could discover super class methods correctly. Starting from Java 1.1 the compiler always generates ACC_SUPER
accessor flag to bytecode.
You can also find the denoted constant definitions in the constant pool:
其中显示了很多关于class文件信息: 编译时间, MD5校验和, 从哪个*.java
源文件编译得来,符合哪个版本的Java语言规范等等。
我们还可以看到 ACC_PUBLIC
和 ACC_SUPER
访问标志符。
ACC_PUBLIC
标志很容易理解:这个类是public类,因此这个标志来说明。
但 ACC_PUBLIC
标志是怎么回事呢? 这就是历史原因了, 引入 ACC_SUPER
的目的是为了修正 invokespecial
指令调用 super 类方法的问题。
这算是 Java 1.0版本的BUG修正, 以便可以正确查找到超类方法。 从 Java 1.1 开始, 编译器都会在字节码中强制生成ACC_SUPER
访问器标志。
终于看到常量池中的常量定义了:
#1 = Methodref #5.#21 //java/lang/Object."":()V
The constant definitions are composable, meaning the constant might be composed from other constants referenced from the same table.
There are a few other things that reveal itself when using -verbose
argument with javap. For instance there’s more information printed about the methods:
常量定义支持组合, 也就是说一个常量的定义中可以引用其他常量。
在 javap 命令中使用 -verbose
选项时, 还显示了其他的一些信息。 例如, main 方法的更多信息被打印出来:
public static void main(java.lang.String[]);
flags: ACC_PUBLIC, ACC_STATIC
Code:
stack=2, locals=2, args_size=1
The accessor flags are also generated for methods, but we can also see how deep a stack is required for execution of the method, how many parameters it takes in, and how many local variable slots need to be reserved in the local variables table.
方法的访问标志也生成了, 而且可以看到执行该方法时需要的stack深度是多少,需要在局部变量表中保留多少个槽位, 还有方法的参数个数。
To understand the details of the bytecode, we need to have an idea of the model of execution of the bytecode. A JVM is a stack-based machine. Each thread has a JVM stack which stores frames. Every time a method is invoked a frame is created. A frame consists of an operand stack, an array of local variables, and a reference to the runtime constant pool of the class of the current method. We have seen all this in our initial example, the disassembled Main class.
要了解字节码的细节,我们需要对字节码的执行模型有所了解。 JVM是一款基于栈的计算机。
每个线程都有一个独属于自己的线程栈(JVM stack),用于存储 栈帧(Frame)。
每调用一个方法,就会创建一个栈帧。
栈帧
由 操作数栈
, 局部变量数组
以及一个class引用
组成(指向运行时常量池中当前方法对应的class)。
我们在前面反编译的那个示例中已经看到这些内容。
The array of local variables, also called the local variable table, contains the parameters of the method and is also used to hold the values of the local variables. The size of the array of local variables is determined at compile time and is dependent on the number and size of local variables and formal method parameters.
局部变量数组
也称为局部变量表
, 包含了方法的参数,并用来保存局部变量的值。 局部变量数组的大小在编译时就会确定: 和局部变量/形参的个数,以及每个变量占用的空间大小有关。
The operand stack is a LIFO stack used to push and pop values. Its size is also determined at compile time. Certain opcode instructions push values onto the operand stack;
others take operands from the stack, manipulate them, and push the result. The operand stack is also used to receive return values from methods.
操作数栈是一个LIFO结构的栈, 用于压入和弹出值。 它的大小也在编译时确定。 有一些操作码指令用于将值压入“操作数栈”; 还有一些操作码指令则是从栈中获取操作数,并进行处理后将结果压入stack。 操作数栈还用于接收调用其他方法的返回值。
In the debugger, we can drop frames one by one, however the state of the fields will not be rolled back.
在调试器中,我们可以一帧一帧地删除 Frame, 但是字段的状态并不能回滚。
When looking at the bytecode listing from the HelloWorld example you might start to wonder, what are those numbers in front of every instruction? And why are the intervals between the numbers not equal?
从前面的示例中,细心的同学可能会猜测字节码列表前面的数字是什么意思,但他们之间的间隔又不相等。
0: new #2 // class algo/MovingAverage
3: dup
4: invokespecial #3 // Method algo/MovingAverage."":()V
7: astore_1
8: return
The reason: Some of the opcodes have parameters that take up space in the bytecode array. For instance, new occupies three slots in the array to operate: one for itself and two for the input parameters. Therefore, the next instruction – dup
– is located at the index 3.
原因分析: 一部分操作码的参数也会占用字节码数组中的空间。 例如, new
就会占用三个槽位: 一个用于表示操作码本身,两个用于存放输入参数。
因此,下一条指令 dup
的索引是 3
。
Here’s what it looks like if we visualize the method body as an array:
如果将方法体变为可视化数组,则看起来如下所示:
Every instruction has its own HEX representation and if we use that we’ll get the HEX string that represents the method body:
每条指令都有自己的十六进制(HEX)表示形式, 如果使用十六进制编辑器,将获得表示方法体的HEX字符串:
By opening the class file in HEX editor we can find this string:
通过在HEX编辑器中打开类文件,我们可以找到以下字符串:
It is even possible to change the bytecode via HEX editor even though it is a bit fragile to do so. Besides there are some better ways of doing this, like using bytecode manipulation tools such as ASM or Javassist.
甚至可以通过HEX编辑器更改字节码,尽管这样做容易出错。此外,还有一些更好的方法可以做到这一点,例如使用字节码操作工具(例如ASM或Javassist)。
Not much to do with this knowledge at the moment, but now you know where these numbers come from.
目前与这些知识关系不大,但是现在您知道这些数字从何而来。
There are a number of instructions that manipulate the stack in one way or another. We have already mentioned some basic instructions that work with the stack: push values to the stack or take values from the stack. But there’s more; the swap instruction can swap two values on the top of the stack.
Here are some example instructions that juggle the values around the stack. Some basic instructions first: dup and pop. The dup instruction duplicates the value on top of the stack. The pop instruction removes the top value from the stack.
There are some more complex instructions: swap, dup_x1 and dup2_x1
, for instance. The swap instruction, as the name implies, swaps two values on the top of the stack, e.g. A and B exchange positions (see example 4); dup_x1
inserts a copy of the top value into the stack two values from the top (see example 5); dup2_x1
duplicates two top values and inserts beneath the third (example 6).
有很多指令可以操纵方法栈。 前面提到了一些基本的栈操作指令: 将值压入栈,或从栈中获取值。 但除了这些基础操作之外还有更多指令; 比如 swap
指令可以交换栈顶两个元素的值。
下面是一些示例指令,这些指令操作栈里的值。
首先,最基础的:dup
和 pop
。 dup
指令复制栈顶元素的值。 pop
指令则从栈中删除最顶部的值。
还有一些复杂的指令:例如,swap
, dup_x1
和 dup2_x1
。
顾名思义,swap
指令可交换栈顶两个元素的值,例如A和B交换位置(下图中的示例4);
dup_x1
将复制栈顶元素的值,并在栈顶插入两次(下图中的示例5);
dup2_x1
则复制栈顶两个元素的值,并插入第三个值(下图中的示例6)。
The dup_x1
and dup2_x1
instructions seem to be a bit esoteric – why would anyone need to apply such behavior – duplicating top values under the existing values in the stack? Here’s a more practical example: how to swap 2 values of double type? The caveat is that double takes two slots in the stack, which means that if we have two double values on the stack they occupy four slots. To swap the two double values we would like to use the swap
instruction but the problem is that it works only with one-word instructions, meaning it will not work with doubles, and swap2 instruction does not exist. The workaround is then to use dup2_x2
instruction to duplicate the top double value below the bottom one, and then we can pop the top value using the pop2
instruction. As a result, the two doubles will be swapped.
dup_x1
和 dup2_x1
指令似乎有点深奥。
为什么需要设置这种指令呢? 在栈中复制最顶部的值?
请看一个实际的示例:如何交换2个double类型的值?
需要注意的是, 一个double值占两个槽位,也就是说如果栈中有两个double值,它们将占用四个槽位。
要交换两个double值,你可能想到了 swap
指令,但问题是 swap 只适用于单字指令(one-word instructions),所以它不能处理double类型, 而又没有 swap2 指令。
怎么办呢? 解决方法是使用 dup2_x2
指令, 将操作数栈顶部的double值, 复制到底部的double值下方, 然后再使用 pop2
指令弹出栈顶的double值。结果就是交换了两个 double 值。 如下图所示:
While the stack is used for execution, local variables are used to save the intermediate results and are in direct interaction with the stack.
Let’s now add some more code into our initial example:
栈主要用于执行指令,而局部变量则用来保存中间结果,可以和栈直接交互。
接着我们增加更多代码:
public static void main(String[] args) {
MovingAverage ma = new MovingAverage();
int num1 = 1;
int num2 = 2;
ma.submit(num1);
ma.submit(num2);
double avg = ma.getAvg();
}
We submit two numbers to the MovingAverage
class and ask it to calculate the average of the current values. The bytecode obtained from this code is as follows:
我们向 MovingAverage
类的实例提交了两个数值, 并要求其计算当前的平均值。 main 方法对应的字节码如下:
Code:
0: new #2 // class algo/MovingAverage
3: dup
4: invokespecial #3 // Method algo/MovingAverage."":()V
7: astore_1
8: iconst_1
9: istore_2
10: iconst_2
11: istore_3
12: aload_1
13: iload_2
14: i2d
15: invokevirtual #4 // Method algo/MovingAverage.submit:(D)V
18: aload_1
19: iload_3
20: i2d
21: invokevirtual #4 // Method algo/MovingAverage.submit:(D)V
24: aload_1
25: invokevirtual #5 // Method algo/MovingAverage.getAvg:()D
28: dstore 4
LocalVariableTable:
Start Length Slot Name Signature
0 31 0 args [Ljava/lang/String;
8 23 1 ma Lalgo/MovingAverage;
10 21 2 num1 I
12 19 3 num2 I
30 1 4 avg D
注意: JDK8的javac工具内置了优化, 可能不生成局部变量表信息。 想查看可以使用 Idea 之类的工具自动生成。
After creating the local variable of type MovingAverage
the code stores the value in a local variable ma
, with the astore_1
instruction: 1 is the slot number of ma in the LocalVariableTable
.
Next, instructions iconst_1
and iconst_2
are used to load constants 1 and 2 to the stack and store them in LocalVariableTable
slots 2 and 3 respectively by the instructions istore_2
and istore_3
.
Note that the invocation of store-like instruction actually removes the value from the top of the stack. This is why in order to use the variable value again we have to load it back to the stack. For instance, in the listing above, before calling the submit method, we have to load the value of the parameter to the stack again:
创建了 MovingAverage
类的局部变量后, 使用 astore_1
指令将引用地址值(addr.)存储(store)到编号为1的局部变量中(此处为 ma
):
astore_1
中的 1
指代 LocalVariableTable 中ma对应的槽位编号。
后面的指令 iconst_1
和 iconst_2
用来将常量值1
和2
加载到栈里面, 并分别由指令 istore_2
和 istore_3
将它们存储到在 LocalVariableTable 的槽位2和槽位3中。
请注意,store之类的指令调用实际上从栈顶删除了该值。 这就是为什么再次使用相同的值时,必须再加载一次的原因。
例如在上面的清单中,调用 submit
方法之前, 必须再次将参数值加载到栈中:
12: aload_1
13: iload_2
14: i2d
15: invokevirtual #4 // Method algo/MovingAverage.submit:(D)V
After calling the getAvg()
method the result of the execution locates on the top of the stack and to store it to the local variable again the dstore
instruction is used since the target variable is of type double.
调用 getAvg()
方法后,返回的结果位于栈顶,然后使用 dstore
将 double
值保存到本地变量4号槽位,这里的d表示目标变量的类型为double。
24: aload_1
25: invokevirtual #5 // Method algo/MovingAverage.getAvg:()D
28: dstore 4
One more interesting thing to notice about the LocalVariableTable
is that the first slot is occupied with the parameter(s) of the method. In our current example it is the static method and there’s no this
reference assigned to the slot 0 in the table. However, for the non-static methods this
will be assigned to slot 0.
关于 LocalVariableTable
有个有意思的事情, 最前面的槽位会被方法参数占用。
在这个示例中,因为是静态方法,所以槽位0中并没有设置为 this
引用的地址。 但是对于非静态方法来说, this
会将分配到第0号槽位中。
有过反射编程经验的同学可能比较容易理解:
Method#invoke(Object obj, Object... args)
; 有JavaScript编程经验的同学也可以类比:fn.apply(obj, args) && fn.call(obj, arg1, arg2);
The takeaway from this part is that whenever you want to assign something to a local variable, it means you want to store
it by using a respective instruction, e.g. astore_1
. The store instruction will always remove the value from the top of the stack. The corresponding load
instruction will push the value from the local variables table to the stack, however the value is not removed from the local variable.
这部分的要点是,给局部变量赋值时,需要使用相应的指令来进行 store
,如 astore_1
。
store
一类的指令都会从栈顶删除该值。
相应的 load
指令会则会将值从局部变量表压入操作数栈,但并不会删除局部变量中的值。
The flow control instructions are used to organize the flow of the execution depending on the conditions. If-Then-Else, ternary operator, various kinds of loops and even exception handling opcodes belong to the control flow group of Java bytecode
. This is all about jumps and gotos now :)
流程控制指令, 用于根据判断条件来控制程序的执行流程。
一般是 If-Then-Else
这种三元运算符(ternary operator),
Java中的各种循环,甚至异常处的理操作码都可归属于 程序流程控制字节码。 这就是现在关于跳转(jump)和goto的全部 :)
We will now change our example so that it will handle an arbitrary number of numbers that can be submitted to the MovingAverage class:
然后,我们修改示例代码,可以提交给 MovingAverage 类任意数量的数值:
MovingAverage ma = new MovingAverage();
for (int number : numbers) {
ma.submit(number);
}
Assume that the numbers variable is a static field in the same class. The bytecode that corresponds to the loop that iterates over the numbers is as follows
如果 numbers 是本类中的 static 属性, 则循环对应的字节码如下所示。
0: new #2 // class algo/MovingAverage
3: dup
4: invokespecial #3 // Method algo/MovingAverage."":()V
7: astore_1
8: getstatic #4 // Field numbers:[I
11: astore_2
12: aload_2
13: arraylength
14: istore_3
15: iconst_0
16: istore 4
18: iload 4
20: iload_3
21: if_icmpge 43
24: aload_2
25: iload 4
27: iaload
28: istore 5
30: aload_1
31: iload 5
33: i2d
34: invokevirtual #5 // Method algo/MovingAverage.submit:(D)V
37: iinc 4, 1
40: goto 18
43: return
LocalVariableTable:
Start Length Slot Name Signature
30 7 5 number I
12 31 2 arr$ [I
15 28 3 len$ I
18 25 4 i$ I
0 49 0 args [Ljava/lang/String;
8 41 1 ma Lalgo/MovingAverage;
48 1 2 avg D
The instructions at positions 8 through 16 are used to organize the loop control. You can see that there are three variables in the LocalVariableTable
that aren’t really mentioned in the source code: arr$
, len$
, i$
– those are the loop variables. The variable arr$
stores the reference value of the numbers field from which the length of the loop, len$
, is derived using the arraylength
instruction. Loop counter, i$
is incremented after each iteration using iinc
instruction.
The first instructions of the loop body are used to perform the comparison of the loop counter to the array length:
位置 [8~16] 的指令用于循环控制。
可以看到, 在LocalVariableTable 中有三个在源码中没有真出现的变量: arr$
, len$
, i$
, 这就是循环变量。
arr$
变量保存了 numbers 的引用值,
len$
由 arraylength
指令使用, 得出循环的长度。
i$
则是循环计数器, 每次迭代后使用 iinc
指令来递增。
循环体的第一条指令用于执行 循环计数器与数组长度 的比较:
18: iload 4
20: iload_3
21: if_icmpge 43
We load the values of i$
and len$
to the stack and call the if_icmpge
to compare the values. The if_ icmpge
instruction meaning is that if the one value is greater or equal than the other value, in our case if i$
is greater or equal than len$
, then the execution should proceed from the statement that is marked with 43. If the condition does not hold, then the loop proceeds with the next iteration.
At the end of the loop it loop counter is incremented by 1 and the loop jumps back to the beginning to validate the loop condition again:
这段指令将 i$
和 len$
的值加载到栈中,并调用 if_icmpge
指令来比较他们的值。
【if_icmpge
解读: if, integer, compare, greate equal】, 如果一个数的值大于或等于另一个值,则程序执行流程应该跳转到pc=43
的地方继续执行。
在这个例子中就是, 如果 i$
大于或等于 len$
, 循环结束,方法也就返回了(43对应的是return).
如果条件不成立,则循环继续进行下一次迭代。
在循环结束时,它的循环计数器加1,然后循环跳回到起点以再次验证循环条件:
37: iinc 4, 1 // increment i$
40: goto 18 // jump back to the beginning of the loop
As you have seen already, there’s a number of instructions that perform all kind of arithmetics in Java bytecode
. In fact, a large portion of the instruction set is denoted to the arithmetic. There are instructions of addition, subtraction, multiplication, division, negation for all kind of types – integers, longs, doubles, floats. Plus there’s a lot of instructions that are used to convert between the types.
如您所见,Java字节码中有许多指令可以执行算术运算。
实际上,指令集中有很大一部分表示都是关于数学运算的。
对于所有数值类型(int
, long
, double
, float
),都有加,减,乘,除,取反的指令。
那么 byte
和 char
, boolean
呢? JVM 是当做 int
来处理的。
另外,还有部分指令用于数据类型之间的转换。
Arithmetical opcodes and types
算术操作码和类型
Type conversion happens for instance when we want to assign an integer value to a variable which type is long.
Type conversion opcodes
例如,当我们想将 integer 类型的值赋值给 long 类型的变量时,就会发生类型转换。
类型转换操作码
In our example where an integer value is passed as a parameter to submit()
method which actually takes double, we can see that before actually calling the method the type conversion opcode is applied:
在前面的示例中, 将 int 值作为参数传递给实际上接收 double 的 submit()
方法时,可以看到, 在实际调用该方法之前,使用了类型转换的操作码:
31: iload 5
33: i2d
34: invokevirtual #5 // Method algo/MovingAverage.submit:(D)V
It means we load a value of a local variable to the stack as an integer, and then apply i2d
instruction to convert it into double in order to be able to pass it as a parameter.
The only instruction that doesn’t require the value on the stack is the increment instruction, iinc
, which operates on the value sitting in LocalVariableTable
directly. All other operations are performed using the stack.
也就是说, 将一个 int 类型局部变量的值, 作为整数加载到栈中,然后用 i2d
指令将其转换为 double 值,以便将其作为参数传递给方法。
唯一不需要将数值load到操作数栈的指令是 iinc
,它直接对 LocalVariableTable
中的值进行运算。 其他的所有操作均使用栈来执行。
There’s a keyword new in Java but there’s also a bytecode instruction called new. When we created an instance of MovingAverage class:
我们都知道Java中有一个关键字是 new
, 但其实在字节码中,也有一个指令叫做 new
。 当我们创建 MovingAverage
类的实例时:
MovingAverage ma = new MovingAverage();
the compiler generated a sequence of opcodes that you can recognize as a pattern:
编译器会生成类似下面这样的操作码:
0: new #2 // class algo/MovingAverage
3: dup
4: invokespecial #3 // Method algo/MovingAverage."":()V
When you see new, dup
and invokespecial
instructions together it must ring a bell – this is the class instance creation!
Why three instructions instead of one, you ask? The new instruction creates the object but it doesn’t call the constructor, for that, the invokespecial instruction is called: it invokes the mysterious method called , which is actually the constructor. The dup instruction is used to duplicate the value on the top of the stack. As the constructor call doesn’t return a value, after calling the method on the object the object will be initialized but the stack will be empty so we wouldn’t be able to do anything with the object after it was initialized. This is why we need to duplicate the reference in advance so that after the constructor returns we can assign the object instance into a local variable or a field. Hence, the next instruction is usually one of the following:
当你同时看到 new, dup
和 invokespecial
指令在一起时,那么一定是在创建类的实例对象!
为什么是三条指令而不是一条呢?
这是因为 new
指令只是创建了对象,但没有调用构造函数。
invokespecial
指令就是用来调用某个特殊方法的, 当然这里调用的是构造函数。
dup
指令用于复制栈顶的值。 由于构造函数调用不会返回值,所以如果没有dup指令, 在对象上调用方法并初始化之后,操作数栈就会是空的,那么在初始化之后我们就无法对其进行任何处理。
这就是为什么要事先复制引用的原因,以便在构造函数返回之后,可以将对象实例赋值给局部变量或某个字段。
因此,接下来的那条指令一般是以下几种:
astore {N}
orastore_{N}
– to assign to a local variable, where {N} is the position of the variable in local variables table.putfield
– to assign the value to an instance fieldputstatic
– to assign the value to a static field
While a call to is a constructor invocation, there’s another similar method, which is invoked even earlier. This is the static initializer name of the class. The static initializer of the class isn’t called directly, but triggered by one of the following instructions: new, getstatic, putstatic or invokestatic. That said, if you create a new instance of the class, access a static field or call a static method, the static initializer is triggered.
In fact, there is even more options to trigger the static initializer as described in the Chapter 5.5 of JVM specification [http://docs.oracle.com/javase/specs/jvms/se7/html/]
astore {N}
orastore_{N}
– 赋值给局部变量,其中{N}
是局部变量表中的位置。putfield
– 将值赋给实例字段putstatic
– 将值赋给静态字段
在调用构造函数的时候,其实还有另一个类似的方法,甚至会在构造函数之前调用。
那就是该类的静态初始化器名称。 类的静态初始化方法并不会被直接调用的,而是由以下指令之一触发的: new
, getstatic
, putstatic
or invokestatic
。
也就是说,如果创建某个类的新实例, 访问静态字段或者调用静态方法,则会触发静态初始化方法【如果尚未初始化】。
实际上,还有一些情况会触发静态初始化, 详情请参考JVM规范中的 Chapter 5.5: [http://docs.oracle.com/javase/specs/jvms/se7/html/]
We have touched the method invocation topic slightly in the class instantiation part: the method was invoked via invokespecial
instruction which resulted in the constructor call. However, there are a few more instructions that are used for method invocation:
invokestatic
, as the name implies, this is a call to a static method of the class. This is the fastest method invocation instruction there is.invokespecial
instruction is used to call the constructor, as we know. But it also is used to call private methods of the same class and accessible methods of the super class.invokevirtual
is used to call public, protected and package private methods if the target object of a concrete type.invokeinterface
is used when the method to be called belongs to an interface.
前面我们在类实例化部分稍微提了一下方法调用: 构造函数是通过 invokespecial
指令调用的。
此外,还有一些用于方法调用的指令:
invokestatic
, 顾名思义,这个指令用于调用 某个类的静态方法。 这是方法调用指令中最快的一个。invokespecial
, 我们已经学过了,invokespecial
指令用来调用构造函数。 但也可以用于调用同一个类中的 private 方法, 以及可看见的超类方法。invokevirtual
-如果具体类型的目标对象,invokevirtual
用于调用公共,受保护和打包私有方法。invokeinterface
- 当要调用的方法属于某个接口时,将使用invokeinterface
指令。
So what is the difference between invokevirtual and invokeinterface?
Indeed a very good question. Why do we need both invokevirtual
and invokeinterface
, why not to use invokevirtual
for everything? The interface methods are public methods after all! Well, this is all due to the optimization for method invocations. First, the method has to be resolved, and then we can call it. For instance, with invokestatic
we know exactly which method to call: it is static, it belongs to only one class. With invokespecial
we have a limited list of options – it is easier to choose the resolution strategy, meaning the runtime will find the required method faster.
With invokevirtual
and invokeinterface
the difference is not that obvious however. Let me offer a very simplistic explanation of the difference for these two instructions. Imagine that the class definition contains a table of method definitions and all the methods are numbered by position. Here’s an example: class A, with methods method1 and method2 and a subclass B, which derives method1, overrides method2, and declares new method3. Note that method1 and method2 are at the same indexed position both in class A and class B.
那么
invokevirtual
和invokeinterface
有什么区别呢?
这确实是个好问题。 为什么需要 invokevirtual
和 invokeinterface
这两种指令呢? 毕竟所有的接口方法都是公共方法, 直接使用 invokevirtual
不就可以了吗?
这么做是源于对方法调用的优化。 JVM必须先解析该方法,然后才能调用它。
例如,使用 invokestatic
指令, JVM就确切地知道要调用的是哪个方法:因为调用的是静态方法,只能属于一个类。
使用 invokespecial
时, 查找的数量也很少, 解析也更加容易, 那么运行时就能更快地找到所需的方法。
使用 invokevirtual
和 invokeinterface
的区别不是那么明显。
想象一下,类定义中包含一个方法定义表, 所有方法都有位置编号。
下面的示例中:A类包含 method1和method2方法; 子类B继承A,继承了method1,覆写了method2,并声明了方法method3。
请注意,method1和method2方法在类A和类B中处于相同的索引位置。
class A
1: method1
2: method2
class B extends A
1: method1
2: method2
3: method3
This means that if the runtime wants to call method2, it will always find it at position 2. Now, to explain the essential difference between invokevirtual
and invokeinterface
let’s make class B to extend interface X which declares a new method:
那么,在运行时只要调用 method2,一定是在位置2处找到它。
现在我们来解释invokevirtual
和 invokeinterface
之间的本质区别。
假设有一个接口X声明了methodX方法, 让B类在上面的基础上实现接口X:
class B extends A implements X
1: method1
2: method2
3: method3
4: methodX
The new method is at index 4 and it looks like it is not any different from method3 in this situation. However, what if theres another class, C, which also implements the interface but does not belong to the same hierarchy as A and B:
新方法methodX位于索引4处,在这种情况下,它看起来与method3没什么不同。 但如果还有另一个类C也实现了X接口,但不继承A,也不继承B:
class C implements X
1: methodC
2: methodX
The interface method is not at the same position as in class B any more and this is why runtime is more restricted in respect to invokinterface
, meaning it can do less assumptions in method resolution process than with invokevirtual
.
C中的接口方法位置与B类的不同,这就是为什么运行时在 invokinterface
方面受到更多限制的原因,
与 invokevirtual
相比,在方法解析过程中 invokinterface
可以做更少的假设, 效率更高。
这也是为什么推荐使用 interface 编程,以及 HashMap 接口直接声明实现 Map 接口的一部分原因。
ObjectWeb ASM is the de-facto
standard for Java bytecode
analysis and manipulation. ASM exposes the internal aggregate components of a given Java class through its visitor oriented API. The API itself is not very broad – with a limited set of classes you can achieve pretty much all you need. ASM can be used for modifying the binary bytecode, as well as generating new bytecode. For instance, ASM can be applied to implement a new programming language semantics (Groovy, Kotlin, Scala), compiling the high-level programming idioms into bytecode capable for execution in the JVM.
“We didn’t even consider using anything else instead of ASM, because other projects at JetBrains use ASM successfully for a long time.” – ANDREY BRESLAV, KOTLIN
My first touch with bytecode first hand was when I started helping in the Groovy project and by then we settled to ASM. ASM can do what is needed, is small and doesn’t try to be too smart to get into your way. ASM tries to be memory and performance effective. For example you don’t have to create huge piles of objects to create your bytecode. It was one of the first with support for
invokedynamic
btw. Of course it has its pro and con sides, but all in all I am happy with it, simply because I can get the job done using it. – JOCHEN THEODOROU, GROOVY
I mostly know about ASM, just because it’s the one used by Groovy :) However, knowing that it’s backed by people like Rémi Forax, who is a major contributor in the JVM world is very important and guarantees that it follows the latest improvements. – CÉDRIC CHAMPEAU, GROOVY
To give you a very gentle introduction we will generate a “Hello World” example using the ASM library and add a loop to print the phrase an arbitrary number of times.
public class HelloWorld {
public static void main(String[] args) {
System.out.println(“Hello, World!”);
}
}
The most common scenario to generate bytecode that corresponds to the example source, is to create ClassWriter
, visit the structure – fields, methods, etc, and after the job is done, write out the final bytes.
First, let’s construct the ClassWriter instance:
ClassWriter cw = new ClassWriter(
ClassWriter.COMPUTE_MAXS |
ClassWriter.COMPUTE_FRAMES);
The ClassWriter
instance can be instantiated with some constants that indicate the behavior that the instance should have. COMPUTE_MAXS
tells ASM to automatically compute the maximum stack size and the maximum number of local variables of methods. COMPUTE_FRAMES
flag makes ASM to automatically compute the stack map frames of methods from scratch.
The define a class we must invoke the visit()
method of ClassWriter:
cw.visit(
Opcodes.V1_6,
Opcodes.ACC_PUBLIC,
"HelloWorld",
null,
"java/lang/Object",
null);
Next, we have to generate the default constructor and the main method. If you skip generating the default constructor nothing bad will happen, but it is still polite to generate one.
MethodVisitor constructor =
cw.visitMethod(
Opcodes.ACC_PUBLIC,
"",
"()V",
null,
null);
constructor.visitCode();
//super()
constructor.visitVarInsn(Opcodes.ALOAD, 0);
constructor.visitMethodInsn(Opcodes.INVOKESPECIAL,
"java/lang/Object", "", "()V");
constructor.visitInsn(Opcodes.RETURN);
constructor.visitMaxs(0, 0);
constructor.visitEnd();
We first created the constructor using the visitMethod()
method. Next, we indicate that we’re now about to start generating the body of the constructor by calling visitCode()
method. At the end we call to visitMaxs()
– this is to ask ASM to recompute the maximum stack size. As we indicated that ASM can do that for us automatically using COMPUTE_MAXS
flag in ClassWriter’s constructor, we can pass in random arguments to visitMaxs()
method. At last, we indicate that the generating bytecode for the method is complete with visitEnd()
method.
Here’s what ASM code for main method looks like:
MethodVisitor mv = cw.visitMethod(
Opcodes.ACC_PUBLIC + Opcodes.ACC_STATIC,
"main", "([Ljava/lang/String;)V", null, null);
mv.visitFieldInsn(Opcodes.GETSTATIC, "java/lang/System",
"out", "Ljava/io/PrintStream;");
mv.visitLdcInsn("Hello, World!");
mv.visitMethodInsn(Opcodes.INVOKEVIRTUAL, "java/io/PrintStream",
"println", "(Ljava/lang/String;)V");
mv.visitInsn(Opcodes.RETURN);
mv.visitMaxs(0, 0);
mv.visitEnd();
By calling the visitMethod()
again, we generated the new method definition with the name, modifiers and the signature. Again, visitCode()
, visitMaxs()
and visitEnd()
methods are used the same way as in case with the constructor.
As you can see the code is full of constants, “flags” and “indicators” and the final code is not very fluently readably by human eyes. At the same time, to write such code one needs to keep in mind the bytecode execution plan to be able to produce correct version of bytecode. This is what makes writing such code rather a complicated task. This is where everyone has his own approach it writing code with ASM.
Our approach is using Kotlin’s ability to enhance existing Java APIs: we created some helper functions (many of them extension functions) that make ASM APIs look very much like a bytecode manipulation DSL. – ANDREY BRESLAV, KOTLIN
I built some meta api into the compiler. For example it let’s you do a swap, regardless of the involved types. It was not in the links above, but I assume you know, that double and long consume two slots, while anything else does only one. The swap instruction handles only the 1-slot version. So if you have to swap an int and a long, a long and an int or a long and a long, you get a different set of instructions. I also added a helper API for local variables, to avoid to have to manage the index. If you want more nice looking code… Cedric wrote a Groovy DSL to generate bytecode. It is still the bytecode more or less, but less method around to make it less clear. – JOCHEN THEODOROU, GROOVY
ASM is a nice low-level API, but I think we miss an up-to-date higher level API, for example for generating proxies and so on. In Groovy we want to limit the number of dependencies we add to the project, so it would be cool if ASM provided this out- of-the-box, but the general idea behind ASM is more to stick with a low level API. – CÉDRIC CHAMPEAU, GROOVY
The tools can be a great help for studying and working with bytecode. The best way to learn to use ASM is to write a Java source file that is equivalent to what you want to generate and then use the ASMifier mode of the Bytecode Outline plugin for Eclipse (or the ASMifier tool) to see the equivalent ASM code. If you want to implement a class transformer, write two Java source files (before and after transformation) and use the compare view of the plugin in ASMifier mode to compare the equivalent ASM code.
Bytecode outline plugin view in Eclipse
For IntelliJ IDEA users there’s the ASM bytecode outline plugin available in the plugins repository and it is quite easy to use too. Right click in the source and select Show Bytecode outline – this will open a view with the code generated by the ASMifier tool.
ASM outline plugin in IntelliJ IDEA
You can also apply the ASMifier directly, without the IDE plugin, as it is a part of ASM library:
$java -classpath "asm.jar;asm-util.jar" \
org.objectweb.asm.util.ASMifier \
HelloWorld.class
We use ASM bytecode outline for IntelliJ IDEA and our own similar plugin that displays bytecodes generated by our compiler. – ANDREY BRESLAV, KOTLIN
Actually, I wrote the “bytecode viewer” plugin for IntelliJ IDEA, and I’m using it quite often :) On the Groovy side, I also use the AST browser view, which provides a bytecode view too, although it seriously needs improvements. – CÉDRIC CHAMPEAU, GROOVY
My tools are mostly
org.objectweb.asm.util.Textifier
andorg.objectweb.asm.util.CheckClassAdapter
. Some time ago I also wrote a tool helping me to visualize the bytecode and the stack information. It allows me to go through the bytecode and see what happens on the stack. And while bytecode used to be a pita to read for me in the beginning, I have seen so much of it, that I don’t even use that tool anymore, because I am usually faster just looking at the text produced by Textifier.That is not supposed to tell you I am good at generating bytecode… no no.. I wouldn’t be able to read it so good if I had not the questionable pleasure of looking at it countless times, because there again was a pop of an empty stack or something like that. It is more that the problems I have to look for tend to repeat themselves and I have a whit of what to look for even before I fire up Textifier. – JOCHEN THEODOROU, GROOVY
We asked Andrey, Jochen and Cédric to share some fun facts from their experiences with Java bytecode
. While the words “bytecode” and “fun” might not stick very well together there are still cases to learn from and the guys warmly share the experiences:
Hmm… bytecode and fun? What a strange combination of words in the same sentence ;)
Well.. one time maybe a little… I told you about the API I use to do a swap. In the beginning it was not working properly of course. That was partially due to me misunderstanding one for those DUP instructions, but mainly it was because I had a simple bug in my code in which I execute the 1-2 swap instead of the 2-1 swap (meaning swapping 1 and 2 slot operands). So I was looking at the code, totally confused, thinking this should work, looking at my code… then thinking I made it wrong with those dups and replacing the code with my new understanding…
All the while the code was not really all that wrong, only the swap cases where swapped. Anyway… after about a full day of getting a headache from too much looking at the bytecode I finally found my mistake and looked at the code to find it looks almost the same as before… and then it dawned on me, that it was only that simple mistake, that could have been corrected in a minute and which took me a full day. Not really funny, but there I laughed a bit at myself actually. – JOCHEN THEODOROU, GROOVY
Actually, the funniest thing was when I wrote the “bytecode DSL” for Groovy, which allows you to write bytecode directly in the body of a method, using a DSL which is very close to what the ASM outline provides, and a nicer “groovy flavoured” DSL too. Although I started this project as a proof-of-concept and a personal experiment, I received a lot of feedback and interest about it.
Today I think it’s a very simple way to have people test bytecode directly, for example for students. It makes writing bytecode a lot easier than using ASM directly. However, I also received a lot of complains, people saying I opened the Pandora box and that it would produce unreadable code in production :D (and I would definitely not recommend using it in production). Yet, it’s been more than one year the project is out, and I haven’t heard of anyone using it, so probably bytecode is really not that fun! – CÉDRIC CHAMPEAU, GROOVY
Many fun things come in connection with Android: Dalvik is very picky about your bytecode conformance to the JVM spec. And HotSpot doesn’t care a bit about many of these things. We were running smoothly on HotSpot for a long time, without knowing that we had so many things done wrong. Now we use Dalvik’s verifier to check every class file we generate, to make sure nobody forgot to put
ACC_SUPER
on a class, proper offsets to a local variable table, and things like that.We also came across a few interesting things in HotSpot, for example, if you call an absent method on an array object
(like array.set())
, you don’t get aNoSuchMethodError
, or anything like that. What you get (what we got on a HotSpot we had a year ago, anyway) is… a native crash. Segmentation fault, if I am not mistaken. Our theory is that the vtable for arrays if so optimized that it is not even there, and lookup crashes because of that. – ANDREY BRESLAV, KOTLIN
The JVM is a wonderful piece of engineering, and like any beautiful machine it is important to be able to understand and appreciate the technology powering the underlying layers. Java bytecode is the machine code that enables the JVM to interpret and compile language code such as Java, Scala, Groovy, Kotlin and a dozen more in order to deliver applications to hungry consumers.
Java bytecode runs the JVM quietly in the background most of the time–so the average developer rarely needs to consider it. But it is the form of the instructions that the JVM executes, so it is essential to the areas of tooling and program analysis, where the applications can modify the bytecode to adjust the behavior according to the application’s domain. Any developer looking to create profilers, mocking frameworks, AOP and other tools should understand Java bytecode
thoroughly.
Thanks for tuning in to this RebelLabs report. For the latest blog posts and reports you follow @ZeroTurnaround on Twitter.