Data-flow analysis
Encyclopedia : D : DA : DAT : Data-flow analysis
Data-flow analysis is a technique of automatically gathering information about the values calculated in a procedural computer program by applying rules to its control flow graph. The information gathered is often useful in optimizing the program, so most compilers for procedural languages include one or more data-flow analysis stages.
A simple way to perform dataflow analysis of programs is to set up dataflow equations for each node of the control flow graph and solve them by repeatedly calculating the output from the input locally at each node until the whole system stabilizes, i.e., it reaches a fixpoint. To guarantee termination it is required that the data flow equations form a fixed-point data-flow analysis, i.e., the local updates by the equations are monotone.
A canonical example of a data-flow analysis is reaching definitions.
An iterative algorithm
The fixpoint of the dataflow equations can be calculated using the round-robin iterative algorithm:
[\texttt\ i\ \leftarrow\ 1\ \texttt\ N] [\mathit\ i]
[\texttt\ (\mathit)] [\texttt\ i\ \leftarrow\ 1\ \texttt\ N] [\mathit\ i]
The order matters
The efficiency of iteratively solving data-flow equations is influenced by the order at which local nodes are visited. For the above round-robin iterative algorithm it depends in which order nodes are selected from the set of nodes. Furthermore, it depends, whether the data-flow equations are used for forward or backward data-flow analysis over the CFG. In the following, a few iteration orders for solving data-flow equations are discussed (a related concept to iteration order of a CFG is tree traversal of a tree).
- random order This iteration order is not aware whether the DFA solves a forward or backward data-flow problem. Therefore, the performance is relatively poor compared to specialized iteration orders.
- postorder This is a typical iteration order for backward data-flow problems. In postorder iteration a node is visited after all its successor nodes have been visited. Typically, the postorder iteration is implemented with the depth-first strategy.
- reverse-postorder This is a typical iteration order for forward data-flow problems. In reverse-postorder iteration a node is visited before all its successor nodes have been visited, except when the successor is reached by a back edge. A more natural name for reverse-postorder iteration would be "preorder iteration", but this would be confusing with the mathematical definition of preorder.
Examples
The following are examples of properties of computer programs that can be calculated by data-flow analysis. Note that the properties calculated by data-flow analysis are typically only approximations of the real properties. This is because data-flow analysis operates on the syntactical structure of the CFG without simulating the exact control flow of the program. However, to be still useful in practice, a data-flow analysis algorithm is typically designed to calculate an upper respectively lower approximation of the real program properties.
Forward Analysis
- reaching definitions
1: if b==4 then 2: a = 5; 3: else 4: a = 3; 5: endif 6: 7: if a < 4 then 8: ... |
The reaching definition of variable "a" at line 7 is the set . |
Backward Analysis
- live variables
1: a = 3; 2: c = 5; 3: a = b + c; |
The set of live variables at line 2 is , but the set of live variables at line 1 is only since variable "c" is updated in line 2. |
Sensitivities
Data-flow analysis is inherently flow-sensitive. Data-flow analysis is typically path-insensitive, though it is possible to define data-flow equations that yield a path-sensitive analysis.
These terms aren't specific to data-flow analysis. The following defintions should be moved somewhere else and fleshed out with more verbose examples. Wikipedia's inter-document structure for static analysis could use some rework.
- A flow-sensitive analysis takes into account the order of statements in a program. For example, a flow-insensitive pointer alias analysis may determine "variables x and y may refer to the same location", while a flow-sensitive analysis may determine "after statement 20, variables x and y may refer to the same location".
- A path-sensitive analysis only considers valid paths through the program. For example, if two operations at different parts of a function are guarded by equivalent predicates, the analysis must only consider paths where both operations execute or neither operation executes. Path-sensitive analyses are necessarily flow-sensitive.
- A context-sensitive analysis is an interprocedural analysis that takes the calling context into account when analyzing the target of a function call. For example, consider a function that accepts a file handle and a boolean parameter that determines whether the file handle should be closed before the function returns. A context-sensitive analysis of any callers of the function should take into account the value of the boolean parameter to determine whether the file handle will be closed when the function returns.
Further reading
Aho, Alfred V. Sethi, Ravi. Ullman, Jeffrey D. Compilers: Principles, Techniques and Tools. Addison Wesley. 1986.
Appel, Andrew W. Modern Compiler Implementation in ML. Cambridge University Press. 1999.
Cooper, Keith D. and Torczon, Linda. Engineering a Compiler. Morgan Kaufmann. 2005.
Muchnick, Steven S. Advanced Compiler Design and Implementation. Morgan Kaufmann. 1997.
Hecht, Matthew S. Flow Analysis of Computer Programs. Elsevier North-Holland Inc. 1977.
From Wikipedia, the Free Encyclopedia. Original article here. Support Wikipedia by contributing or donating.
All text is available under the terms of the GNU Free Documentation License See Wikipedia Copyrights for details.
