Overview
The purpose of the PCG Analysis is to provide clients with the following:
- The PCG data structure representing the state of ownership and borrowing of Rust places at arbitrary program points within a Rust function
- For any pair of consecutive program points, an ordered list of actions that describe the transformation of the PCG of to the PCG of .
PCG Analysis Algorithm
Key Concepts:
- The PCG Analysis Algorithm operates on the MIR Body of a Rust function and returns a PCG Analysis of the function.
- A PCG Analysis contains PCG Data for every reachable1 MIR location in the Body2.
- The PCG Data is a tuple of PCG States and PCG Action Data.
The PCG States of a MIR location is a list of the PCGs computed at that location, concretely:
- An Initial PCG, followed by
- One PCG for every PCG evaluation phase
The PCG Action Data of a MIR location contains a list of PCG Actions for each evaluation phase in the location (i.e. the actions performed at that phase).
The PCG Dataflow Analysis
The PCG Analysis Algorithm is implemented as a MIR dataflow
analysis using PcgDomainData
as
the domain. PcgDomainData
contains a PCGData
value and other relevant
metadata (e.g the associated basic block). Notably, the analysis only analyzes
each basic block one time. Conceptually, this property is ensured because the
final state at a loop head is computed upon entering it in the analysis (without
having first seen the body).
We note that the behaviour of the join operation on PCGDomainData
requires
careful tracking of what blocks have been previously joined (this is basically a
consequence of the interface of the MIR dataflow analysis). We define
(joining the state computed at into ) as follows:
- Let be the associated blocks of and respectively
- If no block has ever been joined into , then \mathit{join}(s', s)
In the implementation the join operation
associated with the PcgDomainData
, we have if
is the state of a loop head and is the state of a back edge; this ensures
that loop heads are only considered once3.
We note that to correspond
PCG Data Structure
The PCG data structure represents the state of ownership and borrowing of Rust places at an arbitrary program point.
In our implementation, it consists of three components:
- The Owned PCG, which describes the state of owned places
- The Borrow State, which describes borrowed memory (borrowed places and lifetime projections) and borrow relations, and also some auxillary data structures
- The Place Capabilities which is a map from places to capabilities
Owned PCG
The Owned PCG is a forest, where the root of each tree in the forest is a MIR
Local
.
Each tree in the forest is defined by a set of place expansions, which describe how unpacked all of the owned places are. For each expansion in the set, the base is a node in the forest and are its children. We note that each expansion can be similarly interpreted as a hyperedge
Each local in the body of a MIR function is either unallocated or allocated in the Owned PCG. A local is allocated iff there exists a corresponding root node in the Owned PCG, otherwise it is unallocated.
The operation allocate in the Owned PCG requires that is not allocated, and adds a new root for . The deallocate operation removes the forest rooted at .
Borrow State
The Borrow State is a 3-tuple containing a Latest Map describing the latest locations of each place; a set of Validity Conditions that describes the set of paths leading to the current block; and a Borrows Graph, a directed acyclic hypergraph which describes borrowed places, sets of borrows, and their relation to one another. The Borrows Graph is represented as a set of PCG hyperedges.
Because a borrow created within a block exists only for executions that visit that block, we label new borrows using the validity conditions of the block in which they were created.
-
A MIR location reachable iff its basic block is reachable from the start block in the CFG without traversing unwind or fake edges (the latter kind do not correspond to the actual control flow of the function). The original reason for only considering reachable edges was to improve performance; removing this constraint (and instead considering all locations) would be simple to change in the implementation. ↩
-
The PCG analysis algorithm is implemented as a MIR dataflow analysis defined by the rust compiler crate. In the implementation, the PCG Data is computed for every reachable MIR location during the algorithm execution itself, but only the PCG Data for the entry state of each basic block is stored. The PCG Data for an arbitrary location within a block is re-computed by applying the dataflow analysis transfer function from the entry state. ↩
-
We should confirm that this also holds in the implementation as this previously was not the case. However, visiting the same stmt multiple times should only affect performance, not correctness. ↩