Overview

The purpose of the PCG Analysis is to provide clients with the following:

The PCG data structure representing the state of ownership and borrowing of Rust places at arbitrary program points within a Rust function
For any pair $p p_{i}, p p_{j}$ of consecutive program points, an ordered list of actions that describe the transformation of the PCG of $p p_{i}$ to the PCG of $p p_{j}$ .

PCG Analysis Algorithm

Key Concepts:

The PCG Analysis Algorithm operates on the MIR Body of a Rust function and returns a PCG Analysis of the function.
A PCG Analysis contains PCG Data for every reachable¹ MIR location in the Body².
The PCG Data is a tuple of PCG States and PCG Action Data.

The PCG States of a MIR location is a list of the PCGs computed at that location, concretely:

An Initial PCG, followed by
One PCG for every PCG evaluation phase

The PCG Action Data of a MIR location contains a list of PCG Actions for each evaluation phase in the location (i.e. the actions performed at that phase).

The PCG Analysis Algorithm is implemented as a MIR dataflow analysis using PcgDomainData as the domain. PcgDomainData contains a PCGData value and other relevant metadata (e.g the associated basic block). Notably, the analysis only analyzes each basic block one time. Conceptually, this property is ensured because the final state at a loop head is computed upon entering it in the analysis (without having first seen the body).

We note that the behaviour of the join operation on PCGDomainData requires careful tracking of what blocks have been previously joined (this is basically a consequence of the interface of the MIR dataflow analysis). We define $join (s^{'}, s)$ (joining the state computed at $s$ into $s^{'}$ ) as follows:

Let $b^{'}, b$ be the associated blocks of $s^{'}$ and $s$ respectively
If no block has ever been joined into $b^{'}$ , then \mathit{join}(s', s)

In the implementation the join operation associated with the PcgDomainData, we have $join (s_{h}, s_{b}) = s_{h}$ if $s_{h}$ is the state of a loop head and $s_{b}$ is the state of a back edge; this ensures that loop heads are only considered once³.

Our implementation should also be checking that the PCG generated at the loop head is valid w.r.t the state at the back edge here, but this is not happening yet.

We note that to correspond

PCG Data Structure

The PCG data structure represents the state of ownership and borrowing of Rust places at an arbitrary program point.

Perhaps this describes too much implementation-specific details?

In our implementation, it consists of three components:

The Owned PCG, which describes the state of owned places
The Borrow State, which describes borrowed memory (borrowed places and lifetime projections) and borrow relations, and also some auxillary data structures
The Place Capabilities which is a map from places to capabilities

Owned PCG

The Owned PCG is a forest, where the root of each tree in the forest is a MIR Local.

Each tree in the forest is defined by a set of place expansions, which describe how unpacked all of the owned places are. For each expansion $\overline{p}$ in the set, the base $p$ is a node in the forest and $\overline{p}$ are its children. We note that each expansion can be similarly interpreted as a hyperedge ${p} \to \overline{p}$

Each local in the body of a MIR function is either unallocated or allocated in the Owned PCG. A local is allocated iff there exists a corresponding root node in the Owned PCG, otherwise it is unallocated.

The operation allocate $v$ in the Owned PCG requires that $v$ is not allocated, and adds a new root for $v$ . The deallocate $v$ operation removes the forest rooted at $v$ .

Borrow State

The Borrow State is a 3-tuple containing a Latest Map describing the latest locations of each place; a set of Validity Conditions that describes the set of paths leading to the current block; and a Borrows Graph, a directed acyclic hypergraph which describes borrowed places, sets of borrows, and their relation to one another. The Borrows Graph is represented as a set of PCG hyperedges.

Because a borrow created within a block exists only for executions that visit that block, we label new borrows using the validity conditions of the block in which they were created.

A MIR location $l$ reachable iff its basic block $b_{l}$ is reachable from the start block in the CFG without traversing unwind or fake edges (the latter kind do not correspond to the actual control flow of the function). The original reason for only considering reachable edges was to improve performance; removing this constraint (and instead considering all locations) would be simple to change in the implementation. ↩
The PCG analysis algorithm is implemented as a MIR dataflow analysis defined by the rust compiler crate. In the implementation, the PCG Data is computed for every reachable MIR location during the algorithm execution itself, but only the PCG Data for the entry state of each basic block is stored. The PCG Data for an arbitrary location within a block is re-computed by applying the dataflow analysis transfer function from the entry state. ↩
We should confirm that this also holds in the implementation as this previously was not the case. However, visiting the same stmt multiple times should only affect performance, not correctness. ↩

PCG Documentation

Overview

PCG Analysis Algorithm

The PCG Dataflow Analysis

PCG Data Structure

Owned PCG

Borrow State

Keyboard shortcuts

PCG Documentation

Overview

PCG Analysis Algorithm

The PCG Dataflow Analysis

PCG Data Structure

Owned PCG

Borrow State