Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 82 additions & 10 deletions nullability/README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,90 @@
# C++ nullability analysis
# C++ Nullability Analysis

Annotating C++ API boundaries with nullability information can improve their
Rust bindings (e.g. binding non-null pointers as `T&` rather than `Option<T&>`).
The `nullability` directory contains a comprehensive static analysis system for
bringing robust null-safety to C++. Its primary purpose is to eliminate the
ambiguity inherent in C++ pointers by providing tools to **infer**, **verify**,
and **enforce** nullability contracts (such as `_Nullable` and `_Nonnull`).

This directory has tools for C++ codebases that use such annotations:
This project is a component of **Crubit**, where it enables the generation of
safer and more ergonomic Rust bindings. By explicitly documenting nullability,
Crubit can bind non-null C++ pointers directly to Rust references (`&T`) or
smart pointers, rather than wrapping them in `Option<T>`.

- **Nullability inference** suggests annotations to add to APIs, by analyzing
the code that implements and uses them.
The directory provides two main toolsets:

- **Nullability verification** verifies that annotated APIs are used and
implemented safely, e.g. checking nullable pointers before dereferencing them.
This is a local analysis suitable for use in a clang-tidy check.
- **Nullability inference** (`inference/`) suggests annotations to add to
existing APIs by analyzing how they are implemented and used across the
codebase.

They use Clang, its [dataflow framework][], and its [nullability annotations][].
- **Nullability verification** (this directory) ensures that annotated APIs
are used and implemented safely (e.g., checking nullable pointers before
dereferencing). This is a local, flow-sensitive analysis suitable for use in
`clang-tidy`.

These tools are built on Clang, its [dataflow framework][], and its
[nullability annotations][].

## File Overview

### Core Analysis

- **pointer_nullability_analysis.h / .cc**: Implements the dataflow analysis
for tracking pointer nullability.
- **pointer_nullability_diagnosis.h / .cc**: Diagnoses nullability safety
violations (e.g., dereferencing nullable pointers) based on the analysis
results.
- **pointer_nullability_lattice.h / .cc**: Defines the lattice (program state)
used in the dataflow analysis.
- **type_transferer.h / .cc**: Handles the propagation of **static,
type-based** nullability information. It computes the nullability of each
C++ type in the AST (e.g., the nested pointer types in `vector<int*>`) in a
non-flow-sensitive manner, providing a baseline for the analysis.
- **value_transferer.h / .cc**: Handles the propagation of **flow-sensitive,
value-based** nullability properties. It models how the nullability state of
specific pointer values changes at different program points due to control
flow, such as becoming "known non-null" after a successful null check or
dereference.

### Data Model

- **type_nullability.h / .cc**: Defines the `TypeNullability` model,
representing nullability for all pointer "slots" within a complex C++ type.
- **pointer_nullability.h / .cc**: Extends the dataflow framework's `Value`
model to track properties like `is_null` and `from_nullable` for pointer
values.
- **pragma.h / .cc**: Handles `#pragma nullability` directives for setting
per-file nullability defaults.

### Utilities and Helpers

- **annotations.h**: Defines string constants containing the literal text of
supported nullability attributes (e.g., `_Nullable`, `_Nonnull`) and Abseil
macros (e.g., `absl_nullable`).
- **ast_helpers.h**: Provides helper classes for simplifying access to the
Clang AST (e.g., matching parameters and arguments).
- **forwarding_functions.h / .cc**: Detects and analyzes forwarding functions
like `std::make_unique` to improve analysis precision.
- **loc_filter.h / .cc**: Interface for filtering source locations (e.g.,
restricting analysis to specific files).
- **macro_arg_capture.h**: Constants for capturing arguments passed to
internal macros during inference.
- **pointer_nullability_matchers.h / .cc**: AST matchers for identifying
nullability-relevant constructs (pointers, dereferences, smart pointers,
etc.).
- **proto_matchers.h / .cc**: GoogleMock matchers for comparing protocol
buffer messages in tests.
- **type_and_maybe_loc_visitor.h**: A specialized visitor for simultaneously
traversing a `Type` and its corresponding `TypeLoc`.

### Subdirectories

- **formal_methods/**: Contains formal specifications or models related to
nullability.
- **google/**: Google-specific regression and crash tests using real-world
code snippets.
- **inference/**: Implementation of the whole-codebase nullability inference
system.
- **test/**: Additional shared testing infrastructure and data.

## Style

Expand Down
80 changes: 80 additions & 0 deletions nullability/inference/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Nullability Inference

## Purpose

This directory contains the implementation of the Crubit nullability inference
system. The system's goal is to automatically deduce the nullability of C++
pointer-typed symbols (functions, parameters, fields, and global variables) by
analyzing their usage patterns across a codebase.

The system follows a distributed "map-reduce" architecture:

1. **Collection (Map Phase):** Local static analysis (using the Clang dataflow
framework) examines individual translation units to gather "evidence" of
nullability—such as unchecked dereferences (suggesting `Nonnull`) or assignments
from `nullptr` (suggesting `Nullable`).

2. **Merging (Reduce Phase):** Evidence from across the entire codebase is
aggregated by symbol and slot to form a final "conclusion" (e.g., this parameter
is likely `Nonnull`).

3. **Application:** These conclusions can then be propagated back into the
source code as nullability annotations.

## File Descriptions

### Core Logic

* **collect_evidence.h / .cc**: Implements the "map" phase. It analyzes ASTs
and CFGs to gather local observations (Evidence) about how symbols are used.
* **merge.h / .cc**: Implements the "reduce" phase. It consolidates Evidence
from multiple sources into final nullability conclusions.
* **inferable.h / .cc**: Defines predicates that determine which C++ symbols
and types are eligible targets for inference.
* **eligible_ranges.h / .cc**: Identifies source code ranges (e.g., the exact
location of a `*` in a declaration) where nullability annotations can be
inserted.
* **infer_tu.h / .cc**: Provides a high-level entry point for running the
entire inference pipeline on a single translation unit, primarily used for
testing and debugging.
* **infer_tu_main.cc**: A standalone tool that runs single-translation-unit
inference on a specified source file.

### Data Models and Utilities

* **inference.proto**: Protocol buffer definitions for core data structures:
`Symbol`, `Evidence`, `SlotInference`, and `CFGSummary`.
* **slot_fingerprint.h / .cc**: Computes stable 64-bit hashes (fingerprints)
for individual nullability "slots" (e.g., a specific parameter's pointer
type) to identify them across translation units.
* **usr_cache.h / .cc**: Provides performance-optimizing caching for Clang
Unified Symbol Resolution (USR) strings.
* **replace_macros.h / .cc**: Implements a preprocessor-based mechanism to intercept and wrap common assertion macros (like `CHECK`, `DCHECK`, and `CHECK_NE`) with internal "argument-capture" functions. This allows the inference engine to reliably detect these patterns in the AST and collect evidence (e.g., that a checked pointer is `Nonnull`).
* **clang_tidy_nullability_replacement_macros.h**: Contains the alternative macro definitions and capture function templates used by `replace_macros` to expose macro-hidden nullability signals to the analysis.

### Build and Infrastructure

* **BUILD**: The Bazel/Bazel build configuration for the inference library and
its associated tools and tests.

### Testing Utilities

* **augmented_test_inputs.h / .cc**: Helpers for creating synthetic C++ code
snippets and AST structures for testing the inference engine.
* **collect_evidence_test_utilities.h / .cc**: Shared infrastructure
specifically for unit testing the evidence collection logic.
* **eligible_ranges_for_test.h**: Simple data structures for identifying
pointer ranges in test code.

### Tests

* **collect_evidence_test.cc**: Comprehensive tests for the evidence
collection "map" phase.
* **eligible_ranges_test.cc**: Tests for identifying annotatable source
ranges.
* **infer_tu_test.cc**: Tests for the single-TU inference orchestration.
* **inferable_test.cc**: Tests for the logic determining what is inferable.
* **merge_test.cc**: Tests for the evidence aggregation and conclusion logic.
* **replace_macros_test.cc**: Tests for macro handling during analysis.
* **slot_fingerprint_test.cc**: Tests for the stable fingerprinting of
nullability slots.