From c4c0cd044c20c2297ef5226d55b43c104c04b827 Mon Sep 17 00:00:00 2001 From: Googler Date: Fri, 20 Mar 2026 11:06:11 -0700 Subject: [PATCH] [crubit] Add README.md files and expand existing one for nullability project. This CL adds new README.md files to the `inference` and `google` directories to provide a clear overview of their purposes and file contents. It also expands the top-level `README.md` in `third_party/crubit/nullability/` with a detailed enumeration and description of each file in the directory, categorized by their function (Core Analysis, Data Model, and Utilities). This improves the project's documentation and helps new contributors understand the purpose of various components in the nullability analysis system. PiperOrigin-RevId: 886881529 --- nullability/README.md | 92 +++++++++++++++++++++++++++++---- nullability/inference/README.md | 80 ++++++++++++++++++++++++++++ 2 files changed, 162 insertions(+), 10 deletions(-) create mode 100644 nullability/inference/README.md diff --git a/nullability/README.md b/nullability/README.md index b0a122110..643beaef1 100644 --- a/nullability/README.md +++ b/nullability/README.md @@ -1,18 +1,90 @@ -# C++ nullability analysis +# C++ Nullability Analysis -Annotating C++ API boundaries with nullability information can improve their -Rust bindings (e.g. binding non-null pointers as `T&` rather than `Option`). +The `nullability` directory contains a comprehensive static analysis system for +bringing robust null-safety to C++. Its primary purpose is to eliminate the +ambiguity inherent in C++ pointers by providing tools to **infer**, **verify**, +and **enforce** nullability contracts (such as `_Nullable` and `_Nonnull`). -This directory has tools for C++ codebases that use such annotations: +This project is a component of **Crubit**, where it enables the generation of +safer and more ergonomic Rust bindings. By explicitly documenting nullability, +Crubit can bind non-null C++ pointers directly to Rust references (`&T`) or +smart pointers, rather than wrapping them in `Option`. -- **Nullability inference** suggests annotations to add to APIs, by analyzing - the code that implements and uses them. +The directory provides two main toolsets: -- **Nullability verification** verifies that annotated APIs are used and - implemented safely, e.g. checking nullable pointers before dereferencing them. - This is a local analysis suitable for use in a clang-tidy check. +- **Nullability inference** (`inference/`) suggests annotations to add to + existing APIs by analyzing how they are implemented and used across the + codebase. -They use Clang, its [dataflow framework][], and its [nullability annotations][]. +- **Nullability verification** (this directory) ensures that annotated APIs + are used and implemented safely (e.g., checking nullable pointers before + dereferencing). This is a local, flow-sensitive analysis suitable for use in + `clang-tidy`. + +These tools are built on Clang, its [dataflow framework][], and its +[nullability annotations][]. + +## File Overview + +### Core Analysis + +- **pointer_nullability_analysis.h / .cc**: Implements the dataflow analysis + for tracking pointer nullability. +- **pointer_nullability_diagnosis.h / .cc**: Diagnoses nullability safety + violations (e.g., dereferencing nullable pointers) based on the analysis + results. +- **pointer_nullability_lattice.h / .cc**: Defines the lattice (program state) + used in the dataflow analysis. +- **type_transferer.h / .cc**: Handles the propagation of **static, + type-based** nullability information. It computes the nullability of each + C++ type in the AST (e.g., the nested pointer types in `vector`) in a + non-flow-sensitive manner, providing a baseline for the analysis. +- **value_transferer.h / .cc**: Handles the propagation of **flow-sensitive, + value-based** nullability properties. It models how the nullability state of + specific pointer values changes at different program points due to control + flow, such as becoming "known non-null" after a successful null check or + dereference. + +### Data Model + +- **type_nullability.h / .cc**: Defines the `TypeNullability` model, + representing nullability for all pointer "slots" within a complex C++ type. +- **pointer_nullability.h / .cc**: Extends the dataflow framework's `Value` + model to track properties like `is_null` and `from_nullable` for pointer + values. +- **pragma.h / .cc**: Handles `#pragma nullability` directives for setting + per-file nullability defaults. + +### Utilities and Helpers + +- **annotations.h**: Defines string constants containing the literal text of + supported nullability attributes (e.g., `_Nullable`, `_Nonnull`) and Abseil + macros (e.g., `absl_nullable`). +- **ast_helpers.h**: Provides helper classes for simplifying access to the + Clang AST (e.g., matching parameters and arguments). +- **forwarding_functions.h / .cc**: Detects and analyzes forwarding functions + like `std::make_unique` to improve analysis precision. +- **loc_filter.h / .cc**: Interface for filtering source locations (e.g., + restricting analysis to specific files). +- **macro_arg_capture.h**: Constants for capturing arguments passed to + internal macros during inference. +- **pointer_nullability_matchers.h / .cc**: AST matchers for identifying + nullability-relevant constructs (pointers, dereferences, smart pointers, + etc.). +- **proto_matchers.h / .cc**: GoogleMock matchers for comparing protocol + buffer messages in tests. +- **type_and_maybe_loc_visitor.h**: A specialized visitor for simultaneously + traversing a `Type` and its corresponding `TypeLoc`. + +### Subdirectories + +- **formal_methods/**: Contains formal specifications or models related to + nullability. +- **google/**: Google-specific regression and crash tests using real-world + code snippets. +- **inference/**: Implementation of the whole-codebase nullability inference + system. +- **test/**: Additional shared testing infrastructure and data. ## Style diff --git a/nullability/inference/README.md b/nullability/inference/README.md new file mode 100644 index 000000000..972dc50a0 --- /dev/null +++ b/nullability/inference/README.md @@ -0,0 +1,80 @@ +# Nullability Inference + +## Purpose + +This directory contains the implementation of the Crubit nullability inference +system. The system's goal is to automatically deduce the nullability of C++ +pointer-typed symbols (functions, parameters, fields, and global variables) by +analyzing their usage patterns across a codebase. + +The system follows a distributed "map-reduce" architecture: + +1. **Collection (Map Phase):** Local static analysis (using the Clang dataflow +framework) examines individual translation units to gather "evidence" of +nullability—such as unchecked dereferences (suggesting `Nonnull`) or assignments +from `nullptr` (suggesting `Nullable`). + +2. **Merging (Reduce Phase):** Evidence from across the entire codebase is +aggregated by symbol and slot to form a final "conclusion" (e.g., this parameter +is likely `Nonnull`). + +3. **Application:** These conclusions can then be propagated back into the +source code as nullability annotations. + +## File Descriptions + +### Core Logic + +* **collect_evidence.h / .cc**: Implements the "map" phase. It analyzes ASTs + and CFGs to gather local observations (Evidence) about how symbols are used. +* **merge.h / .cc**: Implements the "reduce" phase. It consolidates Evidence + from multiple sources into final nullability conclusions. +* **inferable.h / .cc**: Defines predicates that determine which C++ symbols + and types are eligible targets for inference. +* **eligible_ranges.h / .cc**: Identifies source code ranges (e.g., the exact + location of a `*` in a declaration) where nullability annotations can be + inserted. +* **infer_tu.h / .cc**: Provides a high-level entry point for running the + entire inference pipeline on a single translation unit, primarily used for + testing and debugging. +* **infer_tu_main.cc**: A standalone tool that runs single-translation-unit + inference on a specified source file. + +### Data Models and Utilities + +* **inference.proto**: Protocol buffer definitions for core data structures: + `Symbol`, `Evidence`, `SlotInference`, and `CFGSummary`. +* **slot_fingerprint.h / .cc**: Computes stable 64-bit hashes (fingerprints) + for individual nullability "slots" (e.g., a specific parameter's pointer + type) to identify them across translation units. +* **usr_cache.h / .cc**: Provides performance-optimizing caching for Clang + Unified Symbol Resolution (USR) strings. +* **replace_macros.h / .cc**: Implements a preprocessor-based mechanism to intercept and wrap common assertion macros (like `CHECK`, `DCHECK`, and `CHECK_NE`) with internal "argument-capture" functions. This allows the inference engine to reliably detect these patterns in the AST and collect evidence (e.g., that a checked pointer is `Nonnull`). +* **clang_tidy_nullability_replacement_macros.h**: Contains the alternative macro definitions and capture function templates used by `replace_macros` to expose macro-hidden nullability signals to the analysis. + +### Build and Infrastructure + +* **BUILD**: The Bazel/Bazel build configuration for the inference library and + its associated tools and tests. + +### Testing Utilities + +* **augmented_test_inputs.h / .cc**: Helpers for creating synthetic C++ code + snippets and AST structures for testing the inference engine. +* **collect_evidence_test_utilities.h / .cc**: Shared infrastructure + specifically for unit testing the evidence collection logic. +* **eligible_ranges_for_test.h**: Simple data structures for identifying + pointer ranges in test code. + +### Tests + +* **collect_evidence_test.cc**: Comprehensive tests for the evidence + collection "map" phase. +* **eligible_ranges_test.cc**: Tests for identifying annotatable source + ranges. +* **infer_tu_test.cc**: Tests for the single-TU inference orchestration. +* **inferable_test.cc**: Tests for the logic determining what is inferable. +* **merge_test.cc**: Tests for the evidence aggregation and conclusion logic. +* **replace_macros_test.cc**: Tests for macro handling during analysis. +* **slot_fingerprint_test.cc**: Tests for the stable fingerprinting of + nullability slots.