diff --git a/nullability/README.md b/nullability/README.md index b0a122110..643beaef1 100644 --- a/nullability/README.md +++ b/nullability/README.md @@ -1,18 +1,90 @@ -# C++ nullability analysis +# C++ Nullability Analysis -Annotating C++ API boundaries with nullability information can improve their -Rust bindings (e.g. binding non-null pointers as `T&` rather than `Option`). +The `nullability` directory contains a comprehensive static analysis system for +bringing robust null-safety to C++. Its primary purpose is to eliminate the +ambiguity inherent in C++ pointers by providing tools to **infer**, **verify**, +and **enforce** nullability contracts (such as `_Nullable` and `_Nonnull`). -This directory has tools for C++ codebases that use such annotations: +This project is a component of **Crubit**, where it enables the generation of +safer and more ergonomic Rust bindings. By explicitly documenting nullability, +Crubit can bind non-null C++ pointers directly to Rust references (`&T`) or +smart pointers, rather than wrapping them in `Option`. -- **Nullability inference** suggests annotations to add to APIs, by analyzing - the code that implements and uses them. +The directory provides two main toolsets: -- **Nullability verification** verifies that annotated APIs are used and - implemented safely, e.g. checking nullable pointers before dereferencing them. - This is a local analysis suitable for use in a clang-tidy check. +- **Nullability inference** (`inference/`) suggests annotations to add to + existing APIs by analyzing how they are implemented and used across the + codebase. -They use Clang, its [dataflow framework][], and its [nullability annotations][]. +- **Nullability verification** (this directory) ensures that annotated APIs + are used and implemented safely (e.g., checking nullable pointers before + dereferencing). This is a local, flow-sensitive analysis suitable for use in + `clang-tidy`. + +These tools are built on Clang, its [dataflow framework][], and its +[nullability annotations][]. + +## File Overview + +### Core Analysis + +- **pointer_nullability_analysis.h / .cc**: Implements the dataflow analysis + for tracking pointer nullability. +- **pointer_nullability_diagnosis.h / .cc**: Diagnoses nullability safety + violations (e.g., dereferencing nullable pointers) based on the analysis + results. +- **pointer_nullability_lattice.h / .cc**: Defines the lattice (program state) + used in the dataflow analysis. +- **type_transferer.h / .cc**: Handles the propagation of **static, + type-based** nullability information. It computes the nullability of each + C++ type in the AST (e.g., the nested pointer types in `vector`) in a + non-flow-sensitive manner, providing a baseline for the analysis. +- **value_transferer.h / .cc**: Handles the propagation of **flow-sensitive, + value-based** nullability properties. It models how the nullability state of + specific pointer values changes at different program points due to control + flow, such as becoming "known non-null" after a successful null check or + dereference. + +### Data Model + +- **type_nullability.h / .cc**: Defines the `TypeNullability` model, + representing nullability for all pointer "slots" within a complex C++ type. +- **pointer_nullability.h / .cc**: Extends the dataflow framework's `Value` + model to track properties like `is_null` and `from_nullable` for pointer + values. +- **pragma.h / .cc**: Handles `#pragma nullability` directives for setting + per-file nullability defaults. + +### Utilities and Helpers + +- **annotations.h**: Defines string constants containing the literal text of + supported nullability attributes (e.g., `_Nullable`, `_Nonnull`) and Abseil + macros (e.g., `absl_nullable`). +- **ast_helpers.h**: Provides helper classes for simplifying access to the + Clang AST (e.g., matching parameters and arguments). +- **forwarding_functions.h / .cc**: Detects and analyzes forwarding functions + like `std::make_unique` to improve analysis precision. +- **loc_filter.h / .cc**: Interface for filtering source locations (e.g., + restricting analysis to specific files). +- **macro_arg_capture.h**: Constants for capturing arguments passed to + internal macros during inference. +- **pointer_nullability_matchers.h / .cc**: AST matchers for identifying + nullability-relevant constructs (pointers, dereferences, smart pointers, + etc.). +- **proto_matchers.h / .cc**: GoogleMock matchers for comparing protocol + buffer messages in tests. +- **type_and_maybe_loc_visitor.h**: A specialized visitor for simultaneously + traversing a `Type` and its corresponding `TypeLoc`. + +### Subdirectories + +- **formal_methods/**: Contains formal specifications or models related to + nullability. +- **google/**: Google-specific regression and crash tests using real-world + code snippets. +- **inference/**: Implementation of the whole-codebase nullability inference + system. +- **test/**: Additional shared testing infrastructure and data. ## Style diff --git a/nullability/inference/README.md b/nullability/inference/README.md new file mode 100644 index 000000000..972dc50a0 --- /dev/null +++ b/nullability/inference/README.md @@ -0,0 +1,80 @@ +# Nullability Inference + +## Purpose + +This directory contains the implementation of the Crubit nullability inference +system. The system's goal is to automatically deduce the nullability of C++ +pointer-typed symbols (functions, parameters, fields, and global variables) by +analyzing their usage patterns across a codebase. + +The system follows a distributed "map-reduce" architecture: + +1. **Collection (Map Phase):** Local static analysis (using the Clang dataflow +framework) examines individual translation units to gather "evidence" of +nullability—such as unchecked dereferences (suggesting `Nonnull`) or assignments +from `nullptr` (suggesting `Nullable`). + +2. **Merging (Reduce Phase):** Evidence from across the entire codebase is +aggregated by symbol and slot to form a final "conclusion" (e.g., this parameter +is likely `Nonnull`). + +3. **Application:** These conclusions can then be propagated back into the +source code as nullability annotations. + +## File Descriptions + +### Core Logic + +* **collect_evidence.h / .cc**: Implements the "map" phase. It analyzes ASTs + and CFGs to gather local observations (Evidence) about how symbols are used. +* **merge.h / .cc**: Implements the "reduce" phase. It consolidates Evidence + from multiple sources into final nullability conclusions. +* **inferable.h / .cc**: Defines predicates that determine which C++ symbols + and types are eligible targets for inference. +* **eligible_ranges.h / .cc**: Identifies source code ranges (e.g., the exact + location of a `*` in a declaration) where nullability annotations can be + inserted. +* **infer_tu.h / .cc**: Provides a high-level entry point for running the + entire inference pipeline on a single translation unit, primarily used for + testing and debugging. +* **infer_tu_main.cc**: A standalone tool that runs single-translation-unit + inference on a specified source file. + +### Data Models and Utilities + +* **inference.proto**: Protocol buffer definitions for core data structures: + `Symbol`, `Evidence`, `SlotInference`, and `CFGSummary`. +* **slot_fingerprint.h / .cc**: Computes stable 64-bit hashes (fingerprints) + for individual nullability "slots" (e.g., a specific parameter's pointer + type) to identify them across translation units. +* **usr_cache.h / .cc**: Provides performance-optimizing caching for Clang + Unified Symbol Resolution (USR) strings. +* **replace_macros.h / .cc**: Implements a preprocessor-based mechanism to intercept and wrap common assertion macros (like `CHECK`, `DCHECK`, and `CHECK_NE`) with internal "argument-capture" functions. This allows the inference engine to reliably detect these patterns in the AST and collect evidence (e.g., that a checked pointer is `Nonnull`). +* **clang_tidy_nullability_replacement_macros.h**: Contains the alternative macro definitions and capture function templates used by `replace_macros` to expose macro-hidden nullability signals to the analysis. + +### Build and Infrastructure + +* **BUILD**: The Bazel/Bazel build configuration for the inference library and + its associated tools and tests. + +### Testing Utilities + +* **augmented_test_inputs.h / .cc**: Helpers for creating synthetic C++ code + snippets and AST structures for testing the inference engine. +* **collect_evidence_test_utilities.h / .cc**: Shared infrastructure + specifically for unit testing the evidence collection logic. +* **eligible_ranges_for_test.h**: Simple data structures for identifying + pointer ranges in test code. + +### Tests + +* **collect_evidence_test.cc**: Comprehensive tests for the evidence + collection "map" phase. +* **eligible_ranges_test.cc**: Tests for identifying annotatable source + ranges. +* **infer_tu_test.cc**: Tests for the single-TU inference orchestration. +* **inferable_test.cc**: Tests for the logic determining what is inferable. +* **merge_test.cc**: Tests for the evidence aggregation and conclusion logic. +* **replace_macros_test.cc**: Tests for macro handling during analysis. +* **slot_fingerprint_test.cc**: Tests for the stable fingerprinting of + nullability slots.