From 3312667ec913a598339a2774d66bca667879e699 Mon Sep 17 00:00:00 2001 From: Aditya Zutshi Date: Thu, 24 Apr 2025 16:08:34 -0700 Subject: [PATCH 1/2] edited for clarity and flow. --- docs/getting-started/tutorials/pointers.md | 167 +++++++++++++-------- 1 file changed, 107 insertions(+), 60 deletions(-) diff --git a/docs/getting-started/tutorials/pointers.md b/docs/getting-started/tutorials/pointers.md index 16584a87..188cfa01 100644 --- a/docs/getting-started/tutorials/pointers.md +++ b/docs/getting-started/tutorials/pointers.md @@ -1,16 +1,27 @@ # Working with pointers +{{hidden("AZ: Added a summary paragraph.")}} +This chapter introduces how to write and test specifications for +functions that manipulate memory through pointers. Using CN's +resource-based model, you'll learn the concepts of memory ownership +and how to express it in CN specifications. You'll learn how to use +these concepts to address common pointer related bugs that arise due +to incorrect pointer dereferences `*p`, memory leaks and more. + So far we’ve only considered functions manipulating numeric values. Specifications become more interesting when _pointers_ are involved, because the safety of memory accesses via pointers has to be -taken into account. +taken into account in addition to the values being read or written. -CN uses _separation logic resources_ and the concept of _ownership_ to -reason about memory accesses. A _resource_, intuitively, represents -the permission to access a region of memory. +CN helps here by using the idea of _resources_ and _ownership_, drawn +from separation logic. A _resource_, intuitively, represents the +permission to access a region of memory. We now show how CN uses these + ideas to prevent bugs like reading from uninitialized memory, writing + to overlapping regions, and memory leaks. -Let’s look at a simple example. The function `read` takes an integer -pointer `p` and returns the pointee value. +Let’s look at a simple example. The function `read` takes a pointer +`p` to an unsigned integer and returns the pointee value `*p` - the +value stored at the location pointed to by a pointer. ```c title="exercises/read.c" --8<-- @@ -33,13 +44,15 @@ unsigned int* p = (unsigned int*)(12294ca70); read(p); ``` -For the read `*p` to be safe, we need to know that the function has permission -to access the memory pointed to by `p`. We next explain how to represent this -permission. - -## RW resources +The problem? CN doesn’t know about the ownership of the memory +pointed to by `p`. It assumes nothing is safe unless proven otherwise +— a good assumption when working with pointers in C. In other words, +for the read `*p` to be safe, we need to know that the function has +permission to access the memory pointed to by `p`. We next explain +how to represent this permission. -Given a C-type `T` and pointer `p`, the resource `RW(p)` asserts +## Read-Write(RW) Resources {{todo("AZ: RW is never expanded. I am assuming it is read-write.")}} +Given a pointer `p` to a C-type `T`, the resource `RW(p)` asserts _ownership_ of a memory region at location `p` of the size of the C type `T`. @@ -50,9 +63,9 @@ _ownership_ of a memory region at location `p` of the size of the C type size CN can determine automatically, and I don't have a clear enough picture of what those are...") }} -In this example, we can ensure the safe execution of `read` by adding +We can ensure the safe execution of `read` by adding a precondition that requires ownership of `RW(p)`, as -shown below. (The `take ... =` part will be explained shortly.) Since +shown in the below example. (The `take ... =` part will be explained shortly.) Since reading the pointer does not disturb its value, we can also add a corresponding postcondition, whereby `read` returns ownership of `p` after it is finished executing, in the form of @@ -66,23 +79,36 @@ solutions/read.c This specification can be read as follows: -- any function calling `read` has to be able to provide a resource - `RW(p)` to pass into `read`, and +- any function calling `read` must provide a resource + `RW(p)` to pass into `read` (thus giving permission to + read `*p`), and - the caller will receive back a resource `RW(p)` when - `read` returns. + `read` returns. In other words, the function will return the + permission unchanged when finished. -## Pointee values +## Reasoning About Pointee Values -In addition to reasoning about memory accesed by pointers, we likely also want -to reason about the actual values that the pointers point to. The `take P =` in -the precondition assigns the name `P` to the pointee value of `p`. +In addition to reasoning about memory accessed by pointers, we often +want to reason about the actual values that the pointers point to. +For example, we may want to specify that a function returns the value +pointed to by its input, or that it leaves the value unchanged. -That means we can use the pointee values from the pre- and postcondition to -strengthen the specification of `read`. We add two new postconditions specifying +CN let's us express this by naming the pointee value in both the +precondition and the postcondition. In the above specification of +`read`, the `take P = ...` clause in the precondition assigns the name +`P` to the pointee value of `p`. Once we’ve named the initial value, +we can refer back to it in the postcondition. For instance, in the +read function, we expect it to return the value stored at `*p`, and we +also expect that this value hasn’t changed. We can express both of +these expectations directly: -1. that `read` returns `P`, the initial pointee value of `p`, and -1. that the pointee values `P` and `P_post` before and after execution of `read` (respectively) are the same. +1. `return == P` specifies that `read` returns `P`, the initial + pointee value of `p`, and +2. `P_post == P` specifies that the pointee values `P` and `P_post` + before and after execution of `read` (respectively) are the same. + +We can now strengthen the specification of `read` as shown below. ```c title="exercises/read2.c" --8<-- @@ -111,9 +137,10 @@ exercises/read2.c ### Exercises -_Exercise:_ Write a specification for `double_it`, which takes a pointer `p` and -returns double the pointee value. Running `cn test` on this correct -implementation should succeed, +_Exercise:_ Pointers, Values, and a Bit of Arithmetic +Write a specification for `double_it`, a function which takes a +pointer `p` and returns twice the pointee value. Running `cn test` +on this correct implementation should succeed, ```c title="exercises/double_it.c" --8<-- exercises/double_it.c @@ -136,8 +163,9 @@ should fail. ## Writing through pointers -We next have an example where data is written to a pointer. The function -`incr` takes a pointer `p` and increments the value in the memory cell that it +Now let’s look at a function that modifies memory. Here, we have an +example where data is written to a pointer. The function `incr` takes +a pointer `p` and increments the value in the memory cell that it points to. ```c title="exercises/slf0_basic_incr.c" @@ -146,13 +174,21 @@ exercises/slf0_basic_incr.c --8<-- ``` -The precondition binds the initial pointee value to `P`. The postcondition binds -the value _after_ function execution to `P_post`, and uses this to express that -the value `p` points to is incremented by `incr`: `P_post == P + 1i32`. +In the specification: + +- `take P = RW(p)` in the precondition binds the + initial pointee value to `P`, + +- `take P_post = RW(p);` in the postcondition binds the + pointee value _after_ function execution to `P_post`, and finally, + +- `P_post == P + 1u32` express that the value `p` points to is + incremented by `1`. ### Exercises -_Exercise:_ Write a specification for `inplace_double`, which takes a pointer +_Exercise:_ Specifying a Function that Modifies Memory In-Place +Write a specification for `inplace_double`, which takes a pointer `p` and doubles the pointee value. Make sure your postcondition captures the function's intended behavior. @@ -162,7 +198,7 @@ exercises/slf3_basic_inplace_double.c --8<-- ``` -## No memory leaks +## What Happens If Ownership Isn’t Returned? Understanding Memory Leaks In the specifications we have written so far, functions that receive resources as part of their precondition also return this ownership in their postcondition. @@ -191,18 +227,19 @@ unsigned int* p = (unsigned int*)(p0); read(p); ``` -The error from `cn test` report -tells us (1) in which function the error occurred, (2) what happened -("ownership leaked"), and (3) a failing input — i.e., a snippet of C -code that will construct a heap state on which the test fails in this -way. +The `cn test` error report tells us that the `function read` failed +the postcondition leak check and `ownership leaked` for a pointer. It +also provides a failing input — i.e., a snippet of C code that will +construct a heap state on which the test fails. -What went wrong here is that, given the above specification, `read` -leaks memory: it takes ownership, does not return it, but also does -not deallocate the RW memory or otherwise dispose of it. -CN requires that every resource passed into a function has to be either -_returned_ to the caller or else _destroyed_ by deallocating the RW area of -memory (as we shall see later). +So, what went wrong? CN found that the function `read` leaks memory: +it takes ownership of `p`, but does not return it or deallocate the +memory. + +In CN, **every resource must be accounted for**: every resource passed +into a function has to be either _returned_ to the caller or else +_destroyed_ by explicitly deallocating the RW area of memory (as we +shall see later). CN’s motivation for this choice is its focus on low-level systems software in which memory is managed manually; in this context, memory leaks are typically @@ -217,7 +254,7 @@ one, just like before. But there is an additional twist: simultaneously owning resources for two pointers implies that these pointers refer to _disjoint_ regions of memory. -Consider this example to see when disjointness matters: +Consider this example to see when disjointedness matters: ```c title="exercises/five_six.c" --8<-- @@ -240,16 +277,23 @@ exercises/slf8_basic_transfer.c --8<-- ``` -## Ownership of structured objects +## Ownership of Structured Objects + +So far, we've focused on functions that operate on simple types like +integers and pointers. But real-world C programs often manipulate +compound data types, such as _structs_. CN supports reasoning about +C _structs_ in a way that extends naturally from what we’ve already +seen. -So far, our examples have worked with just integers and pointers, but -larger programs typically also manipulate compound values, often -represented using C `struct`s. Specifying functions manipulating -structs works in much the same way as with basic types. +Specifying functions manipulating _structs_ works in much the +same way as with basic types. Just as we can take ownership of a +single unsigned int value via `RW(p)`, we can also take +ownership of an entire struct using its type - `RW(p)`. -For instance, the following example uses a `struct point` to -represent a point in two-dimensional space. The function `transpose` -swaps a point’s `x` and `y` coordinates. +In the following example, we show how to write CN specifications about +a `struct point` that is used to represent a point in two-dimensional +space. Consider the function `transpose`, which swaps a point’s +`x` and `y` coordinates. ```c title="exercises/transpose.c" --8<-- @@ -257,11 +301,14 @@ exercises/transpose.c --8<-- ``` -Here we have `RW(p)` resources, with the appropriate type of `p` -filled in. Accordingly, `P` and `P_post` are values with type `struct point`, -i.e., they are records with members `x` and `y`. The postcondition asserts the -coordinates have been swapped by referring to the members of `P` and `P_post` -individually. +In the precondition, we use `take P = RW(p)` to require +ownership of the entire memory region pointed to by `p`, and bind the +contents of the region to `P`. Accordingly in the postcondition, `P` +and `P_post` are values with type `struct point`, i.e., they are +records with members `x` and `y`. The postcondition specifies the +return of ownership using `take P_post = RW(p)`. +Furthermore, it specifies the intended effect of the function: that +the `x` and `y` fields have been swapped. {{ later("It would be nice to add an exercise that involves using the error messages to find a bug.") }} From 1b86dbffb78ab3608b75ae357e66b91f07d26aaa Mon Sep 17 00:00:00 2001 From: Aditya Zutshi Date: Thu, 24 Apr 2025 16:14:30 -0700 Subject: [PATCH 2/2] capitalized sub-headings. --- docs/getting-started/tutorials/pointers.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/getting-started/tutorials/pointers.md b/docs/getting-started/tutorials/pointers.md index 188cfa01..056e41cd 100644 --- a/docs/getting-started/tutorials/pointers.md +++ b/docs/getting-started/tutorials/pointers.md @@ -161,7 +161,7 @@ and this incorrect implementation ``` should fail. -## Writing through pointers +## Writing Through Pointers Now let’s look at a function that modifies memory. Here, we have an example where data is written to a pointer. The function `incr` takes @@ -247,7 +247,7 @@ very undesirable. As a consequence, function specifications have to do precise bookkeeping of their resource footprint and, in particular, return any unused resources back to the caller. -## Disjoint memory regions +## Disjoint Memory Regions When functions manipulate multiple pointers, we can assert ownership of each one, just like before. But there is an additional twist: simultaneously owning