Skip to content

Conversation

@lionel-
Copy link
Member

@lionel- lionel- commented Nov 17, 2025

This PR:

  • Documents what I've learned about source references while working on Ark in a single place.
  • Implements a tree viewer for source reference objects.

The tree viewer is helpful to get a grasp of the complex web of objects created by the parser:

  • "srcref" and "wholeSrcref" attributes
  • "srcref" objects wrapped in a list or unwrapped
  • Multiple classes of "srcfile"
  • Multiple kinds of srcref-bearing objects

Principles of the tree view:

  • To avoid repetition of "srcfile" objects (attached to every "srcref" object), these are given an identifier on first visit and referred to by id on subsequent visits.
  • The node labels indicate how to extract the object (via body(), attr(), or [[).
  • We recurse in a mostly generic way, looking for srcref-like things even where we don't expect them (e.g. srcref or wholeSrcref attributes on lists).
  • parsed is only shown if different from location (due to a #line directive)

Here is an example of src() showing its own srcrefs tree:

> lobstr::src(lobstr::src)
<closure>
├─attr("srcref"): <srcref>
│ ├─location: 202:8 - 228:1
│ ├─parsed: 788:8 - 814:1
│ └─attr("srcfile"): <srcfilealias> @001
│   ├─filename: "/private/tmp/RtmpPNQes3/R.INSTAL..."
│   └─original: <srcfilecopy> @002
│     ├─Enc: "unknown"
│     ├─filename: "/Users/lionel/R/Library/4.5-aarc..."
│     ├─fixedNewlines: TRUE
│     ├─isFile: TRUE
│     ├─lines<chr [2,049]>: ".packageName...", "#line 1 "/pr...", "#' Find memo...", ...
│     ├─timestamp: "2025-11-17 16:08:54"
│     └─wd: "/private/tmp/RtmpPNQes3/R.INSTAL..."
└─body(): <{>
  ├─attr("srcref"): <list>
  │ ├─[[1]]: <srcref>
  │ │ ├─location: 207:3 - 207:3
  │ │ ├─parsed: 793:3 - 793:3
  │ │ └─attr("srcfile"): @001
  │ ├─[[2]]: <srcref>
  │ │ ├─location: 208:3 - 208:47
  │ │ ├─parsed: 794:3 - 794:47
  │ │ └─attr("srcfile"): @001
  │ ├─[[3]]: <srcref>
  │ │ ├─location: 209:3 - 209:30
  │ │ ├─parsed: 795:3 - 795:30
  │ │ └─attr("srcfile"): @001
  │ ├─[[4]]: <srcref>
  │ │ ├─location: 211:3 - 211:41
  │ │ ├─parsed: 797:3 - 797:41
  │ │ └─attr("srcfile"): @001
  │ ├─[[5]]: <srcref>
  │ │ ├─location: 212:3 - 214:3
  │ │ ├─parsed: 798:3 - 800:3
  │ │ └─attr("srcfile"): @001
  │ ├─[[6]]: <srcref>
  │ │ ├─location: 217:3 - 219:3
  │ │ ├─parsed: 803:3 - 805:3
  │ │ └─attr("srcfile"): @001
  │ └─[[7]]: <srcref>
  │   ├─location: 221:3 - 227:3
  │   ├─parsed: 807:3 - 813:3
  │   └─attr("srcfile"): @001
  ├─attr("srcfile"): @001
  ├─attr("wholeSrcref"): <srcref>
  │ ├─location: 1:0 - 228:1
  │ ├─parsed: 1:0 - 814:1
  │ └─attr("srcfile"): @001
  ├─[[5]][[3]]: <{>
  │ ├─attr("srcref"): <list>
  │ │ ├─[[1]]: <srcref>
  │ │ │ ├─location: 212:24 - 212:24
  │ │ │ ├─parsed: 798:24 - 798:24
  │ │ │ └─attr("srcfile"): @001
  │ │ └─[[2]]: <srcref>
  │ │   ├─location: 213:5 - 213:27
  │ │   ├─parsed: 799:5 - 799:27
  │ │   └─attr("srcfile"): @001
  │ ├─attr("srcfile"): @001
  │ └─attr("wholeSrcref"): <srcref>
  │   ├─location: 1:0 - 214:3
  │   ├─parsed: 1:0 - 800:3
  │   └─attr("srcfile"): @001
  └─[[6]][[3]]: <{>
    ├─attr("srcref"): <list>
    │ ├─[[1]]: <srcref>
    │ │ ├─location: 217:45 - 217:45
    │ │ ├─parsed: 803:45 - 803:45
    │ │ └─attr("srcfile"): @001
    │ └─[[2]]: <srcref>
    │   ├─location: 218:5 - 218:46
    │   ├─parsed: 804:5 - 804:46
    │   └─attr("srcfile"): @001
    ├─attr("srcfile"): @001
    └─attr("wholeSrcref"): <srcref>
      ├─location: 1:0 - 219:3
      ├─parsed: 1:0 - 805:3
      └─attr("srcfile"): @001

@lionel- lionel- requested a review from DavisVaughan November 17, 2025 15:48
Copy link
Member

@DavisVaughan DavisVaughan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't read the code too closely but I did read the docs closely and I really like having this all in one place!

@lionel- lionel- force-pushed the feature/srcref branch 2 times, most recently from be316eb to d3e07a8 Compare November 18, 2025 09:10
@lionel- lionel- requested a review from hadley December 5, 2025 11:03
Copy link
Member

@hadley hadley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks great and you should feel free to merge whenever. But I'm also happy to discuss more 😄

#' @param file Optional file path (default: creates temp file)
#' @return The result of sourcing the code with keep.source = TRUE
#' @noRd
with_srcref <- function(code, env = parent.frame(), file = NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to save to a file? Doesn't (eval(parse_with_srcref(code))) give you an object with srcref?

Copy link
Member Author

@lionel- lionel- Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I just copied that util from rlang as is. But there are differences between the two approaches, parse doesn't produce the same kind of srcref as source. I get expectation failures when I change with_srcref to use parse.


test_that("src() shows quoted function with nested body", {
expect_snapshot({
with_srcref("x <- quote(function() {})")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need the assignment here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this gets assigned in the current environment and is inspected just below.

test_that("src() shows closure with srcref and wholeSrcref", {
expect_snapshot({
f <- simple_function_with_srcref()
scrub_src(src(f))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why this works interactively:

f <- function() {
  x + 1 # comment
  {}
}
scrub_src(src(f))

But not

expect_snapshot({
f <- function() {
  x + 1 # comment
  {}
}
scrub_src(src(f))
})

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm both work for me

@lionel-
Copy link
Member Author

lionel- commented Dec 8, 2025

TODO:

  • Add note that wholeSrcref attribute on body of evaluated closures have unreliable start positions, just like for { nodes.

  • Individual srcref columns are right-boundary positions. I.e. for an expression starting at the start of the file, column will be 1. wholeSrcref on the other hand starts at 0, before the first character. It might also end 1 character after the last srcref column.

  • Link to https://journal.r-project.org/articles/RJ-2010-010

@lionel- lionel- requested a review from gaborcsardi January 29, 2026 09:03
Copy link
Member

@gaborcsardi gaborcsardi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! I didn't look at the code much, only the docs and ran some examples, but the docs are just as valuable as the code, anyway. :)

Left some minor comments about the docs.

#' optionally, the parsed-line numbers if `#line` directives were used.
#'
#' Lengths of 4, 6, or 8 are allowed:
#' - 4: basic (first_line, first_byte, last_line, last_byte)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not too hard to deduce, but maybe worth noting that first_byte and last_byte are within the line. (Right?)

#' there is no support for encodings other than UTF-8.
#'
#' The srcref columns are right-boundary positions, meaning that for an
#' expression starting at the start of the file, the column will be 1. Note that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean at the start of a line? I.e. bytes and columns are counted from the beginning of the line, right?

#' location, for example from a temporary file or generated file to the original
#' location on disk.
#'
#' Called by `install.packages()` when installing a _source_ package with `keep.source.pkgs` set to `TRUE` (see
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Created" by install.packages()... ?

#' - `Enc`: The encoding of output lines. Used by `getSrcLines()`, which
#' calls `iconv()` when `Enc` does not match `encoding`.
#'
#' - `parseData` (optional): Parser information saved when `keep.source.data` is #' set to `TRUE`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newline needed before the #'

@gaborcsardi
Copy link
Member

Btw. do you know when the source refs on { are used? E.g. I would think that for printing a function only attr(src, "srcref") is used and not attr(body(src), "srcref"). Why do we need the latter?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants