diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index a2a9469..803e6fb 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -8,7 +8,7 @@ Thank you for your interest in contributing to `ClassiPyR`! This document provid
 
 - R (>= 4.0.0)
 - devtools package for development
-- Python with `scipy` (required for .mat file operations)
+- Python with `scipy` (required for saving .mat annotation files)
 
 ### Setting Up the Development Environment
 
@@ -26,7 +26,7 @@ Thank you for your interest in contributing to `ClassiPyR`! This document provid
    devtools::load_all()
    ```
 
-4. Set up Python environment (required for .mat file support):
+4. Set up Python environment (required for saving .mat annotation files):
    ```r
    library(iRfcb)
    ifcb_py_install(envname = "./venv")
diff --git a/DESCRIPTION b/DESCRIPTION
index dfd36e7..1d749c0 100644
--- a/DESCRIPTION
+++ b/DESCRIPTION
@@ -18,6 +18,7 @@ LazyData: true
 Imports:
     shiny,
     shinyjs,
+    shinyFiles,
     bslib,
     iRfcb,
     dplyr,
diff --git a/NAMESPACE b/NAMESPACE
index 73b4e6e..32f8a83 100644
--- a/NAMESPACE
+++ b/NAMESPACE
@@ -5,17 +5,21 @@ export(create_empty_changes_log)
 export(create_new_classifications)
 export(filter_to_extracted)
 export(get_config_dir)
+export(get_file_index_path)
 export(get_sample_paths)
 export(get_settings_path)
 export(init_python_env)
 export(is_valid_sample_name)
 export(load_class_list)
+export(load_file_index)
 export(load_from_classifier_mat)
 export(load_from_csv)
 export(load_from_mat)
 export(read_roi_dimensions)
+export(rescan_file_index)
 export(run_app)
 export(sanitize_string)
+export(save_file_index)
 export(save_sample_annotations)
 export(save_validation_statistics)
 importFrom(DT,renderDT)
@@ -26,4 +30,5 @@ importFrom(iRfcb,ifcb_get_mat_variable)
 importFrom(jsonlite,fromJSON)
 importFrom(reticulate,py_available)
 importFrom(shiny,shinyApp)
+importFrom(shinyFiles,shinyDirButton)
 importFrom(shinyjs,useShinyjs)
diff --git a/NEWS.md b/NEWS.md
index 74151fe..46b4a5c 100644
--- a/NEWS.md
+++ b/NEWS.md
@@ -20,6 +20,15 @@
   - ✎✓ = Has both (can switch between modes)
   - * = Unannotated
 
+### File Index Cache
+- Disk-based file index cache for faster app startup on subsequent launches
+- Avoids expensive recursive directory scans when folder contents haven't changed
+- Sync button in sidebar to manually refresh the file index
+- Cache age indicator shows when folders were last scanned
+- `rescan_file_index()` function for headless use (e.g. cron jobs)
+- Cache stored in platform-appropriate config directory alongside settings
+- Auto-sync option (enabled by default) to control whether app scans on startup
+
 ### Image Gallery
 - Paginated image display (50/100/200/500 images per page)
 - Images grouped by class on consecutive pages for efficient review
@@ -49,12 +58,16 @@
 - Save validation statistics as CSV (in `validation_statistics/` subfolder)
 - Organize output PNGs by class folder (for CNN training)
 - Auto-save when navigating between samples
+- Support for non-standard folder structures via direct ADC path resolution
+- Graceful handling of empty (0-byte) ADC files
 
 ### Settings & Persistence
 - Configurable folder paths via settings modal
+- Cross-platform web-based folder browser (shinyFiles)
 - Settings persisted between sessions (`.classipyr_settings.json`)
 - Class list file path remembered and auto-loaded on startup
 - Annotator name tracking for statistics
+- Cache invalidation when folder paths change in settings
 
 ### User Interface
 - Clean, modern interface using bslib (Flatly theme)
@@ -66,6 +79,7 @@
 - Requires Python with scipy for MAT file writing (optional - only for ifcb-analysis compatibility)
 - Uses iRfcb package for IFCB data handling
 - Session cache preserves work when switching samples
+- File index cache reduces startup time by avoiding redundant folder scans
 - Security: Input validation, XSS prevention, path traversal protection
 
 ## Development
diff --git a/R/run_app.R b/R/run_app.R
index 2a5eb61..7e01aa9 100644
--- a/R/run_app.R
+++ b/R/run_app.R
@@ -4,12 +4,13 @@
 #'
 #' Launches the ClassiPyR Shiny app for manual image classification and validation of IFCB data.
 #' This app relies on the iRfcb package for reading IFCB data files and requires
-#' Python (via reticulate) for reading and writing MATLAB .mat files.
+#' Python (via reticulate) for saving MATLAB .mat files.
 #'
-#' @param venv_path Optional path to a Python virtual environment. If NULL (default),
-#'   the app will use any saved venv path from settings, or fall back to a 'venv'
-#'   folder in the current working directory. Set this to specify a custom location
-#'   for the Python virtual environment used by iRfcb.
+#' @param venv_path Optional path to a Python virtual environment. When specified,
+#'   this path takes priority over any saved venv path in settings, both for Python
+#'   initialization at startup and in the Settings UI. If NULL (default), the app
+#'   uses any saved venv path from settings, or falls back to a 'venv' folder in
+#'   the current working directory.
 #' @param reset_settings If TRUE, deletes saved settings before starting the app.
 #'   Useful for troubleshooting or starting fresh. Default is FALSE.
 #' @param launch.browser If TRUE (default), opens the app in the system's default
diff --git a/R/sample_loading.R b/R/sample_loading.R
index d873b1a..968c9e6 100644
--- a/R/sample_loading.R
+++ b/R/sample_loading.R
@@ -8,23 +8,38 @@ NULL
 #' Reads a classification CSV file and returns a data frame with classifications.
 #' Class names are processed to truncate trailing numbers (matching iRfcb behavior).
 #'
+#' The CSV file must contain the following columns:
+#' \describe{
+#'   \item{file_name}{Image filename including the `.png` extension
+#'     (e.g., `D20230101T120000_IFCB134_00001.png`).}
+#'   \item{class_name}{Predicted class name (e.g., `Diatom`).}
+#' }
+#'
+#' An optional column may also be included:
+#' \describe{
+#'   \item{score}{Classification confidence value between 0 and 1.}
+#' }
+#'
+#' The CSV file must be named after the sample it describes
+#' (e.g., `D20230101T120000_IFCB134.csv`) and placed inside the Classification
+#' Folder configured in the app (subfolders are searched recursively).
+#'
 #' @param csv_path Path to classification CSV file
-#' @return Data frame with classifications (columns depend on CSV content)
+#' @return Data frame with classifications. Expected columns: `file_name`,
+#'   `class_name`, and optionally `score`.
 #' @export
 #' @examples
 #' \dontrun{
 #' # Load classifications from a CSV file
-#' classifications <- load_from_csv("/path/to/classifications.csv")
+#' classifications <- load_from_csv("/path/to/D20230101T120000_IFCB134.csv")
 #' head(classifications)
 #' }
 load_from_csv <- function(csv_path) {
   classifications <- utils::read.csv(csv_path, stringsAsFactors = FALSE)
 
-  # Truncate trailing numbers from class names
-  classifications$class_name <- sapply(
-    classifications$class_name,
-    iRfcb:::truncate_folder_name
-  )
+  # Strip trailing 3-digit suffix from class names (e.g., "Diatom_001" -> "Diatom")
+  # This matches iRfcb behavior where class folders may include numeric suffixes
+  classifications$class_name <- sub("_\\d{3}$", "", classifications$class_name)
 
   classifications
 }
diff --git a/R/sample_saving.R b/R/sample_saving.R
index 48444d1..ee212b6 100644
--- a/R/sample_saving.R
+++ b/R/sample_saving.R
@@ -17,9 +17,12 @@ NULL
 #' @param temp_png_folder Path to temporary folder with extracted PNG images
 #' @param output_folder Output folder path for MAT files
 #' @param png_output_folder PNG output folder path (organized by class)
-#' @param roi_folder ROI folder path (for ADC file location)
+#' @param roi_folder ROI folder path (for ADC file location, used as fallback)
 #' @param class2use_path Path to class2use file
 #' @param annotator Annotator name for statistics
+#' @param adc_folder Direct path to the ADC folder. When provided, this is used
+#'   instead of constructing the path via \code{\link{get_sample_paths}}.
+#'   This supports non-standard folder structures.
 #' @return TRUE on success, FALSE on failure
 #' @export
 #' @examples
@@ -47,7 +50,8 @@ save_sample_annotations <- function(sample_name,
                                      png_output_folder,
                                      roi_folder,
                                      class2use_path,
-                                     annotator = "Unknown") {
+                                     annotator = "Unknown",
+                                     adc_folder = NULL) {
 
   if (is.null(sample_name) || is.null(classifications) || is.null(class2use_path)) {
     return(FALSE)
@@ -84,13 +88,16 @@ save_sample_annotations <- function(sample_name,
       output_folder = png_output_folder
     )
 
-    # Find ADC folder
-    paths <- get_sample_paths(sample_name, roi_folder)
+    # Find ADC folder: use provided path, or fall back to get_sample_paths()
+    if (is.null(adc_folder)) {
+      paths <- get_sample_paths(sample_name, roi_folder)
+      adc_folder <- paths$adc_folder
+    }
 
     # Run annotation - save MAT to output folder directly
     ifcb_annotate_samples(
       png_folder = temp_annotate_folder,
-      adc_folder = paths$adc_folder,
+      adc_folder = adc_folder,
       class2use_file = class2use_path,
       output_folder = output_folder,
       sample_names = sample_name,
diff --git a/R/utils.R b/R/utils.R
index dd55799..315b297 100644
--- a/R/utils.R
+++ b/R/utils.R
@@ -5,6 +5,7 @@
 #' @importFrom iRfcb ifcb_get_mat_variable
 #' @importFrom shiny shinyApp
 #' @importFrom shinyjs useShinyjs
+#' @importFrom shinyFiles shinyDirButton
 #' @importFrom bslib bs_theme
 #' @importFrom DT renderDT
 #' @importFrom jsonlite fromJSON
@@ -51,10 +52,206 @@ get_settings_path <- function() {
   file.path(config_dir, "settings.json")
 }
 
+#' Get path to file index cache
+#'
+#' Returns the path to the file index JSON cache file. The file index
+#' stores scanned folder results to avoid expensive recursive directory
+#' scans on startup.
+#'
+#' @return Path to the file index JSON file
+#' @export
+get_file_index_path <- function() {
+  file.path(get_config_dir(), "file_index.json")
+}
+
+#' Save file index to disk cache
+#'
+#' Writes the file index data to a JSON cache file for fast startup.
+#'
+#' @param data List containing scan results (sample names, path maps, etc.)
+#' @return NULL (called for side effects)
+#' @export
+save_file_index <- function(data) {
+  tryCatch({
+    path <- get_file_index_path()
+    dir.create(dirname(path), recursive = TRUE, showWarnings = FALSE)
+    jsonlite::write_json(data, path, auto_unbox = TRUE, pretty = TRUE)
+  }, error = function(e) {
+    message("Could not save file index: ", e$message)
+  })
+}
+
+#' Load file index from disk cache
+#'
+#' Reads the cached file index if it exists and is valid JSON.
+#'
+#' @return List with cached data, or NULL if no cache exists or it is invalid
+#' @export
+load_file_index <- function() {
+  path <- get_file_index_path()
+  if (file.exists(path)) {
+    tryCatch(
+      jsonlite::fromJSON(path, simplifyVector = FALSE),
+      error = function(e) {
+        message("Could not load file index: ", e$message)
+        NULL
+      }
+    )
+  } else {
+    NULL
+  }
+}
+
+#' Rescan folders and rebuild the file index cache
+#'
+#' Scans the configured (or specified) ROI, classification, and output folders
+#' for IFCB sample files and saves the results to the file index cache.
+#' This can be called outside the Shiny app, e.g. from a cron job, to keep
+#' the cache up to date without manually clicking the rescan button.
+#'
+#' If folder paths are not provided, they are read from saved settings.
+#'
+#' @param roi_folder Path to ROI data folder. If NULL, read from saved settings.
+#' @param csv_folder Path to classification folder (CSV/MAT). If NULL, read from saved settings.
+#' @param output_folder Path to output folder for annotations. If NULL, read from saved settings.
+#' @param verbose If TRUE, print progress messages. Default TRUE.
+#' @return Invisibly returns the file index list, or NULL if roi_folder is invalid.
+#' @export
+#' @examples
+#' \dontrun{
+#' # Rescan using saved settings
+#' rescan_file_index()
+#'
+#' # Rescan with explicit paths
+#' rescan_file_index(
+#'   roi_folder = "/data/ifcb/raw",
+#'   csv_folder = "/data/ifcb/classified",
+#'   output_folder = "/data/ifcb/manual"
+#' )
+#'
+#' # Use in a cron job:
+#' # Rscript -e 'ClassiPyR::rescan_file_index()'
+#' }
+rescan_file_index <- function(roi_folder = NULL, csv_folder = NULL,
+                              output_folder = NULL, verbose = TRUE) {
+  # Read from saved settings if not provided
+  if (is.null(roi_folder) || is.null(csv_folder) || is.null(output_folder)) {
+    settings_path <- get_settings_path()
+    if (file.exists(settings_path)) {
+      saved <- tryCatch(
+        jsonlite::fromJSON(settings_path),
+        error = function(e) list()
+      )
+      if (is.null(roi_folder)) roi_folder <- saved$roi_folder
+      if (is.null(csv_folder)) csv_folder <- saved$csv_folder
+      if (is.null(output_folder)) output_folder <- saved$output_folder
+    }
+  }
+
+  # Validate ROI folder
+  roi_valid <- !is.null(roi_folder) && length(roi_folder) == 1 &&
+    !isTRUE(is.na(roi_folder)) && nzchar(roi_folder) && dir.exists(roi_folder)
+  csv_valid <- !is.null(csv_folder) && length(csv_folder) == 1 &&
+    !isTRUE(is.na(csv_folder)) && nzchar(csv_folder) && dir.exists(csv_folder)
+  output_valid <- !is.null(output_folder) && length(output_folder) == 1 &&
+    !isTRUE(is.na(output_folder)) && nzchar(output_folder) && dir.exists(output_folder)
+
+  if (!roi_valid) {
+    if (verbose) message("ROI folder not set or does not exist: ", roi_folder)
+    return(invisible(NULL))
+  }
+
+  # Scan ROI files
+  if (verbose) message("Scanning ROI files in: ", roi_folder)
+  roi_files <- list.files(roi_folder, pattern = "\\.roi$",
+                          recursive = TRUE, full.names = TRUE)
+  sample_names_raw <- tools::file_path_sans_ext(basename(roi_files))
+
+  # Build ROI path map (handle duplicates: keep first occurrence)
+  roi_map <- list()
+  for (i in seq_along(roi_files)) {
+    sn <- sample_names_raw[i]
+    if (is.null(roi_map[[sn]])) {
+      roi_map[[sn]] <- roi_files[i]
+    }
+  }
+  sample_names <- unique(sample_names_raw)
+  if (verbose) message("  Found ", length(sample_names), " samples")
+
+  if (length(sample_names) == 0) {
+    if (verbose) message("No samples found.")
+    return(invisible(NULL))
+  }
+
+  # Scan classification files
+  classified <- character()
+  mat_file_map <- list()
+  csv_map <- list()
+
+  if (csv_valid) {
+    if (verbose) message("Scanning classification files in: ", csv_folder)
+
+    csv_files <- list.files(csv_folder, pattern = "\\.csv$",
+                            recursive = TRUE, full.names = TRUE)
+    csv_sample_names <- tools::file_path_sans_ext(basename(csv_files))
+
+    for (i in seq_along(csv_files)) {
+      sn <- csv_sample_names[i]
+      if (sn %in% sample_names && is.null(csv_map[[sn]])) {
+        csv_map[[sn]] <- csv_files[i]
+      }
+    }
+
+    mat_files <- list.files(csv_folder, pattern = "_class.*\\.mat$",
+                            recursive = TRUE, full.names = TRUE)
+
+    for (mat_file in mat_files) {
+      mat_basename <- basename(mat_file)
+      sample_from_mat <- sub("_class.*\\.mat$", "", mat_basename)
+      if (sample_from_mat %in% sample_names) {
+        mat_file_map[[sample_from_mat]] <- mat_file
+      }
+    }
+
+    mat_samples <- names(mat_file_map)
+    csv_matched <- csv_sample_names[csv_sample_names %in% sample_names]
+    classified <- unique(c(csv_matched, mat_samples))
+    if (verbose) message("  Found ", length(classified), " classified samples")
+  }
+
+  # Scan output folder for manual annotations
+  annotated <- character()
+  if (output_valid) {
+    if (verbose) message("Scanning output folder: ", output_folder)
+    output_mat_files <- list.files(output_folder, pattern = "\\.mat$",
+                                   full.names = FALSE)
+    manual_mat_files <- output_mat_files[!grepl("_class", output_mat_files)]
+    annotated <- tools::file_path_sans_ext(manual_mat_files)
+    annotated <- annotated[annotated %in% sample_names]
+    if (verbose) message("  Found ", length(annotated), " annotated samples")
+  }
+
+  # Save to cache
+  index_data <- list(
+    roi_folder = roi_folder,
+    csv_folder = csv_folder,
+    output_folder = output_folder,
+    sample_names = sample_names,
+    classified_samples = classified,
+    annotated_samples = annotated,
+    roi_path_map = roi_map,
+    csv_path_map = csv_map,
+    classifier_mat_files = mat_file_map,
+    timestamp = as.character(Sys.time())
+  )
+
+  save_file_index(index_data)
+  if (verbose) message("File index saved to: ", get_file_index_path())
+
+  invisible(index_data)
+}
+
 # Constants
-# Cache stores classification metadata (~1.5 MB per sample with 5000 ROIs)
-# 20 samples ≈ 30 MB memory usage - reasonable for most workflows
-MAX_CACHED_SAMPLES <- 20
 VALID_SAMPLE_NAME_PATTERN <- "^D\\d{8}T\\d{6}_IFCB\\d+$"
 
 # Characters unsafe for class names (used in folder names and HTML display):
@@ -127,7 +324,7 @@ sanitize_string <- function(x) {
 #' @export
 #' @examples
 #' \dontrun{
-#' # Load from MATLAB file (requires Python)
+#' # Load from MATLAB file
 #' classes <- load_class_list("/path/to/class2use.mat")
 #'
 #' # Load from text file
@@ -230,10 +427,8 @@ read_roi_dimensions <- function(adc_path) {
     if (!file.exists(adc_path)) {
       stop("ADC file not found: ", adc_path)
     }
-
-    adc_data <- utils::read.csv(adc_path, header = FALSE)
-
-    if (nrow(adc_data) == 0) {
+    
+    if (file.info(adc_path)$size == 0) {
       return(data.frame(
         roi_number = integer(),
         width = numeric(),
@@ -242,6 +437,8 @@ read_roi_dimensions <- function(adc_path) {
       ))
     }
 
+    adc_data <- utils::read.csv(adc_path, header = FALSE)
+
     n_rois <- nrow(adc_data)
 
     # IFCB ADC columns: V14=ROIx, V15=ROIy, V16=ROIwidth, V17=ROIheight
@@ -292,8 +489,18 @@ create_empty_changes_log <- function() {
 #' use or create a virtual environment. Required for reading and writing
 #' MATLAB .mat files.
 #'
+#' The resolution order is:
+#' 1. If Python is already configured via reticulate, use it directly
+#'    (installs scipy if missing).
+#' 2. If \code{venv_path} is provided and the virtual environment exists,
+#'    activate it.
+#' 3. If \code{venv_path} is provided but does not exist, create it via
+#'    \code{\link[iRfcb]{ifcb_py_install}}.
+#' 4. If \code{venv_path} is NULL, default to \code{./venv} in the current
+#'    working directory for steps 2--3.
+#'
 #' @param venv_path Optional path to virtual environment. If NULL (default),
-#'   uses a 'venv' folder in the current working directory.
+#'   uses a \code{venv} folder in the current working directory.
 #' @return TRUE if Python is available, FALSE otherwise
 #' @export
 #' @examples
diff --git a/README.md b/README.md
index 29490b7..1624124 100644
--- a/README.md
+++ b/README.md
@@ -22,6 +22,7 @@ A Shiny application for manual (human) image classification and validation of Im
 - **MATLAB Compatible**: Export for [ifcb-analysis](https://github.com/hsosik/ifcb-analysis) toolbox
 - **CNN Training Ready**: Organized PNG output by class
 - **Measure Tool**: Built-in ruler for image measurements
+- **Cross-Platform**: Web-based folder browser works on all platforms
 
 ## Installation
 
@@ -34,11 +35,12 @@ remotes::install_github("EuropeanIFCBGroup/ClassiPyR")
 
 ### Python Setup
 
-Python is required for reading and writing MATLAB .mat files. If you only work with CSV files, this step is optional.
+Python is required for saving annotations as MATLAB .mat files. If you only need to read existing .mat files or work with CSV files, this step is optional.
 
 ```r
 library(iRfcb)
-ifcb_py_install()
+venv_path = "/path/to/your/venv"
+ifcb_py_install(venv_path)
 ```
 
 ## Quick Start
@@ -46,6 +48,9 @@ ifcb_py_install()
 ```r
 library(ClassiPyR)
 run_app()
+
+# Or specify a Python virtual environment (takes priority over saved settings)
+run_app(venv_path = venv_path)
 ```
 
 See the [Getting Started](https://europeanifcbgroup.github.io/ClassiPyR/articles/getting-started.html) guide for detailed setup instructions.
diff --git a/_pkgdown.yml b/_pkgdown.yml
index 39c9e71..d36c266 100644
--- a/_pkgdown.yml
+++ b/_pkgdown.yml
@@ -45,14 +45,25 @@ reference:
 - title: Sample Loading
   desc: Functions for loading classifications and samples
   contents:
-  - starts_with("load_")
+  - load_class_list
+  - load_from_classifier_mat
+  - load_from_csv
+  - load_from_mat
   - create_new_classifications
   - filter_to_extracted
 - title: Sample Saving
   desc: Functions for saving annotations and exporting images
   contents:
-  - starts_with("save_")
+  - save_sample_annotations
+  - save_validation_statistics
   - copy_images_to_class_folders
+- title: File Index Cache
+  desc: Functions for managing the file index cache for faster startup
+  contents:
+  - get_file_index_path
+  - load_file_index
+  - save_file_index
+  - rescan_file_index
 - title: Utilities
   desc: Helper functions for IFCB data processing
   contents:
diff --git a/data-raw/make_hex_sticker.R b/data-raw/make_hex_sticker.R
index 4adec47..faaf6cf 100644
--- a/data-raw/make_hex_sticker.R
+++ b/data-raw/make_hex_sticker.R
@@ -3,6 +3,7 @@
 
 library(hexSticker)
 library(ggplot2)
+library(magick)
 
 # Create a more realistic centric diatom (frustule-like)
 # Based on Coscinodiscus/Thalassiosira appearance
@@ -78,6 +79,60 @@ create_diatom <- function() {
   return(p)
 }
 
+create_diatom_icon <- function() {
+  
+  outer_circle <- function(n = 64) {
+    theta <- seq(0, 2 * pi, length.out = n)
+    r <- 0.85
+    data.frame(x = r * cos(theta), y = r * sin(theta))
+  }
+  
+  ring <- function(r, n = 64) {
+    theta <- seq(0, 2 * pi, length.out = n)
+    data.frame(x = r * cos(theta), y = r * sin(theta))
+  }
+  
+  ribs <- function(n = 12) {
+    angles <- seq(0, 2 * pi, length.out = n + 1)[-1]
+    data.frame(
+      x = 0, y = 0,
+      xend = 0.82 * cos(angles),
+      yend = 0.82 * sin(angles)
+    )
+  }
+  
+  ggplot() +
+    geom_segment(
+      data = ribs(),
+      aes(x = x, y = y, xend = xend, yend = yend),
+      linewidth = 0.9,
+      color = "#D6ECF8"
+    ) +
+    geom_path(
+      data = ring(0.5),
+      aes(x = x, y = y),
+      linewidth = 0.9,
+      color = "#D6ECF8"
+    ) +
+    geom_point(
+      aes(0, 0),
+      size = 4,
+      color = "#EAF4FB"
+    ) +
+    geom_path(
+      data = outer_circle(),
+      aes(x = x, y = y),
+      linewidth = 1.6,
+      color = "#EAF4FB"
+    ) +
+    coord_fixed(xlim = c(-1.05, 1.05), ylim = c(-1.05, 1.05)) +
+    theme_void() +
+    theme(
+      panel.background = element_rect(fill = "transparent", color = NA),
+      plot.background = element_rect(fill = "transparent", color = NA)
+    )
+}
+
 # Generate the diatom subplot
 diatom_plot <- create_diatom()
 
@@ -121,3 +176,37 @@ svg_sticker <- sticker(
 )
 
 message("Hex sticker saved to man/figures/logo.svg")
+
+icon_diatom <- create_diatom_icon()
+
+sticker(
+  subplot = icon_diatom,
+  package = NULL,
+  s_x = 1,
+  s_y = 1,
+  s_width = 1.5,
+  s_height = 1.5,
+  h_fill = "#1A3A5C",
+  h_color = "#3D8EC9",
+  h_size = 1.4,
+  filename = "man/figures/logo_icon.png",
+  dpi = 1200
+)
+
+message("Icon PNG saved to man/figures/logo_icon.png")
+
+img <- image_read("man/figures/logo_icon.png")
+
+sizes <- c(16, 24, 32, 48, 64, 128, 256)
+
+ico_imgs <- lapply(
+  sizes,
+  function(s) image_resize(img, paste0(s, "x", s))
+)
+
+ico <- image_join(ico_imgs) |>
+  image_convert(format = "ico")
+
+image_write(ico, "man/figures/logo.ico")
+
+message("ICO file saved to man/figures/logo.ico")
\ No newline at end of file
diff --git a/inst/app/global.R b/inst/app/global.R
index 82f0a26..32d00af 100644
--- a/inst/app/global.R
+++ b/inst/app/global.R
@@ -4,22 +4,25 @@
 # Helper functions are loaded from the ClassiPyR package.
 
 # Load required libraries (ClassiPyR imports these)
-library(ClassiPyR)
-library(shiny)
-library(shinyjs)
-library(bslib)
-library(iRfcb)
-library(dplyr)
-library(DT)
-library(jsonlite)
-library(reticulate)
+suppressPackageStartupMessages({
+  library(ClassiPyR)
+  library(shiny)
+  library(shinyjs)
+  library(shinyFiles)
+  library(bslib)
+  library(iRfcb)
+  library(dplyr)
+  library(DT)
+  library(jsonlite)
+  library(reticulate)
+})
 
 # Get version from package
 app_version <- as.character(utils::packageVersion("ClassiPyR"))
 
-# Constants for session cache
-# Cache stores classification metadata (~1.5 MB per sample with 5000 ROIs)
-# 20 samples ≈ 30 MB memory usage - reasonable for most workflows
+# Session cache limit (used in server.R to evict oldest samples)
+# Each cached sample stores classification metadata (~1.5 MB with 5000 ROIs)
+# 20 samples ≈ 30 MB memory usage
 MAX_CACHED_SAMPLES <- 20
 
 # Get Python venv path from: 1) run_app() argument, 2) saved settings, 3) NULL (use default)
@@ -46,5 +49,10 @@ MAX_CACHED_SAMPLES <- 20
 # Initialize Python on app startup with configured venv path
 python_available <- init_python_env(venv_path = .get_venv_path())
 
+# S3 method for dynamic_roots: allows shinyFiles to subscript a function-based
+# roots object. shinyFiles 0.9.3 internally does roots[selectedRoot] without
+# checking if roots is a function, so this class bridges the gap.
+`[.dynamic_roots` <- function(x, i) x()[i]
+
 # App settings
 options(shiny.launch.browser = TRUE)
diff --git a/inst/app/server.R b/inst/app/server.R
index a4f4adc..457964e 100644
--- a/inst/app/server.R
+++ b/inst/app/server.R
@@ -69,17 +69,38 @@ server <- function(input, output, session) {
   # Get user's working directory (captured by run_app() before Shiny changes it)
   startup_wd <- getOption("ClassiPyR.startup_wd", default = getwd())
 
+  # Volumes for shinyFiles directory browser
+  base_volumes <- c("Working Dir" = startup_wd, shinyFiles::getVolumes()())
+
+  # Build volumes with optional "Current" root from a text input path
+  get_browse_volumes <- function(current_path = NULL) {
+    if (!is.null(current_path) && nzchar(current_path) && dir.exists(current_path)) {
+      c("Current" = normalizePath(current_path), base_volumes)
+    } else {
+      base_volumes
+    }
+  }
+
+  # Create a dynamic roots object for shinyDirChoose that reads the current
+
+  # text input value each time the dialog opens or navigates
+  make_dynamic_roots <- function(input_id) {
+    f <- function() get_browse_volumes(input[[input_id]])
+    structure(f, class = c("dynamic_roots", "function"))
+  }
+
   # Load saved settings or use defaults
   load_settings <- function() {
     defaults <- list(
       csv_folder = startup_wd,
       roi_folder = startup_wd,
-      output_folder = file.path(startup_wd, "output"),
-      png_output_folder = file.path(startup_wd, "output", "png"),
+      output_folder = startup_wd,
+      png_output_folder = startup_wd,
       use_threshold = TRUE,
       pixels_per_micron = 3.4,  # IFCB default resolution
+      auto_sync = TRUE,  # Automatically sync folders on startup
       class2use_path = NULL,  # Path to class2use file for auto-loading
-      python_venv_path = NULL  # NULL = use iRfcb default (~/.virtualenvs/iRfcb)
+      python_venv_path = NULL  # NULL = use ./venv in working directory
     )
 
     if (file.exists(settings_file)) {
@@ -117,6 +138,13 @@ server <- function(input, output, session) {
 
   # Initialize config from saved settings
   saved_settings <- load_settings()
+
+  # run_app(venv_path=) takes precedence over saved settings
+  run_app_venv <- getOption("ClassiPyR.venv_path", default = NULL)
+  if (!is.null(run_app_venv) && nzchar(run_app_venv)) {
+    saved_settings$python_venv_path <- run_app_venv
+  }
+
   config <- reactiveValues(
     csv_folder = saved_settings$csv_folder,
     roi_folder = saved_settings$roi_folder,
@@ -124,6 +152,7 @@ server <- function(input, output, session) {
     png_output_folder = saved_settings$png_output_folder,
     use_threshold = saved_settings$use_threshold,
     pixels_per_micron = saved_settings$pixels_per_micron,
+    auto_sync = saved_settings$auto_sync,
     python_venv_path = saved_settings$python_venv_path
   )
 
@@ -141,6 +170,13 @@ server <- function(input, output, session) {
   annotated_samples <- reactiveVal(character())   # Manually annotated (has .mat in output folder)
   # Store mapping of sample names to classifier MAT file paths
   classifier_mat_files <- reactiveVal(list())
+  # Path maps: sample_name -> full file path (discovered during scan)
+  roi_path_map <- reactiveVal(list())
+  csv_path_map <- reactiveVal(list())
+  # Trigger for forcing a folder rescan
+  rescan_trigger <- reactiveVal(0)
+  # Timestamp of last sync (updated after scan completes)
+  last_sync_time <- reactiveVal(NULL)
 
   # Get classes in current classifications that are NOT in class2use
   unmatched_classes <- reactive({
@@ -187,8 +223,8 @@ server <- function(input, output, session) {
         div(style = "flex: 1;",
             textInput("cfg_csv_folder", "Classification Folder (CSV/MAT)",
                       value = config$csv_folder, width = "100%")),
-        actionButton("browse_csv_folder", "Browse", class = "btn-outline-secondary",
-                     style = "margin-bottom: 15px;")
+        shinyDirButton("browse_csv_folder", "Browse", "Select Classification Folder",
+                       class = "btn-outline-secondary", style = "margin-bottom: 15px;")
       ),
 
       # ROI Folder
@@ -197,8 +233,8 @@ server <- function(input, output, session) {
         div(style = "flex: 1;",
             textInput("cfg_roi_folder", "ROI Data Folder",
                       value = config$roi_folder, width = "100%")),
-        actionButton("browse_roi_folder", "Browse", class = "btn-outline-secondary",
-                     style = "margin-bottom: 15px;")
+        shinyDirButton("browse_roi_folder", "Browse", "Select ROI Data Folder",
+                       class = "btn-outline-secondary", style = "margin-bottom: 15px;")
       ),
 
       # Output Folder
@@ -207,8 +243,8 @@ server <- function(input, output, session) {
         div(style = "flex: 1;",
             textInput("cfg_output_folder", "Output Folder (MAT & CSV)",
                       value = config$output_folder, width = "100%")),
-        actionButton("browse_output_folder", "Browse", class = "btn-outline-secondary",
-                     style = "margin-bottom: 15px;")
+        shinyDirButton("browse_output_folder", "Browse", "Select Output Folder",
+                       class = "btn-outline-secondary", style = "margin-bottom: 15px;")
       ),
 
       # PNG Output Folder
@@ -217,12 +253,20 @@ server <- function(input, output, session) {
         div(style = "flex: 1;",
             textInput("cfg_png_output_folder", "PNG Output Folder",
                       value = config$png_output_folder, width = "100%")),
-        actionButton("browse_png_folder", "Browse", class = "btn-outline-secondary",
-                     style = "margin-bottom: 15px;")
+        shinyDirButton("browse_png_folder", "Browse", "Select PNG Output Folder",
+                       class = "btn-outline-secondary", style = "margin-bottom: 15px;")
       ),
 
       hr(),
 
+      # Sync options
+      checkboxInput("cfg_auto_sync", "Sync folders automatically on startup",
+                    value = config$auto_sync),
+      tags$small(class = "text-muted",
+                 "When disabled, the app loads from cache on startup. Use the sync button to update manually."),
+
+      hr(),
+
       # Classifier options
       h5("Classifier Options"),
       checkboxInput("cfg_use_threshold", "Apply classification threshold",
@@ -244,24 +288,6 @@ server <- function(input, output, session) {
 
       hr(),
 
-      # Python environment setting
-      h5("Python Environment"),
-      div(
-        style = "display: flex; gap: 5px; align-items: flex-end; margin-bottom: 10px;",
-        div(style = "flex: 1;",
-            textInput("cfg_python_venv_path", "Virtual Environment Path",
-                      value = ifelse(is.null(config$python_venv_path) || config$python_venv_path == "",
-                                     "", config$python_venv_path),
-                      width = "100%",
-                      placeholder = "Leave empty for default (./venv)")),
-        actionButton("browse_venv_folder", "Browse", class = "btn-outline-secondary",
-                     style = "margin-bottom: 15px;")
-      ),
-      tags$small(class = "text-muted",
-                 "Required for .mat file export. Leave empty to use ./venv in working directory. Changes require app restart."),
-
-      hr(),
-
       # Class list editor button
       div(
         style = "display: flex; align-items: center; gap: 10px;",
@@ -278,51 +304,54 @@ server <- function(input, output, session) {
     ))
   })
 
-  # Browse button handlers using system folder picker
+  # shinyFiles directory browser setup - dynamic roots so the dialog
+  # opens at the path currently typed in the corresponding text field
+  shinyDirChoose(input, "browse_csv_folder",
+    roots = make_dynamic_roots("cfg_csv_folder"), session = session)
+  shinyDirChoose(input, "browse_roi_folder",
+    roots = make_dynamic_roots("cfg_roi_folder"), session = session)
+  shinyDirChoose(input, "browse_output_folder",
+    roots = make_dynamic_roots("cfg_output_folder"), session = session)
+  shinyDirChoose(input, "browse_png_folder",
+    roots = make_dynamic_roots("cfg_png_output_folder"), session = session)
+
+  # Browse button observers - parse selection and update text input
   observeEvent(input$browse_csv_folder, {
-    folder <- tcltk::tk_choose.dir(default = config$csv_folder,
-                                   caption = "Select Classification Folder")
-    if (!is.na(folder) && nzchar(folder)) {
-      updateTextInput(session, "cfg_csv_folder", value = folder)
+    if (!is.integer(input$browse_csv_folder)) {
+      folder <- parseDirPath(get_browse_volumes(input$cfg_csv_folder), input$browse_csv_folder)
+      if (length(folder) > 0) {
+        updateTextInput(session, "cfg_csv_folder", value = as.character(folder))
+      }
     }
   })
 
   observeEvent(input$browse_roi_folder, {
-    folder <- tcltk::tk_choose.dir(default = config$roi_folder,
-                                   caption = "Select ROI Data Folder")
-    if (!is.na(folder) && nzchar(folder)) {
-      updateTextInput(session, "cfg_roi_folder", value = folder)
+    if (!is.integer(input$browse_roi_folder)) {
+      folder <- parseDirPath(get_browse_volumes(input$cfg_roi_folder), input$browse_roi_folder)
+      if (length(folder) > 0) {
+        updateTextInput(session, "cfg_roi_folder", value = as.character(folder))
+      }
     }
   })
 
   observeEvent(input$browse_output_folder, {
-    folder <- tcltk::tk_choose.dir(default = config$output_folder,
-                                   caption = "Select Output Folder")
-    if (!is.na(folder) && nzchar(folder)) {
-      updateTextInput(session, "cfg_output_folder", value = folder)
+    if (!is.integer(input$browse_output_folder)) {
+      folder <- parseDirPath(get_browse_volumes(input$cfg_output_folder), input$browse_output_folder)
+      if (length(folder) > 0) {
+        updateTextInput(session, "cfg_output_folder", value = as.character(folder))
+      }
     }
   })
 
   observeEvent(input$browse_png_folder, {
-    folder <- tcltk::tk_choose.dir(default = config$png_output_folder,
-                                   caption = "Select PNG Output Folder")
-    if (!is.na(folder) && nzchar(folder)) {
-      updateTextInput(session, "cfg_png_output_folder", value = folder)
+    if (!is.integer(input$browse_png_folder)) {
+      folder <- parseDirPath(get_browse_volumes(input$cfg_png_output_folder), input$browse_png_folder)
+      if (length(folder) > 0) {
+        updateTextInput(session, "cfg_png_output_folder", value = as.character(folder))
+      }
     }
   })
 
-  observeEvent(input$browse_venv_folder, {
-    default_path <- if (is.null(config$python_venv_path) || config$python_venv_path == "") {
-      startup_wd
-    } else {
-      config$python_venv_path
-    }
-    folder <- tcltk::tk_choose.dir(default = default_path,
-                                   caption = "Select Python Virtual Environment Folder")
-    if (!is.na(folder) && nzchar(folder)) {
-      updateTextInput(session, "cfg_python_venv_path", value = folder)
-    }
-  })
 
   # Class count display
 
@@ -586,11 +615,10 @@ server <- function(input, output, session) {
     config$png_output_folder <- input$cfg_png_output_folder
     config$use_threshold <- input$cfg_use_threshold
     config$pixels_per_micron <- input$cfg_pixels_per_micron
-    # Store empty string as NULL for venv path
-    venv_path <- input$cfg_python_venv_path
-    config$python_venv_path <- if (is.null(venv_path) || venv_path == "") NULL else venv_path
+    config$auto_sync <- input$cfg_auto_sync
 
     # Persist settings to file for next session
+    # python_venv_path is kept from config (set via run_app() or previous save)
     persist_settings(list(
       csv_folder = input$cfg_csv_folder,
       roi_folder = input$cfg_roi_folder,
@@ -598,16 +626,21 @@ server <- function(input, output, session) {
       png_output_folder = input$cfg_png_output_folder,
       use_threshold = input$cfg_use_threshold,
       pixels_per_micron = input$cfg_pixels_per_micron,
-      class2use_path = rv$class2use_path,  # Persist the class list path for auto-loading
+      auto_sync = input$cfg_auto_sync,
+      class2use_path = rv$class2use_path,
       python_venv_path = config$python_venv_path
     ))
 
     removeModal()
-    showNotification("Settings saved. Restart app for Python environment changes to take effect.", type = "message")
+    showNotification("Settings saved.", type = "message")
 
     # Only trigger sample rescan if folder paths actually changed
     if (paths_changed) {
-      all_samples(character())
+      cache_path <- get_file_index_path()
+      if (file.exists(cache_path)) {
+        file.remove(cache_path)
+      }
+      rescan_trigger(rescan_trigger() + 1)
     }
   })
 
@@ -615,6 +648,29 @@ server <- function(input, output, session) {
   # UI Outputs - Warnings and Indicators
   # ============================================================================
 
+  output$cache_age_text <- renderUI({
+    invalidateLater(60000)
+    ts <- last_sync_time()
+    if (!is.null(ts)) {
+      cache_time <- as.POSIXct(ts)
+      age_secs <- as.numeric(difftime(Sys.time(), cache_time, units = "secs"))
+      age_text <- if (age_secs < 60) {
+        "just now"
+      } else if (age_secs < 3600) {
+        paste0(round(age_secs / 60), " min ago")
+      } else if (age_secs < 86400) {
+        paste0(round(age_secs / 3600), " hours ago")
+      } else {
+        paste0(round(age_secs / 86400), " days ago")
+      }
+      div(
+        style = "font-size: 11px; color: #999; margin-bottom: 5px;",
+        icon("clock", style = "margin-right: 3px;"),
+        paste0("Last sync: ", age_text)
+      )
+    }
+  })
+
   output$python_warning <- renderUI({
     if (!python_available) {
       div(
@@ -749,7 +805,12 @@ server <- function(input, output, session) {
     req(rv$current_sample, rv$has_both_modes)
 
     sample_name <- rv$current_sample
-    paths <- get_sample_paths(sample_name, config$roi_folder)
+    roi_path <- roi_path_map()[[sample_name]]
+    if (is.null(roi_path)) {
+      showNotification("ROI file not found for this sample", type = "error")
+      return()
+    }
+    adc_path <- sub("\\.roi$", ".adc", roi_path)
 
     # Find classification source (CSV or classifier MAT)
     csv_path <- find_csv_file(sample_name)
@@ -759,7 +820,7 @@ server <- function(input, output, session) {
       classifications <- load_from_csv(csv_path)
       showNotification("Switched to Validation mode (CSV)", type = "message")
     } else if (!is.null(classifier_mat_path)) {
-      roi_dims <- read_roi_dimensions(paths$adc_path)
+      roi_dims <- read_roi_dimensions(adc_path)
       classifications <- load_from_classifier_mat(
         classifier_mat_path, sample_name, rv$class2use, roi_dims,
         use_threshold = config$use_threshold
@@ -794,11 +855,16 @@ server <- function(input, output, session) {
     req(rv$current_sample, rv$has_both_modes)
 
     sample_name <- rv$current_sample
-    paths <- get_sample_paths(sample_name, config$roi_folder)
+    roi_path <- roi_path_map()[[sample_name]]
+    if (is.null(roi_path)) {
+      showNotification("ROI file not found for this sample", type = "error")
+      return()
+    }
+    adc_path <- sub("\\.roi$", ".adc", roi_path)
     annotation_mat_path <- file.path(config$output_folder, paste0(sample_name, ".mat"))
 
     if (file.exists(annotation_mat_path)) {
-      roi_dims <- read_roi_dimensions(paths$adc_path)
+      roi_dims <- read_roi_dimensions(adc_path)
       classifications <- load_from_mat(annotation_mat_path, sample_name, rv$class2use, roi_dims)
 
       rv$original_classifications <- classifications
@@ -882,10 +948,9 @@ server <- function(input, output, session) {
 
       rv$class2use <- classes
 
-      # Copy to a persistent location in the working directory
-      # This ensures the file survives between sessions
+      # Copy to user config directory so it survives package reinstalls
       ext <- tools::file_ext(input$class2use_file$name)
-      persistent_path <- file.path(getwd(), paste0("class2use_saved.", ext))
+      persistent_path <- file.path(get_config_dir(), paste0("class2use_saved.", ext))
       file.copy(input$class2use_file$datapath, persistent_path, overwrite = TRUE)
       rv$class2use_path <- persistent_path
 
@@ -918,79 +983,97 @@ server <- function(input, output, session) {
   # Sample Discovery and Selection
   # ============================================================================
 
+  # Helper: populate reactive values from file index data
+  populate_from_index <- function(index_data) {
+    sample_names <- as.character(index_data$sample_names)
+    if (length(sample_names) == 0) return(FALSE)
+
+    safe_char <- function(x) as.character(if (is.null(x)) character() else x)
+    safe_list <- function(x) as.list(if (is.null(x)) list() else x)
+
+    all_samples(sample_names)
+    classified_samples(safe_char(index_data$classified_samples))
+    annotated_samples(safe_char(index_data$annotated_samples))
+    roi_path_map(safe_list(index_data$roi_path_map))
+    csv_path_map(safe_list(index_data$csv_path_map))
+    classifier_mat_files(safe_list(index_data$classifier_mat_files))
+
+    years <- unique(substr(sample_names, 2, 5))
+    years <- sort(years)
+    first_year <- if (length(years) > 0) years[1] else "all"
+    updateSelectInput(session, "year_select",
+                      choices = c("All" = "all", setNames(years, years)),
+                      selected = first_year)
+
+    last_sync_time(index_data$timestamp)
+    TRUE
+  }
+
   # Scan for available ROI files and classification files (CSV and MAT)
+  # Uses disk cache for fast startup on subsequent launches
   observe({
+    rescan_trigger()  # Force dependency on rescan trigger
     roi_folder <- config$roi_folder
     csv_folder <- config$csv_folder
+    output_folder <- config$output_folder
 
     # Validate folder paths before using them
     roi_valid <- !is.null(roi_folder) && length(roi_folder) == 1 && !isTRUE(is.na(roi_folder)) && nzchar(roi_folder) && dir.exists(roi_folder)
-    csv_valid <- !is.null(csv_folder) && length(csv_folder) == 1 && !isTRUE(is.na(csv_folder)) && nzchar(csv_folder) && dir.exists(csv_folder)
-
-    if (roi_valid) {
-      roi_files <- list.files(roi_folder, pattern = "\\.roi$",
-                              recursive = TRUE, full.names = FALSE)
-      sample_names <- tools::file_path_sans_ext(basename(roi_files))
-      sample_names <- unique(sample_names)
-
-      if (length(sample_names) > 0) {
-        all_samples(sample_names)
-
-        classified <- character()
-        mat_file_map <- list()
-
-        if (csv_valid) {
-          # Scan for CSV files RECURSIVELY
-          csv_files <- list.files(csv_folder, pattern = "\\.csv$",
-                                  recursive = TRUE, full.names = TRUE)
-          csv_samples <- tools::file_path_sans_ext(basename(csv_files))
-
-          # Scan for classifier MAT files (pattern: samplename_class*.mat)
-          mat_files <- list.files(csv_folder, pattern = "_class.*\\.mat$",
-                                  recursive = TRUE, full.names = TRUE)
-
-          # Extract sample names from MAT files (remove _class* suffix)
-          for (mat_file in mat_files) {
-            mat_basename <- basename(mat_file)
-            # Extract sample name by removing _class*.mat suffix
-            sample_from_mat <- sub("_class.*\\.mat$", "", mat_basename)
-            if (sample_from_mat %in% sample_names) {
-              mat_file_map[[sample_from_mat]] <- mat_file
-            }
-          }
 
-          # Combine CSV samples and MAT samples
-          mat_samples <- names(mat_file_map)
-          classified <- unique(c(csv_samples[csv_samples %in% sample_names], mat_samples))
-        }
+    if (!roi_valid) return()
 
-        # Scan for manually annotated samples (existing .mat files in output folder)
-        # Note: exclude classifier MAT files (which have _class* suffix)
-        annotated <- character()
-        output_folder <- config$output_folder
-        output_valid <- !is.null(output_folder) && length(output_folder) == 1 && !isTRUE(is.na(output_folder)) && nzchar(output_folder) && dir.exists(output_folder)
-        if (output_valid) {
-          output_mat_files <- list.files(output_folder, pattern = "\\.mat$", full.names = FALSE)
-          # Filter out classifier files (have _class in name)
-          manual_mat_files <- output_mat_files[!grepl("_class", output_mat_files)]
-          annotated <- tools::file_path_sans_ext(manual_mat_files)
-          annotated <- annotated[annotated %in% sample_names]
-        }
+    # Try loading from cache first
+    cached <- load_file_index()
+    cache_valid <- !is.null(cached) &&
+      identical(cached$roi_folder, roi_folder) &&
+      identical(cached$csv_folder, csv_folder) &&
+      identical(cached$output_folder, output_folder)
 
-        classified_samples(classified)
-        annotated_samples(annotated)
-        classifier_mat_files(mat_file_map)
+    if (cache_valid) {
+      populate_from_index(cached)
+      return()
+    }
 
-        years <- unique(substr(sample_names, 2, 5))
-        years <- sort(years)
+    # When auto-sync is disabled, load stale cache if available
+    auto_sync <- config$auto_sync
+    if (!isTRUE(auto_sync) && !is.null(cached)) {
+      populate_from_index(cached)
+      return()
+    }
 
-        # Auto-select first year for better UX with large sample lists
-        first_year <- if (length(years) > 0) years[1] else "all"
-        updateSelectInput(session, "year_select",
-                          choices = c("All" = "all", setNames(years, years)),
-                          selected = first_year)
-      }
+    # Full scan with progress indicator (delegates to rescan_file_index)
+    withProgress(message = "Syncing folders...", value = 0, {
+      result <- rescan_file_index(
+        roi_folder = roi_folder,
+        csv_folder = csv_folder,
+        output_folder = output_folder,
+        verbose = FALSE
+      )
+    })
+
+    if (!is.null(result)) {
+      populate_from_index(result)
+    }
+  })
+
+  # Update cache when annotations are saved (so status is correct after restart)
+  observe({
+    annotated <- annotated_samples()
+    cached <- load_file_index()
+    if (!is.null(cached) && !identical(as.character(cached$annotated_samples), annotated)) {
+      cached$annotated_samples <- annotated
+      cached$timestamp <- as.character(Sys.time())
+      save_file_index(cached)
+    }
+  })
+
+  # Rescan button: invalidate cache and trigger fresh scan
+  observeEvent(input$rescan_folders, {
+    cache_path <- get_file_index_path()
+    if (file.exists(cache_path)) {
+      file.remove(cache_path)
     }
+    rescan_trigger(rescan_trigger() + 1)
   })
 
   # Helper function to update month choices based on year selection
@@ -1089,9 +1172,10 @@ server <- function(input, output, session) {
       choices <- character(0)
     }
 
-    # Update sample dropdown
+    # Update sample dropdown with server-side processing for large datasets
     updateSelectizeInput(session, "sample_select", choices = choices,
-                         options = list(placeholder = "Select sample..."))
+                         options = list(placeholder = "Select sample..."),
+                         server = TRUE)
   }
 
   # Simple observeEvent handlers that call the helper functions
@@ -1161,15 +1245,10 @@ server <- function(input, output, session) {
   # ============================================================================
 
   find_csv_file <- function(sample_name) {
-    csv_folder <- config$csv_folder
-    if (!dir.exists(csv_folder)) return(NULL)
-
-    # Search recursively for the CSV file
-    csv_files <- list.files(csv_folder, pattern = paste0("^", sample_name, "\\.csv$"),
-                            recursive = TRUE, full.names = TRUE)
-
-    if (length(csv_files) > 0) {
-      return(csv_files[1])  # Return first match
+    csv_map <- csv_path_map()
+    path <- csv_map[[sample_name]]
+    if (!is.null(path) && file.exists(path)) {
+      return(path)
     }
     return(NULL)
   }
@@ -1207,6 +1286,8 @@ server <- function(input, output, session) {
 
       # Auto-save annotations
       tryCatch({
+        roi_path_for_save <- roi_path_map()[[rv$current_sample]]
+        adc_folder_for_save <- if (!is.null(roi_path_for_save)) dirname(roi_path_for_save) else NULL
         save_sample_annotations(
           sample_name = rv$current_sample,
           classifications = rv$classifications,
@@ -1217,7 +1298,8 @@ server <- function(input, output, session) {
           png_output_folder = config$png_output_folder,
           roi_folder = config$roi_folder,
           class2use_path = rv$class2use_path,
-          annotator = input$annotator_name
+          annotator = input$annotator_name,
+          adc_folder = adc_folder_for_save
         )
       }, error = function(e) {
         showNotification(paste("Auto-save failed:", e$message), type = "error")
@@ -1235,16 +1317,17 @@ server <- function(input, output, session) {
     has_csv <- !is.null(csv_path)
     has_classifier_mat <- !is.null(classifier_mat_path)
 
-    paths <- get_sample_paths(sample_name, config$roi_folder)
-
-    if (!file.exists(paths$roi_path)) {
-      showNotification(paste("ROI file not found:", paths$roi_path), type = "error")
+    # Use discovered paths from scan (supports any folder structure)
+    roi_path <- roi_path_map()[[sample_name]]
+    if (is.null(roi_path) || !file.exists(roi_path)) {
+      showNotification(paste("ROI file not found for:", sample_name), type = "error")
       return(FALSE)
     }
+    adc_path <- sub("\\.roi$", ".adc", roi_path)
 
     # Check session cache first
     if (sample_name %in% names(rv$session_cache)) {
-      return(load_from_cache(sample_name, paths$roi_path))
+      return(load_from_cache(sample_name, roi_path))
     }
 
     tryCatch({
@@ -1259,12 +1342,12 @@ server <- function(input, output, session) {
       # Priority: Manual annotation > Classification > New annotation
       if (has_existing_annotation) {
         # ANNOTATION MODE - from existing manual annotation (priority when both exist)
-        if (!file.exists(paths$adc_path)) {
-          showNotification(paste("ADC file not found:", paths$adc_path), type = "error")
+        if (!file.exists(adc_path)) {
+          showNotification(paste("ADC file not found:", adc_path), type = "error")
           return(FALSE)
         }
 
-        roi_dims <- read_roi_dimensions(paths$adc_path)
+        roi_dims <- read_roi_dimensions(adc_path)
         classifications <- load_from_mat(annotation_mat_path, sample_name, rv$class2use, roi_dims)
         rv$is_annotation_mode <- TRUE
 
@@ -1284,12 +1367,12 @@ server <- function(input, output, session) {
 
       } else if (has_classifier_mat) {
         # VALIDATION MODE - from classifier MAT file
-        if (!file.exists(paths$adc_path)) {
-          showNotification(paste("ADC file not found:", paths$adc_path), type = "error")
+        if (!file.exists(adc_path)) {
+          showNotification(paste("ADC file not found:", adc_path), type = "error")
           return(FALSE)
         }
 
-        roi_dims <- read_roi_dimensions(paths$adc_path)
+        roi_dims <- read_roi_dimensions(adc_path)
         classifications <- load_from_classifier_mat(
           classifier_mat_path, sample_name, rv$class2use, roi_dims,
           use_threshold = config$use_threshold
@@ -1304,12 +1387,12 @@ server <- function(input, output, session) {
 
       } else {
         # NEW ANNOTATION
-        if (!file.exists(paths$adc_path)) {
-          showNotification(paste("ADC file not found:", paths$adc_path), type = "error")
+        if (!file.exists(adc_path)) {
+          showNotification(paste("ADC file not found:", adc_path), type = "error")
           return(FALSE)
         }
 
-        roi_dims <- read_roi_dimensions(paths$adc_path)
+        roi_dims <- read_roi_dimensions(adc_path)
         classifications <- create_new_classifications(sample_name, roi_dims)
         rv$is_annotation_mode <- TRUE
 
@@ -1335,7 +1418,7 @@ server <- function(input, output, session) {
                         choices = c("All" = "all", setNames(available_classes, display_names)))
 
       # Extract images
-      extract_sample_images(sample_name, paths$roi_path, classifications)
+      extract_sample_images(sample_name, roi_path, classifications)
 
       return(TRUE)
 
@@ -1874,12 +1957,17 @@ server <- function(input, output, session) {
         )
       })
 
-      paths <- get_sample_paths(rv$current_sample, config$roi_folder)
+      roi_path <- roi_path_map()[[rv$current_sample]]
+      adc_folder <- if (!is.null(roi_path)) dirname(roi_path) else NULL
+      if (is.null(adc_folder)) {
+        showNotification("Cannot find ROI data folder for this sample", type = "error")
+        return()
+      }
 
       withProgress(message = "Saving MAT file...", {
         result <- ifcb_annotate_samples(
           png_folder = temp_annotate_folder,
-          adc_folder = paths$adc_folder,
+          adc_folder = adc_folder,
           class2use_file = rv$class2use_path,
           output_folder = output_folder,
           sample_names = rv$current_sample,
@@ -2052,7 +2140,12 @@ server <- function(input, output, session) {
     req(rv$current_sample, rv$has_both_modes)
 
     sample_name <- rv$current_sample
-    paths <- get_sample_paths(sample_name, config$roi_folder)
+    roi_path <- roi_path_map()[[sample_name]]
+    if (is.null(roi_path)) {
+      showNotification("ROI file not found for this sample", type = "error")
+      return()
+    }
+    adc_path <- sub("\\.roi$", ".adc", roi_path)
 
     # Find classification source (CSV or classifier MAT)
     csv_path <- find_csv_file(sample_name)
@@ -2062,7 +2155,7 @@ server <- function(input, output, session) {
       classifications <- load_from_csv(csv_path)
       showNotification("Switched to Validation mode (CSV)", type = "message")
     } else if (!is.null(classifier_mat_path)) {
-      roi_dims <- read_roi_dimensions(paths$adc_path)
+      roi_dims <- read_roi_dimensions(adc_path)
       classifications <- load_from_classifier_mat(
         classifier_mat_path, sample_name, rv$class2use, roi_dims,
         use_threshold = config$use_threshold
diff --git a/inst/app/ui.R b/inst/app/ui.R
index 1e7db0d..2f1cd4a 100644
--- a/inst/app/ui.R
+++ b/inst/app/ui.R
@@ -27,6 +27,8 @@ gallery_js <- function() {
       window.wasDragging = false;
       return;
     }
+    // Don't toggle selection when measuring
+    if (measureMode) return;
     var img = $(this).data('img');
     $(this).toggleClass('selected');
     updateCardStyle($(this));
@@ -180,50 +182,58 @@ gallery_js <- function() {
 
     removeMeasureLine();
 
-    var img = $(this);
-    var imgOffset = img.offset();
+    var container = $('.gallery-drag-area');
+    if (container.css('position') === 'static') {
+      container.css('position', 'relative');
+    }
+    var containerOffset = container.offset();
+
+    // Store start position relative to the gallery container
+    var relX = e.pageX - containerOffset.left;
+    var relY = e.pageY - containerOffset.top;
 
     measureStart = {
       x: e.pageX,
       y: e.pageY,
-      imgX: e.pageX - imgOffset.left,
-      imgY: e.pageY - imgOffset.top,
-      img: img
+      relX: relX,
+      relY: relY,
+      containerOffset: containerOffset
     };
 
-    // Create measure line SVG overlay
-    measureLine = $('<svg class=\"measure-line-svg\" style=\"position:fixed;top:0;left:0;width:100%;height:100%;pointer-events:none;z-index:9999;\"><line class=\"measure-line\" stroke=\"#ff0000\" stroke-width=\"2\" stroke-dasharray=\"4,2\"/><circle class=\"measure-start\" r=\"4\" fill=\"#ff0000\"/><circle class=\"measure-end\" r=\"4\" fill=\"#ff0000\"/></svg>');
-    $('body').append(measureLine);
+    // Create SVG overlay inside the gallery container (persists across re-renders)
+    measureLine = $('<svg class=\"measure-line-svg\" style=\"position:absolute;top:0;left:0;width:100%;height:100%;pointer-events:none;z-index:99;overflow:visible;\"><line class=\"measure-line\" stroke=\"#ff0000\" stroke-width=\"2\" stroke-dasharray=\"4,2\"/><circle class=\"measure-start\" r=\"4\" fill=\"#ff0000\"/><circle class=\"measure-end\" r=\"4\" fill=\"#ff0000\"/></svg>');
+    container.append(measureLine);
 
-    measureLine.find('.measure-start').attr('cx', measureStart.x).attr('cy', measureStart.y);
-    measureLine.find('.measure-line').attr('x1', measureStart.x).attr('y1', measureStart.y)
-                                     .attr('x2', measureStart.x).attr('y2', measureStart.y);
-    measureLine.find('.measure-end').attr('cx', measureStart.x).attr('cy', measureStart.y);
+    measureLine.find('.measure-start').attr('cx', relX).attr('cy', relY);
+    measureLine.find('.measure-line').attr('x1', relX).attr('y1', relY)
+                                     .attr('x2', relX).attr('y2', relY);
+    measureLine.find('.measure-end').attr('cx', relX).attr('cy', relY);
   });
 
   // Measure on image - mousemove
   $(document).on('mousemove.measure', function(e) {
     if (!measureMode || !measureStart) return;
 
-    var endX = e.pageX;
-    var endY = e.pageY;
+    // Convert to coordinates relative to the gallery container
+    var endRelX = e.pageX - measureStart.containerOffset.left;
+    var endRelY = e.pageY - measureStart.containerOffset.top;
 
-    measureLine.find('.measure-line').attr('x2', endX).attr('y2', endY);
-    measureLine.find('.measure-end').attr('cx', endX).attr('cy', endY);
+    measureLine.find('.measure-line').attr('x2', endRelX).attr('y2', endRelY);
+    measureLine.find('.measure-end').attr('cx', endRelX).attr('cy', endRelY);
 
-    // Calculate distance and show label
-    var dx = endX - measureStart.x;
-    var dy = endY - measureStart.y;
+    // Calculate distance using the absolute deltas
+    var dx = e.pageX - measureStart.x;
+    var dy = e.pageY - measureStart.y;
     var pixelDist = Math.sqrt(dx*dx + dy*dy);
     var microns = pixelDist / pixelsPerMicron;
 
     if (!measureLabel) {
-      measureLabel = $('<div class=\"measure-label\" style=\"position:fixed;background:rgba(0,0,0,0.8);color:white;padding:4px 8px;border-radius:4px;font-size:12px;z-index:10000;pointer-events:none;\"></div>');
-      $('body').append(measureLabel);
+      measureLabel = $('<div class=\"measure-label\" style=\"position:absolute;background:rgba(0,0,0,0.8);color:white;padding:4px 8px;border-radius:4px;font-size:12px;z-index:99;pointer-events:none;\"></div>');
+      $('.gallery-drag-area').append(measureLabel);
     }
 
     measureLabel.text(microns.toFixed(1) + ' µm (' + Math.round(pixelDist) + ' px)');
-    measureLabel.css({left: (endX + 15) + 'px', top: (endY - 10) + 'px'});
+    measureLabel.css({left: (endRelX + 15) + 'px', top: (endRelY - 10) + 'px'});
   });
 
   // Measure on image - mouseup
@@ -237,33 +247,16 @@ gallery_js <- function() {
   // Click anywhere else to clear measurement
   $(document).on('click.measure', function(e) {
     if (!measureMode) return;
-    if (!$(e.target).closest('.image-card img').length && !$(e.target).closest('.measure-label').length) {
+    if (!$(e.target).closest('.image-card').length && !$(e.target).closest('.measure-label').length) {
       // Don't remove if clicking on measure toggle button
       if (!$(e.target).closest('#measure_toggle').length) {
         removeMeasureLine();
       }
     }
   });
-  "
-}
 
-# Custom CSS for warning styling in dropdowns
-warning_css <- "
-/* Style selectize options containing warning symbol (⚠) */
-.selectize-dropdown-content .option:has-text('⚠'),
-.selectize-dropdown .option[data-value*='⚠'] {
-  color: #856404 !important;
-  background-color: #fff3cd !important;
-}
-/* Fallback: style any option starting with warning symbol using attribute selector */
-.selectize-dropdown-content .option {
-  /* Default styling */
-}
-/* Target the selected item in the dropdown that contains warning */
-.selectize-input .item {
-  /* Check content dynamically via JS */
+  "
 }
-"
 
 # UI object
 ui <- page_sidebar(
@@ -423,8 +416,7 @@ ui <- page_sidebar(
     div(class = "sample-dropdown",
         selectizeInput("sample_select", "Sample", choices = NULL, width = "100%",
                        options = list(
-                         placeholder = "Select sample...",
-                         maxOptions = 500
+                         placeholder = "Select sample..."
                        ))),
 
     # Legend for sample status symbols (compact, single line)
@@ -437,7 +429,7 @@ ui <- page_sidebar(
 
     # Navigation buttons
     div(
-      style = "display: flex; gap: 5px; margin-bottom: 15px;",
+      style = "display: flex; gap: 5px; margin-bottom: 5px;",
       actionButton("load_sample", "Load",
                    class = "btn-primary", style = "flex: 1;"),
       actionButton("prev_sample", label = icon("arrow-left"),
@@ -448,9 +440,15 @@ ui <- page_sidebar(
                    title = "Next sample"),
       actionButton("random_sample", label = icon("random"),
                    class = "btn-outline-secondary", style = "flex: 0;",
-                   title = "Random sample")
+                   title = "Random sample"),
+      actionButton("rescan_folders", label = icon("sync"),
+                   class = "btn-outline-secondary", style = "flex: 0;",
+                   title = "Sync folders (refresh file index)")
     ),
 
+    # Cache age indicator
+    uiOutput("cache_age_text"),
+
     hr(),
 
     # Save button (prominent)
diff --git a/man/figures/interface-overview.png b/man/figures/interface-overview.png
index d245efe..ca6f82d 100644
Binary files a/man/figures/interface-overview.png and b/man/figures/interface-overview.png differ
diff --git a/man/figures/logo.ico b/man/figures/logo.ico
new file mode 100644
index 0000000..79e004a
Binary files /dev/null and b/man/figures/logo.ico differ
diff --git a/man/figures/logo_icon.png b/man/figures/logo_icon.png
new file mode 100644
index 0000000..e034f88
Binary files /dev/null and b/man/figures/logo_icon.png differ
diff --git a/man/figures/settings-dialog.png b/man/figures/settings-dialog.png
index 73030c6..ca617f5 100644
Binary files a/man/figures/settings-dialog.png and b/man/figures/settings-dialog.png differ
diff --git a/man/get_file_index_path.Rd b/man/get_file_index_path.Rd
new file mode 100644
index 0000000..9eb7f0c
--- /dev/null
+++ b/man/get_file_index_path.Rd
@@ -0,0 +1,16 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/utils.R
+\name{get_file_index_path}
+\alias{get_file_index_path}
+\title{Get path to file index cache}
+\usage{
+get_file_index_path()
+}
+\value{
+Path to the file index JSON file
+}
+\description{
+Returns the path to the file index JSON cache file. The file index
+stores scanned folder results to avoid expensive recursive directory
+scans on startup.
+}
diff --git a/man/init_python_env.Rd b/man/init_python_env.Rd
index d0001ca..25033b1 100644
--- a/man/init_python_env.Rd
+++ b/man/init_python_env.Rd
@@ -8,7 +8,7 @@ init_python_env(venv_path = NULL)
 }
 \arguments{
 \item{venv_path}{Optional path to virtual environment. If NULL (default),
-uses a 'venv' folder in the current working directory.}
+uses a \code{venv} folder in the current working directory.}
 }
 \value{
 TRUE if Python is available, FALSE otherwise
@@ -18,6 +18,17 @@ Checks if Python is already available via reticulate, otherwise tries to
 use or create a virtual environment. Required for reading and writing
 MATLAB .mat files.
 }
+\details{
+The resolution order is:
+1. If Python is already configured via reticulate, use it directly
+   (installs scipy if missing).
+2. If \code{venv_path} is provided and the virtual environment exists,
+   activate it.
+3. If \code{venv_path} is provided but does not exist, create it via
+   \code{\link[iRfcb]{ifcb_py_install}}.
+4. If \code{venv_path} is NULL, default to \code{./venv} in the current
+   working directory for steps 2--3.
+}
 \examples{
 \dontrun{
 # Initialize with default venv path (./venv)
diff --git a/man/load_class_list.Rd b/man/load_class_list.Rd
index 8cbea54..dca371e 100644
--- a/man/load_class_list.Rd
+++ b/man/load_class_list.Rd
@@ -19,7 +19,7 @@ for safe use in file paths and HTML.
 }
 \examples{
 \dontrun{
-# Load from MATLAB file (requires Python)
+# Load from MATLAB file
 classes <- load_class_list("/path/to/class2use.mat")
 
 # Load from text file
diff --git a/man/load_file_index.Rd b/man/load_file_index.Rd
new file mode 100644
index 0000000..47aed1b
--- /dev/null
+++ b/man/load_file_index.Rd
@@ -0,0 +1,14 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/utils.R
+\name{load_file_index}
+\alias{load_file_index}
+\title{Load file index from disk cache}
+\usage{
+load_file_index()
+}
+\value{
+List with cached data, or NULL if no cache exists or it is invalid
+}
+\description{
+Reads the cached file index if it exists and is valid JSON.
+}
diff --git a/man/load_from_csv.Rd b/man/load_from_csv.Rd
index 076774b..774dadd 100644
--- a/man/load_from_csv.Rd
+++ b/man/load_from_csv.Rd
@@ -10,16 +10,34 @@ load_from_csv(csv_path)
 \item{csv_path}{Path to classification CSV file}
 }
 \value{
-Data frame with classifications (columns depend on CSV content)
+Data frame with classifications. Expected columns: `file_name`,
+  `class_name`, and optionally `score`.
 }
 \description{
 Reads a classification CSV file and returns a data frame with classifications.
 Class names are processed to truncate trailing numbers (matching iRfcb behavior).
 }
+\details{
+The CSV file must contain the following columns:
+\describe{
+  \item{file_name}{Image filename including the `.png` extension
+    (e.g., `D20230101T120000_IFCB134_00001.png`).}
+  \item{class_name}{Predicted class name (e.g., `Diatom`).}
+}
+
+An optional column may also be included:
+\describe{
+  \item{score}{Classification confidence value between 0 and 1.}
+}
+
+The CSV file must be named after the sample it describes
+(e.g., `D20230101T120000_IFCB134.csv`) and placed inside the Classification
+Folder configured in the app (subfolders are searched recursively).
+}
 \examples{
 \dontrun{
 # Load classifications from a CSV file
-classifications <- load_from_csv("/path/to/classifications.csv")
+classifications <- load_from_csv("/path/to/D20230101T120000_IFCB134.csv")
 head(classifications)
 }
 }
diff --git a/man/rescan_file_index.Rd b/man/rescan_file_index.Rd
new file mode 100644
index 0000000..8b2018d
--- /dev/null
+++ b/man/rescan_file_index.Rd
@@ -0,0 +1,50 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/utils.R
+\name{rescan_file_index}
+\alias{rescan_file_index}
+\title{Rescan folders and rebuild the file index cache}
+\usage{
+rescan_file_index(
+  roi_folder = NULL,
+  csv_folder = NULL,
+  output_folder = NULL,
+  verbose = TRUE
+)
+}
+\arguments{
+\item{roi_folder}{Path to ROI data folder. If NULL, read from saved settings.}
+
+\item{csv_folder}{Path to classification folder (CSV/MAT). If NULL, read from saved settings.}
+
+\item{output_folder}{Path to output folder for annotations. If NULL, read from saved settings.}
+
+\item{verbose}{If TRUE, print progress messages. Default TRUE.}
+}
+\value{
+Invisibly returns the file index list, or NULL if roi_folder is invalid.
+}
+\description{
+Scans the configured (or specified) ROI, classification, and output folders
+for IFCB sample files and saves the results to the file index cache.
+This can be called outside the Shiny app, e.g. from a cron job, to keep
+the cache up to date without manually clicking the rescan button.
+}
+\details{
+If folder paths are not provided, they are read from saved settings.
+}
+\examples{
+\dontrun{
+# Rescan using saved settings
+rescan_file_index()
+
+# Rescan with explicit paths
+rescan_file_index(
+  roi_folder = "/data/ifcb/raw",
+  csv_folder = "/data/ifcb/classified",
+  output_folder = "/data/ifcb/manual"
+)
+
+# Use in a cron job:
+# Rscript -e 'ClassiPyR::rescan_file_index()'
+}
+}
diff --git a/man/run_app.Rd b/man/run_app.Rd
index 5e7e70d..50c9f2e 100644
--- a/man/run_app.Rd
+++ b/man/run_app.Rd
@@ -7,10 +7,11 @@
 run_app(venv_path = NULL, reset_settings = FALSE, launch.browser = TRUE, ...)
 }
 \arguments{
-\item{venv_path}{Optional path to a Python virtual environment. If NULL (default),
-the app will use any saved venv path from settings, or fall back to a 'venv'
-folder in the current working directory. Set this to specify a custom location
-for the Python virtual environment used by iRfcb.}
+\item{venv_path}{Optional path to a Python virtual environment. When specified,
+this path takes priority over any saved venv path in settings, both for Python
+initialization at startup and in the Settings UI. If NULL (default), the app
+uses any saved venv path from settings, or falls back to a 'venv' folder in
+the current working directory.}
 
 \item{reset_settings}{If TRUE, deletes saved settings before starting the app.
 Useful for troubleshooting or starting fresh. Default is FALSE.}
@@ -27,7 +28,7 @@ This function does not return; it runs the Shiny app
 \description{
 Launches the ClassiPyR Shiny app for manual image classification and validation of IFCB data.
 This app relies on the iRfcb package for reading IFCB data files and requires
-Python (via reticulate) for reading and writing MATLAB .mat files.
+Python (via reticulate) for saving MATLAB .mat files.
 }
 \examples{
 \dontrun{
diff --git a/man/save_file_index.Rd b/man/save_file_index.Rd
new file mode 100644
index 0000000..4b6570c
--- /dev/null
+++ b/man/save_file_index.Rd
@@ -0,0 +1,17 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/utils.R
+\name{save_file_index}
+\alias{save_file_index}
+\title{Save file index to disk cache}
+\usage{
+save_file_index(data)
+}
+\arguments{
+\item{data}{List containing scan results (sample names, path maps, etc.)}
+}
+\value{
+NULL (called for side effects)
+}
+\description{
+Writes the file index data to a JSON cache file for fast startup.
+}
diff --git a/man/save_sample_annotations.Rd b/man/save_sample_annotations.Rd
index ac68c0e..ac00201 100644
--- a/man/save_sample_annotations.Rd
+++ b/man/save_sample_annotations.Rd
@@ -14,7 +14,8 @@ save_sample_annotations(
   png_output_folder,
   roi_folder,
   class2use_path,
-  annotator = "Unknown"
+  annotator = "Unknown",
+  adc_folder = NULL
 )
 }
 \arguments{
@@ -32,11 +33,15 @@ save_sample_annotations(
 
 \item{png_output_folder}{PNG output folder path (organized by class)}
 
-\item{roi_folder}{ROI folder path (for ADC file location)}
+\item{roi_folder}{ROI folder path (for ADC file location, used as fallback)}
 
 \item{class2use_path}{Path to class2use file}
 
 \item{annotator}{Annotator name for statistics}
+
+\item{adc_folder}{Direct path to the ADC folder. When provided, this is used
+instead of constructing the path via \code{\link{get_sample_paths}}.
+This supports non-standard folder structures.}
 }
 \value{
 TRUE on success, FALSE on failure
diff --git a/tests/testthat/test-app.R b/tests/testthat/test-app.R
index 13734fa..ea13464 100644
--- a/tests/testthat/test-app.R
+++ b/tests/testthat/test-app.R
@@ -37,6 +37,7 @@ test_that("required packages are listed in DESCRIPTION", {
 
   expect_true(grepl("shiny", imports))
   expect_true(grepl("shinyjs", imports))
+  expect_true(grepl("shinyFiles", imports))
   expect_true(grepl("bslib", imports))
   expect_true(grepl("iRfcb", imports))
   expect_true(grepl("dplyr", imports))
@@ -80,3 +81,8 @@ test_that("app server function can be created without errors", {
   # Verify server is a function
   expect_true(is.function(app_env$server))
 })
+
+test_that("run_app errors for non-existent app directory", {
+  expect_error(run_app(appDir= "not_an_app_dir"),
+               "No Shiny application exists at the path")
+})
diff --git a/tests/testthat/test-sample_saving.R b/tests/testthat/test-sample_saving.R
index 2f7512b..391252e 100644
--- a/tests/testthat/test-sample_saving.R
+++ b/tests/testthat/test-sample_saving.R
@@ -1,315 +1,341 @@
-# Tests for sample saving functions
-
-library(testthat)
-
-test_that("copy_images_to_class_folders creates correct folder structure", {
-  # Create temp source folder with images
-  src_folder <- tempfile("src_")
-  dir.create(src_folder)
-  file.create(file.path(src_folder, "sample_00001.png"))
-  file.create(file.path(src_folder, "sample_00002.png"))
-  file.create(file.path(src_folder, "sample_00003.png"))
-
-  # Create temp and output folders
-  temp_folder <- tempfile("temp_")
-  output_folder <- tempfile("output_")
-
-  # Create classifications
-  classifications <- data.frame(
-    file_name = c("sample_00001.png", "sample_00002.png", "sample_00003.png"),
-    class_name = c("Diatom", "Ciliate", "Diatom"),
-    stringsAsFactors = FALSE
-  )
-
-  # Run copy
-  copy_images_to_class_folders(classifications, src_folder, temp_folder, output_folder)
-
-  # Check temp folder structure (for ifcb_annotate_samples)
-  expect_true(dir.exists(file.path(temp_folder, "Diatom")))
-  expect_true(dir.exists(file.path(temp_folder, "Ciliate")))
-  expect_true(file.exists(file.path(temp_folder, "Diatom", "sample_00001.png")))
-  expect_true(file.exists(file.path(temp_folder, "Diatom", "sample_00003.png")))
-  expect_true(file.exists(file.path(temp_folder, "Ciliate", "sample_00002.png")))
-
-  # Check output folder structure (permanent storage)
-  expect_true(dir.exists(file.path(output_folder, "Diatom")))
-  expect_true(dir.exists(file.path(output_folder, "Ciliate")))
-  expect_true(file.exists(file.path(output_folder, "Diatom", "sample_00001.png")))
-
-  # Cleanup
-  unlink(src_folder, recursive = TRUE)
-  unlink(temp_folder, recursive = TRUE)
-  unlink(output_folder, recursive = TRUE)
-})
-
-test_that("copy_images_to_class_folders handles missing source files gracefully", {
-  # Create temp folders (empty src)
-  src_folder <- tempfile("src_")
-  temp_folder <- tempfile("temp_")
-  output_folder <- tempfile("output_")
-  dir.create(src_folder)
-
-  classifications <- data.frame(
-    file_name = c("nonexistent.png"),
-    class_name = c("Diatom"),
-    stringsAsFactors = FALSE
-  )
-
-  # Should not error, just skip missing files
-  expect_no_error(
-    copy_images_to_class_folders(classifications, src_folder, temp_folder, output_folder)
-  )
-
-  # Folder may or may not be created depending on implementation
-  # But definitely no files should exist
-  if (dir.exists(file.path(temp_folder, "Diatom"))) {
-    expect_equal(
-      length(list.files(file.path(temp_folder, "Diatom"))),
-      0
-    )
-  }
-
-  # Cleanup
-  unlink(src_folder, recursive = TRUE)
-  unlink(temp_folder, recursive = TRUE)
-  unlink(output_folder, recursive = TRUE)
-})
-
-test_that("save_validation_statistics creates correct CSV files", {
-  skip_if_not_installed("dplyr")
-
-  sample_name <- "D20230314T001205_IFCB134"
-  stats_folder <- tempfile("stats_")
-  dir.create(stats_folder)
-
-  original_classifications <- data.frame(
-    file_name = c("sample_00001.png", "sample_00002.png", "sample_00003.png"),
-    class_name = c("Diatom", "Ciliate", "Diatom"),
-    score = c(0.95, 0.87, 0.92),
-    stringsAsFactors = FALSE
-  )
-
-  # Current classifications with one change
-  current_classifications <- data.frame(
-    file_name = c("sample_00001.png", "sample_00002.png", "sample_00003.png"),
-    class_name = c("Diatom", "Dinoflagellate", "Diatom"),  # Ciliate -> Dinoflagellate
-    stringsAsFactors = FALSE
-  )
-
-  save_validation_statistics(
-    sample_name = sample_name,
-    classifications = current_classifications,
-    original_classifications = original_classifications,
-    stats_folder = stats_folder,
-    annotator = "TestUser"
-  )
-
-  # Check summary stats file
-  stats_file <- file.path(stats_folder, paste0(sample_name, "_validation_stats.csv"))
-  expect_true(file.exists(stats_file))
-
-  stats <- read.csv(stats_file)
-  expect_equal(stats$sample, sample_name)
-  expect_equal(stats$annotator, "TestUser")
-  expect_equal(stats$total_images, 3)
-  expect_equal(stats$correct_classifications, 2)
-  expect_equal(stats$incorrect_classifications, 1)
-  expect_equal(stats$accuracy, 2/3, tolerance = 0.001)
-
-  # Check detailed stats file
-  detailed_file <- file.path(stats_folder, paste0(sample_name, "_validation_detailed.csv"))
-  expect_true(file.exists(detailed_file))
-
-  detailed <- read.csv(detailed_file)
-  expect_equal(nrow(detailed), 3)
-  expect_true("correct" %in% names(detailed))
-  expect_true("annotator" %in% names(detailed))
-
-  # Cleanup
-  unlink(stats_folder, recursive = TRUE)
-})
-
-test_that("save_validation_statistics handles all correct classifications", {
-  skip_if_not_installed("dplyr")
-
-  sample_name <- "D20230314T001205_IFCB134"
-  stats_folder <- tempfile("stats_")
-  dir.create(stats_folder)
-
-  # All classifications are correct (no changes)
-  classifications <- data.frame(
-    file_name = c("sample_00001.png", "sample_00002.png"),
-    class_name = c("Diatom", "Ciliate"),
-    score = c(0.95, 0.87),
-    stringsAsFactors = FALSE
-  )
-
-  save_validation_statistics(
-    sample_name = sample_name,
-    classifications = classifications,
-    original_classifications = classifications,
-    stats_folder = stats_folder,
-    annotator = "TestUser"
-  )
-
-  # Check 100% accuracy
-  stats_file <- file.path(stats_folder, paste0(sample_name, "_validation_stats.csv"))
-  stats <- read.csv(stats_file)
-  expect_equal(stats$accuracy, 1.0)
-  expect_equal(stats$correct_classifications, 2)
-  expect_equal(stats$incorrect_classifications, 0)
-
-  # Cleanup
-  unlink(stats_folder, recursive = TRUE)
-})
-
-test_that("save_sample_annotations returns FALSE for NULL inputs", {
-  expect_false(save_sample_annotations(
-    sample_name = NULL,
-    classifications = data.frame(),
-    original_classifications = data.frame(),
-    changes_log = data.frame(image = "x", original_class = "a", new_class = "b"),
-    temp_png_folder = tempdir(),
-    output_folder = tempdir(),
-    png_output_folder = tempdir(),
-    roi_folder = tempdir(),
-    class2use_path = "/tmp/class2use.txt"
-  ))
-
-  expect_false(save_sample_annotations(
-    sample_name = "D20230314T001205_IFCB134",
-    classifications = NULL,
-    original_classifications = data.frame(),
-    changes_log = data.frame(image = "x", original_class = "a", new_class = "b"),
-    temp_png_folder = tempdir(),
-    output_folder = tempdir(),
-    png_output_folder = tempdir(),
-    roi_folder = tempdir(),
-    class2use_path = "/tmp/class2use.txt"
-  ))
-
-  expect_false(save_sample_annotations(
-    sample_name = "D20230314T001205_IFCB134",
-    classifications = data.frame(),
-    original_classifications = data.frame(),
-    changes_log = data.frame(image = "x", original_class = "a", new_class = "b"),
-    temp_png_folder = tempdir(),
-    output_folder = tempdir(),
-    png_output_folder = tempdir(),
-    roi_folder = tempdir(),
-    class2use_path = NULL
-  ))
-})
-
-test_that("save_sample_annotations returns FALSE for empty changes log", {
-  empty_log <- data.frame(
-    image = character(0),
-    original_class = character(0),
-    new_class = character(0),
-    stringsAsFactors = FALSE
-  )
-
-  expect_false(save_sample_annotations(
-    sample_name = "D20230314T001205_IFCB134",
-    classifications = data.frame(
-      file_name = "test.png",
-      class_name = "Diatom",
-      stringsAsFactors = FALSE
-    ),
-    original_classifications = data.frame(),
-    changes_log = empty_log,
-    temp_png_folder = tempdir(),
-    output_folder = tempdir(),
-    png_output_folder = tempdir(),
-    roi_folder = tempdir(),
-    class2use_path = "/tmp/class2use.txt"
-  ))
-})
-
-# Integration test using real test data files
-
-test_that("save_sample_annotations creates MAT file with real data", {
-  skip_if_not_installed("iRfcb")
-  skip_if_not_installed("dplyr")
-  skip_if_not(reticulate::py_available(), "Python not available")
-  skip_if_not(reticulate::py_module_available("scipy"), "scipy not available")
-
-  sample_name <- "D20220522T000439_IFCB134"
-
-  # Check test data files exist
-  png_folder <- testthat::test_path("test_data", "png")
-  roi_folder <- testthat::test_path("test_data", "raw")
-  class2use_path <- testthat::test_path("test_data", "class2use.mat")
-
-  skip_if_not(dir.exists(file.path(png_folder, sample_name)), "Test PNG folder not found")
-  skip_if_not(file.exists(class2use_path), "Test class2use file not found")
-  skip_if_not(
-    file.exists(file.path(roi_folder, "2022", "D20220522", paste0(sample_name, ".adc"))),
-    "Test ADC file not found"
-  )
-
-  # List available PNG files
-  png_files <- list.files(file.path(png_folder, sample_name), pattern = "\\.png$")
-  skip_if(length(png_files) < 2, "Not enough test PNG files")
-
-  # Create classifications matching the PNG files
-  original_classifications <- data.frame(
-    file_name = png_files,
-    class_name = rep("unclassified", length(png_files)),
-    score = rep(NA_real_, length(png_files)),
-    stringsAsFactors = FALSE
-  )
-
-  # Updated classifications with some changes
-  current_classifications <- data.frame(
-    file_name = png_files,
-    class_name = c("Mesodinium_rubrum", rep("Ciliophora", length(png_files) - 1)),
-    stringsAsFactors = FALSE
-  )
-
-  # Changes log (at least one change required)
-  changes_log <- data.frame(
-    image = png_files[1],
-    original_class = "unclassified",
-    new_class = "Mesodinium_rubrum",
-    stringsAsFactors = FALSE
-  )
-
-  # Create temp output folders
-  output_folder <- tempfile("output_")
-  png_output_folder <- tempfile("png_output_")
-
-  result <- save_sample_annotations(
-    sample_name = sample_name,
-    classifications = current_classifications,
-    original_classifications = original_classifications,
-    changes_log = changes_log,
-    temp_png_folder = png_folder,
-    output_folder = output_folder,
-    png_output_folder = png_output_folder,
-    roi_folder = roi_folder,
-    class2use_path = class2use_path,
-    annotator = "TestUser"
-  )
-
-  expect_true(result)
-
-  # Check MAT file was created (directly in output folder, not in manual/ subfolder)
-  mat_file <- file.path(output_folder, paste0(sample_name, ".mat"))
-  expect_true(file.exists(mat_file))
-
-  # Check statistics files were created (in validation_statistics subfolder)
-  stats_file <- file.path(output_folder, "validation_statistics", paste0(sample_name, "_validation_stats.csv"))
-  expect_true(file.exists(stats_file))
-
-  detailed_file <- file.path(output_folder, "validation_statistics", paste0(sample_name, "_validation_detailed.csv"))
-  expect_true(file.exists(detailed_file))
-
-  # Check PNG output folders were created
-  expect_true(dir.exists(file.path(png_output_folder, "Mesodinium_rubrum")))
-  expect_true(dir.exists(file.path(png_output_folder, "Ciliophora")))
-
-  # Cleanup
-  unlink(output_folder, recursive = TRUE)
-  unlink(png_output_folder, recursive = TRUE)
-})
+# Tests for sample saving functions
+
+library(testthat)
+
+test_that("copy_images_to_class_folders creates correct folder structure", {
+  # Create temp source folder with images
+  src_folder <- tempfile("src_")
+  dir.create(src_folder)
+  file.create(file.path(src_folder, "sample_00001.png"))
+  file.create(file.path(src_folder, "sample_00002.png"))
+  file.create(file.path(src_folder, "sample_00003.png"))
+
+  # Create temp and output folders
+  temp_folder <- tempfile("temp_")
+  output_folder <- tempfile("output_")
+
+  # Create classifications
+  classifications <- data.frame(
+    file_name = c("sample_00001.png", "sample_00002.png", "sample_00003.png"),
+    class_name = c("Diatom", "Ciliate", "Diatom"),
+    stringsAsFactors = FALSE
+  )
+
+  # Run copy
+  copy_images_to_class_folders(classifications, src_folder, temp_folder, output_folder)
+
+  # Check temp folder structure (for ifcb_annotate_samples)
+  expect_true(dir.exists(file.path(temp_folder, "Diatom")))
+  expect_true(dir.exists(file.path(temp_folder, "Ciliate")))
+  expect_true(file.exists(file.path(temp_folder, "Diatom", "sample_00001.png")))
+  expect_true(file.exists(file.path(temp_folder, "Diatom", "sample_00003.png")))
+  expect_true(file.exists(file.path(temp_folder, "Ciliate", "sample_00002.png")))
+
+  # Check output folder structure (permanent storage)
+  expect_true(dir.exists(file.path(output_folder, "Diatom")))
+  expect_true(dir.exists(file.path(output_folder, "Ciliate")))
+  expect_true(file.exists(file.path(output_folder, "Diatom", "sample_00001.png")))
+
+  # Cleanup
+  unlink(src_folder, recursive = TRUE)
+  unlink(temp_folder, recursive = TRUE)
+  unlink(output_folder, recursive = TRUE)
+})
+
+test_that("copy_images_to_class_folders handles missing source files gracefully", {
+  # Create temp folders (empty src)
+  src_folder <- tempfile("src_")
+  temp_folder <- tempfile("temp_")
+  output_folder <- tempfile("output_")
+  dir.create(src_folder)
+
+  classifications <- data.frame(
+    file_name = c("nonexistent.png"),
+    class_name = c("Diatom"),
+    stringsAsFactors = FALSE
+  )
+
+  # Should not error, just skip missing files
+  expect_no_error(
+    copy_images_to_class_folders(classifications, src_folder, temp_folder, output_folder)
+  )
+
+  # Folder may or may not be created depending on implementation
+  # But definitely no files should exist
+  if (dir.exists(file.path(temp_folder, "Diatom"))) {
+    expect_equal(
+      length(list.files(file.path(temp_folder, "Diatom"))),
+      0
+    )
+  }
+
+  # Cleanup
+  unlink(src_folder, recursive = TRUE)
+  unlink(temp_folder, recursive = TRUE)
+  unlink(output_folder, recursive = TRUE)
+})
+
+test_that("save_validation_statistics creates correct CSV files", {
+  skip_if_not_installed("dplyr")
+
+  sample_name <- "D20230314T001205_IFCB134"
+  stats_folder <- tempfile("stats_")
+  dir.create(stats_folder)
+
+  original_classifications <- data.frame(
+    file_name = c("sample_00001.png", "sample_00002.png", "sample_00003.png"),
+    class_name = c("Diatom", "Ciliate", "Diatom"),
+    score = c(0.95, 0.87, 0.92),
+    stringsAsFactors = FALSE
+  )
+
+  # Current classifications with one change
+  current_classifications <- data.frame(
+    file_name = c("sample_00001.png", "sample_00002.png", "sample_00003.png"),
+    class_name = c("Diatom", "Dinoflagellate", "Diatom"),  # Ciliate -> Dinoflagellate
+    stringsAsFactors = FALSE
+  )
+
+  save_validation_statistics(
+    sample_name = sample_name,
+    classifications = current_classifications,
+    original_classifications = original_classifications,
+    stats_folder = stats_folder,
+    annotator = "TestUser"
+  )
+
+  # Check summary stats file
+  stats_file <- file.path(stats_folder, paste0(sample_name, "_validation_stats.csv"))
+  expect_true(file.exists(stats_file))
+
+  stats <- read.csv(stats_file)
+  expect_equal(stats$sample, sample_name)
+  expect_equal(stats$annotator, "TestUser")
+  expect_equal(stats$total_images, 3)
+  expect_equal(stats$correct_classifications, 2)
+  expect_equal(stats$incorrect_classifications, 1)
+  expect_equal(stats$accuracy, 2/3, tolerance = 0.001)
+
+  # Check detailed stats file
+  detailed_file <- file.path(stats_folder, paste0(sample_name, "_validation_detailed.csv"))
+  expect_true(file.exists(detailed_file))
+
+  detailed <- read.csv(detailed_file)
+  expect_equal(nrow(detailed), 3)
+  expect_true("correct" %in% names(detailed))
+  expect_true("annotator" %in% names(detailed))
+
+  # Cleanup
+  unlink(stats_folder, recursive = TRUE)
+})
+
+test_that("save_validation_statistics handles all correct classifications", {
+  skip_if_not_installed("dplyr")
+
+  sample_name <- "D20230314T001205_IFCB134"
+  stats_folder <- tempfile("stats_")
+  dir.create(stats_folder)
+
+  # All classifications are correct (no changes)
+  classifications <- data.frame(
+    file_name = c("sample_00001.png", "sample_00002.png"),
+    class_name = c("Diatom", "Ciliate"),
+    score = c(0.95, 0.87),
+    stringsAsFactors = FALSE
+  )
+
+  save_validation_statistics(
+    sample_name = sample_name,
+    classifications = classifications,
+    original_classifications = classifications,
+    stats_folder = stats_folder,
+    annotator = "TestUser"
+  )
+
+  # Check 100% accuracy
+  stats_file <- file.path(stats_folder, paste0(sample_name, "_validation_stats.csv"))
+  stats <- read.csv(stats_file)
+  expect_equal(stats$accuracy, 1.0)
+  expect_equal(stats$correct_classifications, 2)
+  expect_equal(stats$incorrect_classifications, 0)
+
+  # Cleanup
+  unlink(stats_folder, recursive = TRUE)
+})
+
+test_that("save_sample_annotations returns FALSE for NULL inputs", {
+  expect_false(save_sample_annotations(
+    sample_name = NULL,
+    classifications = data.frame(),
+    original_classifications = data.frame(),
+    changes_log = data.frame(image = "x", original_class = "a", new_class = "b"),
+    temp_png_folder = tempdir(),
+    output_folder = tempdir(),
+    png_output_folder = tempdir(),
+    roi_folder = tempdir(),
+    class2use_path = "/tmp/class2use.txt"
+  ))
+
+  expect_false(save_sample_annotations(
+    sample_name = "D20230314T001205_IFCB134",
+    classifications = NULL,
+    original_classifications = data.frame(),
+    changes_log = data.frame(image = "x", original_class = "a", new_class = "b"),
+    temp_png_folder = tempdir(),
+    output_folder = tempdir(),
+    png_output_folder = tempdir(),
+    roi_folder = tempdir(),
+    class2use_path = "/tmp/class2use.txt"
+  ))
+
+  expect_false(save_sample_annotations(
+    sample_name = "D20230314T001205_IFCB134",
+    classifications = data.frame(),
+    original_classifications = data.frame(),
+    changes_log = data.frame(image = "x", original_class = "a", new_class = "b"),
+    temp_png_folder = tempdir(),
+    output_folder = tempdir(),
+    png_output_folder = tempdir(),
+    roi_folder = tempdir(),
+    class2use_path = NULL
+  ))
+})
+
+test_that("save_sample_annotations returns FALSE for empty changes log", {
+  empty_log <- data.frame(
+    image = character(0),
+    original_class = character(0),
+    new_class = character(0),
+    stringsAsFactors = FALSE
+  )
+
+  expect_false(save_sample_annotations(
+    sample_name = "D20230314T001205_IFCB134",
+    classifications = data.frame(
+      file_name = "test.png",
+      class_name = "Diatom",
+      stringsAsFactors = FALSE
+    ),
+    original_classifications = data.frame(),
+    changes_log = empty_log,
+    temp_png_folder = tempdir(),
+    output_folder = tempdir(),
+    png_output_folder = tempdir(),
+    roi_folder = tempdir(),
+    class2use_path = "/tmp/class2use.txt"
+  ))
+})
+
+test_that("save_sample_annotations accepts adc_folder parameter", {
+  # When adc_folder is NULL (default), it falls back to get_sample_paths()
+  # When adc_folder is provided, it uses that directly
+  # This is a smoke test for the new parameter. We just verify it does not
+  # error on the parameter itself (the actual save will fail without real data)
+  
+  expect_false(
+    save_sample_annotations(
+      sample_name = NULL,
+      classifications = data.frame(),
+      original_classifications = data.frame(),
+      changes_log = data.frame(
+        image = "x",
+        original_class = "a",
+        new_class = "b"
+      ),
+      temp_png_folder = tempdir(),
+      output_folder = tempdir(),
+      png_output_folder = tempdir(),
+      roi_folder = tempdir(),
+      class2use_path = "/tmp/class2use.txt",
+      adc_folder = "/some/custom/path"
+    )
+  )
+})
+
+# Integration test using real test data files
+
+test_that("save_sample_annotations creates MAT file with real data", {
+  skip_if_not_installed("iRfcb")
+  skip_if_not_installed("dplyr")
+  skip_if_not(reticulate::py_available(), "Python not available")
+  skip_if_not(reticulate::py_module_available("scipy"), "scipy not available")
+
+  sample_name <- "D20220522T000439_IFCB134"
+
+  # Check test data files exist
+  png_folder <- testthat::test_path("test_data", "png")
+  roi_folder <- testthat::test_path("test_data", "raw")
+  class2use_path <- testthat::test_path("test_data", "class2use.mat")
+
+  skip_if_not(dir.exists(file.path(png_folder, sample_name)), "Test PNG folder not found")
+  skip_if_not(file.exists(class2use_path), "Test class2use file not found")
+  skip_if_not(
+    file.exists(file.path(roi_folder, "2022", "D20220522", paste0(sample_name, ".adc"))),
+    "Test ADC file not found"
+  )
+
+  # List available PNG files
+  png_files <- list.files(file.path(png_folder, sample_name), pattern = "\\.png$")
+  skip_if(length(png_files) < 2, "Not enough test PNG files")
+
+  # Create classifications matching the PNG files
+  original_classifications <- data.frame(
+    file_name = png_files,
+    class_name = rep("unclassified", length(png_files)),
+    score = rep(NA_real_, length(png_files)),
+    stringsAsFactors = FALSE
+  )
+
+  # Updated classifications with some changes
+  current_classifications <- data.frame(
+    file_name = png_files,
+    class_name = c("Mesodinium_rubrum", rep("Ciliophora", length(png_files) - 1)),
+    stringsAsFactors = FALSE
+  )
+
+  # Changes log (at least one change required)
+  changes_log <- data.frame(
+    image = png_files[1],
+    original_class = "unclassified",
+    new_class = "Mesodinium_rubrum",
+    stringsAsFactors = FALSE
+  )
+
+  # Create temp output folders
+  output_folder <- tempfile("output_")
+  png_output_folder <- tempfile("png_output_")
+
+  result <- save_sample_annotations(
+    sample_name = sample_name,
+    classifications = current_classifications,
+    original_classifications = original_classifications,
+    changes_log = changes_log,
+    temp_png_folder = png_folder,
+    output_folder = output_folder,
+    png_output_folder = png_output_folder,
+    roi_folder = roi_folder,
+    class2use_path = class2use_path,
+    annotator = "TestUser"
+  )
+
+  expect_true(result)
+
+  # Check MAT file was created (directly in output folder, not in manual/ subfolder)
+  mat_file <- file.path(output_folder, paste0(sample_name, ".mat"))
+  expect_true(file.exists(mat_file))
+
+  # Check statistics files were created (in validation_statistics subfolder)
+  stats_file <- file.path(output_folder, "validation_statistics", paste0(sample_name, "_validation_stats.csv"))
+  expect_true(file.exists(stats_file))
+
+  detailed_file <- file.path(output_folder, "validation_statistics", paste0(sample_name, "_validation_detailed.csv"))
+  expect_true(file.exists(detailed_file))
+
+  # Check PNG output folders were created
+  expect_true(dir.exists(file.path(png_output_folder, "Mesodinium_rubrum")))
+  expect_true(dir.exists(file.path(png_output_folder, "Ciliophora")))
+
+  # Cleanup
+  unlink(output_folder, recursive = TRUE)
+  unlink(png_output_folder, recursive = TRUE)
+})
diff --git a/tests/testthat/test-utils.R b/tests/testthat/test-utils.R
index 9c6014a..9cdc24b 100644
--- a/tests/testthat/test-utils.R
+++ b/tests/testthat/test-utils.R
@@ -113,6 +113,22 @@ test_that("read_roi_dimensions returns correct structure", {
  unlink(temp_adc)
 })
 
+test_that("read_roi_dimensions handles 0-byte ADC file", {
+  # create a truly empty temp file
+  tmp_file <- tempfile(fileext = ".adc")
+  file.create(tmp_file)  # creates 0-byte file
+  
+  res <- read_roi_dimensions(tmp_file)
+  
+  # Expect a data frame with zero rows and correct columns
+  expect_s3_class(res, "data.frame")
+  expect_equal(nrow(res), 0)
+  expect_equal(colnames(res), c("roi_number", "width", "height", "area"))
+  
+  # Clean up
+  unlink(tmp_file)
+})
+
 test_that("create_empty_changes_log returns correct structure", {
  log <- create_empty_changes_log()
 
@@ -330,3 +346,269 @@ test_that("get_config_dir uses tempdir during R CMD check", {
     Sys.setenv("_R_CHECK_PACKAGE_NAME_" = old_val)
   }
 })
+
+# =============================================================================
+# File index cache functions
+# =============================================================================
+
+test_that("get_file_index_path returns a valid path ending in .json", {
+  index_path <- get_file_index_path()
+
+  expect_type(index_path, "character")
+  expect_true(grepl("\\.json$", index_path))
+  expect_true(grepl("ClassiPyR", index_path))
+  # Should be in the same directory as settings
+  expect_equal(dirname(index_path), dirname(get_settings_path()))
+})
+
+test_that("save_file_index and load_file_index round-trip data correctly", {
+  # Clean up any existing cache first
+  cache_path <- get_file_index_path()
+  if (file.exists(cache_path)) file.remove(cache_path)
+
+  test_data <- list(
+    roi_folder = "/data/roi",
+    csv_folder = "/data/csv",
+    output_folder = "/data/output",
+    sample_names = c("D20230101T120000_IFCB134", "D20230102T130000_IFCB134"),
+    classified_samples = c("D20230101T120000_IFCB134"),
+    annotated_samples = character(),
+    roi_path_map = list(
+      "D20230101T120000_IFCB134" = "/data/roi/2023/D20230101/D20230101T120000_IFCB134.roi",
+      "D20230102T130000_IFCB134" = "/data/roi/2023/D20230102/D20230102T130000_IFCB134.roi"
+    ),
+    csv_path_map = list(
+      "D20230101T120000_IFCB134" = "/data/csv/2023/D20230101T120000_IFCB134.csv"
+    ),
+    classifier_mat_files = list(
+      "D20230101T120000_IFCB134" = "/data/csv/2023/D20230101T120000_IFCB134_class_v1.mat"
+    ),
+    timestamp = "2024-01-01 12:00:00"
+  )
+
+  # Write using actual exported function
+  save_file_index(test_data)
+  expect_true(file.exists(cache_path))
+
+  # Read back using actual exported function
+  loaded <- load_file_index()
+
+  expect_type(loaded, "list")
+  expect_equal(loaded$roi_folder, "/data/roi")
+  expect_equal(loaded$csv_folder, "/data/csv")
+  expect_equal(loaded$output_folder, "/data/output")
+  expect_length(loaded$sample_names, 2)
+  expect_equal(loaded$sample_names[[1]], "D20230101T120000_IFCB134")
+  expect_length(loaded$classified_samples, 1)
+  expect_length(loaded$annotated_samples, 0)
+
+  # Path maps survive JSON round-trip as named lists
+  roi_map <- as.list(loaded$roi_path_map)
+  expect_equal(roi_map[["D20230101T120000_IFCB134"]],
+               "/data/roi/2023/D20230101/D20230101T120000_IFCB134.roi")
+  expect_equal(roi_map[["D20230102T130000_IFCB134"]],
+               "/data/roi/2023/D20230102/D20230102T130000_IFCB134.roi")
+
+  csv_map <- as.list(loaded$csv_path_map)
+  expect_equal(csv_map[["D20230101T120000_IFCB134"]],
+               "/data/csv/2023/D20230101T120000_IFCB134.csv")
+
+  mat_map <- as.list(loaded$classifier_mat_files)
+  expect_equal(mat_map[["D20230101T120000_IFCB134"]],
+               "/data/csv/2023/D20230101T120000_IFCB134_class_v1.mat")
+
+  expect_equal(loaded$timestamp, "2024-01-01 12:00:00")
+
+  # Clean up
+  if (file.exists(cache_path)) file.remove(cache_path)
+})
+
+test_that("save_file_index and load_file_index handle empty lists correctly", {
+  cache_path <- get_file_index_path()
+  if (file.exists(cache_path)) file.remove(cache_path)
+
+  test_data <- list(
+    roi_folder = "/test/roi",
+    csv_folder = "/test/csv",
+    output_folder = "/test/output",
+    sample_names = c("D20220101T000000_IFCB1"),
+    classified_samples = character(),
+    annotated_samples = character(),
+    roi_path_map = list("D20220101T000000_IFCB1" = "/test/roi/sample.roi"),
+    csv_path_map = list(),
+    classifier_mat_files = list(),
+    timestamp = as.character(Sys.time())
+  )
+
+  save_file_index(test_data)
+  loaded <- load_file_index()
+
+  expect_equal(loaded$roi_folder, "/test/roi")
+  expect_equal(as.character(loaded$sample_names), "D20220101T000000_IFCB1")
+
+  # Path map round-trips correctly
+  roi_map <- as.list(loaded$roi_path_map)
+  expect_equal(roi_map[["D20220101T000000_IFCB1"]], "/test/roi/sample.roi")
+
+  # Empty lists round-trip correctly
+  expect_length(loaded$csv_path_map, 0)
+  expect_length(loaded$classifier_mat_files, 0)
+  expect_length(loaded$classified_samples, 0)
+  expect_length(loaded$annotated_samples, 0)
+
+  if (file.exists(cache_path)) file.remove(cache_path)
+})
+
+test_that("load_file_index returns NULL when no cache exists", {
+  cache_path <- get_file_index_path()
+  if (file.exists(cache_path)) file.remove(cache_path)
+
+  result <- load_file_index()
+  expect_null(result)
+})
+
+test_that("load_file_index returns NULL for invalid JSON", {
+  cache_path <- get_file_index_path()
+
+  # Write invalid JSON to the actual cache path
+  dir.create(dirname(cache_path), recursive = TRUE, showWarnings = FALSE)
+  writeLines("this is not valid json {{{", cache_path)
+
+  result <- load_file_index()
+  expect_null(result)
+
+  if (file.exists(cache_path)) file.remove(cache_path)
+})
+
+test_that("save_file_index handles write errors gracefully", {
+  # Try to save to an invalid path - should not error (message only)
+  expect_no_error(
+    save_file_index(list(test = TRUE))
+  )
+})
+
+# =============================================================================
+# rescan_file_index
+# =============================================================================
+
+test_that("rescan_file_index returns NULL for invalid roi_folder", {
+  result <- rescan_file_index(
+    roi_folder = "/nonexistent/path",
+    csv_folder = "/nonexistent/path",
+    output_folder = "/nonexistent/path",
+    verbose = FALSE
+  )
+  expect_null(result)
+})
+
+test_that("rescan_file_index scans folders and builds cache", {
+  # Create a temp directory structure with mock ROI, CSV, and MAT files
+  temp_root <- tempfile("rescan_test_")
+  roi_folder <- file.path(temp_root, "raw", "2023", "D20230101")
+  csv_folder <- file.path(temp_root, "classified", "2023")
+  output_folder <- file.path(temp_root, "manual")
+  dir.create(roi_folder, recursive = TRUE)
+  dir.create(csv_folder, recursive = TRUE)
+  dir.create(output_folder, recursive = TRUE)
+
+  # Create mock ROI/ADC files
+  file.create(file.path(roi_folder, "D20230101T120000_IFCB134.roi"))
+  file.create(file.path(roi_folder, "D20230101T120000_IFCB134.adc"))
+  file.create(file.path(roi_folder, "D20230101T130000_IFCB134.roi"))
+  file.create(file.path(roi_folder, "D20230101T130000_IFCB134.adc"))
+
+  # Create a mock CSV classification
+  writeLines("file_name,class_name", file.path(csv_folder, "D20230101T120000_IFCB134.csv"))
+
+  # Create a mock manual annotation MAT
+  file.create(file.path(output_folder, "D20230101T130000_IFCB134.mat"))
+
+  result <- rescan_file_index(
+    roi_folder = file.path(temp_root, "raw"),
+    csv_folder = file.path(temp_root, "classified"),
+    output_folder = output_folder,
+    verbose = FALSE
+  )
+
+  expect_type(result, "list")
+  expect_length(result$sample_names, 2)
+  expect_true("D20230101T120000_IFCB134" %in% result$sample_names)
+  expect_true("D20230101T130000_IFCB134" %in% result$sample_names)
+
+  # Check classified samples (CSV match)
+  expect_true("D20230101T120000_IFCB134" %in% result$classified_samples)
+
+  # Check annotated samples (MAT in output folder)
+  expect_true("D20230101T130000_IFCB134" %in% result$annotated_samples)
+
+  # Check ROI path map
+  expect_true(!is.null(result$roi_path_map[["D20230101T120000_IFCB134"]]))
+  expect_true(grepl("\\.roi$", result$roi_path_map[["D20230101T120000_IFCB134"]]))
+
+  # Check CSV path map
+  expect_true(!is.null(result$csv_path_map[["D20230101T120000_IFCB134"]]))
+
+  # Check timestamp exists
+  expect_true(!is.null(result$timestamp))
+
+  # Verify the cache file was written
+  cache_path <- get_file_index_path()
+  expect_true(file.exists(cache_path))
+
+  # Verify round-trip: load cache and compare
+  loaded <- load_file_index()
+  expect_equal(length(loaded$sample_names), 2)
+
+  unlink(temp_root, recursive = TRUE)
+})
+
+test_that("rescan_file_index works with non-standard folder structure", {
+  # Create a flat folder structure (no YYYY/DYYYYMMDD hierarchy)
+  temp_root <- tempfile("flat_test_")
+  roi_folder <- file.path(temp_root, "all_roi_files")
+  dir.create(roi_folder, recursive = TRUE)
+
+  # ROI files directly in the folder, no subdirectories
+  file.create(file.path(roi_folder, "D20220601T100000_IFCB1.roi"))
+  file.create(file.path(roi_folder, "D20220601T100000_IFCB1.adc"))
+  file.create(file.path(roi_folder, "D20230715T200000_IFCB999.roi"))
+  file.create(file.path(roi_folder, "D20230715T200000_IFCB999.adc"))
+
+  result <- rescan_file_index(
+    roi_folder = roi_folder,
+    csv_folder = tempdir(),
+    output_folder = tempdir(),
+    verbose = FALSE
+  )
+
+  expect_type(result, "list")
+  expect_length(result$sample_names, 2)
+  expect_true("D20220601T100000_IFCB1" %in% result$sample_names)
+  expect_true("D20230715T200000_IFCB999" %in% result$sample_names)
+
+  # Path map should contain the flat paths (no year subdirectory)
+  roi_path <- result$roi_path_map[["D20220601T100000_IFCB1"]]
+  expect_true(grepl("all_roi_files", roi_path))
+  # The path should go directly from roi_folder to the file, no YYYY/DYYYYMMDD layer
+  expect_equal(normalizePath(dirname(roi_path), winslash = "/"), 
+               normalizePath(roi_folder, winslash = "/"))
+
+  unlink(temp_root, recursive = TRUE)
+})
+
+test_that("rescan_file_index reads folder paths from saved settings", {
+  # This test verifies that rescan_file_index falls back to saved settings
+  # We can't easily mock get_settings_path, so we test the fallback path:
+  # when all folder args are NULL and no settings file exists, it should
+  # return NULL gracefully
+  result <- rescan_file_index(
+    roi_folder = NULL,
+    csv_folder = NULL,
+    output_folder = NULL,
+    verbose = FALSE
+  )
+  # If no settings exist with valid paths, result is NULL
+  # (the actual behavior depends on whether settings are saved,
+  # but the function should not error)
+  expect_true(is.null(result) || is.list(result))
+})
diff --git a/vignettes/class-management.Rmd b/vignettes/class-management.Rmd
index 54c2f68..3cba17f 100644
--- a/vignettes/class-management.Rmd
+++ b/vignettes/class-management.Rmd
@@ -82,7 +82,6 @@ Standard MATLAB class2use format:
 2. Click Browse next to "Class List File"
 3. Select your `.mat` file
 
-> **Note**: Reading .mat files requires Python (via iRfcb).
 
 ### From Text File
 
diff --git a/vignettes/faq.Rmd b/vignettes/faq.Rmd
index b9ea2be..f080c00 100644
--- a/vignettes/faq.Rmd
+++ b/vignettes/faq.Rmd
@@ -42,13 +42,11 @@ A: No. The app only reads your original files. All output is written to separate
 
 **Q: I see "Python not available" warning**
 
-A: This warning affects reading and writing .mat files. Python is required for:
+A: This warning affects saving .mat files. Python is required for:
 
-- Loading existing manual annotations (.mat files)
-- Loading MATLAB classifier output (.mat files)
-- Saving annotations as .mat files
+- Saving annotations as .mat files for [ifcb-analysis](https://github.com/hsosik/ifcb-analysis)
 
-If you only work with CSV files, you can ignore this warning.
+Reading .mat files (annotations, classifier output, class lists) does not require Python. If you do not need to save .mat files, you can ignore this warning.
 
 To enable .mat support:
 
@@ -61,18 +59,28 @@ Then restart the app.
 
 **Q: Where is the Python virtual environment created?**
 
-A: By default, `ifcb_py_install()` creates a `venv` folder in your current working directory. You can specify a different location:
+A: By default, `ifcb_py_install()` creates a `venv` folder in your home directory. You can specify a different location:
 
 ```{r, eval = FALSE}
 ifcb_py_install("/path/to/your/venv")
 ```
 
-You can also configure the venv path in Settings or when launching the app:
+You can also specify the venv path when launching the app:
 
 ```{r, eval = FALSE}
 run_app(venv_path = "/path/to/your/venv")
 ```
 
+**Q: How is the Python virtual environment path resolved?**
+
+A: The app uses the following priority order:
+
+1. **`venv_path` argument** passed to `run_app()` (highest priority)
+2. **Saved settings** from a previous session (stored in `settings.json`)
+3. **Default** `./venv` in the working directory
+
+When you specify `run_app(venv_path = "/path/to/venv")`, that path is used for Python initialization and pre-filled in the Settings dialog, overriding any previously saved path.
+
 **Q: Package installation fails**
 
 A: Make sure you have remotes installed and try:
@@ -87,7 +95,7 @@ remotes::install_github("EuropeanIFCBGroup/ClassiPyR")
 A: Try reinstalling the package:
 
 ```{r, eval = FALSE}
-remotes::install_github("EuropeanIFCBGroup/ClassiPyR", force = TRUE)
+install.packages("iRfcb")
 ```
 
 **Q: iRfcb won't install**
@@ -95,7 +103,7 @@ remotes::install_github("EuropeanIFCBGroup/ClassiPyR", force = TRUE)
 A: [iRfcb](https://github.com/EuropeanIFCBGroup/iRfcb) is the core dependency for `ClassiPyR` and is installed automatically. If you encounter issues:
 
 ```{r, eval = FALSE}
-remotes::install_github("EuropeanIFCBGroup/iRfcb")
+install.packages("iRfcb")
 ```
 
 ---
@@ -107,34 +115,32 @@ remotes::install_github("EuropeanIFCBGroup/iRfcb")
 A: Check that:
 
 1. ROI Data Folder points to your data
-2. Data is organized as: `folder/YYYY/DYYYYMMDD/files`
-3. ROI files exist and are readable
+2. ROI files exist and are readable
+3. Click the **Sync** button (circular arrow icon) to rescan folders if you recently added new data
 
 **Q: "ROI file not found" error**
 
-A: The app expects this structure:
+A: The app scans the ROI Data Folder recursively, so any subfolder layout works (including flat). Check that:
 
-```
-roi_folder/
-  2023/
-    D20230101/
-      D20230101T120000_IFCB134.roi
-      D20230101T120000_IFCB134.adc
-```
+1. The ROI Data Folder path is correct
+2. Each `.roi` file has a matching `.adc` file in the same directory
+3. Filenames follow the IFCB naming convention (`DYYYYMMDDTHHMMSS_IFCBNNN`)
+4. Click the **Sync** button to rescan if you recently moved or added files
 
 **Q: Classifications not loading**
 
 A: For CSV files:
 
-- Must have columns containing "file" and "class" in their names
-- Recommended column names: `file_name` and `class_name`
-- File should be in the Classification Folder (searched recursively)
+- Must have columns named `file_name` and `class_name` (exact names required)
+- Optionally include a `score` column (confidence value between 0 and 1)
+- The CSV file must be named after the sample (e.g., `D20230101T120000_IFCB134.csv`)
+- File should be in the Classification Folder (indexed via file cache; click Sync to refresh)
 
 For MAT files:
 
 - Must match pattern `*_class*.mat`
 - Must contain `roinum` and `TBclass` variables
-- Requires Python to be available
+- Must contain `roinum` and `TBclass` variables
 
 ---
 
@@ -142,7 +148,9 @@ For MAT files:
 
 **Q: What should my classification CSV look like?**
 
-A: At minimum, your CSV needs:
+A: The CSV must have columns named `file_name` and `class_name`. The file must be named after the sample (e.g., `D20230101T120000_IFCB134.csv`).
+
+Minimal example:
 
 ```
 file_name,class_name
@@ -150,16 +158,25 @@ D20230101T120000_IFCB134_00001.png,Diatom
 D20230101T120000_IFCB134_00002.png,Ciliate
 ```
 
-Optional columns include `score` for confidence values (0-1).
+With optional `score` column (confidence values between 0 and 1):
 
-**Q: My CNN classifier outputs different column names**
+```
+file_name,class_name,score
+D20230101T120000_IFCB134_00001.png,Diatom,0.95
+D20230101T120000_IFCB134_00002.png,Ciliate,0.87
+D20230101T120000_IFCB134_00003.png,Dinoflagellate,0.72
+```
 
-A: The app uses flexible column matching and looks for columns containing "file" and "class". These variants work:
+**Q: My CNN classifier outputs different column names**
 
-- `filename`, `image_file`, `file_path` → matched as file column
-- `class`, `predicted_class`, `classification` → matched as class column
+A: The column names must be exactly `file_name` and `class_name`. If your classifier uses different names, rename the columns before loading. For example in R:
 
-If your format is different, rename the columns to `file_name` and `class_name`.
+```{r, eval = FALSE}
+df <- read.csv("my_classifications.csv")
+names(df)[names(df) == "predicted_class"] <- "class_name"
+names(df)[names(df) == "filename"] <- "file_name"
+write.csv(df, "D20230101T120000_IFCB134.csv", row.names = FALSE)
+```
 
 ---
 
@@ -182,7 +199,7 @@ A: The ROI might be empty (no actual image data). These are filtered out automat
 A: Check that:
 
 1. Output folder is writable
-2. Python is available (required for .mat files)
+2. Python is available (required for saving .mat files)
 3. Click "Save Annotations" before closing
 
 ---
@@ -210,7 +227,7 @@ A:
 
 **Q: Can I import a class list from MATLAB?**
 
-A: Yes, load your existing `class2use.mat` file via Settings. Note: this requires Python.
+A: Yes, load your existing `class2use.mat` file via Settings.
 
 **Q: My class names look different after loading**
 
@@ -245,14 +262,14 @@ A: In the Output Folder you configured:
 
 **Q: Can I import annotations back to MATLAB?**
 
-A: Yes, the MAT files are compatible with the [ifcb-analysis](https://github.com/hsosik/ifcb-analysis) toolbox (Sosik & Olson, 2007). Use:
+A: Yes, the MAT files are compatible with the [ifcb-analysis](https://github.com/hsosik/ifcb-analysis) toolbox (Sosik & Olson, 2007). Use the list in `startMC`, or load the list in MATLAB using:
 
 ```matlab
 load('sample_name.mat');
 % classlist contains [roi_number, class_index]
 ```
 
-Note: Python with scipy must be installed to save .mat files.
+Note: Python with `scipy` must be installed to save .mat files.
 
 **Q: What's in the statistics CSV?**
 
@@ -281,12 +298,33 @@ A: Open Settings and find "Pixels per Micrometer". The default is 3.4 (standard
 
 A: Yes! Settings are stored in a configuration file:
 
-- **Linux**: `~/.local/share/ClassiPyR/settings.json`
-- **macOS**: `~/Library/Application Support/ClassiPyR/settings.json`
-- **Windows**: `%LOCALAPPDATA%/ClassiPyR/settings.json`
+- **Linux**: `~/.config/R/ClassiPyR/settings.json`
+- **macOS**: `~/Library/Preferences/org.R-project.R/R/ClassiPyR/settings.json`
+- **Windows**: `%APPDATA%/R/config/R/ClassiPyR/settings.json`
 
 Folder paths, class list location, and Python venv path are automatically restored when you restart the app.
 
+**Q: How do I reset all settings to defaults?**
+
+A: Use the `reset_settings` argument when launching the app:
+
+```{r, eval = FALSE}
+run_app(reset_settings = TRUE)
+```
+
+This deletes the saved `settings.json` file and starts the app with default values. All folder paths, the class list reference, and the Python venv path are cleared, so you will need to reconfigure them. The class list file itself (`class2use_saved.*`) is not deleted from the config directory but will not be loaded until you re-upload it. This is useful if:
+
+- The app fails to start due to invalid saved paths
+- Folder paths point to locations that no longer exist
+- You want a clean slate after changing your data layout
+
+You can also combine it with other arguments:
+
+```{r, eval = FALSE}
+# Reset settings and specify a new Python environment
+run_app(reset_settings = TRUE, venv_path = "/path/to/your/venv")
+```
+
 **Q: What's the yellow warning on some classes?**
 
 A: Classes marked with a warning are in your classification data but not in your class list. This can happen when:
@@ -306,27 +344,74 @@ A: This sample has both manual annotations AND auto-classifications available. W
 
 ---
 
+## File Index Cache
+
+**Q: What is the file index cache?**
+
+A: The file index cache stores the locations of all IFCB files (ROI, classification, annotation) found in your configured folders. It's saved to disk so the app doesn't need to re-scan your entire folder hierarchy every time it starts. This significantly speeds up startup for large datasets.
+
+**Q: How do I refresh the file cache?**
+
+A: Click the **Sync** button (circular arrow icon) in the sidebar, next to the sample navigation buttons. The cache age indicator below shows when the last scan occurred.
+
+**Q: New samples I added aren't showing up**
+
+A: The app loads from the cached file index. Click the **Sync** button to rescan your folders and pick up new files.
+
+**Q: Can I update the cache without opening the app?**
+
+A: Yes. Use `rescan_file_index()` from the R console or a scheduled script:
+
+```{r, eval = FALSE}
+ClassiPyR::rescan_file_index()
+```
+
+This reads folder paths from your saved settings and rebuilds the cache. You can also pass paths explicitly:
+
+```{r, eval = FALSE}
+ClassiPyR::rescan_file_index(
+  roi_folder = "/data/ifcb/raw",
+  csv_folder = "/data/ifcb/classified",
+  output_folder = "/data/ifcb/manual"
+)
+```
+
+**Q: Where is the cache file stored?**
+
+A: In the same config directory as your settings:
+
+- **Linux**: `~/.config/R/ClassiPyR/file_index.json`
+- **macOS**: `~/Library/Preferences/org.R-project.R/R/ClassiPyR/file_index.json`
+- **Windows**: `%APPDATA%/R/config/R/ClassiPyR/file_index.json`
+
+---
+
 ## Error Messages
 
 | Error | Solution |
 |-------|----------|
-| "ROI file not found" | Check ROI Data Folder path and file structure |
+| "ROI file not found" | Check ROI Data Folder path; ensure `.roi` files use IFCB naming and click Sync |
 | "ADC file not found" | ADC file must be alongside ROI file |
-| "Python not available" | Affects .mat files. Run `iRfcb::ifcb_py_install()` |
+| "Python not available" | Affects saving .mat files. Run `iRfcb::ifcb_py_install()` |
 | "Error loading class list" | Check file format (.mat or .txt) |
 | "No samples found" | Check ROI Data Folder configuration |
+| App fails to start | Try `run_app(reset_settings = TRUE)` to clear saved settings |
 
 ---
 
 ## Performance Tips
 
-1. **Use pagination** - Lower images per page for faster loading
+1. **File index cache** - The app caches folder scan results for fast startup. Click Sync only when you've added new data.
+
+2. **Use pagination** - Lower images per page for faster loading
+
+3. **Filter by class** - Reduces rendering load
 
-2. **Filter by class** - Reduces rendering load
+4. **Close other apps** - Image extraction uses memory
 
-3. **Close other apps** - Image extraction uses memory
+5. **SSD storage** - Faster file access
 
-4. **SSD storage** - Faster file access
+6. **Scheduled rescans** - On servers with regularly arriving data, use `ClassiPyR::rescan_file_index()` in a cron job to keep the cache current without manual intervention
 
 ---
 
diff --git a/vignettes/getting-started.Rmd b/vignettes/getting-started.Rmd
index d39b918..c4eaee5 100644
--- a/vignettes/getting-started.Rmd
+++ b/vignettes/getting-started.Rmd
@@ -23,17 +23,27 @@ Make sure you have:
 1. The package installed (see [Installation](https://europeanifcbgroup.github.io/ClassiPyR/))
 2. Your IFCB data files (ROI, ADC, HDR)
 3. Optionally: a class list file (.mat or .txt) - you can also create one from scratch in the app
-4. Optionally: existing classifications (CSV or classifier MAT files)
+4. Optionally: existing classifications (CSV or classifier MAT files, see below)
 
 ### Python Requirements
 
-Python is required if you work with MATLAB .mat files:
+Python is required for saving annotations as MATLAB .mat files for use with [ifcb-analysis](https://github.com/hsosik/ifcb-analysis). Reading existing .mat files (annotations, classifier output, class lists) does not require Python.
 
-- **Loading existing annotations** (.mat files from previous sessions)
-- **Loading MATLAB classifier output** (.mat files)
-- **Saving annotations** as .mat files for [ifcb-analysis](https://github.com/hsosik/ifcb-analysis)
+If you only need to read .mat files or work with CSV classification files, Python is not required.
 
-If you only work with CSV classification files, Python is not required.
+### CSV Classification Format
+
+If you have existing classifications in CSV format, each file must be named after its sample (e.g., `D20230101T120000_IFCB134.csv`) and contain at least these columns:
+
+```
+file_name,class_name
+D20230101T120000_IFCB134_00001.png,Diatom
+D20230101T120000_IFCB134_00002.png,Ciliate
+```
+
+An optional `score` column (confidence values between 0 and 1) can also be included. See the [User Guide](user-guide.html) for more details.
+
+### Python Setup
 
 To set up Python:
 
@@ -52,8 +62,8 @@ Launch the app:
 library(ClassiPyR)
 run_app()
 
-# Or specify a custom Python virtual environment path
-run_app(venv_path = "/path/to/your/venv")
+# Or specify a custom Python virtual environment path (takes priority over saved settings)
+run_app(venv_path = "./venv")
 ```
 
 Click the **gear icon** next to your username in the sidebar.
@@ -63,7 +73,7 @@ Click the **gear icon** next to your username in the sidebar.
 </a>
 <p><em>Settings dialog showing folder configuration options. Click to enlarge.</em></p>
 
-Configure your folders:
+Configure your folders using the built-in folder browser:
 
 | Setting | Description | Example |
 |---------|-------------|---------|
@@ -72,9 +82,9 @@ Configure your folders:
 | Output Folder | Where annotations will be saved | `/ifcb/manual/` |
 | PNG Output Folder | Where images will be organized | `/ifcb/png/` |
 
-Click **Save Settings**.
+Click **Save Settings**. The app will scan your folders and build a file index cache for fast loading.
 
-> **Note**: You can also configure the Python virtual environment path in Settings if you didn't specify it when launching the app.
+> **Note**: The Python virtual environment path is configured via `run_app(venv_path = ...)` and remembered for future sessions. See the [FAQ](faq.html) for details on how the path is resolved.
 
 ---
 
@@ -126,12 +136,14 @@ Choose a sample from the dropdown:
 - \* = Unannotated (new sample)
 
 <a href="https://europeanifcbgroup.github.io/ClassiPyR/reference/figures/sample-browser.png">
-<img src="https://europeanifcbgroup.github.io/ClassiPyR/reference/figures/sample-browser.png" alt="Sample browser with year/month filters and status indicators." style="max-width:60%;">
+<img src="https://europeanifcbgroup.github.io/ClassiPyR/reference/figures/sample-browser.png" alt="Sample browser with year/month filters." style="max-width:60%;">
 </a>
-<p><em>Sample browser with year/month filters and status indicators. Click to enlarge.</em></p>
+<p><em>Sample browser with year/month filters. Click to enlarge.</em></p>
 
 Click **Load**.
 
+> **Tip**: If your sample list seems out of date, click the **Sync** button (circular arrow icon) next to the navigation buttons to rescan your folders.
+
 > **Tip**: Samples with ✎✓ let you switch between viewing your manual annotations and the auto-classifications using a button in the header.
 
 ---
diff --git a/vignettes/user-guide.Rmd b/vignettes/user-guide.Rmd
index cdb0f4c..c3676b4 100644
--- a/vignettes/user-guide.Rmd
+++ b/vignettes/user-guide.Rmd
@@ -28,7 +28,8 @@ Complete documentation for all `ClassiPyR` features.
 ### Title Bar
 
 - **App name and version**
-- **Mode indicator**: Shows current sample and mode (Validation/Annotation)
+- **Mode indicator**: Shows current state and mode
+  - No sample loaded: Initial state before selecting a sample
   - Validation mode: Shows accuracy percentage
   - Annotation mode: Shows progress (X/Y classified)
 
@@ -37,7 +38,8 @@ Complete documentation for all `ClassiPyR` features.
 - **Annotator name**: Your name for statistics tracking
 - **Settings**: Configure folders and options
 - **Sample selection**: Year, month, status filters
-- **Navigation**: Load, previous, next, random
+- **Navigation**: Load, previous, next, random, sync
+- **Cache age**: Shows when folders were last scanned
 - **Save button**: Manual save trigger
 
 ### Main Area (Tabs)
@@ -154,16 +156,26 @@ The default scale is 3.4 pixels per micrometer (standard for IFCB). To adjust:
 
 ### CSV Files
 
-Standard classification CSV output. Required columns:
+Standard classification CSV output. The CSV file must be named after the sample it describes (e.g., `D20230101T120000_IFCB134.csv`).
 
-- `file_name`: Image filename (e.g., `D20230101T120000_IFCB134_00001.png`)
+Required columns (exact names):
+
+- `file_name`: Image filename including `.png` extension (e.g., `D20230101T120000_IFCB134_00001.png`)
 - `class_name`: Predicted class name
 
 Optional columns:
 
 - `score`: Classification confidence (0-1)
 
-**Example CSV:**
+**Minimal example:**
+
+```
+file_name,class_name
+D20230101T120000_IFCB134_00001.png,Diatom
+D20230101T120000_IFCB134_00002.png,Ciliate
+```
+
+**Example with confidence scores:**
 
 ```
 file_name,class_name,score
@@ -172,11 +184,9 @@ D20230101T120000_IFCB134_00002.png,Ciliate,0.87
 D20230101T120000_IFCB134_00003.png,Dinoflagellate,0.72
 ```
 
-**Flexible column matching**: The app searches for columns containing "file" and "class" in their names, so variants like `filename`, `image_file`, `predicted_class`, or `class` will also work.
-
-**Different CNN pipelines**: If your classifier produces different column names, rename them to `file_name` and `class_name`, or contact us to add support for your format.
+**Different CNN pipelines**: If your classifier produces different column names, rename them to `file_name` and `class_name` before placing the CSV in the Classification Folder.
 
-Files are searched recursively in the Classification Folder.
+Files are looked up from the file index cache (see [File Index Cache](#file-index-cache) below).
 
 ### MATLAB Classifier Output
 
@@ -188,13 +198,51 @@ Files matching `*_class*.mat` pattern containing:
 
 **Threshold option**: Enable in Settings to include unclassified predictions below confidence threshold.
 
-> **Note**: Reading MATLAB classifier output requires Python (via iRfcb).
-
 ### Existing Annotations
 
 Previously saved annotations (in output folder) are automatically detected and can be resumed.
 
-> **Note**: Reading existing .mat annotations requires Python (via iRfcb).
+---
+
+## File Index Cache {#file-index-cache}
+
+To avoid slow startup from scanning large folder hierarchies, `ClassiPyR` maintains a file index cache on disk. The cache stores the locations of all ROI, classification, and annotation files found in your configured folders.
+
+### How it Works
+
+- On first launch (or after changing folder paths in Settings), the app scans all configured folders and saves the results to a JSON cache file
+- On subsequent launches, the app loads the cached index instantly instead of re-scanning
+- The cache is stored alongside your settings in the platform config directory (see [Settings Persistence](#settings-persistence))
+
+### Sync Button
+
+The **Sync** button (circular arrow icon) in the sidebar navigation row triggers a manual rescan of all folders. Use this when:
+
+- You've added new IFCB data files to your folders
+- The sample dropdown seems out of date
+- You want to force a fresh scan
+
+The **cache age indicator** below the navigation buttons shows when the folders were last scanned (e.g. "synced just now", "synced 2 hours ago").
+
+### Auto-Sync
+
+By default, the app checks whether the cache matches your current folder settings on startup and rescans automatically if needed. You can disable auto-sync in Settings to always load from the existing cache, which provides the fastest possible startup.
+
+### Headless Rescan
+
+You can update the file index cache without launching the app using `rescan_file_index()`. This is useful for scheduled updates (e.g. cron jobs) on servers where new data arrives regularly:
+
+```{r, eval = FALSE}
+# Rescan using saved settings
+ClassiPyR::rescan_file_index()
+
+# Or specify folder paths explicitly
+ClassiPyR::rescan_file_index(
+  roi_folder = "/data/ifcb/raw",
+  csv_folder = "/data/ifcb/classified",
+  output_folder = "/data/ifcb/manual"
+)
+```
 
 ---
 
@@ -215,17 +263,17 @@ MATLAB-compatible format with:
 
 ### Statistics Files
 
-`output/validation_statistics/[sample_name]_validation_stats.csv`
+`output_folder/validation_statistics/[sample_name]_validation_stats.csv`
 
 - Summary: total, correct, incorrect, accuracy
 
-`output/validation_statistics/[sample_name]_validation_detailed.csv`
+`output_folder/validation_statistics/[sample_name]_validation_detailed.csv`
 
 - Per-image: original class, validated class, correct flag
 
 ### Organized PNGs
 
-`png_output/[class_name]/[image_files]`
+`png_output_folder/[class_name]/[image_files]`
 
 Images organized into class folders for training CNN models or other classifiers.
 
@@ -242,18 +290,24 @@ Images organized into class folders for training CNN models or other classifiers
 | Output Folder | Where MAT and CSV output goes |
 | PNG Output Folder | Where organized images go |
 
-### Python Configuration
+Folder paths are configured using a web-based folder browser that works on all platforms (Linux, macOS, Windows). Changing folder paths in Settings automatically invalidates the file index cache, triggering a fresh scan.
+
+### Auto-Sync
 
 | Setting | Description |
 |---------|-------------|
-| Python Virtual Environment Path | Path to venv with scipy installed |
+| Auto-sync folders on startup | When enabled (default), the app checks and refreshes the file index on launch. Disable for instant startup using the existing cache. |
+
+### Python Configuration
 
-The venv path can also be specified when launching the app:
+The Python virtual environment path is configured when launching the app:
 
 ```{r, eval = FALSE}
 run_app(venv_path = "/path/to/your/venv")
 ```
 
+The path is remembered for future sessions. **Priority order**: `run_app(venv_path=)` argument > saved settings > default (`./venv`).
+
 ### Classifier Options
 
 **Apply classification threshold**: When loading MATLAB classifier output, use `TBclass_above_threshold` (checked) or `TBclass` (unchecked).
@@ -288,7 +342,9 @@ Shows class distribution:
 
 ## Session Cache
 
-The app maintains a session cache:
+The app maintains two types of caches:
+
+**In-memory session cache** (per session):
 
 - Switching samples saves work automatically
 - Returning to a sample restores your changes
@@ -296,15 +352,21 @@ The app maintains a session cache:
 
 **Note**: Always click Save before closing for permanent storage.
 
+**File index cache** (persistent on disk):
+
+- Stores the locations of all IFCB files across your configured folders
+- Persists between sessions for fast startup
+- See [File Index Cache](#file-index-cache) for details
+
 ---
 
 ## Settings Persistence
 
 `ClassiPyR` stores your settings in a configuration file that follows R standards:
 
-- **Linux**: `~/.local/share/ClassiPyR/settings.json`
-- **macOS**: `~/Library/Application Support/ClassiPyR/settings.json`
-- **Windows**: `%LOCALAPPDATA%/ClassiPyR/settings.json`
+- **Linux**: `~/.config/R/ClassiPyR/settings.json`
+- **macOS**: `~/Library/Preferences/org.R-project.R/R/ClassiPyR/settings.json`
+- **Windows**: `%APPDATA%/R/config/R/ClassiPyR/settings.json`
 
 Settings are loaded automatically when you start the app, so your folder paths, class list location, and Python venv path are remembered between sessions. Settings can be reset by specifying `run_app(reset_settings = TRUE)`.
 
@@ -312,11 +374,11 @@ Settings are loaded automatically when you start the app, so your folder paths,
 
 ## Dependencies
 
-`ClassiPyR` relies on **[iRfcb](https://github.com/EuropeanIFCBGroup/iRfcb)** for all IFCB data operations:
+`ClassiPyR` relies on **[`iRfcb`](https://github.com/EuropeanIFCBGroup/iRfcb)** for all IFCB data operations:
 
 - Extracting images from ROI files
 - Reading ADC metadata (dimensions, timestamps)
 - Reading and writing MATLAB .mat files
 - Class list handling
 
-iRfcb is installed automatically as a dependency when you install `ClassiPyR`.
+`iRfcb` is installed automatically as a dependency when you install `ClassiPyR`.