Skip to content

More general integration of selfing generations#6

Open
cheuerde wants to merge 29 commits intoRpedigree:masterfrom
cheuerde:master
Open

More general integration of selfing generations#6
cheuerde wants to merge 29 commits intoRpedigree:masterfrom
cheuerde:master

Conversation

@cheuerde
Copy link
Copy Markdown

@cheuerde cheuerde commented Jul 1, 2024

Some general updates:

  • pedigree() function will now run editPed itself if necessary
  • pedigree S4 class has 3 more slots: generation, selfing_generation, expanded
  • function added: getGeneration which returns the generation number. It calls the new C function get_generation
  • function added: expandPedigreeSelfing - based on the selfing_generation, will add appropriate generations to pedigree.
    • this is based on getASelfing logic.
    • Calls C function expand_pedgiree_selfing
    • while all expanded entries will have the suffix "-Fx" (if sepChar = "-F"), but the original individuals wont!
  • getA with labs argument now uses Matrix::crossprod(relfactor(ped))[, labs] to compute A subset
  • getASubset now same as getA
  • getASelfing function still there but now uses the generic pedigree generation

#' stopifnot(Matrix::isSymmetric(A))
getA <- function(ped, labs = NULL) {

L = relfactor(ped)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cheuerde we don't want this for a large pedigree - the getASubset() uses the indirect Colleau method that gives a subset of A without ever forming the full L matrix,

The tryCatch error handler in pedigree() was calling editPed() with
sire/dam that had already been converted to integer indices by the
try body. editPed then treated these integers as character IDs,
found no matches in the label set, and added them all as phantom
founder entries. This broke ancestor connections and caused zero
coancestry in the NRM.

Fix: save original character sire/dam/label before integer conversion
and use those in the error handler.
sire and dam are character label vectors, not integer positions.
The old docs were inherited from pedigreemm and did not reflect
how the code actually works.
…stors

Replace the O(n^2) pure R recursive loop in editPed with the existing
C get_generation function. On a 434k record pedigree this reduces
editPed from hours to seconds.
Three optimizations that reduce total processing time from hours to seconds:

1. inbreeding() C code: Add selfing shortcut — when sire==dam,
   F = (1 + F_parent)/2 in O(1) instead of tracing all ancestors.
   This skips ~80% of individuals in selfing-expanded pedigrees.
   (1M pedigree: 8+ hours → 0.03 seconds)

2. expandPedigreeSelfing() R wrapper: Replace ped2DF() + factor()
   with direct slot access + match() for O(n) label-to-index mapping.
   Avoids creating expensive factor columns on million-row data.
   (1M pedigree: minutes → 3.5 seconds)

3. pedigree() constructor + editPed(): Replace all factor() calls
   with match() for O(n) label mapping instead of O(n*m).

Full pipeline benchmark (250k input → 1.05M expanded):
  pedigree():                0.28s (was hours)
  expandPedigreeSelfing():   3.6s  (was minutes)
  inbreeding():              0.03s (was 8+ hours)
  TOTAL:                     3.9s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants