More general integration of selfing generations#6
Open
cheuerde wants to merge 29 commits intoRpedigree:masterfrom
Open
More general integration of selfing generations#6cheuerde wants to merge 29 commits intoRpedigree:masterfrom
cheuerde wants to merge 29 commits intoRpedigree:masterfrom
Conversation
Update NAMESPACE
minors on namespace
| #' stopifnot(Matrix::isSymmetric(A)) | ||
| getA <- function(ped, labs = NULL) { | ||
|
|
||
| L = relfactor(ped) |
Contributor
There was a problem hiding this comment.
@cheuerde we don't want this for a large pedigree - the getASubset() uses the indirect Colleau method that gives a subset of A without ever forming the full L matrix,
The tryCatch error handler in pedigree() was calling editPed() with sire/dam that had already been converted to integer indices by the try body. editPed then treated these integers as character IDs, found no matches in the label set, and added them all as phantom founder entries. This broke ancestor connections and caused zero coancestry in the NRM. Fix: save original character sire/dam/label before integer conversion and use those in the error handler.
sire and dam are character label vectors, not integer positions. The old docs were inherited from pedigreemm and did not reflect how the code actually works.
…stors Replace the O(n^2) pure R recursive loop in editPed with the existing C get_generation function. On a 434k record pedigree this reduces editPed from hours to seconds.
Three optimizations that reduce total processing time from hours to seconds: 1. inbreeding() C code: Add selfing shortcut — when sire==dam, F = (1 + F_parent)/2 in O(1) instead of tracing all ancestors. This skips ~80% of individuals in selfing-expanded pedigrees. (1M pedigree: 8+ hours → 0.03 seconds) 2. expandPedigreeSelfing() R wrapper: Replace ped2DF() + factor() with direct slot access + match() for O(n) label-to-index mapping. Avoids creating expensive factor columns on million-row data. (1M pedigree: minutes → 3.5 seconds) 3. pedigree() constructor + editPed(): Replace all factor() calls with match() for O(n) label mapping instead of O(n*m). Full pipeline benchmark (250k input → 1.05M expanded): pedigree(): 0.28s (was hours) expandPedigreeSelfing(): 3.6s (was minutes) inbreeding(): 0.03s (was 8+ hours) TOTAL: 3.9s
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Some general updates:
getGenerationwhich returns the generation number. It calls the new C functionget_generationexpandPedigreeSelfing- based on the selfing_generation, will add appropriate generations to pedigree.getASelfinglogic.expand_pedgiree_selfinggetAwithlabsargument now usesMatrix::crossprod(relfactor(ped))[, labs]to compute A subsetgetASubsetnow same as getAgetASelfingfunction still there but now uses the generic pedigree generation