Skip to content

Code for deriving pedigree/parentage history, given data about individuals and parents.

License

Notifications You must be signed in to change notification settings

soybase/parentage

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

The two scripts in this repository can be used to recursively generate a pedigree, given tab-separated data of the form individual parent1 parent2; and also to determine what other lines in the data have a specified line in their pedigree, and any aliases and comments regarding the specified line.

Main program for recursively calculating pedigrees from parentage data:

  Usage:  parentage.pl -parents FILE [-options]

  Examples:
  To generate a file of the parents throughout the pedigree for each individual:
    parentage.pl -parents data/parentage.tsv -outfile data/parentage-list.tsv -format list 
  To report the parentage for a given query, built up from the immediate parents through successive genrations:
    parentage.pl -parents data/parentage.tsv -q Hardin
  To report the parentage for a given query, in a tabular format suitable for viewing with https://helium.hutton.ac.uk
    parentage.pl -parents data/parentage.tsv -q Hardin -format table0 -outfile QUERY

  Given a file of individuals and parents, recursively determine the pedigree
  (the parentage going back as many generations as possible) for an individual.
  Can be calculated for a single indicated individual or for all in the parents file.

  The parents file should be structured like this.
  The individual (progeny) is on the left, and the parents are on the right
    indID  parent1  parent2
    A      B        C
    D      B        C
    E      B        F

  Required:
    -parents  File listing the individuals and parents

  Options:
    -query    ID of an individual for which to calculate parentage.
              If not provided, report parentage for all individuals.
    -outfile  Print to indicated filename; otherwise to STDOUT. 
              If -outfile "QUERY" is indicated, the query name will be used (with spaces replaced by underscores).
    -outdir   If outfile is specified, write files to this directory. Default "."
    -format   Output format. Options: string, table0, table1, list
                string: pedigree string, in parenthetical tree format. Printed at increasing depth unless -last_only
                table0: query parent1 parent2   With header line but no other information
                table1: query parent1 parent2   With termination information and final pedigree string
                list:   individual and all parents throughout the pedigree, comma-separated (no parentheses)
    -last_only     For string format, print only the last pedigree string; otherwise, print one for each data line.
    -max_count  The maximum number of individuals in the pedigree to report.
                        When this number is reached, the pedigree of that size will be reported,
                        even if other parents may be found in the input data.
    -verbose  Report some intermediate information.
    -help     This message.

Wrapper program that generates a report including synonyms, comments, and lines with the query individual in their pedigree:

  Usage:  
  parentage_report.pl -query ID [-options]
  
  Three data files are required, and an optional fourth ensures a much faster run.
  If the data files named as follows and available in a directory "data" at
  the same location as the script, the script will use these file names and locations
  by default, so they don't need to be specified explicitly:
       -parentage data/parentage.tsv 
       -synonyms data/parentage-synonyms.tsv 
       -comments data/parentage-comments.tsv 
       -plist data/parentage-list.tsv          # Optional but recommended
  
  Given the requried input data, generate a report about an individual, including the pedigree, 
  any aliases/synonyms for the line, the lines which have the individual in their pedigree, 
  and any available comments about the individual.
  
  In the invocation without -plist, the parentage.tsv file is taken in as data for calculating
  pedigrees for all lines, and then the query is checked against those pedigrees to find which lines
  contain the query individual in the pedigree. This option is space-efficient (parentage.tsv is small)
  but relatively time-consuming to run (it takes several seconds to recalculate all pedigrees).
  
  In the invocation WITH -plist, the parentage-list.tsv file has, for each individual, the lines in the
  pedigrees of that individual. The query is checked against each of those lists to find which lines
  contain the query individual in the pedigree. This option is relatively space-inefficient
  (the parentage-list.tsv file may be several megabytes) but fast to run.
  
  The parentage-list.tsv can be generated by the script parentage.pl:
    ./parentage.pl -par data/parentage.tsv -f list -outfile data/parentage-list.tsv

  To generate a standard report in JSON format
    ./parentage_report.pl -plist data/parentage-list.tsv -q Hardin

  To generate just the three-column pedigree table suitable for submitting to the Helium viewer:
    ./parentage_report.pl -plist data/parentage-list.tsv -table -q Hardin

  Some other lines to try, to check various characteristics of the data:
    Hardin, Hayes, Hamlin, Gnome, Franklin, Flyer, Flambeau, Williams, "Williams 82", Lee

  Required:
    -query      ID of an individual for which to generate a report

  Required, with defaults indicated above:
    -parents    File with three columns: individuals and parents individuals and the parents;
    -synonyms   File with two columns: individual and synonym (if multiple synonyms, one line for each);
    -comments   File with two columns: individual and comments

  Options:
    -plist      Tab-separated file with individual (first column) and all progenitors for that individual
    -text_out   Print a plain-text report to STDOUT; otherwise to JSON (default)
    -pretty     For JSON output, format for human viewing (with line returns, indentation)
    -table_out  Print pedigree table, suitable for submitting to the Helium viewer, to STDOUT.
                With this option, the other output is not reported (neither JSON nor test_out)
    -max_count  The maximum number of individuals in the pedigree to report.
    -verbose    Report some intermediate information.
    -help       This message.

Example: Generate a full report for a specified query (genotype)

First use parentage.pl to generate (one time only) a file data/parentage-list.tsv

  ./parentage.pl -parents data/parentage.tsv -outfile data/parentage-list.tsv -format list 

Then call parentage_report.pl, taking advantage of the data files with default names and locations. This generates a report in JSON format (flag -pretty for human viewing, and reformatted and abbreviated somewhat here):

./parentage_report.pl -query Hardin -max 10 -pretty

{
   "table" : [
      [ "Genotype", "FemaleParent", "MaleParent" ],
      [ "Hardin", "Corsoy 3", "Cutler 71" ],
      ...
   ],
   "query" : [ "Hardin" ],
   "comments" : [ "PVP 8100052" ],
   "synonyms" : [ "PI 548526", "A76-102009" ],
   "matches" : [ "05KL119276", ... ],
   "construction" : [
      "( Corsoy 3 , Cutler 71 )",
      "( Corsoy 3 , ( Cutler 4 , SL5 ) )",
      "( Corsoy 3 , ( Cutler 4 , ( ( Kent 7 , L49-4196 ) , ( Kent 8 , Mukden ) ) ) )"
   ]
}

Alternatively, adding the -text flag for a plain-text output

./parentage_report.pl -query Hardin -max 10 -text

comments:
  PVP 8100052

construction:
  ( Corsoy 3 , Cutler 71 )
  ( Corsoy 3 , ( Cutler 4 , SL5 ) )
  ( Corsoy 3 , ( Cutler 4 , ( ( Kent 7 , L49-4196 ) , ( Kent 8 , Mukden ) ) ) )

matches:
  05KL119276
  05KL135608
  Asgrow A2242
  MT002989
  OW1012750
  PI 669396
  Syngenta S16-Y6
  XP1928

query:
  Hardin

synonyms:
  PI 548526
  A76-102009

table:
  Genotype	FemaleParent	MaleParent
  Hardin	Corsoy 3	Cutler 71
  Cutler 71	Cutler 4	SL5
  SL5	( Kent 7 X L49-4196 )	( Kent 8 X Mukden )

Example: Calculate pedigree strings for all genotypes in the input parent file

Report the output as a table of genotype-parent-parent triples and a tree-like pedigree string (the default output option), and limit the reported pedigree size by setting max_ped_size 10. Note: given the example data, with 14740 individuals with corresponding parents indicated, pedigrees are generated for each; below, only the first 10 lines of the output are shown.

  ./parentage.pl -parents data/parentage.tsv -max 10 | head    
  
  ## 1 ##
  00CY622138 ==	( ( B152 , B231 ) , 11415 ) 
  ## 2 ##
  02JR310007 ==	( CM4035N , Pioneer P93B82 ) 
  02JR310007 ==	( CM4035N , ( Pioneer P9273 , ( ( MO304 , Asgrow A3127 ) , ( Asgrow 3733 , Resnik ) ) ) ) 
  02JR310007 ==	( CM4035N , ( ( Pioneer P2981 , Asgrow A3127 ) , ( ( MO304 , Asgrow A3127 ) , ( Asgrow 3733 , Resnik ) ) ) ) 
  02JR310007 ==	( CM4035N , ( ( ( Hark , ( Corsoy , Calland ) ) , Asgrow A3127 ) , ( ( MO304 , Asgrow A3127 ) , ( Asgrow 3733 , Resnik ) ) ) ) 
  !! Terminating search because number of individuals is greater than max_ped_size 10
  ## 3 ##
  02JR310007BC1 ==	( 02JR310007 , ( 02JR310007 , 3607F9-AOYN ) ) 

Example: Report input data and pedigree string for an indicated genotype

Report the output as a table of genotype-parent-parent triples, with header line. This can be written to a file and used as input to the Helium pedigree viewer

  ./parentage.pl -p data/parentage.tsv -f table0 -q Essex 

  Genotype	FemaleParent	MaleParent
  Essex	Lee	S5-7075
  Lee	S-100	C.N.S.
  S5-7075	N48-1248	Perry
  N48-1248	Roanoke	N45-745
  N45-745	Ogden	C.N.S.
  Ogden	Tokyo	PI 54610
  Perry	Patoka	L37-1355

Here is the pedigree image generated by the Helium pedigree viewer from the input above (for -query Essex):

Essex

Here is the corresponding pedigree string, generated with
./parentage.pl -parents data/parentage.tsv -q Essex -f string

Essex ==	( Lee , S5-7075 ) 
Essex ==	( ( S-100 , C.N.S. ) , S5-7075 ) 
Essex ==	( ( S-100 , C.N.S. ) , ( N48-1248 , Perry ) ) 
Essex ==	( ( S-100 , C.N.S. ) , ( ( Roanoke , N45-745 ) , Perry ) ) 
Essex ==	( ( S-100 , C.N.S. ) , ( ( Roanoke , ( Ogden , C.N.S. ) ) , Perry ) ) 
Essex ==	( ( S-100 , C.N.S. ) , ( ( Roanoke , ( ( Tokyo , PI 54610 ) , C.N.S. ) ) , Perry ) ) 
Essex ==	( ( S-100 , C.N.S. ) , ( ( Roanoke , ( ( Tokyo , PI 54610 ) , C.N.S. ) ) , ( Patoka , L37-1355 ) ) ) 

REST API (optional)

The api.pl script uses the Mojolicious library to provide a simple REST API for parentage_report.pl:

An HTTP GET /genotypes request will return a JSON array of all genotypes that can be used as a query:

perl api.pl get /genotypes

An HTTP GET /?q=<query> request will return a JSON response for the query:

perl api.pl get '/?q=Essex'

Appending '/pedigree.helium.zip' will produce a zip file in a format compatible with the Helium pedigree viewer:

perl api.pl get '/pedigree.helium.zip?q=Essex' > pedigree.helium.zip

If this API is served from a public web server, the zip file can be imported directly by the Helium public web server (entering the URL in the Helium "Import > "Load Pedigree > Germinate Link" menu).

To start a Mojo::Server::Daemon listening on default port 3000:

perl api.pl daemon # development mode; for production: daemon -m production

Then queries can be tested, e.g., with curl:

curl -f 'http://localhost:3000/?q=Essex'

About

Code for deriving pedigree/parentage history, given data about individuals and parents.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages