Skip to content
jbloom22 edited this page Jan 11, 2017 · 2 revisions

getStats

Input Parameters

  • passback : String. Optional. Default: None. Returned as is in the response JSON to facilitate debug.
  • md_version : String. Required. Indicates metadata version, including dataset.
  • api_version : Float. Required. Indicates API version to synchronize clients.
  • phenotype : String. Optional. Default: None.. Name of phenotype to be used as response in regression. List of available phenotypes is published through the metadata API.
  • sample_covariates : Array of String. Optional. Default: None. phenotype must be present. Names of sample covariates to be projected out of phenotype. List of available sample covariates is published through the metadata API.
  • variant_covariates : Array of Variant Object. Optional. Default: None. phenotype must be present. Variants to be projected out of the phenotype. Each variant object is comprised of a list of four key-value pairs:
    1. chrom : String. Chromosome.
    2. pos : Integer. Position.
    3. ref : String. Reference allele.
    4. alt : String. Alternate allele.
  • variant_filters : Array of Filter Objects. Required. Current max width is 600k. Filters are logically conjuncted (AND). Each filter statement is comprised of a list of key-value pairs that define the filter operation. The filter statements contain the following keys:
    1. operand : String. Property used in the formula (chrom, pos, mac).
    2. operator : String. Formula operator (eq, gte, gt, lte, lt). eq is not allowed for mac.
    3. value : String. Formula value.
    4. operand_type : String. Property type (String, Integer).
  • variant_list : Array of Variant Objects. Optional. Default: None. Each variant object is comprised of a list of four key-value pairs:
  • variant_ld : Variant Object. Optional. Default: None. Variant must be active. Marked variant for LD computation.
  • compute_linreg : Boolean. Optional. Default: false. phenotype must be present. If true, linear regression statistics are computed.
  • compute_ld_r : Boolean. Optional. Default: false. variant_ld must be present. If true, r is computed against variant_ld.
  • compute_ld_d : Boolean. Optional. Default: false. variant_ld must be present. If true, D' is computed against variant_ld.
  • compute_scores : Boolean. Optional. Default: false. phenotype must be present. If true, the vector u of scores is computed.
  • compute_covariance : Boolean. Optional. Default: false. If true, the unscaled covariance matrix C is computed.
  • compute_sigma_sq : Boolean. Optional. Dafault: false. phenotype must be present. If true, sigma_sq is computed.
  • limit : Integer. Optional. Default: current hard limit is 100k. Maximum number of variants returned.
  • count : Boolean. Optional. Default: false. If true, only the number of active_variants is returned, with no statistics.

Active variants are those variants in the dataset that satisfy all variant filters and, if a variant list is present, are in the variant list.

When compute_covariance is true, we may impose harder limits (TBD) on the width of the window and size of variant_list.

Example input:

{
  "passback"        : "example",
  "md_version"      : "mdv1",
  "api_version"     : 1,
  "phenotype"       : "t2d",
  "sample_covariates"  : [ "BMI", "PC1" ],
  "variant_covariates" : [
                           {"chrom": "20", "pos": 2000, "ref": "T", "alt": "G"}
                         ],
  "variant_filters" : [ 
                        {"operand": "chrom", "operator": "eq", "value": "20", "operand_type": "string"},
                        {"operand": "position", "operator": "gte", "value": 1000, "operand_type": "integer"},
                        {"operand": "position", "operator": "lte", "value": 4000, "operand_type": "integer"},
                        {"operand": "mac", "operator": "gte", "value": 4, "operand_type": "integer"}
                      ],
  "variant_list"    : [
                        {"chrom": "20", "pos": 1234, "ref": "G", "alt": "A"}
                        {"chrom": "20", "pos": 2900, "ref": "C", "alt": "T"}
                      ],
  "variant_ld"      : {"chrom": "20", "pos": 1234, "ref": "G", "alt": "A"},
  "compute_linreg"  : true,
  "compute_ld_r"    : true,
  "compute_ld_d"    : true,
  "compute_scores"  : true,
  "compute_covariance" : true,
  "compute_sigma_sq" : true,
  "limit"           : 50,
  "count"           : false
}

Output Parameters

Statistical Parameters

  • count : Integer. Number of active variants.
  • active_variants : Array of Variant Objects. Active variants sorted by pos, ref, alt.
  • betas : Array of Float. Betas indexed by active_variants. Present if compute_linreg is true.
  • stderrs : Array of Float. Standard errors indexed by active_variants. Present if compute_linreg is true.
  • zstats : Array of Float. Test statistics indexed by active_variants. Present if compute_linreg is true.
  • pvals : Array of Float. p-values indexed by active_variants. Present if compute_linreg is true.
  • ld_r : Array of Float. r-values with variant_ld indexed by active_variants. Present if compute_ld_r is true.
  • ld_d : Array of Float. D' values with variant_ld indexed by active_variants. Present if compute_ld_d is true.
  • scores : Array of Float. Scores u = (X - Xbar)^T y indexed by active_variants. y is the residual phenotype. Present if compute_scores is true.
  • covariance : Array of Float. Unscaled covariance matrix C = (X - Xbar)(X - Xbar)^T as an array of length n * (n - 1) / 2 where n = count. C is indexed by active_variants and encoded to array via upper triangle (row-major) or equivalently lower triangle (column-major):
  [ 0 1 2 ]
  [ 1 3 4 ]
  [ 2 4 5 ]

  [ (0,0), (0, 1), (0, 2), (1, 1), (1, 2), (2, 2)]

  [ (0,0), (1, 0), (2, 0), (1, 1), (2, 1), (2, 2)]

The variance-covariance matrix V is given by sigma_sq * C. Present if compute_covariance is true.

  • sigma_sq : Float. Variance of the residual phenotype y. Present if compute_sigma_sq is true.
  • nsamples : Integer. Number of samples used (e.g., with phenotype and all covariates present). Present if phenotype is present.

Debug Parameters

  • passback : String. Contains the passback value given in the original request.
  • is_error : Boolean. True if the operation errored out due to bad input or an internal issue.
  • error_message : String. Indicates the cause of failure. Present if is_error is true.

See the Methods section of Meta-Analysis of Gene Level Tests for Rare Variant Association for details on u, V, and sigma_sq.

Example output:

{
    "is_error"        : false,
    "passback"        : "example",
    "count"           : 2,
    "active_variants" : [
                          {"chrom": "20", "pos": 1234, "ref": "G", "alt": "A"}
                          {"chrom": "20", "pos": 2900, "ref": "C", "alt": "T"}
                        ],
    "betas"           : [ 0.1, 2.0],
    "stderrs"         : [ 0.2, 1.0],
    "zstats"          : [ 0.5, 2.0],
    "pvals"           : [ 0.6171, 0.0455],
    "ld_r"            : [ 1.0, -0.1],
    "ld_d"            : [ 1.0, -0.2],                    
    "scores"          : [ 1.2, 0.4 ],
    "covariance"      : [ 1.4, -1.2, 0.9 ],          
    "sigma_sq"        : 12.2,
    "nsamples"        : 2104
}

Clone this wiki locally