Skip to content

Statistical Analysis [one factor] fails when using Read.TextData with format="colp" due to orientation mismatch #372

@roxiefoxx

Description

@roxiefoxx

After a year of avoiding using the R package due to issues with dependencies and the inability to run the R code produced by the MetaboAnalyst online modules, I'm slowly piecing together the differences between online and the R package. The online module works with the same file and I am able to do my analysis, but it is time consuming.

There appears to be a consistency issue between data-loading and the statistical analysis when using paired columns (colp). While Read.TextData(..., format="colp") correctly identifies metabolites in rows and samples in paired columns, subsequent functions like Match.Pattern and FC.Anal (specifically group_paired) expect the row format (Samples as Rows, Metabolites as Columns). In the end, it could be as simple as "just transpose your data in the beginning" but why even give us the option?

The package does not automatically transpose the mSet$dataSet$norm matrix after normalization when importing in colp mode, leading to errors and mSet corruption. When a function fails, we have to clean the slate and start the workflow over again.

Expected Behavior: The functions should either automatically detect the colp format and transpose the data internally, or the Normalization function should ensure the matrix is oriented with metabolites in columns before exiting.

Actual Behavior:

  • Match.Pattern throws: Error in cor.test.default(...) : not enough finite observations
  • If assigned back to the object (mSet <- Match.Pattern(...)), the object is corrupted into an atomic vector: Error in mSetObj$dataSet : $ operator is invalid for atomic vectors.
  • FC.Anal(..., "group_paired") returns NaN for all features.

Environment:

  • R version: 4.5.2
  • RStudio: 2026.01.1 Build 392
  • MetaboAnalystR version: 4.2.0
  • OS: Windows 11 Pro

The Data

  • 12 samples (6 treated, 6 control)
  • Labels: treated as "1" and non-treated as "-1"
  • Analysis: Statistics [one factor]
  • Measure: Peak Intensities
  • Note: Using the example lcms_table.csv provided under Data Formats fails SanityCheckData(). Using the first two rows of lcms_table.csv on viable data also fails SanityCheckData()

mSet<-SanityCheckData(mSet)

Warning in SanityCheckData(mSet) : NAs introduced by coercion
Warning in min(table(cls.lbl)) :
no non-missing arguments to min; returning Inf
Error in orig.data[ord.inx, , drop = FALSE] :
incorrect number of dimensions

Steps to Reproduce:

  1. Load a CSV with metabolites in rows and 12 samples in paired columns using Read.TextData(..., format="colp", ...).
mSet<-InitDataObjects("pktable", "stat", TRUE, 150)
mSet<-Read.TextData(mSet, "data/some_data_name.csv", "colp", "disc")

print(mSet[["msgSet"]][["read.msg"]])

Starting Rserve...
"C:...\renv\library\windows\R-4.5\X86_64~1\Rserve\libs\x64\Rserve.exe" --no-save
Warning in Cairo::CairoFonts(regular = "Arial:style=Regular", bold = "Arial:style=Bold", :
CairoFonts() has no effect on Windows. Please use par(family="...") to specify the desired font - see ?par.
[1] "MetaboAnalyst R objects initialized ..."
[1] "Samples are in columns and features in rows."
[2] "The uploaded file is in comma separated values (.csv) format."
[3] "The uploaded data file contains 12 (samples) by 155 (peaks(mz/rt)) data matrix."

  1. Run SanityCheckData and Normalization().
# Normalizing 
mSet<-SanityCheckData(mSet)
mSet<-PreparePrenormData(mSet)
mSet<-Normalization(mSet, "NULL", "LogNorm", "AutoNorm", ratio=FALSE, ratioNum=20)
mSet<-PlotNormSummary(mSet, "norm_0_", "png", 150, width=NA)
mSet<-PlotSampleNormSummary(mSet, "snorm_0_", "png", 150, width=NA)

[1] "Successfully passed sanity check!"
[2] "Samples are paired."
[3] "2 groups were detected in samples."
[4] "Only English letters, numbers, underscore, hyphen and forward slash (/) are allowed."
[5] "<font color="orange">Other special characters or punctuations (if any) will be stripped off."
[6] "All data values are numeric."
[7] ""
[8] "No missing values were detected. Click the Proceed button to the next step."

  1. Observe that colnames(mSet$dataSet$norm) still lists Metabolites rather than Sample IDs. (this line is not included in the Rhistory.R when downloading off the web module)
colnames(mSet$dataSet$norm)

[1] "335.01094/0.771" "268.15463/0.811" "348.31086/0.813" "324.27524/0.815" "228.1498/0.817" "336.31093/0.824" "263.1491/0.827" "159.06538/0.831"
[9] "236.11314/0.832" "191.16433/0.855" "113.0963/0.863" "228.19613/0.864" "273.25408/0.864" "145.12248/0.869" "348.08691/0.872" "216.01936/0.88"

  1. Thus, if you attempt FC.Anal(mSet,2.0,0,TRUE).
mSet<-FC.Anal(mSet,2.0,0,TRUE)
mSet<-PlotFC(mSet,"fc_0_","png",72, width = NA)
mSet$analSet$fc$fc.log

Warning in min(x) : no non-missing arguments to min; returning Inf
Warning in max(x) : no non-missing arguments to max; returning -Inf
335.01094/0.771 268.15463/0.811 348.31086/0.813 324.27524/0.815 228.1498/0.817 336.31093/0.824 263.1491/0.827 159.06538/0.831
NaN NaN NaN NaN NaN NaN NaN NaN
236.11314/0.832 191.16433/0.855 113.0963/0.863 228.19613/0.864 273.25408/0.864 145.12248/0.869 348.08691/0.872 216.01936/0.88
NaN NaN NaN NaN NaN NaN NaN NaN

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions