After a year of avoiding using the R package due to issues with dependencies and the inability to run the R code produced by the MetaboAnalyst online modules, I'm slowly piecing together the differences between online and the R package. The online module works with the same file and I am able to do my analysis, but it is time consuming.
There appears to be a consistency issue between data-loading and the statistical analysis when using paired columns (colp). While Read.TextData(..., format="colp") correctly identifies metabolites in rows and samples in paired columns, subsequent functions like Match.Pattern and FC.Anal (specifically group_paired) expect the row format (Samples as Rows, Metabolites as Columns). In the end, it could be as simple as "just transpose your data in the beginning" but why even give us the option?
The package does not automatically transpose the mSet$dataSet$norm matrix after normalization when importing in colp mode, leading to errors and mSet corruption. When a function fails, we have to clean the slate and start the workflow over again.
Expected Behavior: The functions should either automatically detect the colp format and transpose the data internally, or the Normalization function should ensure the matrix is oriented with metabolites in columns before exiting.
Actual Behavior:
Match.Pattern throws: Error in cor.test.default(...) : not enough finite observations
- If assigned back to the object (
mSet <- Match.Pattern(...)), the object is corrupted into an atomic vector: Error in mSetObj$dataSet : $ operator is invalid for atomic vectors.
FC.Anal(..., "group_paired") returns NaN for all features.
Environment:
- R version: 4.5.2
- RStudio: 2026.01.1 Build 392
- MetaboAnalystR version: 4.2.0
- OS: Windows 11 Pro
The Data
- 12 samples (6 treated, 6 control)
- Labels: treated as "1" and non-treated as "-1"
- Analysis: Statistics [one factor]
- Measure: Peak Intensities
- Note: Using the example
lcms_table.csv provided under Data Formats fails SanityCheckData(). Using the first two rows of lcms_table.csv on viable data also fails SanityCheckData()
mSet<-SanityCheckData(mSet)
Warning in SanityCheckData(mSet) : NAs introduced by coercion
Warning in min(table(cls.lbl)) :
no non-missing arguments to min; returning Inf
Error in orig.data[ord.inx, , drop = FALSE] :
incorrect number of dimensions
Steps to Reproduce:
- Load a CSV with metabolites in rows and 12 samples in paired columns using
Read.TextData(..., format="colp", ...).
mSet<-InitDataObjects("pktable", "stat", TRUE, 150)
mSet<-Read.TextData(mSet, "data/some_data_name.csv", "colp", "disc")
print(mSet[["msgSet"]][["read.msg"]])
Starting Rserve...
"C:...\renv\library\windows\R-4.5\X86_64~1\Rserve\libs\x64\Rserve.exe" --no-save
Warning in Cairo::CairoFonts(regular = "Arial:style=Regular", bold = "Arial:style=Bold", :
CairoFonts() has no effect on Windows. Please use par(family="...") to specify the desired font - see ?par.
[1] "MetaboAnalyst R objects initialized ..."
[1] "Samples are in columns and features in rows."
[2] "The uploaded file is in comma separated values (.csv) format."
[3] "The uploaded data file contains 12 (samples) by 155 (peaks(mz/rt)) data matrix."
- Run
SanityCheckData and Normalization().
# Normalizing
mSet<-SanityCheckData(mSet)
mSet<-PreparePrenormData(mSet)
mSet<-Normalization(mSet, "NULL", "LogNorm", "AutoNorm", ratio=FALSE, ratioNum=20)
mSet<-PlotNormSummary(mSet, "norm_0_", "png", 150, width=NA)
mSet<-PlotSampleNormSummary(mSet, "snorm_0_", "png", 150, width=NA)
[1] "Successfully passed sanity check!"
[2] "Samples are paired."
[3] "2 groups were detected in samples."
[4] "Only English letters, numbers, underscore, hyphen and forward slash (/) are allowed."
[5] "<font color="orange">Other special characters or punctuations (if any) will be stripped off."
[6] "All data values are numeric."
[7] ""
[8] "No missing values were detected. Click the Proceed button to the next step."
- Observe that
colnames(mSet$dataSet$norm) still lists Metabolites rather than Sample IDs. (this line is not included in the Rhistory.R when downloading off the web module)
colnames(mSet$dataSet$norm)
[1] "335.01094/0.771" "268.15463/0.811" "348.31086/0.813" "324.27524/0.815" "228.1498/0.817" "336.31093/0.824" "263.1491/0.827" "159.06538/0.831"
[9] "236.11314/0.832" "191.16433/0.855" "113.0963/0.863" "228.19613/0.864" "273.25408/0.864" "145.12248/0.869" "348.08691/0.872" "216.01936/0.88"
- Thus, if you attempt
FC.Anal(mSet,2.0,0,TRUE).
mSet<-FC.Anal(mSet,2.0,0,TRUE)
mSet<-PlotFC(mSet,"fc_0_","png",72, width = NA)
mSet$analSet$fc$fc.log
Warning in min(x) : no non-missing arguments to min; returning Inf
Warning in max(x) : no non-missing arguments to max; returning -Inf
335.01094/0.771 268.15463/0.811 348.31086/0.813 324.27524/0.815 228.1498/0.817 336.31093/0.824 263.1491/0.827 159.06538/0.831
NaN NaN NaN NaN NaN NaN NaN NaN
236.11314/0.832 191.16433/0.855 113.0963/0.863 228.19613/0.864 273.25408/0.864 145.12248/0.869 348.08691/0.872 216.01936/0.88
NaN NaN NaN NaN NaN NaN NaN NaN
After a year of avoiding using the R package due to issues with dependencies and the inability to run the R code produced by the MetaboAnalyst online modules, I'm slowly piecing together the differences between online and the R package. The online module works with the same file and I am able to do my analysis, but it is time consuming.
There appears to be a consistency issue between data-loading and the statistical analysis when using paired columns (colp). While
Read.TextData(..., format="colp")correctly identifies metabolites in rows and samples in paired columns, subsequent functions likeMatch.PatternandFC.Anal(specifically group_paired) expect the row format (Samples as Rows, Metabolites as Columns). In the end, it could be as simple as "just transpose your data in the beginning" but why even give us the option?The package does not automatically transpose the
mSet$dataSet$normmatrix after normalization when importing in colp mode, leading to errors and mSet corruption. When a function fails, we have to clean the slate and start the workflow over again.Expected Behavior: The functions should either automatically detect the
colpformat and transpose the data internally, or theNormalizationfunction should ensure the matrix is oriented with metabolites in columns before exiting.Actual Behavior:
Match.Patternthrows:Error in cor.test.default(...) : not enough finite observationsmSet <- Match.Pattern(...)), the object is corrupted into an atomic vector:Error in mSetObj$dataSet : $ operator is invalid for atomic vectors.FC.Anal(..., "group_paired")returns NaN for all features.Environment:
The Data
lcms_table.csvprovided under Data Formats failsSanityCheckData(). Using the first two rows oflcms_table.csvon viable data also failsSanityCheckData()Steps to Reproduce:
Read.TextData(..., format="colp", ...).SanityCheckDataandNormalization().colnames(mSet$dataSet$norm)still lists Metabolites rather than Sample IDs. (this line is not included in the Rhistory.R when downloading off the web module)FC.Anal(mSet,2.0,0,TRUE).