⚠️ CONFIDENTIAL - UNPUBLISHED PACKAGE
This package is currently under development and has not been published yet. All content, code, and documentation are confidential and proprietary. Please do not distribute or share without explicit permission.
MultiConnector is an R package for functional clustering analysis of multi-dimensional time series data. It implements the James & Sugar (2003) functional clustering model to identify distinct patterns in longitudinal data, making it particularly useful for biomedical research, growth studies, and longitudinal data analysis where curves need to be grouped based on their shape and temporal behavior.
- Functional Clustering: Advanced clustering based on curve shapes rather than individual time points
- Multi-dimensional Support: Handle multiple measurements per subject simultaneously
- Spline-based Modeling: Natural cubic splines capture complex curve patterns
- Comprehensive Validation: Built-in quality metrics including silhouette analysis and entropy measures
- Rich Visualizations: Specialized plots for cluster exploration, validation, and interpretation
- Parallel Processing: Speed up analysis with multi-core support
- Flexible Data Input: Support for Excel, CSV files, and R tibbles
# Install devtools if you haven't already
install.packages("devtools")
# Install MultiConnector from GitHub
devtools::install_github("qBioTurin/MultiConnector")MultiConnector requires R ≥ 4.0.0 and several packages that will be automatically installed:
# Core dependencies
packages <- c("dplyr", "ggplot2", "splines", "Matrix", "parallel",
"readxl", "readr", "tibble", "magrittr", "patchwork")
install.packages(packages)
if (!require(devtools)) {
install.packages('devtools')
}
devtools::install_github('erocoar/gghalves')Your time series data should contain:
subjID: Subject/sample identifiertime: Time pointsmeasureID: Measurement type identifier (for multi-dimensional data)value: Observed values
Your annotations should contain:
-
subjID: Subject identifier (matching time series) -
Additional feature columns (e.g., treatment, gender, outcome)
- Excel files (
.xlsx,.xls) - CSV/text files (
.csv,.txt) - R tibbles (for programmatic use)
Understanding the key terms and notation used throughout MultiConnector:
| Term | Definition |
|---|---|
| Measure | A given dimension of the functional data (identified by measureID) |
| Observation | The discrete sampled curve of a specific measure |
| Subject | The connection of the samples among the different measures (identified by subjID) |
| Time | The time point of a single observation characterizing the set of points in the sample |
| Functional Data | The complete set of multiple measures for the same subject (subjID) |
| Feature | A characteristic or attribute associated with each subject (subjID) |
| Annotations | The complete set of features associated with each subject |
| p | Spline dimension - determines the complexity of curve fitting |
| G | Number of clusters to identify in the data |
| h | Latent factor dimension - reduces dimensionality in clustering space |
Subject (subjID) ──┬── Measure 1 (measureID) ──── Observations (time, value)
├── Measure 2 (measureID) ──── Observations (time, value)
├── ...
└── Features (annotations)
Example:
- Subject: Patient_001
- Measures: Blood pressure, Heart rate, Temperature
- Observations: Time series of values for each measure
- Features: Age, Gender, Treatment group
- Functional Data: All measures combined for Patient_001
| Function | Purpose |
|---|---|
ConnectorData() |
Import and prepare data for analysis |
estimatepDimension() |
Determine optimal spline dimensions |
estimateCluster() |
Perform clustering analysis |
selectCluster() |
Choose optimal clustering configuration |
validateCluster() |
Assess clustering quality |
plot() |
Intelligent plotting dispatch |
DiscriminantPlot() |
Discriminant analysis visualization |
splinePlot() |
Spline-based curve visualization |
SubjectInfo() |
Detailed subject analysis with cluster highlighting |
clusterDistribution() |
Cross-tabulation of single or multiple features vs clusters |
generateReport() |
Comprehensive analysis report generation |
setClusterNames() |
Assign custom names to clusters |
getClusterNames() |
Retrieve current cluster names |
MultiConnector is built around two main S4 classes that provide a structured object-oriented approach to functional clustering:
The CONNECTORData class represents preprocessed time series data ready for clustering analysis.
Slots:
@curves: Tibble containing time series data (subjID, measureID, time, value)@dimensions: Tibble with observation counts per sample@annotations: Tibble with subject annotations and features@TimeGrids: List of time grids for each measurement type
Key Methods:
ConnectorData(): Constructor method to create objects from files or tibblesplot(): Visualize time series data, dispatches toPlotTimeSeries()show(): Display basic object summarygetAnnotations(): Extract available feature names
# Create CONNECTORData object
data <- ConnectorData("timeseries.xlsx", "annotations.csv")
# Access slots
head(data@curves) # Time series data
data@annotations # Feature annotations
names(data@TimeGrids) # Available measurements
# Use methods
plot(data, feature = "treatment")
show(data)
getAnnotations(data) # Lists available featuresThe CONNECTORDataClustered class represents the results of clustering analysis with cluster assignments and parameters.
Slots:
@TTandfDBandSil: Tibble with quality metrics (TT, fDB, Silhouette, G)@CfitandParameters: List containing clustering fit and estimated parameters@h: Latent factor dimension used in clustering@freq: Frequency of the clustering configuration@cluster.names: Character vector of cluster labels (e.g., "A", "B", "C")@KData: List containing original data and preprocessing results
Key Methods:
plot(): Visualize clustering results, dispatches toClusterPlot()DiscriminantPlot(): Create discriminant analysis plots for cluster interpretationvalidateCluster(): Compute and plot clustering quality metrics (returns plot, entropy_silhouette_table, assignmentProbs)splinePlot(): Visualize cluster-specific spline representationsMaximumDiscriminationFunction(): Show optimal discrimination weightsgetAnnotations(): Extract features with cluster assignmentsgetClusters(): Extract subjID with cluster assignmentssetClusterNames(): Assign custom names to clusters (used in plots and tables)getClusterNames(): Retrieve current cluster namesSubjectInfo(): Get detailed information about specific subjectsclusterDistribution(): Analyze feature distribution across clusters
# Create clustered object (from estimateCluster results)
clustered_data <- selectCluster(cluster_results, G = 3, best = "MinfDB")
# Access slots
clustered_data@cluster.names # Cluster labels
clustered_data@TTandfDBandSil # Quality metrics
clustered_data@CfitandParameters$pred$class.pred # Cluster assignments
# Set custom cluster names (will be used in all plots and outputs)
clustered_data <- setClusterNames(clustered_data, c("Low", "Medium", "High"))
getClusterNames(clustered_data) # Returns: "Low", "Medium", "High"
# Use specialized methods
plot(clustered_data, feature = "treatment") # Cluster visualization
DiscriminantPlot(clustered_data) # Discriminant analysis
validateCluster(clustered_data) # Quality assessment
splinePlot(clustered_data) # Spline-based plots
getAnnotations(clustered_data) # Features + clusters
# NEW: Advanced analysis methods
SubjectInfo(clustered_data, "subject_123") # Single subject analysis
SubjectInfo(clustered_data, c("s1", "s2")) # Multiple subjects
clusterDistribution(clustered_data, "treatment") # Feature distribution table
report <- generateReport(clustered_data = clustered_data) # Comprehensive reportThe package uses S4 method dispatch to provide intelligent function behavior based on object class:
| Method | CONNECTORData | CONNECTORDataClustered |
|---|---|---|
plot() |
Time series plots via PlotTimeSeries() |
Cluster plots via ClusterPlot() |
getAnnotations() |
Lists available feature names | Shows features with cluster assignments |
summary() |
Data summary statistics | (inherited from base) |
This design ensures that the same function name (plot(), getAnnotations()) automatically does the right thing based on whether you're working with raw data or clustering results.
library(MultiConnector)
library(dplyr)
# 1. Load your data
data <- ConnectorData("timeseries.xlsx", "annotations.csv")
# 2. Explore the data
plot(data)
plot(data, feature = "treatment_group")
# 3. Estimate optimal spline dimensions
dimension_results <- estimatepDimension(data, p = 2:8, cores = 2)
# 4. Perform clustering analysis
cluster_results <- estimateCluster(
data,
G = 2:5, # Test 2-5 clusters
p = 4, # Spline dimension from step 3
runs = 50, # Multiple runs for stability
cores = 4 # Parallel processing
)
# 5. Select optimal configuration
plot(cluster_results) # chose G based on the plot)
final_clusters <- selectCluster(cluster_results, G = 3, best = "MinfDB")
# 6. Visualize results
plot(final_clusters)
plot(final_clusters, feature = "treatment_group")
# 7. Validate clustering quality
validation <- validateCluster(final_clusters)
print(validation$plot)
print(validation$entropy_silhouette_table) # Per-curve quality metrics
print(validation$assignmentProbs) # Cluster membership probabilities# Load example ovarian cancer data
system.file("Data/OvarianCancer/Ovarian_TimeSeries.xlsx", package="MultiConnector") -> time_series_path
system.file("Data/OvarianCancer/Ovarian_Annotations.txt", package="MultiConnector") -> annotations_path
# Create data object
ovarian_data <- ConnectorData(time_series_path, annotations_path)
# Quick visualization
plot(ovarian_data, feature = "Progeny")
# Full analysis pipeline
results <- estimateCluster(ovarian_data, G = 2:4, p = 3, runs = 20)
best_model <- selectCluster(results, G = 3, best = "MinfDB")
validation <- validateCluster(best_model)The SubjectInfo() function provides detailed analysis of specific subjects within their clustering context:
# Single subject analysis
subject_analysis <- SubjectInfo(clustered_data, subjIDs = "patient_001")
# Multiple subjects comparison
multi_analysis <- SubjectInfo(clustered_data,
subjIDs = c("patient_001", "patient_045", "patient_123"))
# With feature-based coloring
feature_analysis <- SubjectInfo(clustered_data,
subjIDs = "patient_001",
feature = "treatment",
feature_type = "discrete")
# Access results
subject_analysis$cluster_assignments # "Subject patient_001 belongs to Cluster 2"
subject_analysis$highlighted_plot # Plot with subject highlighted
subject_analysis$quality_metrics # Silhouette/entropy table
subject_analysis$subjects_data # Subject's time series dataAnalyze how different features are distributed across clusters:
# Single feature distribution
dist_table <- clusterDistribution(clustered_data, "treatment")
# Multiple features distribution (multi-dimensional analysis)
multi_dist <- clusterDistribution(clustered_data, c("treatment", "age_group"))
# With totals
detailed_table <- clusterDistribution(clustered_data, "age_group",
include_totals = TRUE)
# View results
print(dist_table)
# treatment cluster1 cluster2 cluster3 Total
# Control 25 15 10 50
# Treatment 20 30 25 75
# TOTAL 45 45 35 125
# Multi-feature example output
print(multi_dist)
# treatment age_group cluster1 cluster2 cluster3 Total
# Control Young 10 5 3 18
# Control Old 15 10 7 32
# Treatment Young 8 15 10 33
# Treatment Old 12 15 15 42
# TOTAL TOTAL 45 45 35 125
# Check table metadata
attr(dist_table, "total_subjects") # Total number of subjects
attr(dist_table, "missing_values") # Count of missing values per feature
attr(dist_table, "n_complete_cases") # Subjects with complete dataGenerate complete analysis reports including all plots and tables:
# Basic report
report <- generateReport(clustered_data = clustered_data)
# Advanced report with features
comprehensive_report <- generateReport(
data = original_data, # Include dimension analysis
clustered_data = clustered_data, # Include clustering results
report_title = "Clinical Trial Analysis",
features = c("treatment", "age", "gender"),
include_dimension_analysis = TRUE,
include_cluster_analysis = TRUE
)
# View report summary
printReportSummary(comprehensive_report)
# Access specific elements
comprehensive_report$plots$cluster_plot_basic # Basic cluster plot
comprehensive_report$plots$dimension_analysis # Dimension selection plot
comprehensive_report$tables$cluster_assignments # Cluster size table
comprehensive_report$tables$quality_metrics # Silhouette/entropy metrics
# Feature-specific plots
comprehensive_report$plots$cluster_plots_by_feature$treatment
comprehensive_report$plots$cluster_plots_by_feature$age# Advanced clustering with custom parameters
advanced_results <- estimateCluster(
data = my_data,
G = 2:6, # Cluster range
p = c("measure1" = 4, # Different spline dimensions per measure
"measure2" = 3),
h = 2, # Latent factor dimension
runs = 100, # More runs for stability
cores = 8, # Parallel processing
seed = 2024 # Reproducibility
)# Classify new data using existing model
new_classifications <- ClassificationCurves(
newdata = new_connector_data,
CONNECTORDataClustered = final_clusters,
cores = 4
)# Multiple visualization options
plot(clustered_data) # Basic cluster plot
DiscriminantPlot(clustered_data, feature = "treatment") # Discriminant analysis
splinePlot(clustered_data) # Spline visualization
MaximumDiscriminationFunction(clustered_data) # Discrimination weightsdemo/DemoOvarianCancer.R: Complete one-dimensional clustering analysisdemo/DemoMCL.R: Two-dimensional clustering exampledemo/DemoEmoCovid.R: COVID-19 emotional data analysis
Access detailed tutorials with:
browseVignettes("MultiConnector")# Function-specific help
?ConnectorData
?estimateCluster
?selectCluster
# Package overview
help(package = "MultiConnector")If you use MultiConnector in your research, please cite:
Soon.....
- Pernice Simone - Developer
- Sirovich Roberta - Developer
- Frattarola Marco - Developer
This project is licensed under the GPL-3 License - see the LICENSE file for details.
Explore comprehensive HTML tutorials with interactive examples and detailed explanations:
Complete Guide to One-Dimensional Functional Clustering
- Title: MultiConnector: Complete Guide to One-Dimensional Functional Clustering
- Subtitle: Step-by-Step Analysis of Longitudinal Data
- Dataset: Ovarian cancer cell growth data analysis
Complete Guide to Two-Dimensional Functional Clustering
- Title: MultiConnector: Complete Guide to Two-Dimensional Functional Clustering
- Subtitle: Step-by-Step Analysis of Longitudinal Data
- Dataset: MCL MRD data
Method 1: Direct File Access (Recommended)
Navigate to the vignettes/ folder in the GitHub repository and open the HTML files in your browser:
# Clone or download the repository, then open in browser:
# MultiConnector/vignettes/OneD_Clustering_Guide.htmlor
# Open in browser:
browseVignettes("MultiConnector")Method 2: R Script Demos
Run the comprehensive demo scripts available in the demo/ folder:
# After installing the package, run example demos
demo("DemoOvarianCancer", package = "MultiConnector") # Complete 1D analysisMethod 3: GitHub Pages (Future) Interactive tutorials will be available online once the package is fully published.
- cluster: Classical clustering methods
- mixtools: Mixture model clustering
- fda: Functional data analysis
- funclust: Alternative functional clustering approaches