PLS-DA Analysis for MS/MS IDA Data

Complete implementation of Partial Least Squares Discriminant Analysis (PLS-DA) for tandem mass spectrometry data analysis.

Overview

This repository provides a comprehensive, production-ready implementation of PLS-DA specifically designed for MS/MS (Mass Spectrometry) IDA (Information Dependent Acquisition) data analysis. Perfect for metabolomics, proteomics, and biomarker discovery workflows.

Key Features

Complete PLS-DA implementation using NIPALS algorithm
Zero dependencies - uses only base R
Cross-validation for optimal model selection
Variable Importance (VIP) scores for biomarker discovery
7 publication-ready visualizations
4 detailed CSV exports for further analysis
Comprehensive documentation with examples
Example dataset included for immediate testing

Quick Start

Installation

# Clone the repository
git clone https://github.com/[your-username]/PLSDA-MSMS-Analysis.git
cd PLSDA-MSMS-Analysis

Run the Example

# In R console
source("plsda_analysis.R")

# Or from command line
Rscript plsda_analysis.R

That's it! The script will:

Generate example MS/MS data (150 samples, 50 features, 3 classes)
Perform PLS-DA analysis with cross-validation
Create 7 plots and 4 data files
Display comprehensive results summary

Runtime: ~5-10 seconds for example data

Example Output

The analysis automatically generates:

Visualizations (PNG)

Scores Plot - Sample clustering and group separation
Test Predictions - Model validation on held-out data
VIP Scores - Top 20 most important features
Loadings Plot - Feature contribution to separation
CV Results - Model optimization curve
Confusion Matrix - Classification performance heatmap
Variance Explained - Component importance

Data Files (CSV)

Predictions - Sample classifications with scores
VIP Scores - All features ranked by importance
Model Summary - Performance metrics (accuracy, R²X, R²Y)
Loadings - Feature contributions to each component

Documentation

For Beginners

QUICK_START.md - Get running in 5 minutes
Sections 1-5 of README_FULL.md - Basic concepts and usage

For Advanced Users

README_FULL.md - Complete 2000+ line documentation including:
- Statistical theory and mathematical background
- Using your own data (step-by-step guide)
- Parameter tuning and optimization
- Troubleshooting common issues
- Best practices for publication
- Advanced topics (permutation tests, bootstrap, etc.)

Use Cases

Perfect for:

Metabolomics - Identify metabolite biomarkers between groups
Proteomics - Discover differentially expressed proteins
Lipidomics - Classify samples based on lipid profiles
Biomarker Discovery - Rank features by discriminative power
Quality Control - Validate analytical methods
Method Development - Optimize sample preparation protocols

Why This Implementation?

Advantages of PLS-DA for MS/MS Data

Feature	Benefit
Handles high dimensionality	Works with 1000s of m/z features
Manages multicollinearity	Correlated features (common in MS data)
Supervised classification	Uses group labels for maximum separation
Interpretable results	VIP scores, loadings, and scores plots
Robust to noise	Dimensionality reduction filters noise
No external dependencies	Pure R implementation

Why Not Just Use PCA?

PCA is unsupervised (ignores group labels)
PLS-DA maximizes separation between known groups
PLS-DA provides biomarker rankings (VIP scores)
PLS-DA is designed for classification tasks

Requirements

R version ≥ 4.0.0
No additional packages required!
Memory: Minimum 4GB RAM (8GB recommended)
Storage: ~10MB for outputs

Methodology

This implementation uses:

NIPALS algorithm for PLS component extraction
Stratified train-test split (70-30 default)
K-fold cross-validation for component optimization
VIP scores for feature importance ranking
Dummy matrix encoding for multi-class problems

See README_FULL.md Section 13 for complete statistical background.

Using Your Own Data

Your CSV should have this structure:

SampleID,Class,Batch,mz_200.00,mz_250.00,mz_300.00,...
S001,Control,1,1234.5,2345.6,3456.7,...
S002,Treatment,1,5678.9,6789.0,7890.1,...

See README_FULL.md Section 9 for detailed integration guide.

Quick integration:

# Load your data
my_data <- read.csv("your_msms_data.csv")

# Extract features and labels
feature_cols <- grep("^mz_", colnames(my_data), value = TRUE)
features <- as.matrix(my_data[, feature_cols])
labels <- factor(my_data$Class)

# Continue with script from Section 2 (Preprocessing)

Contributing

Contributions are welcome! Areas for improvement:

Additional preprocessing methods
Support for more data formats
Integration with pathway analysis tools
Additional visualization options
Performance optimizations

Please open an issue or submit a pull request.

Citation

If you use this code in your research, please cite:

This repository:

AyebBlk. (2025). PLS-DA Analysis for MS/MS IDA Data. 
GitHub: https://github.com/AyehBlk/PLSDA-MSMS-Analysis

Original PLS-DA method:

Barker, M., & Rayens, W. (2003). Partial least squares for discrimination. 
Journal of Chemometrics, 17(3), 166-173.

NIPALS algorithm:

Wold, S., Sjöström, M., & Eriksson, L. (2001). PLS-regression: 
a basic tool of chemometrics. Chemometrics and Intelligent Laboratory 
Systems, 58(2), 109-130.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

You are free to:

Use for academic research
Use for commercial projects
Modify and distribute
Include in your own projects

Just include the license and attribution!

Troubleshooting

Common Issues

Problem: Low accuracy (<70%)

Check if classes are actually separable (biological question)
Ensure sufficient samples (minimum 20 per class recommended)
Try different preprocessing methods

Problem: Error "Singular matrix"

Features are too highly correlated
Remove features with correlation >0.95

Problem: All VIP scores <1

May need fewer components
Check if preprocessing is appropriate
Verify classes are actually different

See README_FULL.md Section 12 for complete troubleshooting guide.

Support

Check the comprehensive documentation
Read the FAQ
Open an issue for bugs
Start a discussion for questions

🌟 Star History

If you find this useful, please consider giving it a star! ⭐

🔗 Related Resources

MetaboAnalyst - Web-based metabolomics analysis
KEGG - Pathway database
mixOmics R package - Extended multivariate methods
ropls Bioconductor - Alternative PLS-DA implementation

👤 Author

Ayeh Bolouki

GitHub: @AyehBlk
Role: Computational Biologist / Bioinformatician

Project Status

Active Development - Maintained and open to contributions

Current Version: 1.0 Last Updated: October 2025

Made with ❤️ - Let's make free science for everybody around the world.

_{If this helped your research, consider citing it in your publications!}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PLS-DA Analysis for MS/MS IDA Data

Overview

Key Features

Quick Start

Installation

Run the Example

Example Output

Visualizations (PNG)

Data Files (CSV)

Documentation

For Beginners

For Advanced Users

Use Cases

Why This Implementation?

Advantages of PLS-DA for MS/MS Data

Why Not Just Use PCA?

Requirements

Methodology

Using Your Own Data

Contributing

Citation

📄 License

Troubleshooting

Common Issues

Support

🌟 Star History

🔗 Related Resources

👤 Author

Project Status

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
LICENSE		LICENSE
QUICK_START.md		QUICK_START.md
README.md		README.md
README_FULL.md		README_FULL.md
plsda_analysis.R		plsda_analysis.R

License

AyehBlk/PLSDA-MSMS-Analysis

Folders and files

Latest commit

History

Repository files navigation

PLS-DA Analysis for MS/MS IDA Data

Overview

Key Features

Quick Start

Installation

Run the Example

Example Output

Visualizations (PNG)

Data Files (CSV)

Documentation

For Beginners

For Advanced Users

Use Cases

Why This Implementation?

Advantages of PLS-DA for MS/MS Data

Why Not Just Use PCA?

Requirements

Methodology

Using Your Own Data

Contributing

Citation

📄 License

Troubleshooting

Common Issues

Support

🌟 Star History

🔗 Related Resources

👤 Author

Project Status

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages