You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+3-3
Original file line number
Diff line number
Diff line change
@@ -4,11 +4,11 @@ All notable changes to this project will be documented in this file.
4
4
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
5
5
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
6
6
7
-
## [Unreleased]
7
+
## [v0.14.0]
8
8
### Added
9
-
- Support for parquet file format output. Search results and reporter ion quantification will be written to one file (`results.sage.parquet`) and label-free quant will be written to another (`lfq.parquet`)
9
+
- Support for parquet file format output. Search results and reporter ion quantification will be written to one file (`results.sage.parquet`) and label-free quant will be written to another (`lfq.parquet`). Parquet files tend to be significantly smaller than TSV files, faster to parse, and are compatible with a variety of distributed SQL engines.
10
10
### Changed
11
-
- Implement heapselect algorithm for faster sorting of candidate matches (#80)
11
+
- Implement heapselect algorithm for faster sorting of candidate matches (#80). This is a backwards-incompatible change with respect to output - small changes in PSM ranks will be present between v0.13.4 and v0.14.0
Copy file name to clipboardExpand all lines: DOCS.md
+2-2
Original file line number
Diff line number
Diff line change
@@ -240,7 +240,7 @@ The "results.sage.tsv" file contains the following columns (headers):
240
240
-`spectrum_q`: Assigned spectrum-level q-value.
241
241
-`peptide_q`: Assigned peptide-level q-value.
242
242
-`protein_q`: Assigned protein-level q-value.
243
-
-`ms1_intensity`: Intensity of the MS1 precursor ion
243
+
-`ms1_intensity`: Intensity of the selected MS1 precursor ion (not label-free quant)
244
244
-`ms2_intensity`: Total intensity of MS2 spectrum
245
245
246
-
These columns provide comprehensive information about each candidate peptide spectrum match (PSM) identified by the Sage search engine, enabling users to assess the quality and characteristics of the results.
246
+
These columns provide comprehensive information about each candidate peptide spectrum match (PSM) identified by the Sage search engine.
- Retention time prediction models fit to each LC/MS run
28
30
- PSM rescoring using built-in linear discriminant analysis (LDA)
29
31
- PEP calculation using a non-parametric model (KDE)
@@ -33,15 +35,11 @@ Check out the [blog post introducing Sage](https://lazear.github.io/sage/) for m
33
35
- Built-in support for reading gzipped-mzML files
34
36
- Support for reading/writing directly from AWS S3
35
37
36
-
### Experimental features
37
-
38
-
- Label-free quantification: consider all charge states & isotopologues *a la* FlashLFQ
39
-
40
38
### Assign multiple peptides to complex spectra
41
39
42
40
<imgsrc="figures/chimera_27525.png"width="800">
43
41
44
-
- When chimeric searching is turned on, 2 peptide identifications will be reported for each MS2 scan, both with `rank=1`
42
+
- When chimeric searching is enabled, multiple peptide identifications can be reported for each MS2 scan
45
43
46
44
### Sage trains machine learning models for FDR refinement and posterior error probability calculation
47
45
@@ -113,6 +111,8 @@ Options:
113
111
Path where search and quant results will be written. Overrides the directory specified in the configuration file.
114
112
--batch-size <batch-size>
115
113
Number of files to search in parallel (default = number of CPUs/2)
114
+
--parquet
115
+
Write parquet files instead of tab-separated files
116
116
--write-pin
117
117
Write percolator-compatible `.pin` output files
118
118
-h, --help
@@ -127,7 +127,7 @@ Example usage: `sage config.json`
127
127
128
128
Some options in the parameters file can be over-written using the command line interface. These are:
129
129
130
-
1. The paths to the raw mzML data
130
+
1. The paths to the mzML data
131
131
2. The path to the database (fasta file)
132
132
3. The output directory
133
133
@@ -149,12 +149,14 @@ Running Sage will produce several output files (located in either the current di
149
149
- MS2 search results will be stored as a tab-separated file (`results.sage.tsv`) file - this is a tab-separated file, which can be opened in Excel/Pandas/etc
150
150
- MS2 and MS3 quantitation results will be stored as a tab-separated file (`tmt.tsv`, `lfq.tsv`) if `quant.tmt` or `quant.lfq` options are used in the parameter file
151
151
152
+
If `--parquet` is passed as a command line argument, `results.sage.parquet` (and optionally, `lfq.parquet`) will be written. These have a similar set of columns, but TMT values are stored as a nested array alongside PSM features
153
+
152
154
## Configuration file schema
153
155
154
156
### Notes
155
157
156
158
- The majority of parameters are optional - only "database.fasta", "precursor_tol", and "fragment_tol" are required. Sage will try and use reasonable defaults for any parameters not supplied
157
-
- Tolerances are specified on the *experimental* m/z values. To perform a -100 to +500 Da open search (mass window applied to *precursor*), you would use `"da": [-500, 100]`
159
+
- Tolerances are specified on the *experimental* m/z values. To perform a -100 to +500 Da open search (mass window applied to *theoretical*), you would use `"da": [-500, 100]`
0 commit comments