Skip to content

Commit 1b4cbcb

Browse files
committed
update README
1 parent ffb6e6b commit 1b4cbcb

File tree

1 file changed

+87
-13
lines changed

1 file changed

+87
-13
lines changed

README.md

+87-13
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,53 @@
1+
Overview
2+
========
3+
4+
FLEA is a bioinformatics pipeline for analyzing longitudinal
5+
sequencing data from the Pacific Biosciences RS-II or Sequel. It
6+
currently supports full-length HIV *env* sequences.
7+
8+
The pipeline takes a set of FASTQ files, one per time point,
9+
containing circular consensus sequence (CCS) reads, which can be
10+
obtained using the ”Reads of Insert“ protocol on PacBio’s SMRTportal
11+
or SMRTanalysis tools. It produces a JSON file containing the
12+
following results:
13+
14+
- a multiple sequence alignment of high-quality consensus sequences
15+
for each time point
16+
17+
- a maximum-likelihood phylogenetic tree, inferred using
18+
[FastTree](http://www.microbesonline.org/fasttree/)
19+
20+
- the most recent common ancestor (MRCA) and other inferred ancestor
21+
sequences
22+
23+
- a two-dimensional embedding that respects TN93 sequence distances
24+
25+
- per-site selection pressure, inferred using
26+
[FUBAR](https://veg.github.io/hyphy-site/methods/selection-methods/),
27+
and other per-site evolutionary metrics
28+
29+
- per-segment evolutionary and phenotypic metrics, inferred using
30+
[HyPhy](http://www.hyphy.org/)
31+
32+
The pipeline logic is implemented in
33+
[Nextflow](https://www.nextflow.io/). A full description of the
34+
pipeline has been submitted for publication. A link to the journal
35+
article will be added here when it is available.
36+
37+
Setup
38+
=====
39+
140
Dependencies
241
------------
3-
- Nextflow
4-
- Python
5-
- usearch
6-
- mafft
7-
- HyPhy
8-
- TN93
9-
- GNU parallel
42+
- [Nextflow](https://www.nextflow.io/)
43+
- [Python](https://www.python.org/)
44+
- [USEARCH](https://www.drive5.com/usearch/)
45+
- [MAFFT](https://mafft.cbrc.jp/alignment/software/)
46+
- [HyPhy](http://www.hyphy.org/)
47+
- [FastTree](http://www.microbesonline.org/fasttree/)
48+
- [TN93](https://github.com/veg/tn93)
49+
- [GNU parallel](https://www.gnu.org/software/parallel/)
50+
- Python dependencies (see below)
1051

1152
Install Python scripts
1253
----------------------
@@ -24,17 +65,50 @@ To test:
2465
python setup.py nosetests
2566

2667

68+
Configuration
69+
-------------
70+
71+
The default config file is `nextflow.config`. It is recommended that
72+
you make a seperate config file that overrides any options that need
73+
to be customized. For more information on Nextflow-specific
74+
configuration, see [the Nextflow
75+
documentation](https://www.nextflow.io/docs/latest/config.html).
76+
77+
At the very least, `params.reference_dir` and the parameters that
78+
depend on it need to point to the various reference files used by the
79+
pipeline:
80+
81+
- `params.reference_db`: FASTA file of reference sequences
82+
- `params.contaminants_db`: FASTA file of contaminant sequences
83+
- `params.reference_dna`: reference DNA sequence
84+
- `params.reference_protein`: reference amino acid sequence
85+
- `params.reference_coordinates`:
86+
87+
2788
Usage
28-
-----
29-
Write a control file containing a list of fastq files, their sequence ids, and
30-
their dates, seperated by spaces.
89+
=====
90+
91+
Write a control file containing a list of FASTQ files, visit codes,
92+
and dates, seperated by spaces.
3193

32-
<file> <label> <date>
33-
<file> <label> <date>
94+
<file> <visit code> <date>
95+
<file> <visit code> <date>
3496
....
3597

3698
Dates must be in 'YYYYMMDD' format.
3799

38100
Run the pipeline with Nextflow:
39101

40-
nextflow path/to/flea.nf --infile path/to/metadata --results_dir path/to/results
102+
nextflow path/to/flea.nf -c path/to/custom/config/file \
103+
--infile path/to/metadata \
104+
--results_dir path/to/results
105+
106+
The results directory will contain output from lots of pipeline
107+
steps. The two files that contain the final results are:
108+
109+
- `session.json`: a JSON file to be visualized with
110+
[`flea-web-app`](https://github.com/veg/flea-web-app).
111+
112+
- `session.zip`: a zip file with FASTA files for the consensus
113+
sequences, ancestors, and MRCA, and a Newick file containing the
114+
rooted phylogenetic tree.

0 commit comments

Comments
 (0)