Skip to content

Commit 6bf67d9

Browse files
committed
fasta conversion notes
Former-commit-id: 1ccd77ab3668bacfeed57474008e84b7c006a38f
1 parent f6b3a12 commit 6bf67d9

File tree

1 file changed

+18
-0
lines changed

1 file changed

+18
-0
lines changed

README.md

+18
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,24 @@ To read a burst sequence file (e.g. `fixtures/cloudburst/100k.br`) in order to c
3333

3434
This will write out each record (or chunk) from the sequence file to a text file on disk.
3535

36+
To convert a FASTA file for use with Brisera alignments, you can use the `convert_fasta.py` Spark application as follows:
37+
38+
$ spark-submit --master local[*] apps/convert_fasta.py input.fa output.ser
39+
40+
This command will transform a single FASTA file into a chunked, binary format that can be used with Spark in a computationally efficient way (preparing the chunks for seed reduce).
41+
42+
To compute alignments, use the `brisera_align.py` Spark application. Note that this application takes its configuration from the `conf/brisera.yaml` file, an example of which is included. You can modify the configuration for the app by modifying that file. Run the alignment as follows:
43+
44+
$ spark-submit --master local[*] apps/brisera_align.py refpath qrypath outpath
45+
46+
The input is as follows:
47+
48+
- The `refpath` is the converted FASTA file of the reference genome you wish to align to
49+
- The `qrypath` is the set of queries or reads that you would like aligned to the reference
50+
- The `outpath` is where the alignment information will be written to when complete
51+
52+
Depending on the value you set for K, this could take seconds to hours; be aware of how modifying the settings can change things!
53+
3654
Other Details
3755
-------------
3856

0 commit comments

Comments
 (0)