Skip to content

Commit 77b555f

Browse files
committed
better structure
Former-commit-id: a5d9c71d6218717dd75c3ec7095de0f72f75c863
1 parent 2a554f8 commit 77b555f

19 files changed

+629084
-0
lines changed

README.md

+40
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
Brisera
2+
=======
3+
4+
A Python implementation of a distributed seed and reduce algorithm (similar to BlastReduce and CloudBurst) that utilizes RDDs (resilient distributed datasets) to perform fast iterative analyses and dynamic programming without relying on chained MapReduce jobs.
5+
6+
Quick Start
7+
-----------
8+
9+
The code is organized as follows:
10+
11+
- `apps/` - this directory contains the SparkApplications to be run
12+
- `brisera/` - this is the python module with the code
13+
- `tests/` - contains a stub testing library for ensuring things work
14+
- `fixtures/` - contains reference data for running the apps against
15+
- `docs/` - stubs for documentation for the project
16+
17+
To install the required dependencies:
18+
19+
$ pip install -r requirements.txt
20+
21+
The code for Brisera is found in the `brisera` Python module. This module must be available to the spark applications (e.g. able to be imported) either by running the spark applications locally in the working directory that contains `brisera` or by using a virtual environment (recommended). You can install `brisera` and all dependencies, use the setup.py function:
22+
23+
$ python setup.py install
24+
25+
But note that you will still have to have access to the Spark applications that are in the `apps/` directory - don't delete them out of hand!
26+
27+
Other Details
28+
-------------
29+
30+
Brisera means to "explode" or to "burst" in Swedish. Since I'm reworking CloudBurst and BlastReduce (both of which use BLAST) to Spark (weirdly all the same terminology) it felt right to name the project something burst/explode related. (I tried a few languages, but Swedish had the best result).
31+
32+
### References
33+
34+
1. M\. C. Schatz, “BlastReduce: high performance short read mapping with MapReduce,” University of Maryland, [http://cgis. cs.umd.edu/Grad/scholarlypapers/papers/MichaelSchatz](http://cgis. cs.umd.edu/Grad/scholarlypapers/papers/MichaelSchatz). pd f, 2008.
35+
36+
1. M\. C. Schatz, “CloudBurst: highly sensitive read mapping with MapReduce,” Bioinformatics, vol. 25, no. 11, pp. 1363–1369, 2009.
37+
38+
1. X\. Li, W. Jiang, Y. Jiang, and Q. Zou, “Hadoop Applications in Bioinformatics,” in Open Cirrus Summit (OCS), 2012 Seventh, 2012, pp. 48–52.
39+
40+
1. R\. K. Menon, G. P. Bhat, and M. C. Schatz, “Rapid parallel genome indexing with MapReduce,” in Proceedings of the second international workshop on MapReduce and its applications, 2011, pp. 51–58.
File renamed without changes.

brisera/__init__.py

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
"""
2+
Brisera in Swedish means to spark, burst, or explode - perfect for an
3+
implementation of BlastReduce (CloudBurst) in Python with PySpark!
4+
"""
5+
6+
__version__ = "1.0"

brisera/utils.py

+46
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
"""
2+
Utility functions for Brisera
3+
"""
4+
5+
##########################################################################
6+
## Imports
7+
##########################################################################
8+
9+
import os
10+
11+
##########################################################################
12+
## Module Constants
13+
##########################################################################
14+
15+
BASE_DIR = os.path.abspath(os.path.join(os.path.dirname(__file__), ".."))
16+
FIXTURES = os.path.join(BASE_DIR, "fixtures")
17+
18+
##########################################################################
19+
## Utility Functions
20+
##########################################################################
21+
22+
def fixture(fname, label="reference"):
23+
"""
24+
Returns a path to a fixture via the given fname and label
25+
"""
26+
return os.path.join(FIXTURES, label, fname)
27+
28+
def fasta(path):
29+
"""
30+
Reads a file in FASTA format, returning a tuple, (label, sequence).
31+
"""
32+
label = None
33+
sequence = None
34+
with open(path, 'r') as data:
35+
for line in data:
36+
line = line.strip()
37+
if line.startswith('>'):
38+
if label and sequence:
39+
yield (label, sequence)
40+
label = line[1:]
41+
sequence = ""
42+
else:
43+
sequence += line
44+
45+
if label and sequence:
46+
yield (label, sequence)

docs/Makefile

+177
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,177 @@
1+
# Makefile for Sphinx documentation
2+
#
3+
4+
# You can set these variables from the command line.
5+
SPHINXOPTS =
6+
SPHINXBUILD = sphinx-build
7+
PAPER =
8+
BUILDDIR = _build
9+
10+
# User-friendly check for sphinx-build
11+
ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1)
12+
$(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/)
13+
endif
14+
15+
# Internal variables.
16+
PAPEROPT_a4 = -D latex_paper_size=a4
17+
PAPEROPT_letter = -D latex_paper_size=letter
18+
ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
19+
# the i18n builder cannot share the environment and doctrees with the others
20+
I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
21+
22+
.PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext
23+
24+
help:
25+
@echo "Please use \`make <target>' where <target> is one of"
26+
@echo " html to make standalone HTML files"
27+
@echo " dirhtml to make HTML files named index.html in directories"
28+
@echo " singlehtml to make a single large HTML file"
29+
@echo " pickle to make pickle files"
30+
@echo " json to make JSON files"
31+
@echo " htmlhelp to make HTML files and a HTML help project"
32+
@echo " qthelp to make HTML files and a qthelp project"
33+
@echo " devhelp to make HTML files and a Devhelp project"
34+
@echo " epub to make an epub"
35+
@echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
36+
@echo " latexpdf to make LaTeX files and run them through pdflatex"
37+
@echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx"
38+
@echo " text to make text files"
39+
@echo " man to make manual pages"
40+
@echo " texinfo to make Texinfo files"
41+
@echo " info to make Texinfo files and run them through makeinfo"
42+
@echo " gettext to make PO message catalogs"
43+
@echo " changes to make an overview of all changed/added/deprecated items"
44+
@echo " xml to make Docutils-native XML files"
45+
@echo " pseudoxml to make pseudoxml-XML files for display purposes"
46+
@echo " linkcheck to check all external links for integrity"
47+
@echo " doctest to run all doctests embedded in the documentation (if enabled)"
48+
49+
clean:
50+
rm -rf $(BUILDDIR)/*
51+
52+
html:
53+
$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
54+
@echo
55+
@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
56+
57+
dirhtml:
58+
$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
59+
@echo
60+
@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
61+
62+
singlehtml:
63+
$(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
64+
@echo
65+
@echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
66+
67+
pickle:
68+
$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
69+
@echo
70+
@echo "Build finished; now you can process the pickle files."
71+
72+
json:
73+
$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
74+
@echo
75+
@echo "Build finished; now you can process the JSON files."
76+
77+
htmlhelp:
78+
$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
79+
@echo
80+
@echo "Build finished; now you can run HTML Help Workshop with the" \
81+
".hhp project file in $(BUILDDIR)/htmlhelp."
82+
83+
qthelp:
84+
$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
85+
@echo
86+
@echo "Build finished; now you can run "qcollectiongenerator" with the" \
87+
".qhcp project file in $(BUILDDIR)/qthelp, like this:"
88+
@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/Brisera.qhcp"
89+
@echo "To view the help file:"
90+
@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/Brisera.qhc"
91+
92+
devhelp:
93+
$(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
94+
@echo
95+
@echo "Build finished."
96+
@echo "To view the help file:"
97+
@echo "# mkdir -p $$HOME/.local/share/devhelp/Brisera"
98+
@echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/Brisera"
99+
@echo "# devhelp"
100+
101+
epub:
102+
$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
103+
@echo
104+
@echo "Build finished. The epub file is in $(BUILDDIR)/epub."
105+
106+
latex:
107+
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
108+
@echo
109+
@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
110+
@echo "Run \`make' in that directory to run these through (pdf)latex" \
111+
"(use \`make latexpdf' here to do that automatically)."
112+
113+
latexpdf:
114+
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
115+
@echo "Running LaTeX files through pdflatex..."
116+
$(MAKE) -C $(BUILDDIR)/latex all-pdf
117+
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
118+
119+
latexpdfja:
120+
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
121+
@echo "Running LaTeX files through platex and dvipdfmx..."
122+
$(MAKE) -C $(BUILDDIR)/latex all-pdf-ja
123+
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
124+
125+
text:
126+
$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
127+
@echo
128+
@echo "Build finished. The text files are in $(BUILDDIR)/text."
129+
130+
man:
131+
$(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
132+
@echo
133+
@echo "Build finished. The manual pages are in $(BUILDDIR)/man."
134+
135+
texinfo:
136+
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
137+
@echo
138+
@echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo."
139+
@echo "Run \`make' in that directory to run these through makeinfo" \
140+
"(use \`make info' here to do that automatically)."
141+
142+
info:
143+
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
144+
@echo "Running Texinfo files through makeinfo..."
145+
make -C $(BUILDDIR)/texinfo info
146+
@echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo."
147+
148+
gettext:
149+
$(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
150+
@echo
151+
@echo "Build finished. The message catalogs are in $(BUILDDIR)/locale."
152+
153+
changes:
154+
$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
155+
@echo
156+
@echo "The overview file is in $(BUILDDIR)/changes."
157+
158+
linkcheck:
159+
$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
160+
@echo
161+
@echo "Link check complete; look for any errors in the above output " \
162+
"or in $(BUILDDIR)/linkcheck/output.txt."
163+
164+
doctest:
165+
$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
166+
@echo "Testing of doctests in the sources finished, look at the " \
167+
"results in $(BUILDDIR)/doctest/output.txt."
168+
169+
xml:
170+
$(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml
171+
@echo
172+
@echo "Build finished. The XML files are in $(BUILDDIR)/xml."
173+
174+
pseudoxml:
175+
$(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml
176+
@echo
177+
@echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml."

0 commit comments

Comments
 (0)