CircleSim is a command-line tool designed to simulate circular DNA and RNA sequences and generate corresponding sequencing reads. This tool can be helpful for researchers working with circular DNA (like eccDNA) or RNA, providing an easy way to simulate the coordinates of these molecules and generate sequencing files for further analysis.
- Simulate Coordinates: Generates genomic coordinates for circular DNA or RNA molecules with customizable parameters.
- Simulate Reads: Creates simulated sequencing reads for the generated circular sequences, including options to introduce mutations and sequencing errors.
- Merge FASTQ Files: Merges FASTQ files from different sequencing simulations into a single output.
To install CircleSim in a Conda environment, follow these steps:
git clone https://github.com/yourusername/CircleSim.git
cd CircleSim
conda create -n circlesim python=3.8
conda activate circlesim
pip install -r requirements.txtCircleSim requires a reference genome in FASTA format for simulating circular DNA or RNA sequences. You can download a suitable genome file and place it in a designated folder called database. If a reference genome is not specified in the scripts, CircleSim will automatically look for the genome file in this folder. For circRNA simulations, circles can also be created from transcriptome regions using the database/Homo_sapiens.GRCh38.cdna.all.short.bed file.
Simulate Coordinates:
python CircleSim.py coordinates -t DNA -T circular -n 100 -l 500 -L 1500 -o circular_coordinates.bed
python CircleSim.py coordinates -t DNA -T linear -n 100 -l 500 -L 1500 -o linear_coordinates.bedSimulate reads:
python CircleSim.py reads -b circular_coordinates.bed -o circular_reads_
python CircleSim.py reads -b linear_coordinates.bed -o linear_reads_Merge FASTQ Files:
python CircleSim.py -c circular_reads_R1.fastq -l linear_reads_R1.fastq -o reads_R1.fastq
python CircleSim.py -c circular_reads_R2.fastq -l linear_reads_R2.fastq -o reads_R2.fastqThe following parameters are used to generate genomic coordinates for circular DNA or RNA molecules:
-t,--type: Type of molecule to simulate. Options:DNA,RNA. Default isDNA.-T,--molecule: Specifies whether the molecule islinearorcircular. Default iscircular.-n,--number: Number of circular sequences to simulate. Default is100.-d,--distribution: Distribution of the circle lengths. Options:uniform,lognormal. Default isuniform.-l,--length_min: Minimum length of the circular DNA/RNA. Default is300.-L,--length_max: Maximum length of the circular DNA/RNA. Default is10,000.-m,--mean: Mean length of the circles when using lognormal distribution. Default is1,000.-sd,--sd: Standard deviation for the lognormal distribution of circle lengths. Default is1.-s,--split: Specifies a sequence (e.g.,AGGT) to enable splitting of circular sequences at specific points.-g,--genome_fasta: Path to the reference genome FASTA file.-r,--transcript_bed: Path to the transcript BED file for circRNA simulations.-o,--output_bed: Path for saving the output BED file with simulated coordinates. Default is the current directory.
These parameters are used to generate sequencing reads for the simulated circles:
-t,--type: Type of circle. Options:DNAorRNARequired-T,--molecule: Specifies the molecule as eitherlinearorcircular. Default iscircular.-s,--sequence: Coverage type for read simulation. Default isshort.-c,--coverage: Sequencing coverage. Default is30x.-r,--reads_length: Length of the simulated reads. Default is150base pairs.-i,--insert_length: Insert length for paired-end reads. Default is500.-a,--alpha: Alpha parameter for the beta distribution, which influences mutation rates. Default is0.5.-v,--beta: Beta parameter for the beta distribution. Default is0.5.--mutation: Enables mutation simulation.--save_unmutated: Option to save unmutated reads.--mutation_rate: Specifies the mutation rate for the simulated reads. Default is0.01.--error_rate: Specifies the sequencing error rate. Default is0.001.-g,--genome_fasta: Path to the genome FASTA file.-b,--input_bed: Path to the input BED file with coordinates for the circular sequences. Required-o,--output_fastq: Path for saving the simulated FASTQ reads. Default is the current directory.
Use these parameters to merge FASTQ files from different simulations:
-c,--circle_fastq: Path to the FASTQ file for circular sequences. Required-l,--linear_fastq: Path to the FASTQ file for linear sequences. Required-o,--output_fastq: Path for saving the merged FASTQ file. Default is the current directory.
CircleSim is freely available under the MIT license.
CircleSim is developed by Aitor Zabala, Iñigo Prada-Luengo, Alex Martínez Ascensión and David Otaegui at Biogipuzkoa Health Research Institute.