Releases: Ecogenomics/GTDBTk
2.7.1
2.7.0
Release 2.7.0+ updates GTDB-Tk to use the latest GTDB R232 taxonomy and requires a new reference package. It includes the following new features:
- Pre-sketched skani database: GTDB-Tk now uses a skani pre-sketched database of the GTDB representative genomes. This significantly reduces the database storage footprint from 198 GB (in Release 232) down to 98 GB.
- Representative genomes availability: The GTDB representative genomes are now available via the "Download" page on the GTDB website.
- Deprecated flag: Because the database is already sketched natively, the
--skani_sketch_dirflag is now deprecated. - Replaced
--skip_ani_screenwith--place_species: The--skip_ani_screenflag is now deprecated in v2.7.0 and has been replaced by the--place_speciesflag. The logic has been updated to reflect the new database structure:- Previously: Using
--skip_ani_screen, genomes placed in a genus by pplacer were only compared to representative genomes within that specific genus. - Now: Because the database is a single skani sketch, user genomes are compared against all GTDB reference genomes once at the very beginning of the pipeline. When the new
--place_speciesflag is selected, the genomes are still explicitly placed in the reference tree.
- Previously: Using
2.6.1
2.6.0
Major Changes:
- GTDB-Tk has now a fixed version for skani (v0.3.1) and pplacer (v1.1.alpha19) to i) ensure reproducibility of results and ii) use the sketch format compatible with skani v0.3.1.
- The limit of number of genomes compared in dense genera has been removed.This ensures that all representative genomes in a genus are compared, preventing incorrect species assignments when the closest genome by ANI is outside the previous 100-genome limit. This is especially important in dense genera like Collinsella and significantly improves classification accuracy, even if runtime increases slightly.
Bug Fixes:
- (#670, #674, #668 ) Fixed an issue where GTDB-Tk would crash when using pplacer v1.1.alpha20. This issue is now resolved by fixing pplacer to v1.1.alpha19.
- (#671) The limit of number of genomes compared in dense genera has been removed.
- (#672) skani is now fixed to v0.3.1 to and uses
sketch+searchcommands instead ofdist. - (#665) GTDB-Tk now uses skani v0.3.1 and have a option to save the sketch db for reference genomes for future use(
--skani_sketch_dir). - (#669) BaseModel from pydantic is now replaces by DataClass to avoid warnings with pydantic v2.x.
2.5.2
2.5.1
2.5.0
Bug Fixes:
- (#644 , #641) Fixed compatibility with recent versions of NumPy (≥1.24), which removed the tostring() method from numpy.ndarray.
Minor Changes:
- (#650) Update CLI with an up-to-date taxon.
Major Changes:
- GTDB-Tk now uses Skani exclusively for genome clustering, replacing the previous Mash/Skani hybrid approach. This change simplifies the CLI and removes the dependency on Mash, streamlining installation and execution.
2.4.1
2.4.0
Bug Fixes:
- (#576) When all genomes fail the prodigal step in the
classify_wf, The
bac120 summary file is still produced with the all failed genomes listed as 'Unclassified' - (#573) When running the 3 classify steps independently, a genome can be filtered out in the
align
step but still be classified in theidentifystep. To avoid duplication of row, the genome is classified with a warning. - (#540 ) Empty files are skipped during the sketch step of
Mash,
they are then catched in theprodigalstep and are returned as 'Unclassified' - (#549) :
--forcehas been modified to deal with #540.Prodigal
wasn't returning the empty files as failed genomes, it was only skipping them. These genomes are now returned in the summary file and flagged as Unclassified.
Major Changes:
-
FastANIhas been replaced byskanias the primary tool for computing Average Nucleotide Identity (ANI).Users may notice slight variations in the results compared to those obtained usingFastANI. -
In the generated
summary.tsvfiles, several columns have been renamed for clarity and consistency. The following columns have been affected:- "
fastani_reference" column has been renamed to "closest_genome_reference". - "
fastani_reference_radius" column has been renamed to "closest_genome_reference_radius". - "
fastani_taxonomy" column has been renamed to "closest_genome_taxonomy". - "
fastani_ani" column has been renamed to "closest_genome_ani". - "
fastani_af" column has been renamed to "closest_genome_af".
- "
These changes have been implemented to improve the readability and understanding of the data within the summary.tsv files. Users should update their scripts or processes accordingly to reflect these renamed column headers.