Skip to content

Releases: Ecogenomics/GTDBTk

2.7.1

17 Apr 05:02
eb69a2f

Choose a tag to compare

Bug Fixes:

  • (#699) Although all genomes are classified with skani, selecting --place_species still requires _bac_gids, _ar_gids, and bac_ar_diff for downstream processing.
  • (#698) Fixes an MD5 mismatch in the check_install configuration.

2.7.0

15 Apr 05:03
4432746

Choose a tag to compare

Release 2.7.0+ updates GTDB-Tk to use the latest GTDB R232 taxonomy and requires a new reference package. It includes the following new features:

  • Pre-sketched skani database: GTDB-Tk now uses a skani pre-sketched database of the GTDB representative genomes. This significantly reduces the database storage footprint from 198 GB (in Release 232) down to 98 GB.
  • Representative genomes availability: The GTDB representative genomes are now available via the "Download" page on the GTDB website.
  • Deprecated flag: Because the database is already sketched natively, the --skani_sketch_dir flag is now deprecated.
  • Replaced --skip_ani_screen with --place_species: The --skip_ani_screen flag is now deprecated in v2.7.0 and has been replaced by the --place_species flag. The logic has been updated to reflect the new database structure:
    • Previously: Using --skip_ani_screen, genomes placed in a genus by pplacer were only compared to representative genomes within that specific genus.
    • Now: Because the database is a single skani sketch, user genomes are compared against all GTDB reference genomes once at the very beginning of the pipeline. When the new --place_species flag is selected, the genomes are still explicitly placed in the reference tree.

⚠️ IMPORTANT MEMORY WARNING: The divide-and-conquer approach now requires more than 128 GB of RAM. Specifically, you will need at least 140 GB of RAM for R232.

2.6.1

12 Dec 06:12
4461f70

Choose a tag to compare

Bug Fixes:

  • (#680) Solve check_install error when StageLogger path is not set

2.6.0

10 Dec 06:28
5cdb5f8

Choose a tag to compare

Major Changes:

  • GTDB-Tk has now a fixed version for skani (v0.3.1) and pplacer (v1.1.alpha19) to i) ensure reproducibility of results and ii) use the sketch format compatible with skani v0.3.1.
  • The limit of number of genomes compared in dense genera has been removed.This ensures that all representative genomes in a genus are compared, preventing incorrect species assignments when the closest genome by ANI is outside the previous 100-genome limit. This is especially important in dense genera like Collinsella and significantly improves classification accuracy, even if runtime increases slightly.

Bug Fixes:

  • (#670, #674, #668 ) Fixed an issue where GTDB-Tk would crash when using pplacer v1.1.alpha20. This issue is now resolved by fixing pplacer to v1.1.alpha19.
  • (#671) The limit of number of genomes compared in dense genera has been removed.
  • (#672) skani is now fixed to v0.3.1 to and uses sketch + search commands instead of dist.
  • (#665) GTDB-Tk now uses skani v0.3.1 and have a option to save the sketch db for reference genomes for future use( --skani_sketch_dir ).
  • (#669) BaseModel from pydantic is now replaces by DataClass to avoid warnings with pydantic v2.x.

2.5.2

12 Sep 03:00
2e9d4c1

Choose a tag to compare

Bug fixes:

  • (#662, #663) Resolves TypeError: bool() undefined when iterable == total == None

2.5.1

09 Sep 04:55
9e97cda

Choose a tag to compare

Bug Fixes:

  • (#658) Change the spinner to a progress bar

2.5.0

08 Sep 05:30
3e65722

Choose a tag to compare

Bug Fixes:

  • (#644 , #641) Fixed compatibility with recent versions of NumPy (≥1.24), which removed the tostring() method from numpy.ndarray.

Minor Changes:

  • (#650) Update CLI with an up-to-date taxon.

Major Changes:

  • GTDB-Tk now uses Skani exclusively for genome clustering, replacing the previous Mash/Skani hybrid approach. This change simplifies the CLI and removes the dependency on Mash, streamlining installation and execution.

2.4.1

18 Apr 04:27
655baba

Choose a tag to compare

Bug Fixes:

  • (#630) Fixed SyntaxWarning in Python 3.12 by using raw strings for regex in HMMResultsIO.py

Minor Changes:

  • (#631) gtdb_to_ncbi_majority_vote.py script has been included as part of the release

The GTDB-Tk version has been bumped to synchronise its release with GTDB R226.

2.4.0

24 Apr 01:21
59609e2

Choose a tag to compare

Bug Fixes:

  • (#576) When all genomes fail the prodigal step in the classify_wf, The
    bac120 summary file is still produced with the all failed genomes listed as 'Unclassified'
  • (#573) When running the 3 classify steps independently, a genome can be filtered out in the align
    step but still be classified in the identify step. To avoid duplication of row, the genome is classified with a warning.
  • (#540 ) Empty files are skipped during the sketch step of Mash,
    they are then catched in the prodigal step and are returned as 'Unclassified'
  • (#549) : --force has been modified to deal with #540. Prodigal
    wasn't returning the empty files as failed genomes, it was only skipping them. These genomes are now returned in the summary file and flagged as Unclassified.

Major Changes:

  • FastANI has been replaced by skani as the primary tool for computing Average Nucleotide Identity (ANI).Users may notice slight variations in the results compared to those obtained using FastANI.

  • In the generated summary.tsv files, several columns have been renamed for clarity and consistency. The following columns have been affected:

    • "fastani_reference" column has been renamed to "closest_genome_reference".
    • "fastani_reference_radius" column has been renamed to "closest_genome_reference_radius".
    • "fastani_taxonomy" column has been renamed to "closest_genome_taxonomy".
    • "fastani_ani" column has been renamed to "closest_genome_ani".
    • "fastani_af" column has been renamed to "closest_genome_af".

These changes have been implemented to improve the readability and understanding of the data within the summary.tsv files. Users should update their scripts or processes accordingly to reflect these renamed column headers.

2.3.2

05 Jul 22:38
7765d60

Choose a tag to compare

Bug Fixes:

  • (#528) (#529) setup.py has been modified to restrict pydantic version to >=1.9.2 and < 2.0a1

Minor Changes:

  • (#526) change captures the Mash stderr in a separate buffer ( Thanks @wasade for your contribution)