Problem: select the right samples for binning abundance profiling.
This is an expensive step, and very storage intensive. Choosing optimal samples (contain overlapping genomes at different abundances) will improve accuracy and speed computation.
GATTACA is a tool using MinHashes to try and partition representative genomes: http://www.biorxiv.org/content/biorxiv/early/2017/04/26/130997.full.pdf