Skip to content

Conversation

@georgeglidden
Copy link
Contributor

@georgeglidden georgeglidden commented Apr 29, 2025

Implements types::{BisectOptions, BisectResult} and the bisect methods in SufrSearch, SufrFile, and SuffixArray.

The bisect method finds the first and last positions of query strings as they occur in the suffix array. Using this range, the number of occurrences - the count - of each query is calculated without enumerating the range.

If bisect has already been run on a prefix of the query sequence, the BisectResult of the prefix can be passed into the query BisectOptions via the optional prefix_result field. In this case, the bisect of the query will only search within range of positions occupied by the prefix.

Example:

use anyhow::Result;
use libsufr::{types::{BisectOptions, BisectResult}, suffix_array::SuffixArray};

fn main() -> Result<()> {
    let mut suffix_array = SuffixArray::read("data/inputs/3.sufr", false)?;

    let queries = vec!["AACTG".to_string(), "AAAA".to_string()];
    let prefix_query = vec!["AA".to_string()];

    let prefix_opts = BisectOptions {
        queries: prefix_query,
        max_query_len: None,
        low_memory: false,
        prefix_result: None, 
        // search the entire suffix array for query "AA"
    };
    let prefix_result = suffix_array.bisect(prefix_opts)?;
    assert_eq!(
        prefix_result,
        vec![
            BisectResult {
                query_num: 0,
                query: "AA".to_string(), 
                count: 11,
                first_position: 1,
                last_position: 11,
            }
        ]
    );

    let query_opts = BisectOptions {
        queries: queries,
        max_query_len: None,
        low_memory: false,
        prefix_result: Some(prefix_result[0].clone()), 
        // search within [prefix_result.first_position..=prefix_result.last_position]
    };
    let query_result = suffix_array.bisect(query_opts)?;
    assert_eq!(
        query_result,
        vec![
            BisectResult {
                query_num: 0,
                query: "AACTG".to_string(),
                count: 1,
                first_position: 8,
                last_position: 8,
            }, 
            BisectResult {
                query_num: 1,
                query: "AAAA".to_string(),
                count: 4,
                first_position: 1,
                last_position: 4,
            }
        ]
    );
    
    Ok(())
}

… BisectResult in types, bisect in suffix_array, suffix_file suffix_search. supports bisecting with or without a prefix result, which specifies the range of suffixes to search. if a prefix result is not passed, the full range of suffixes is used.
…r n is the upper end of an open interval, not a closed one, so passing `n := high` would prevent `high` from being searched. avoided by passing `n := high + 1`.
@georgeglidden georgeglidden requested a review from kyclark April 29, 2025 21:14
@georgeglidden georgeglidden changed the title Bisect for fast counting and Bisect for fast counting Apr 29, 2025
@georgeglidden georgeglidden requested a review from jackroddy April 30, 2025 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants