Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
109 changes: 83 additions & 26 deletions nitrite-fjall-adapter/src/store.rs
Original file line number Diff line number Diff line change
Expand Up @@ -244,12 +244,76 @@ impl FjallStoreInner {
}
}

/// Delay in milliseconds to wait for file system cleanup when recreating a deleted partition.
/// This allows Fjall's internal cleanup processes to complete before attempting to create
/// a new partition with the same name.
const PARTITION_CLEANUP_DELAY_MS: u64 = 50;

/// Helper function to check if an error indicates a partition was deleted
#[inline]
fn is_partition_deleted_error(err_msg: &str) -> bool {
err_msg.contains("not found") || err_msg.contains("deleted") || err_msg.contains("PartitionDeleted")
}

/// Opens a partition with retry logic for deleted partitions.
///
/// This handles the case where a partition was deleted (e.g., during index rebuild)
/// and needs to be recreated. When Fjall reports a partition is deleted, we simply
/// try to open it again, which will create a new partition.
///
Comment on lines +259 to +263
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation states "try to open it again" but the implementation actually tries to open it twice more (line 290 and line 302), for a total of 3 attempts. The documentation should clarify that up to 3 total attempts are made: initial attempt, immediate retry, and retry after delay.

Suggested change
///
/// This handles the case where a partition was deleted (e.g., during index rebuild)
/// and needs to be recreated. When Fjall reports a partition is deleted, we simply
/// try to open it again, which will create a new partition.
///
///
/// This handles the case where a partition was deleted (e.g., during index rebuild)
/// and needs to be recreated. When Fjall reports a partition is deleted, this method
/// will attempt to open the partition up to three times: the initial attempt, an immediate
/// retry if the partition was deleted, and a final retry after a brief delay if the second
/// attempt also fails. This ensures that transient file system cleanup issues are handled
/// gracefully and a new partition is created if needed.
///

Copilot uses AI. Check for mistakes.
/// Note: This method uses `std::thread::sleep` which blocks the current thread.
/// This is intentional as we need to wait for file system synchronization after
/// partition deletion. The sleep is brief (50ms) and only occurs on retry paths.
fn open_partition_with_retry(
&self,
ks: &Keyspace,
name: &str,
config: &fjall::PartitionCreateOptions,
) -> NitriteResult<fjall::PartitionHandle> {
// Clone config once to avoid multiple clones in retry paths
let config_clone = config.clone();

match ks.open_partition(name, config_clone.clone()) {
Comment on lines +273 to +276
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config parameter is cloned unnecessarily. Since config is already a reference, you can use it directly in the open_partition calls. The intermediate config_clone variable and its subsequent clones add overhead without providing any benefit. Consider removing line 274 and using config.clone() directly in the open_partition calls.

Copilot uses AI. Check for mistakes.
Ok(partition) => Ok(partition),
Err(err) => {
let err_msg = err.to_string();

// If partition was deleted, we need to recreate it
if Self::is_partition_deleted_error(&err_msg) {
log::warn!("Partition '{}' was deleted, recreating it", name);

// Clean up any stale references in the registry
self.map_registry.remove(name);

// Fjall's open_partition should create a new partition if it doesn't exist
// If it fails, it might be due to file system cleanup in progress, so retry once
match ks.open_partition(name, config_clone.clone()) {
Ok(partition) => Ok(partition),
Err(retry_err) => {
// If the retry also fails, wait a moment for file system cleanup
log::debug!("First retry failed, waiting briefly for cleanup: {}", retry_err);

// Block briefly to allow Fjall's file system cleanup to complete
std::thread::sleep(std::time::Duration::from_millis(
Self::PARTITION_CLEANUP_DELAY_MS
));

// Final attempt
ks.open_partition(name, config_clone)
.map_err(|e| {
log::error!("Failed to recreate partition '{}' after retries: {}", name, e);
to_nitrite_error(e)
})
Comment on lines +293 to +306
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retry logic unconditionally retries after sleep regardless of the error type. If the second attempt (line 290) fails for a reason other than PartitionDeleted (e.g., permission error, disk full), the code still sleeps and retries. Consider checking if retry_err is also a PartitionDeleted error before sleeping, or at minimum logging the error type to help diagnose non-transient failures.

Suggested change
// If the retry also fails, wait a moment for file system cleanup
log::debug!("First retry failed, waiting briefly for cleanup: {}", retry_err);
// Block briefly to allow Fjall's file system cleanup to complete
std::thread::sleep(std::time::Duration::from_millis(
Self::PARTITION_CLEANUP_DELAY_MS
));
// Final attempt
ks.open_partition(name, config_clone)
.map_err(|e| {
log::error!("Failed to recreate partition '{}' after retries: {}", name, e);
to_nitrite_error(e)
})
let retry_err_msg = retry_err.to_string();
if Self::is_partition_deleted_error(&retry_err_msg) {
log::debug!("First retry failed with PartitionDeleted, waiting briefly for cleanup: {}", retry_err);
// Block briefly to allow Fjall's file system cleanup to complete
std::thread::sleep(std::time::Duration::from_millis(
Self::PARTITION_CLEANUP_DELAY_MS
));
// Final attempt
ks.open_partition(name, config_clone)
.map_err(|e| {
log::error!("Failed to recreate partition '{}' after retries: {}", name, e);
to_nitrite_error(e)
})
} else {
log::error!(
"Failed to recreate partition '{}' on retry: {} (error type: {})",
name, retry_err, retry_err_msg
);
Err(to_nitrite_error(retry_err))
}

Copilot uses AI. Check for mistakes.
Comment on lines +293 to +306
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The immediate retry attempt (line 290) doesn't check if the error is a PartitionDeleted error before proceeding to the delayed retry. This means even non-transient errors (e.g., permission denied, invalid configuration) will trigger the sleep-and-retry path. Consider checking if retry_err indicates a transient PartitionDeleted error before proceeding to the delayed retry, otherwise fail fast for unrelated errors.

Suggested change
// If the retry also fails, wait a moment for file system cleanup
log::debug!("First retry failed, waiting briefly for cleanup: {}", retry_err);
// Block briefly to allow Fjall's file system cleanup to complete
std::thread::sleep(std::time::Duration::from_millis(
Self::PARTITION_CLEANUP_DELAY_MS
));
// Final attempt
ks.open_partition(name, config_clone)
.map_err(|e| {
log::error!("Failed to recreate partition '{}' after retries: {}", name, e);
to_nitrite_error(e)
})
// Only proceed to delayed retry if this is still a PartitionDeleted error
let retry_err_msg = retry_err.to_string();
if Self::is_partition_deleted_error(&retry_err_msg) {
log::debug!("First retry failed with PartitionDeleted, waiting briefly for cleanup: {}", retry_err);
// Block briefly to allow Fjall's file system cleanup to complete
std::thread::sleep(std::time::Duration::from_millis(
Self::PARTITION_CLEANUP_DELAY_MS
));
// Final attempt
ks.open_partition(name, config_clone)
.map_err(|e| {
log::error!("Failed to recreate partition '{}' after retries: {}", name, e);
to_nitrite_error(e)
})
} else {
log::error!("Failed to recreate partition '{}' on retry: {}", name, retry_err);
Err(to_nitrite_error(retry_err))
}

Copilot uses AI. Check for mistakes.
}
}
} else {
log::error!("Failed to open partition '{}': {}", name, err);
Err(to_nitrite_error(err))
}
}
}
}

fn initialize(&self, config: NitriteConfig) -> NitriteResult<()> {
// get_or_init() always returns a reference to the initialized value (or initial value if already initialized)
// The None case in pattern matching below is unreachable after get_or_init() completes successfully
Expand Down Expand Up @@ -412,30 +476,22 @@ impl FjallStoreInner {
}

if let Some(ks) = self.keyspace.get() {
match ks.open_partition(name, self.store_config.partition_config()) {
Ok(partition) => {
let fjall_map = FjallMap::new(
name.to_string(),
partition,
fjall_store,
self.store_config.clone(),
);
fjall_map.initialize()?;

self.map_registry
.insert(name.to_string(), fjall_map.clone());
Ok(NitriteMap::new(fjall_map))
}
Err(err) => {
// If partition was deleted, remove from cache and propagate error
let err_msg = err.to_string();
if Self::is_partition_deleted_error(&err_msg) {
self.map_registry.remove(name);
}
log::error!("Failed to open partition: {}", err);
Err(to_nitrite_error(err))
}
}
let config = self.store_config.partition_config();

// Try to open the partition - with retry logic for deleted partitions
let partition = self.open_partition_with_retry(&ks, name, &config)?;

let fjall_map = FjallMap::new(
name.to_string(),
partition,
fjall_store,
self.store_config.clone(),
);
fjall_map.initialize()?;

self.map_registry
.insert(name.to_string(), fjall_map.clone());
Ok(NitriteMap::new(fjall_map))
} else {
Err(NitriteError::new(
"Keyspace is not initialized",
Expand Down Expand Up @@ -475,11 +531,12 @@ impl FjallStoreInner {
}
}
Err(err) => {
// If partition doesn't exist, it might already be deleted
// This is acceptable - just ensure it's removed from registry
let err_msg = err.to_string();

// If partition doesn't exist or was already deleted, that's OK
if Self::is_partition_deleted_error(&err_msg) {
self.map_registry.remove(name);
log::debug!("Partition '{}' was already deleted", name);
Ok(())
} else {
log::error!("Failed to open partition for removal: {}", err);
Expand Down