-
Notifications
You must be signed in to change notification settings - Fork 299
fix: BLE interface refactoring with race condition fixes and improved shutdown #831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The previous implementation of the BLEInterface could hang on exit or when interrupted, often requiring a second Ctrl-C to terminate. This was caused by race conditions and a lack of timeouts in the shutdown process. This commit addresses the issue by: - Making `BLEInterface.close()` idempotent to prevent multiple disconnection attempts. - Simplifying the `atexit` handler to call `close()` directly. - Adding an explicit timeout to the `BLEClient.disconnect()` call to prevent indefinite hangs. - Adding focused tests to verify the new shutdown behavior, including idempotency and graceful error handling.
This change introduces an automatic reconnection feature to the BLE interface to improve stability for long-running applications. - A new `_connection_monitor` background task now manages the BLE connection lifecycle. - The monitor will automatically attempt to reconnect with an exponential backoff strategy if the connection is lost. - The `auto_reconnect` feature is enabled by default but can be disabled via a new parameter in the `BLEInterface` constructor. - The `__init__` and `close` methods have been refactored to correctly manage the new async task and its event loop, ensuring a robust and graceful shutdown. - The `BLEClient` wrapper has been removed, and its async bridging logic has been integrated directly into `BLEInterface`. - New tests have been added to validate the auto-reconnect functionality.
This commit addresses two primary issues with the BLE interface: 1. The application would hang on exit, requiring multiple interrupts. 2. There was no mechanism for long-running applications to handle disconnections without the entire interface shutting down. Changes: - The `BLEInterface.close()` method is now idempotent and includes a timeout for the disconnect operation to prevent hangs. - The `atexit` handler has been simplified to call `close()` directly. - A new `auto_reconnect` parameter has been added to `BLEInterface.__init__`. - If `True` (default), on disconnect, the interface will clean up the client but remain active, allowing the application to handle reconnection. - If `False`, the interface will call `close()` as before. - A new example file, `examples/reconnect_example.py`, demonstrates how to implement a client-side reconnection loop.
- Fix race condition in _receiveFromRadioImpl and _sendToRadioImpl by using local client variables - Update reconnect_example.py to include interface info in logs and re-raise exceptions - Move close() invocation off Bleak loop thread to prevent blocking disconnects - Fix import order and missing final newline in reconnect_example.py
…tency - Fix disconnect callback to properly handle BleakClient and close BLEClient wrapper - Add meshtastic.connection.status messages with connected boolean to align with example - Improve reconnect_example.py exception handling to continue loop instead of exiting - Prevent thread leaks by properly closing BLE client wrapper in background thread - Fix AttributeError in disconnect callback by using correct client.address attribute Critical fixes that ensure the BLE reconnection feature works correctly and prevents resource leaks during disconnection events.
- Fix race condition between BLEInterface.close() and _on_ble_disconnect() by checking self._closing flag to prevent concurrent BLEClient.close() calls - Pin protobuf to ^4.25.3 and types-protobuf to ^4.25.3 for compatibility - Prevent RuntimeError from concurrent access to BLEClient event loop - Ensure close() has exclusive control over shutdown process once initiated Critical fixes that prevent installation failures and runtime race conditions during BLE interface disconnection scenarios.
|
|
…rce leaks and eliminate code duplication - Add proper interface cleanup in BLEError handler to prevent resource leaks - Centralize retry logic to eliminate code duplication across exception handlers - Use try/except/finally pattern to ensure consistent cleanup - Add sys import for exception handling
- Add check to verify disconnect callback matches current client before cleanup - Use getattr with default None to safely access bleak_client attribute - Add debug logging when ignoring stale disconnect events - Prevents accidental shutdown of freshly reconnected sessions when old disconnect events fire
- Remove fragile sys.exc_info() usage outside except blocks - Leverage idempotent iface.close() to simplify cleanup logic - Remove complex conditional logic for retry determination - Make retry logic straightforward - always retry except on KeyboardInterrupt - Improve code readability and maintainability significantly
BLE Interface Fix: - Keep receive thread alive when auto_reconnect=True and client is None - Add 1-second sleep to avoid busy-waiting while waiting for reconnection - Allow same BLEInterface instance to be reused for reconnections - Only terminate receive thread when auto_reconnect=False or explicit close Test Setup Refactoring: - Replace manual sys.modules manipulation with pytest fixtures - Use monkeypatch for proper test isolation and scoping - Create separate fixtures for serial, pubsub, tabulate, bleak, and bleak.exc - Move imports to local scope to avoid import-time dependency issues - Follow pytest best practices for better test maintainability
- Fix AttributeError in __repr__ method by using self.address instead of self.client.address - Prevent duplicate connection status events when auto_reconnect=True - Improve device address matching using _sanitize_address for better BLE device lookup - Replace magic number retry delay with named constant in reconnect example - Upgrade exception handling to use logging.exception for better error tracking - Reuse original exception message in TimeoutError to avoid linting issues - Remove unused variables and trailing whitespace to pass pylint checks - Refactor test setup to use proper pytest fixtures with monkeypatch All tests pass and code quality score improved to 9.93/10.
- Fix critical UnboundLocalError risk in BLE receive loop by adding break statements - Replace lambdas with named stubs in tests to satisfy Ruff ARG005 - Remove f-string without placeholders to fix Ruff F541 - Make _sanitize_address a staticmethod to remove pylint disable comment - Simplify TimeoutError message to avoid TRY003 hint - Clean up trailing whitespace and unnecessary else clauses All tests continue to pass and code quality is improved.
- Fix auto-reconnect being broken by exception handlers that permanently shut down receive loop - When auto_reconnect=True, keep receive loop alive on disconnect by clearing client and continuing - Only tear down receive loop when auto_reconnect=False (preserve existing behavior) - Fix missing disconnect notification on manual close when auto_reconnect=True - Add _disconnect_notified flag to track one-shot disconnect notifications - Ensure manual close always emits connection status update regardless of auto_reconnect - Reset notification flag on new connection establishment - Clean up unnecessary else clauses for better pylint compliance These fixes ensure auto-reconnect works reliably and disconnect notifications are properly emitted.
- Add argparse support to reconnect_example.py for user-friendly device address input - Replace inefficient time.sleep(1) polling with threading.Event for better performance - Add _reconnected_event to BLEInterface for efficient reconnection signaling - Update receive loop to wait on event instead of fixed sleep intervals - Set event when connection is established, clear when disconnect occurs - Improves reconnection responsiveness and reduces unnecessary CPU wake-ups - Maintains backward compatibility while significantly improving efficiency Example now accepts device address as command line argument instead of hardcoded value.
- Convert auto_reconnect parameter to keyword-only argument with * - Reduces positional arguments from 6 to 5 to satisfy pylint R0917 - Maintains backward compatibility while improving code quality - Achieves perfect 10.00/10 pylint score
…bility - Add _handle_read_loop_disconnect() helper method to eliminate code duplication - Add timeout=1.0 to reconnection waits to prevent shutdown hangs - Simplify BleakDBusError and BleakError exception handlers using helper method - Improve code maintainability while preserving functionality - Address code review suggestions for cleaner exception handling
- Fix docstring in reconnect_example.py to avoid starting with "This" - Update description to clarify BLE-specific functionality - Apply consistent formatting improvements across BLE interface files - Clean up import ordering and line breaks for better readability - Standardize docstring formatting and code structure
…eters - Add detailed docstring to BLEInterface.__init__() method - Document all parameters including address, noProto, debugOut, noNodes - Provide detailed explanation of auto_reconnect parameter behavior - Explain when auto_reconnect=True vs False and how it affects application behavior - Follow existing documentation pattern from SerialInterface for consistency - Address code review feedback by making auto_reconnect behavior clear
- Move logging configuration from import-time to main() in reconnect_example.py - Replace print() statements with logger.info() calls for consistency - Use proper logger formatting with placeholders instead of f-strings - Rename unused address parameter to _address to satisfy Ruff linting - Add assertion to verify exactly one disconnect status is published - Harden disconnect callback's background close with exception logging - Narrow broad exception handling in write method for better diagnostics - Improve error handling and logging consistency across BLE interface
- Add comprehensive comment explaining the time.sleep(0.01) and should_read polling - Document that this is a workaround for device firmware notification inconsistencies - Explain the ideal communication flow vs current implementation - Provide context for why manual polling is needed for robustness - Address code review suggestion about clarifying the polling mechanism
…nism" This reverts commit 26de6b5.
Add methods to properly wait for disconnect notifications, drain publish queue, and ensure pubsub binding. Introduce is_connected method for BLEClient to reliably check connection state without relying on error messages.
- Fix misleading auto_reconnect docstring to clarify event emission behavior - Restore helpful error message for BLE write errors with permission/pairing hints - Create _register_notifications helper method to centralize notification setup - Ensure notifications are re-registered after reconnection to fix critical bug - Add pubsub binding before disconnect events to prevent rare races - Implement fallback disconnect notification in read loop for robustness - Enhance read error diagnostics with original exception message - Fix indentation and syntax issues for clean lint pass All changes maintain backward compatibility while addressing the critical notification re-registration bug that prevented proper operation after disconnect/reconnect cycles.
- Replace polling wait(timeout=1.0) with blocking wait() in _receiveFromRadioImpl to reduce CPU usage while waiting for reconnection - Add _reconnected_event.set() in close() method to wake up receive thread for clean shutdown when blocking wait is used These changes improve efficiency by eliminating unnecessary polling and ensure clean thread termination during shutdown.
Refactor BLE interface to use dynamic pubsub import and remove static binding. Tighten exception handling to specific types for better error management. Simplify error messages and remove unnecessary notification registrations to enhance shutdown reliability and reduce potential crashes during disconnection.
Move iface = None initialization inside the while loop to ensure proper resource management for each connection attempt. This prevents the finally block from incorrectly closing the previous interface instance when a new connection attempt fails. Fixes code review issue regarding confusing cleanup behavior.
Add _safe_close_client helper method to safely close BLEClient wrappers with proper exception handling. Update _handle_read_loop_disconnect to accept previous_client parameter and ensure deterministic cleanup of the client wrapper before clearing self.client. This prevents thread leaks when the read loop clears self.client before Bleak fires the disconnect callback, ensuring the BLEClient thread is properly closed instead of relying solely on the disconnect callback. Fixes critical thread leak issue in BLE disconnection handling.
- Add BLE_SCAN_TIMEOUT, RECEIVE_WAIT_TIMEOUT, EMPTY_READ_RETRY_DELAY - Add EMPTY_READ_MAX_RETRIES, SEND_PROPAGATION_DELAY, CONNECTION_TIMEOUT - Improve code readability and maintainability - Fix syntax error in retry logic indentation
- Replace incorrect response.items() usage with response.values() - Fix service UUID filtering to properly access AdvertisementData - The original code tried to access service_uuids on a tuple instead of AdvertisementData - The reviewer's suggestion was incorrect for bleak 1.1.1 API structure - Correct implementation unpacks (BLEDevice, AdvertisementData) tuples properly
- Fix race condition in close() method for _disconnect_notified flag by using atomic check-and-set under _client_lock - Add None guard for service_uuids in scan method using getattr to prevent AttributeError - Add documentation comment about executing queued publishes inline during close - Centralize long error text for exceptions (Ruff TRY003) by defining constants at module level - Add is_connected method to mock BleakClient in tests to match bleak 1.1.1 API - Clean up trailing whitespace and remove unnecessary else after continue (Ruff C0303, R1724) All changes maintain compatibility with existing functionality while addressing code review feedback.
Replace broad exception catching with specific exception types throughout the BLE interface to provide better diagnostics while maintaining robustness. Changes include: - Receive thread: Replace broad Exception with specific exceptions (AttributeError, TypeError, ValueError, RuntimeError, OSError, DecodeError) - Client close operations: Separate RuntimeError and OSError with detailed context-aware logging - Send operations: Distinguish BleakError from RuntimeError and OSError with specific error messages - Disconnect handling: Create specific exception categories for different failure scenarios - Notification flushing: Handle RuntimeError and ValueError separately with context - Publish queue draining: Separate callback execution failures from queue processing failures - Connection state checks: Handle different types of connection check failures separately Add comprehensive tests covering all exception handling improvements to ensure robustness and maintainability.
Implement code review suggestions to improve BLE interface robustness and compatibility: - Fix FROMNUM notification handling to prevent struct errors from malformed payloads - Update scan() method to handle different BleakScanner.discover return types across versions - Use BLE_SCAN_TIMEOUT constant in log message instead of hardcoded value - Add service fetching fallback in has_characteristic method for better robustness - Add get_services() method to support service discovery fallback These changes improve error handling, compatibility across Bleak versions, and overall robustness of the BLE interface.
- Remove programming error exceptions from BLE receive thread exception handling - Simplify pubsub import by replacing lazy loading with direct import - Update tests to use new pubsub import pattern and verify correct exception handling - Ensure AttributeError, TypeError, ValueError bubble up to expose bugs - Keep RuntimeError, OSError, DecodeError handling for expected external factors
…testing - Replace broad Exception catch with specific exceptions (TimeoutError, BleakError) in has_characteristic method - Improve test_receive_thread_specific_exceptions to test actual exception handling logic in _receiveFromRadioImpl thread - New integration test mocks client.read_gatt_char to raise exceptions and verifies close() method is called - Provides stronger guarantees that thread's error handling works as intended
- Fix variable capture bug in mock_close closure using default arguments - Replace bare except Exception: pass with proper exception logging - Add malformed notification counter to from_num_handler with warning threshold
- Move 'from pubsub import pub' from line 53 to line 50 with other imports - Follows PEP 8 guidelines for import organization - No circular import issues found between modules
- Add BLEInterface.BLEError to the outer exception handler in receive loop - Prevents silent thread termination when BleakError occurs during reads - Ensures proper cleanup and interface shutdown on fatal BLE read errors - Addresses high-priority code review feedback
- Combine separate RuntimeError and OSError handlers into single blocks - Reduce code duplication and improve readability in disconnect and client close handling - Maintain same functionality with unified error messages - Addresses medium-priority code review feedback
- Add context to BLEInterface.BLEError re-raising in connection handling - Change TimeoutError to BLEInterface.BLEError in async_await for consistency - Improves exception handling patterns and library consistency - Addresses medium-priority code review feedback
- Change TimeoutError to BLEInterface.BLEError in disconnect timeout handling - client.disconnect() via BLEClient.async_await raises BLEInterface.BLEError on timeout - Fixes high-priority bug where disconnect timeouts were unhandled - Ensures proper timeout warning logging during shutdown
- Replace client.discover() with client.get_services() to properly populate GATT services - Eliminates 5-10 second unnecessary delay and ensures reliable characteristic checks - Add ERROR_CONNECTION_FAILED and ERROR_ASYNC_TIMEOUT module-level constants - Update raise statements to use constants for consistency with codebase pattern - Addresses critical performance issue and medium-priority code review feedback
- Combine duplicate BleakError, RuntimeError, and OSError handlers in _sendToRadioImpl - Fix exception handling in has_characteristic to catch BLEInterface.BLEError instead of TimeoutError - Maintain specific debug logging for each exception type while reducing code duplication
- Move 'from pubsub import pub' from after local imports to after third-party imports - Maintains PEP 8 compliance: standard library, third-party, then local imports
- Extract magic number 10 into MALFORMED_NOTIFICATION_THRESHOLD constant - Extract disconnect and close client logic into _disconnect_and_close_client helper method - Change exception handling in _drain_publish_queue to catch generic Exception for better robustness
- Fix critical AttributeError by moving _disconnect_and_close_client method from BLEClient to BLEInterface class - Verify bleak 1.1.1 behavior: discover() with return_adv=True returns dict, so keep dict handling (comment was incorrect) - Simplify _sanitize_address method by removing redundant else and updating docstring to mention whitespace stripping
- Wrap _disconnect_notified flag reset with _client_lock to prevent race condition - Fix docstring formatting and style issues (D300, D401, D417, D400, D415) - Ensure atomic access to disconnect notification state
- Add try/except block in connect method to ensure client.close() on failure - Prevents accumulation of leaked event loop threads in long-running applications - Includes proper error logging with exception information for debugging
- Remove redundant bytes() constructors in struct.unpack and read_gatt_char calls - Change self._sanitize_address to BLEInterface._sanitize_address for static method clarity - Improve performance by avoiding unnecessary data copies in frequently called handlers
- Add BleakClient type hint to _on_ble_disconnect callback parameter - Change _sanitize_address docstring from single quotes to triple quotes for PEP 257 compliance - Improve code clarity and static analysis support
- Move _disconnect_notified flag reset to else clause for better state consistency - Flag now only resets on successful connection, not on connection failures - Prevents inconsistent state when connection setup fails
- Update test assertion from "Value error in deferred publish callback" to "Error in deferred publish callback" - Ensures test correctly validates the actual log message format
- Add EVENT_THREAD_JOIN_TIMEOUT constant (2.0s) for event thread join - Modify BLEClient.close() to use timeout when joining event thread - Add warning log if event thread doesn't exit within timeout - Prevents indefinite hangs during BLE client shutdown
- Replace specific exception handling with generic Exception in receive thread to prevent silent crashes - Replace specific exception handling with generic Exception in MeshInterface.close() to ensure cleanup always completes - Prevents resource leaks and ensures proper shutdown even with unexpected exceptions
- Add test_auto_reconnect_behavior() to verify auto_reconnect=True behavior - Test ensures BLEInterface.close() is NOT called when auto_reconnect=True - Verify connection status event is published with connected=False - Confirm internal client is cleaned up properly - Ensure receive thread remains alive for reconnection - Add mock_publishing_thread fixture to properly mock publishingThread.queueWork - Enhance DummyClient with bleak_client attribute for proper testing - Fix test isolation to prevent shared state between test runs
Author
|
This is a WIP. I wasn't aware I opened a PR upstream. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Changes
Testing