-
Notifications
You must be signed in to change notification settings - Fork 30
Fix crashes from dispatching to a nil queue
#2138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix crashes from dispatching to a nil queue
#2138
Conversation
WalkthroughA systematic replacement of direct GCD calls with Changes
Sequence Diagram(s)sequenceDiagram
participant App as Application
participant ARTRest as ARTRest
participant ARTPush as ARTPush
participant PushInt as ARTPushInternal
participant MainQ as Main Queue
participant InternalQ as ARTPushInternal._queue
App->>ARTPush: +didRegisterForRemoteNotificationsWithDeviceToken:rest:
ARTPush->>PushInt: forward registration (internal handler)
PushInt->>ARTRest: ensure rest exists (extended lifetime)
PushInt->>MainQ: art_dispatch_async(main)
MainQ->>InternalQ: art_dispatch_async(_queue)
InternalQ->>PushInt: createActivationStateMachine (may be nil)
PushInt-->>ARTPush: callback with ARTPushActivationStateMachine (_Nullable)
Note over PushInt,InternalQ: nil-guard + extended lifetime ensure safe creation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~40 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
e126916 to
72b51c4
Compare
We've had a customer report a crash that appears to be due to us trying to submit a block to a nil queue, which results in an EXC_BAD_ACCESS that's not very easy to interpret. Throw an exception with a helpful message in this case instead, to help with debugging in the future.
This is consistent with the approach used elsewhere in the codebase and prevents accidentally dispatching to a `nil` queue (which causes a crash) in the case where our weakly-referenced _rest gets deallocated. This is a partial speculative fix for an infrequent crash reported by a customer in [1]. [1] https://ably-real-time.slack.com/archives/C09BK2DB6H0/p1761923714781609
72b51c4 to
e67b08b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Jira integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (32)
CONTRIBUTING.md(1 hunks)Source/ARTAuth.m(8 hunks)Source/ARTChannel.m(3 hunks)Source/ARTChannels.m(5 hunks)Source/ARTConnection.m(3 hunks)Source/ARTEventEmitter.m(5 hunks)Source/ARTGCD.m(1 hunks)Source/ARTHTTPPaginatedResponse.m(3 hunks)Source/ARTLog.m(2 hunks)Source/ARTOSReachability.m(2 hunks)Source/ARTPaginatedResult.m(3 hunks)Source/ARTPush.m(4 hunks)Source/ARTPushActivationStateMachine.m(12 hunks)Source/ARTPushAdmin.m(2 hunks)Source/ARTPushChannel.m(6 hunks)Source/ARTPushChannelSubscriptions.m(7 hunks)Source/ARTPushDeviceRegistrations.m(9 hunks)Source/ARTQueuedDealloc.m(2 hunks)Source/ARTRealtime.m(4 hunks)Source/ARTRealtimeAnnotations.m(6 hunks)Source/ARTRealtimeChannel.m(11 hunks)Source/ARTRealtimeChannels.m(2 hunks)Source/ARTRealtimePresence.m(15 hunks)Source/ARTRest.m(8 hunks)Source/ARTRestAnnotations.m(5 hunks)Source/ARTRestChannel.m(4 hunks)Source/ARTRestPresence.m(5 hunks)Source/ARTURLSessionServerTrust.m(2 hunks)Source/ARTWebSocketTransport.m(1 hunks)Source/PrivateHeaders/Ably/ARTGCD.h(2 hunks)Source/PrivateHeaders/Ably/ARTPush+Private.h(0 hunks)Test/AblyTests/Tests/GCDTests.swift(1 hunks)
💤 Files with no reviewable changes (1)
- Source/PrivateHeaders/Ably/ARTPush+Private.h
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: check (tvOS, test_tvOS18_4)
- GitHub Check: check (iOS, test_iOS18_4)
- GitHub Check: check (macOS, test_macOS)
🔇 Additional comments (25)
Source/ARTURLSessionServerTrust.m (1)
2-2: LGTM! GCD wrapper migration improves nil-queue safety.The import and wrapper usage align with the PR's objective to prevent crashes from nil queue dispatches.
Also applies to: 41-43
Source/ARTQueuedDealloc.m (1)
2-2: LGTM! Dealloc dispatch now protected against nil queue.The wrapper ensures safe cleanup even if the queue is unexpectedly nil.
Also applies to: 20-22
Source/ARTOSReachability.m (1)
11-11: LGTM! Reachability callback dispatch now validated.The wrapper adds safety to the callback path while maintaining the weak-strong pattern.
Also applies to: 49-53
Source/ARTRestChannel.m (1)
23-23: LGTM! Comprehensive GCD wrapper migration across all channel operations.All callback and processing paths are now protected against nil queues, covering history, status, and message publishing operations.
Also applies to: 170-173, 177-226, 234-288, 295-387
Source/ARTPushAdmin.m (1)
9-9: LGTM! Push admin publish path now protected.The wrapper migration covers both the callback handler and main publish logic.
Also applies to: 61-95
Source/ARTWebSocketTransport.m (1)
208-210: LGTM! WebSocket opening dispatch now validated.The wrapper protects the critical websocket open operation on the dedicated
_websocketOpenQueue.Source/ARTChannel.m (1)
13-13: LGTM! Channel options accessors now protected.The synchronous wrappers ensure thread-safe access to options while validating the queue.
Also applies to: 39-43, 50-53
Source/ARTConnection.m (1)
10-10: LGTM! All connection property accessors now protected.The synchronous wrappers ensure thread-safe access to connection state (id, key, state, errorReason, recoveryKey) while validating the queue.
Also applies to: 137-299
Test/AblyTests/Tests/GCDTests.swift (1)
15-15: LGTM: Type annotation improves code clarity.The explicit type annotation makes the variable type clear without changing behavior.
Source/ARTLog.m (1)
3-3: LGTM: GCD wrapper adoption improves nil-queue safety.The replacement of
dispatch_syncwithart_dispatch_syncadds validation to prevent crashes from nil queues while preserving the existing logging behavior.Also applies to: 95-95
CONTRIBUTING.md (1)
70-73: LGTM: Clear documentation of the new coding standard.The addition of the coding standards section effectively communicates the requirement to use
art_dispatch_syncandart_dispatch_asyncwrappers, along with the rationale for improved debuggability.Source/ARTRestAnnotations.m (1)
19-19: LGTM: Consistent adoption of GCD wrappers throughout.All
dispatch_asynccalls have been systematically replaced withart_dispatch_async, maintaining the existing callback and queue behavior while adding nil-queue validation.Also applies to: 124-124, 158-158, 233-233, 247-247
Source/ARTHTTPPaginatedResponse.m (1)
13-13: LGTM: Pagination callbacks now use validated dispatch wrappers.The replacement of
dispatch_asyncwithart_dispatch_asyncin both pagination methods adds nil-queue protection to user callback invocations.Also applies to: 59-59, 72-72
Source/ARTPaginatedResult.m (1)
10-10: LGTM: Consistent pagination callback protection.The
art_dispatch_asyncwrapper usage in bothfirst:andnext:methods ensures nil-queue validation for user callbacks.Also applies to: 116-116, 131-131
Source/ARTRealtimeAnnotations.m (1)
19-19: LGTM: Comprehensive GCD wrapper adoption across all dispatch points.All
dispatch_asyncanddispatch_synccalls have been replaced with theirart_dispatch_*counterparts throughout the file, covering user callbacks, queue synchronization, and subscription management. The changes preserve existing threading semantics while adding nil-queue validation.Also applies to: 116-116, 143-143, 206-206, 214-214, 278-278, 289-289, 296-296
Source/ARTChannels.m (1)
7-7: LGTM: All synchronous dispatch calls now use validated wrappers.The systematic replacement of
dispatch_syncwithart_dispatch_syncacross channel management methods adds nil-queue protection while maintaining the existing synchronous execution semantics.Also applies to: 30-30, 46-46, 65-65, 76-76
Source/ARTPushChannelSubscriptions.m (1)
12-12: LGTM! Systematic replacement with defensive wrappers.The replacement of
dispatch_asyncwithart_dispatch_asyncthroughout this file adds nil-queue validation without changing behavior. This defensive approach will help catch the nil queue crashes reported in the PR early with a clear exception message.Also applies to: 70-70, 82-82, 117-117, 123-123, 139-139, 145-145, 162-162, 168-168, 188-188, 194-194
Source/ARTRealtimeChannels.m (1)
9-9: LGTM! Critical path now protected.The replacement in the channel release flow adds important nil-queue validation at critical synchronization points. The changes preserve existing behavior while adding defensive checks.
Also applies to: 105-105, 111-111
Source/ARTRestPresence.m (1)
13-13: LGTM! Presence operations protected.Consistent application of the art_dispatch_async wrapper for presence query and history operations. The defensive nil-queue checks are appropriate for these async callback paths.
Also applies to: 128-128, 159-159, 174-174, 221-221
Source/ARTGCD.m (1)
73-87: Core implementation looks solid.These wrapper functions provide the nil-queue validation that addresses the crash reported in the PR. The implementation is straightforward:
- Clear validation with
if (!queue)check- Appropriate exception type (
NSInvalidArgumentException)- Descriptive error messages identifying the function
- Minimal overhead (just a nil-check before delegating to system functions)
This defensive approach will convert silent crashes into clear, debuggable exceptions.
Source/ARTPushDeviceRegistrations.m (1)
13-13: LGTM! Push device operations protected.Systematic application of art_dispatch_async throughout device registration operations. Given that the PR mentions customer crashes potentially related to push notifications, these protections are particularly relevant.
Also applies to: 71-71, 83-83, 125-125, 137-137, 177-177, 183-183, 200-200, 206-206, 238-238, 250-250
Source/ARTPushActivationStateMachine.m (1)
17-17: Critical area now protected.The push activation state machine is protected with the new wrappers throughout. This is particularly important since the PR description indicates the author reproduced a crash in this area when activating push notifications. The use of
dispatch_get_main_queue()at line 424 is safe since system queues cannot be nil.Also applies to: 83-83, 91-91, 103-103, 114-114, 173-173, 236-236, 285-285, 344-344, 389-389, 400-400, 411-411, 424-424
Source/ARTEventEmitter.m (2)
88-88: LGTM! Event emitter dispatch paths protected.Consistent application of art_dispatch_async and art_dispatch_sync wrappers throughout the event emitter callback and synchronization paths. These are critical paths for event delivery where nil-queue validation is essential.
Also applies to: 345-345, 362-362, 379-379, 396-396, 352-352, 369-369, 386-386, 403-403, 410-410, 416-416, 422-422
427-429: Verification confirms the implementation is correct.The single caller at ARTRealtimeChannels.m:119 executes
off_nosyncwithinart_dispatch_sync(_queue, ^{ ... }), where_queueis set torealtime.rest.queue—the same queue used by ARTRealtimeChannel. All callback paths from_detach:execute on this queue (either immediately or via_detachedEventEmitter, which is initialized with the channel's queue). The synchronization requirement is properly satisfied.Source/ARTRealtime.m (1)
72-72: LGTM! Connection lifecycle protected.The critical connection lifecycle methods (
internalAsync,internalSync,connect,close,ping) now use the nil-queue validation wrappers. This is essential for the realtime connection management flow where queue initialization issues could lead to the crashes described in the PR.Also applies to: 78-78, 455-455, 473-473, 525-525, 531-531
This solves a potential crash similar to that described in 380c82a, this time resulting from potentially creating a push activation state machine with a nil `rest` and thus a nil queue, causing a crash in -[ARTAuth setLocalDeviceClientId_nosync]'s call to -sendEvent:.
e67b08b to
09eb2aa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
♻️ Duplicate comments (1)
Source/ARTPush.m (1)
139-147: Guard against nil legacy delegate and ensure protocol conformance.Line 141 fetches
UIApplication.sharedApplication.delegatebut doesn't verify it's non-nil or conforms toARTPushRegistererDelegatebefore passing it tocreateActivationStateMachineWithDelegate:on line 146. If the delegate is nil or doesn't implement the protocol, the activation state machine may be created in an invalid state or fail unexpectedly.Apply this diff to properly type-check and guard against nil:
art_dispatch_async(dispatch_get_main_queue(), ^{ // -[UIApplication delegate] is an UI API call, so needs to be called from main thread. - const id legacyDelegate = UIApplication.sharedApplication.delegate; + const id<ARTPushRegistererDelegate, NSObject> legacyDelegate = (id<ARTPushRegistererDelegate, NSObject>)UIApplication.sharedApplication.delegate; // After this dispatch to the main queue, I believe there is no longer any mechanism guaranteed to be keeping _rest alive, hence our extendedLifetimeRest variable, which keeps _rest alive for long enough to ensure that when createActivationStateMachineWithDelegate creates the state machine, it passes it a non-nil _rest, as its initializer's contract requires. art_dispatch_async(self->_queue, ^{ - callbackWithUnlock([self createActivationStateMachineWithDelegate:legacyDelegate]); + callbackWithUnlock(legacyDelegate ? [self createActivationStateMachineWithDelegate:legacyDelegate] : nil); extendedLifetimeRest = nil; }); });
🧹 Nitpick comments (1)
Source/ARTRest.m (1)
738-740: Fix indentation.The
art_dispatch_asynccall is not properly indented, breaking code formatting consistency.Apply this diff to fix the indentation:
- -art_dispatch_async(_queue, ^{ - [ARTPaginatedResult executePaginated:self withRequest:request andResponseProcessor:responseProcessor wrapperSDKAgents:wrapperSDKAgents logger:self.logger callback:callback]; -}); + + art_dispatch_async(_queue, ^{ + [ARTPaginatedResult executePaginated:self withRequest:request andResponseProcessor:responseProcessor wrapperSDKAgents:wrapperSDKAgents logger:self.logger callback:callback]; + });
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Jira integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
Source/ARTAuth.m(9 hunks)Source/ARTPush.m(8 hunks)Source/ARTRest.m(9 hunks)Source/PrivateHeaders/Ably/ARTPush+Private.h(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- Source/ARTAuth.m
- Source/PrivateHeaders/Ably/ARTPush+Private.h
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: check
- GitHub Check: check (tvOS, test_tvOS18_4)
- GitHub Check: check (macOS, test_macOS)
- GitHub Check: check (iOS, test_iOS18_4)
- GitHub Check: check
- GitHub Check: build
umair-ably
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
maratal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This contains two speculative fixes for a customer-reported crash (internal discussion) resulting from us dispatching to a
nilqueue.They are speculative in that I was able to cause a crash in the area of code that our customer's crash reports indicated, but probably not via the same sequence of events that caused the crash for their users (I caused a crash by activating push notifications, but the customer is not using push notifications; it is however possible that their crash comes from the
-setLocalDeviceClientId_nosync:method — which invokes the-getActivationMachinemethod too — instead, but their stack trace was not able to confirm this and I couldn't think of a simple way to try reproducing.)Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Refactor
API