Skip to content

Conversation

@lawrence-forooghian
Copy link
Collaborator

@lawrence-forooghian lawrence-forooghian commented Oct 31, 2025

This contains two speculative fixes for a customer-reported crash (internal discussion) resulting from us dispatching to a nil queue.

They are speculative in that I was able to cause a crash in the area of code that our customer's crash reports indicated, but probably not via the same sequence of events that caused the crash for their users (I caused a crash by activating push notifications, but the customer is not using push notifications; it is however possible that their crash comes from the -setLocalDeviceClientId_nosync: method — which invokes the -getActivationMachine method too — instead, but their stack trace was not able to confirm this and I couldn't think of a simple way to try reproducing.)

Summary by CodeRabbit

  • New Features

    • Added public push registration lifecycle methods and a device access queue API
    • Added a non-synchronized variant for event unsubscription
  • Bug Fixes

    • Added nil-queue validation for dispatch operations to prevent crashes
  • Documentation

    • Updated contributing guidelines with dispatch wrapper standards
  • Refactor

    • Replaced direct dispatch calls across the codebase with safety-checked dispatch wrappers; adjusted related nullability and callback handling
  • API

    • Minor presence-related constructor/signature updates for presence-sync events

@coderabbitai
Copy link

coderabbitai bot commented Oct 31, 2025

Walkthrough

A systematic replacement of direct GCD calls with art_dispatch_sync/art_dispatch_async wrappers across the codebase, plus the ARTGCD wrapper implementation, small nullability and nil-guard adjustments, ARTPush API/implementation restructuring for activation state machine handling, and a new off_nosync method in ARTEventEmitter.

Changes

Cohort / File(s) Summary
ARTGCD Wrapper Implementation
Source/PrivateHeaders/Ably/ARTGCD.h, Source/ARTGCD.m
Add art_dispatch_sync and art_dispatch_async that validate queue non-nil and delegate to GCD; header gains nullability scope.
Global Dispatch Replacements
Source/ARTAuth.m, Source/ARTRealtime.m, Source/ARTRest.m, Source/ARTChannel.m, Source/ARTChannels.m, Source/ARTConnection.m, Source/ARTLog.m
Replace dispatch_sync/dispatch_async with art_dispatch_sync/art_dispatch_async; add ARTGCD.h imports where needed. No public API signature changes (except ARTRest device queue addition noted below).
Realtime & Channels
Source/ARTRealtimeChannel.m, Source/ARTRealtimeChannels.m, Source/ARTRealtimePresence.m, Source/ARTRealtimeAnnotations.m, Source/ARTRealtimeChannel.m
Replace GCD calls with art wrappers across channel, presence, annotation flows; presence event constructor changed from -initWithPresenceSyncState: to +newWithPresenceSyncState: and usages updated.
EventEmitter & Logging
Source/ARTEventEmitter.m, Source/ARTLog.m
Replace dispatch calls with art wrappers; add public - (void)off_nosync in ARTEventEmitter.
Push Subsystem Restructure & API Additions
Source/ARTPush.m, Source/PrivateHeaders/Ably/ARTPush+Private.h
ARTPush: add multiple public class methods for registration/failure (remote/location × rest/realtime), remove queue property from ARTPushInternal, add private _queue ivar initialized from rest.queue, change getActivationMachine: callback to accept nullable machine and add extended-lifetime/nil-guard logic. Removed deprecated initializer declaration.
Push Components (channels, admin, subscriptions, device regs)
Source/ARTPushAdmin.m, Source/ARTPushChannel.m, Source/ARTPushChannelSubscriptions.m, Source/ARTPushDeviceRegistrations.m, Source/ARTPushActivationStateMachine.m
Replace dispatch calls with art wrappers; adjust userQueue/main queue callback dispatches to use art wrappers; minor nullability annotations applied in activation callbacks.
REST Layer: Channels, Presence, Annotations, HTTP pagination
Source/ARTRestChannel.m, Source/ARTRestPresence.m, Source/ARTRestAnnotations.m, Source/ARTHTTPPaginatedResponse.m, Source/ARTPaginatedResult.m
Replace dispatch calls with art wrappers in history, presence, annotations, and HTTP pagination flows; added ARTGCD.h imports.
Reachability, URLSession & WebSocket
Source/ARTOSReachability.m, Source/ARTURLSessionServerTrust.m, Source/ARTWebSocketTransport.m
Swap dispatch_async for art_dispatch_async in callbacks and task completions.
Queued Dealloc & Misc
Source/ARTQueuedDealloc.m, Source/ARTPaged...*
Replace dispatch_async with art_dispatch_async where used; add ARTGCD.h imports.
ARTRest API Additions
Source/ARTRest.m
Add + (dispatch_queue_t)deviceAccessQueue public method; replace GCD calls with art wrappers; adjust activation callback nullability in paths that use ARTPush activation.
Private Header Updates
Source/PrivateHeaders/Ably/ARTGCD.h, Source/PrivateHeaders/Ably/ARTPush+Private.h
Add art dispatch declarations; remove ARTPushInternal queue property and a deprecated initializer; update getActivationMachine: signature to accept nullable parameter.
Tests
Test/AblyTests/Tests/GCDTests.swift
Added explicit local type annotation ARTScheduledBlockHandle? to scheduledBlock variable; no behavioral change.
Documentation
CONTRIBUTING.md
Add "Coding standards" section documenting use of art_dispatch_sync/art_dispatch_async instead of direct GCD to improve debuggability on nil queue submissions.

Sequence Diagram(s)

sequenceDiagram
    participant App as Application
    participant ARTRest as ARTRest
    participant ARTPush as ARTPush
    participant PushInt as ARTPushInternal
    participant MainQ as Main Queue
    participant InternalQ as ARTPushInternal._queue

    App->>ARTPush: +didRegisterForRemoteNotificationsWithDeviceToken:rest:
    ARTPush->>PushInt: forward registration (internal handler)
    PushInt->>ARTRest: ensure rest exists (extended lifetime)
    PushInt->>MainQ: art_dispatch_async(main)
    MainQ->>InternalQ: art_dispatch_async(_queue)
    InternalQ->>PushInt: createActivationStateMachine (may be nil)
    PushInt-->>ARTPush: callback with ARTPushActivationStateMachine (_Nullable)
    Note over PushInt,InternalQ: nil-guard + extended lifetime ensure safe creation
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~40 minutes

  • Repetitive, mechanical replacements (dispatch → art_dispatch) lower per-file reasoning cost.
  • Areas requiring deeper review:
    • ARTPush activation flow: extended lifetime handling, _queue ownership, nullable callback propagation.
    • ARTGCD behavior: exception-on-nil design (NSInvalidArgumentException) — ensure callers and tests tolerate/expect this.
    • Presence API change: initWithPresenceSyncState:newWithPresenceSyncState: — confirm all call sites updated.
    • Confirm no leftover direct dispatch_* calls remain in modified files.

Poem

🐰 I hopped through queues both near and far,

Replaced raw dispatch with an artful spar.
Nil-guards now keep crashes light,
Queues in order, debug delight —
A rabbit’s patch to make things right. 🎀

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'Fix crashes from dispatching to a nil queue' directly and clearly summarizes the main objective of the changeset. The PR is fundamentally about replacing direct GCD dispatch calls with wrapper functions (art_dispatch_sync and art_dispatch_async) that validate queue arguments and throw exceptions if a nil queue is passed, thereby preventing crashes. This is reflected throughout the 30+ modified files where dispatch_sync/dispatch_async are replaced with these safer wrappers, and is documented in the new CONTRIBUTING.md guidelines section.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch 2025-10-31-createActivationStateMachine-crash

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@lawrence-forooghian lawrence-forooghian force-pushed the 2025-10-31-createActivationStateMachine-crash branch from e126916 to 72b51c4 Compare October 31, 2025 20:04
@github-actions github-actions bot temporarily deployed to staging/pull/2138/features October 31, 2025 20:05 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/2138/jazzydoc October 31, 2025 20:09 Inactive
We've had a customer report a crash that appears to be due to us trying
to submit a block to a nil queue, which results in an EXC_BAD_ACCESS
that's not very easy to interpret. Throw an exception with a helpful
message in this case instead, to help with debugging in the future.
This is consistent with the approach used elsewhere in the codebase and
prevents accidentally dispatching to a `nil` queue (which causes a
crash) in the case where our weakly-referenced _rest gets deallocated.
This is a partial speculative fix for an infrequent crash reported by a
customer in [1].

[1] https://ably-real-time.slack.com/archives/C09BK2DB6H0/p1761923714781609
@lawrence-forooghian lawrence-forooghian force-pushed the 2025-10-31-createActivationStateMachine-crash branch from 72b51c4 to e67b08b Compare October 31, 2025 20:13
@github-actions github-actions bot temporarily deployed to staging/pull/2138/features October 31, 2025 20:14 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/2138/jazzydoc October 31, 2025 20:17 Inactive
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 4e23a51 and e67b08b.

📒 Files selected for processing (32)
  • CONTRIBUTING.md (1 hunks)
  • Source/ARTAuth.m (8 hunks)
  • Source/ARTChannel.m (3 hunks)
  • Source/ARTChannels.m (5 hunks)
  • Source/ARTConnection.m (3 hunks)
  • Source/ARTEventEmitter.m (5 hunks)
  • Source/ARTGCD.m (1 hunks)
  • Source/ARTHTTPPaginatedResponse.m (3 hunks)
  • Source/ARTLog.m (2 hunks)
  • Source/ARTOSReachability.m (2 hunks)
  • Source/ARTPaginatedResult.m (3 hunks)
  • Source/ARTPush.m (4 hunks)
  • Source/ARTPushActivationStateMachine.m (12 hunks)
  • Source/ARTPushAdmin.m (2 hunks)
  • Source/ARTPushChannel.m (6 hunks)
  • Source/ARTPushChannelSubscriptions.m (7 hunks)
  • Source/ARTPushDeviceRegistrations.m (9 hunks)
  • Source/ARTQueuedDealloc.m (2 hunks)
  • Source/ARTRealtime.m (4 hunks)
  • Source/ARTRealtimeAnnotations.m (6 hunks)
  • Source/ARTRealtimeChannel.m (11 hunks)
  • Source/ARTRealtimeChannels.m (2 hunks)
  • Source/ARTRealtimePresence.m (15 hunks)
  • Source/ARTRest.m (8 hunks)
  • Source/ARTRestAnnotations.m (5 hunks)
  • Source/ARTRestChannel.m (4 hunks)
  • Source/ARTRestPresence.m (5 hunks)
  • Source/ARTURLSessionServerTrust.m (2 hunks)
  • Source/ARTWebSocketTransport.m (1 hunks)
  • Source/PrivateHeaders/Ably/ARTGCD.h (2 hunks)
  • Source/PrivateHeaders/Ably/ARTPush+Private.h (0 hunks)
  • Test/AblyTests/Tests/GCDTests.swift (1 hunks)
💤 Files with no reviewable changes (1)
  • Source/PrivateHeaders/Ably/ARTPush+Private.h
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: check (tvOS, test_tvOS18_4)
  • GitHub Check: check (iOS, test_iOS18_4)
  • GitHub Check: check (macOS, test_macOS)
🔇 Additional comments (25)
Source/ARTURLSessionServerTrust.m (1)

2-2: LGTM! GCD wrapper migration improves nil-queue safety.

The import and wrapper usage align with the PR's objective to prevent crashes from nil queue dispatches.

Also applies to: 41-43

Source/ARTQueuedDealloc.m (1)

2-2: LGTM! Dealloc dispatch now protected against nil queue.

The wrapper ensures safe cleanup even if the queue is unexpectedly nil.

Also applies to: 20-22

Source/ARTOSReachability.m (1)

11-11: LGTM! Reachability callback dispatch now validated.

The wrapper adds safety to the callback path while maintaining the weak-strong pattern.

Also applies to: 49-53

Source/ARTRestChannel.m (1)

23-23: LGTM! Comprehensive GCD wrapper migration across all channel operations.

All callback and processing paths are now protected against nil queues, covering history, status, and message publishing operations.

Also applies to: 170-173, 177-226, 234-288, 295-387

Source/ARTPushAdmin.m (1)

9-9: LGTM! Push admin publish path now protected.

The wrapper migration covers both the callback handler and main publish logic.

Also applies to: 61-95

Source/ARTWebSocketTransport.m (1)

208-210: LGTM! WebSocket opening dispatch now validated.

The wrapper protects the critical websocket open operation on the dedicated _websocketOpenQueue.

Source/ARTChannel.m (1)

13-13: LGTM! Channel options accessors now protected.

The synchronous wrappers ensure thread-safe access to options while validating the queue.

Also applies to: 39-43, 50-53

Source/ARTConnection.m (1)

10-10: LGTM! All connection property accessors now protected.

The synchronous wrappers ensure thread-safe access to connection state (id, key, state, errorReason, recoveryKey) while validating the queue.

Also applies to: 137-299

Test/AblyTests/Tests/GCDTests.swift (1)

15-15: LGTM: Type annotation improves code clarity.

The explicit type annotation makes the variable type clear without changing behavior.

Source/ARTLog.m (1)

3-3: LGTM: GCD wrapper adoption improves nil-queue safety.

The replacement of dispatch_sync with art_dispatch_sync adds validation to prevent crashes from nil queues while preserving the existing logging behavior.

Also applies to: 95-95

CONTRIBUTING.md (1)

70-73: LGTM: Clear documentation of the new coding standard.

The addition of the coding standards section effectively communicates the requirement to use art_dispatch_sync and art_dispatch_async wrappers, along with the rationale for improved debuggability.

Source/ARTRestAnnotations.m (1)

19-19: LGTM: Consistent adoption of GCD wrappers throughout.

All dispatch_async calls have been systematically replaced with art_dispatch_async, maintaining the existing callback and queue behavior while adding nil-queue validation.

Also applies to: 124-124, 158-158, 233-233, 247-247

Source/ARTHTTPPaginatedResponse.m (1)

13-13: LGTM: Pagination callbacks now use validated dispatch wrappers.

The replacement of dispatch_async with art_dispatch_async in both pagination methods adds nil-queue protection to user callback invocations.

Also applies to: 59-59, 72-72

Source/ARTPaginatedResult.m (1)

10-10: LGTM: Consistent pagination callback protection.

The art_dispatch_async wrapper usage in both first: and next: methods ensures nil-queue validation for user callbacks.

Also applies to: 116-116, 131-131

Source/ARTRealtimeAnnotations.m (1)

19-19: LGTM: Comprehensive GCD wrapper adoption across all dispatch points.

All dispatch_async and dispatch_sync calls have been replaced with their art_dispatch_* counterparts throughout the file, covering user callbacks, queue synchronization, and subscription management. The changes preserve existing threading semantics while adding nil-queue validation.

Also applies to: 116-116, 143-143, 206-206, 214-214, 278-278, 289-289, 296-296

Source/ARTChannels.m (1)

7-7: LGTM: All synchronous dispatch calls now use validated wrappers.

The systematic replacement of dispatch_sync with art_dispatch_sync across channel management methods adds nil-queue protection while maintaining the existing synchronous execution semantics.

Also applies to: 30-30, 46-46, 65-65, 76-76

Source/ARTPushChannelSubscriptions.m (1)

12-12: LGTM! Systematic replacement with defensive wrappers.

The replacement of dispatch_async with art_dispatch_async throughout this file adds nil-queue validation without changing behavior. This defensive approach will help catch the nil queue crashes reported in the PR early with a clear exception message.

Also applies to: 70-70, 82-82, 117-117, 123-123, 139-139, 145-145, 162-162, 168-168, 188-188, 194-194

Source/ARTRealtimeChannels.m (1)

9-9: LGTM! Critical path now protected.

The replacement in the channel release flow adds important nil-queue validation at critical synchronization points. The changes preserve existing behavior while adding defensive checks.

Also applies to: 105-105, 111-111

Source/ARTRestPresence.m (1)

13-13: LGTM! Presence operations protected.

Consistent application of the art_dispatch_async wrapper for presence query and history operations. The defensive nil-queue checks are appropriate for these async callback paths.

Also applies to: 128-128, 159-159, 174-174, 221-221

Source/ARTGCD.m (1)

73-87: Core implementation looks solid.

These wrapper functions provide the nil-queue validation that addresses the crash reported in the PR. The implementation is straightforward:

  • Clear validation with if (!queue) check
  • Appropriate exception type (NSInvalidArgumentException)
  • Descriptive error messages identifying the function
  • Minimal overhead (just a nil-check before delegating to system functions)

This defensive approach will convert silent crashes into clear, debuggable exceptions.

Source/ARTPushDeviceRegistrations.m (1)

13-13: LGTM! Push device operations protected.

Systematic application of art_dispatch_async throughout device registration operations. Given that the PR mentions customer crashes potentially related to push notifications, these protections are particularly relevant.

Also applies to: 71-71, 83-83, 125-125, 137-137, 177-177, 183-183, 200-200, 206-206, 238-238, 250-250

Source/ARTPushActivationStateMachine.m (1)

17-17: Critical area now protected.

The push activation state machine is protected with the new wrappers throughout. This is particularly important since the PR description indicates the author reproduced a crash in this area when activating push notifications. The use of dispatch_get_main_queue() at line 424 is safe since system queues cannot be nil.

Also applies to: 83-83, 91-91, 103-103, 114-114, 173-173, 236-236, 285-285, 344-344, 389-389, 400-400, 411-411, 424-424

Source/ARTEventEmitter.m (2)

88-88: LGTM! Event emitter dispatch paths protected.

Consistent application of art_dispatch_async and art_dispatch_sync wrappers throughout the event emitter callback and synchronization paths. These are critical paths for event delivery where nil-queue validation is essential.

Also applies to: 345-345, 362-362, 379-379, 396-396, 352-352, 369-369, 386-386, 403-403, 410-410, 416-416, 422-422


427-429: Verification confirms the implementation is correct.

The single caller at ARTRealtimeChannels.m:119 executes off_nosync within art_dispatch_sync(_queue, ^{ ... }), where _queue is set to realtime.rest.queue—the same queue used by ARTRealtimeChannel. All callback paths from _detach: execute on this queue (either immediately or via _detachedEventEmitter, which is initialized with the channel's queue). The synchronization requirement is properly satisfied.

Source/ARTRealtime.m (1)

72-72: LGTM! Connection lifecycle protected.

The critical connection lifecycle methods (internalAsync, internalSync, connect, close, ping) now use the nil-queue validation wrappers. This is essential for the realtime connection management flow where queue initialization issues could lead to the crashes described in the PR.

Also applies to: 78-78, 455-455, 473-473, 525-525, 531-531

This solves a potential crash similar to that described in 380c82a, this
time resulting from potentially creating a push activation state machine
with a nil `rest` and thus a nil queue, causing a crash in -[ARTAuth
setLocalDeviceClientId_nosync]'s call to -sendEvent:.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (1)
Source/ARTPush.m (1)

139-147: Guard against nil legacy delegate and ensure protocol conformance.

Line 141 fetches UIApplication.sharedApplication.delegate but doesn't verify it's non-nil or conforms to ARTPushRegistererDelegate before passing it to createActivationStateMachineWithDelegate: on line 146. If the delegate is nil or doesn't implement the protocol, the activation state machine may be created in an invalid state or fail unexpectedly.

Apply this diff to properly type-check and guard against nil:

             art_dispatch_async(dispatch_get_main_queue(), ^{
                 // -[UIApplication delegate] is an UI API call, so needs to be called from main thread.
-                const id legacyDelegate = UIApplication.sharedApplication.delegate;
+                const id<ARTPushRegistererDelegate, NSObject> legacyDelegate = (id<ARTPushRegistererDelegate, NSObject>)UIApplication.sharedApplication.delegate;
 
                 // After this dispatch to the main queue, I believe there is no longer any mechanism guaranteed to be keeping _rest alive, hence our extendedLifetimeRest variable, which keeps _rest alive for long enough to ensure that when createActivationStateMachineWithDelegate creates the state machine, it passes it a non-nil _rest, as its initializer's contract requires.
 
                 art_dispatch_async(self->_queue, ^{
-                    callbackWithUnlock([self createActivationStateMachineWithDelegate:legacyDelegate]);
+                    callbackWithUnlock(legacyDelegate ? [self createActivationStateMachineWithDelegate:legacyDelegate] : nil);
                     extendedLifetimeRest = nil;
                 });
             });
🧹 Nitpick comments (1)
Source/ARTRest.m (1)

738-740: Fix indentation.

The art_dispatch_async call is not properly indented, breaking code formatting consistency.

Apply this diff to fix the indentation:

-    
-art_dispatch_async(_queue, ^{
-    [ARTPaginatedResult executePaginated:self withRequest:request andResponseProcessor:responseProcessor wrapperSDKAgents:wrapperSDKAgents logger:self.logger callback:callback];
-});
+    
+    art_dispatch_async(_queue, ^{
+        [ARTPaginatedResult executePaginated:self withRequest:request andResponseProcessor:responseProcessor wrapperSDKAgents:wrapperSDKAgents logger:self.logger callback:callback];
+    });
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e67b08b and 09eb2aa.

📒 Files selected for processing (4)
  • Source/ARTAuth.m (9 hunks)
  • Source/ARTPush.m (8 hunks)
  • Source/ARTRest.m (9 hunks)
  • Source/PrivateHeaders/Ably/ARTPush+Private.h (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • Source/ARTAuth.m
  • Source/PrivateHeaders/Ably/ARTPush+Private.h
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: check
  • GitHub Check: check (tvOS, test_tvOS18_4)
  • GitHub Check: check (macOS, test_macOS)
  • GitHub Check: check (iOS, test_iOS18_4)
  • GitHub Check: check
  • GitHub Check: build

Copy link
Contributor

@umair-ably umair-ably left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@maratal maratal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lawrence-forooghian lawrence-forooghian merged commit 30490f2 into main Nov 3, 2025
10 of 12 checks passed
@lawrence-forooghian lawrence-forooghian deleted the 2025-10-31-createActivationStateMachine-crash branch November 3, 2025 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants