Skip to content

Closes #508 - chores: implemented webhooks for onchain events#776

Closed
KodeSage wants to merge 5 commits intoSolFoundry:mainfrom
KodeSage:feat/webhooks_onchain_events
Closed

Closes #508 - chores: implemented webhooks for onchain events#776
KodeSage wants to merge 5 commits intoSolFoundry:mainfrom
KodeSage:feat/webhooks_onchain_events

Conversation

@KodeSage
Copy link
Copy Markdown
Contributor

Description

This pull request extends the contributor outbound webhook system so subscribers can receive on-chain–sourced events (escrow, reputation, staking) in addition to existing bounty lifecycle events.

  • New webhook event types: escrow.locked, escrow.released, reputation.updated, stake.deposited, stake.withdrawn, plus webhook.test for integration checks.
  • Indexer integration: POST /api/webhooks/internal/chain-events accepts normalized payloads from Helius, Shyft, or any worker, authenticated via X-Chain-Indexer-Key / CHAIN_WEBHOOK_INDEXER_SECRET. Optional notify_user_id limits delivery to one user’s registered URLs.
  • Batching: Events are queued and flushed every 5 seconds to reduce HTTP calls; batched POSTs use X-SolFoundry-Event: batch and a documented JSON envelope (delivery_mode, batch_id, events[]) with transaction_signature, slot, timestamp, and data.accounts (and optional bounty_id / extra).
  • Single-event deliveries (bounty lifecycle) remain immediate; payloads may include optional transaction_signature / slot when relevant.
  • Dashboard & observability: webhook_delivery_attempts table logs each HTTP attempt (retries, status, errors). GET /api/webhooks/delivery-stats exposes 7-day failure rate and recent attempts. POST /api/webhooks/test sends a signed test event to the caller’s endpoints.
  • Frontend: Contributor dashboard Webhooks tab shows delivery health and recent retry history.
  • Documentation: docs/WEBHOOK_EVENT_CATALOG.md — full event catalog and payload schemas.
  • Tests: backend/tests/test_chain_webhooks.py and updates to test_contributor_webhooks.py.

Closes #508

Solana Wallet for Payout

Wallet: EwWiRi5zkynTYN9pvgjqCEiWKuFwR7SLdgFox9R3GmyS

Type of Change

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📝 Documentation update
  • 🎨 Style/UI update
  • ♻️ Code refactoring
  • ⚡ Performance improvement
  • ✅ Test addition/update

Checklist

  • Code is clean and follows the issue spec exactly
  • One PR per bounty (no multiple bounties in one PR)
  • Tests included for new functionality
  • All existing tests pass
  • No console.log or debugging code left behind
  • No hardcoded secrets or API keys

Testing

  • Manual testing performed
  • Unit tests added/updated
  • Integration tests added/updated

Details: Ran pytest tests/test_chain_webhooks.py tests/test_contributor_webhooks.py. Verified FastAPI app imports (from app.main import app). Indexer route behavior covered with httpx AsyncClient against a minimal app; batcher grouping and batch JSON schema covered in unit tests.

Additional Notes

  • Migration: Apply alembic upgrade head (revision 006_webhook_delivery_attempts) in environments that use Alembic; init_db / create_all picks up the model in dev.
  • Operations: Set CHAIN_WEBHOOK_INDEXER_SECRET before enabling indexer traffic; if unset, chain ingest returns 503 (fail closed).
  • Bounty lifecycle webhook subscribers continue to receive the same single-event JSON shape; new fields are optional when absent.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 23, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: eadfe90f-0652-40ac-b4b8-1bb0bb6dfc24

📥 Commits

Reviewing files that changed from the base of the PR and between 592a5e7 and be2d7c4.

📒 Files selected for processing (2)
  • .github/workflows/anchor.yml
  • .github/workflows/ci.yml

📝 Walkthrough

Walkthrough

Adds on-chain webhook ingestion and batched delivery: new internal POST /api/webhooks/internal/chain-events with indexer authentication; a ChainWebhookBatcher that groups events on 5s windows and is started/stopped with the app; new DB migration and SQLAlchemy model for webhook delivery attempts; expanded webhook models (batch payloads, delivery-attempt public view, on-chain event set); new contributor webhook endpoints (delivery-stats, test); service changes to build/deliver batches, retry with per-attempt logging, and dashboard aggregation; tests for ingestion/batching/delivery; docs (WEBHOOK_EVENT_CATALOG); and CHAIN_WEBHOOK_INDEXER_SECRET added to config examples.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

approved, paid

Suggested reviewers

  • chronoeth-creator
  • github-actions
🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (1 warning, 2 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 41.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Linked Issues check ❓ Inconclusive The PR implements most acceptance criteria from #508: new webhook event types (escrow.locked, escrow.released, reputation.updated, stake.deposited, stake.withdrawn, webhook.test), batching (5-second windows), dashboard endpoints, test endpoint, documentation, and unit tests. However, direct Helius/Shyft indexer integration is not implemented; the PR provides a normalized ingest endpoint but lacks indexer client/adapters. Review whether 'Events triggered from the event indexer (Helius/Shyft integration)' in #508 requires explicit client implementations or if a normalized ingest endpoint satisfies the criterion. Complete end-to-end indexer integration tests if needed.
Out of Scope Changes check ❓ Inconclusive All changes align with #508 objectives: webhook event types, ingest endpoint, batching, dashboard, test endpoint, documentation, and tests. GitHub Actions workflow updates (Solana/Anchor tooling versions) are minor infrastructure changes supporting the build process but not directly related to webhook implementation. Confirm whether GitHub Actions workflow updates (.github/workflows/anchor.yml, .github/workflows/ci.yml) are necessary for this feature. If unrelated, they should be separated into a dedicated infrastructure PR.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Closes #508 - chores: implemented webhooks for onchain events' clearly summarizes the main change: implementing webhooks for on-chain events, directly addressing the linked issue #508.
Description check ✅ Passed The description provides comprehensive context: new event types, indexer integration, batching, dashboard features, frontend changes, documentation, and testing details. It is fully related to the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/api/contributor_webhooks.py`:
- Around line 146-163: Add explicit handling and documentation for errors and
the response model in the webhook_test_delivery endpoint: update the endpoint to
declare a response_model (e.g., a pydantic model with status and
endpoints_notified) and add the ValueError to the responses mapping (e.g., 400:
{"model": ErrorResponse, "description": "..."}), or wrap the call to
ContributorWebhookService.dispatch_test_event in a try/except that converts
ValueError into an HTTPException(status_code=400) so OpenAPI and runtime
behavior are explicit; reference the webhook_test_delivery function and the
dispatch_test_event method when making these edits.

In `@backend/app/main.py`:
- Around line 176-177: Wrap the await chain_webhook_batcher.stop() call in a
try/except so any exception from chain_webhook_batcher.stop() is caught and
logged (use logger.exception or process_logger.error with the exception) and
does not abort the rest of shutdown; ensure task cancellation and calls to
close_redis and close_db still run (move them into a finally block or simply run
them after the except) so resources are always released even if
chain_webhook_batcher.stop() fails.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 50dbbf99-5d4d-40b2-b898-2347fd4d19fc

📥 Commits

Reviewing files that changed from the base of the PR and between 5a3017b and 4a9efa5.

⛔ Files ignored due to path filters (1)
  • contracts/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (17)
  • .env.example
  • backend/alembic/env.py
  • backend/alembic/versions/006_webhook_delivery_attempts.py
  • backend/app/api/chain_webhook_indexer.py
  • backend/app/api/contributor_webhooks.py
  • backend/app/database.py
  • backend/app/main.py
  • backend/app/models/contributor_webhook.py
  • backend/app/models/webhook_delivery.py
  • backend/app/services/chain_webhook_batcher.py
  • backend/app/services/config_validator.py
  • backend/app/services/contributor_webhook_service.py
  • backend/tests/test_chain_webhooks.py
  • backend/tests/test_contributor_webhooks.py
  • docs/WEBHOOK_EVENT_CATALOG.md
  • frontend/src/components/ContributorDashboard.tsx
  • test.md

Comment on lines +146 to +163
@router.post(
"/test",
summary="Send a test webhook",
description=(
"Dispatches a signed ``webhook.test`` payload immediately to every active "
"webhook registered by the caller (not batched)."
),
responses={
401: {"model": ErrorResponse, "description": "Authentication required"},
},
)
async def webhook_test_delivery(
user_id: str = Depends(get_current_user_id),
db: AsyncSession = Depends(get_db),
) -> dict[str, str | int]:
service = ContributorWebhookService(db)
n = await service.dispatch_test_event(user_id)
return {"status": "completed", "endpoints_notified": n}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider adding explicit error handling for dispatch_test_event.

The service's dispatch_test_event method raises ValueError if "webhook.test" is not configured in WEBHOOK_EVENTS. While the global value_error_handler in main.py catches this and returns 400, consider documenting this in the responses dict or handling it explicitly for clarity.

Additionally, consider adding a response_model for the return type to ensure consistent API documentation in OpenAPI.

💡 Optional improvement
 `@router.post`(
     "/test",
     summary="Send a test webhook",
     description=(
         "Dispatches a signed ``webhook.test`` payload immediately to every active "
         "webhook registered by the caller (not batched)."
     ),
     responses={
+        400: {"model": ErrorResponse, "description": "webhook.test event not configured"},
     },
 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/contributor_webhooks.py` around lines 146 - 163, Add explicit
handling and documentation for errors and the response model in the
webhook_test_delivery endpoint: update the endpoint to declare a response_model
(e.g., a pydantic model with status and endpoints_notified) and add the
ValueError to the responses mapping (e.g., 400: {"model": ErrorResponse,
"description": "..."}), or wrap the call to
ContributorWebhookService.dispatch_test_event in a try/except that converts
ValueError into an HTTPException(status_code=400) so OpenAPI and runtime
behavior are explicit; reference the webhook_test_delivery function and the
dispatch_test_event method when making these edits.

Comment on lines +176 to +177
await chain_webhook_batcher.stop()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider wrapping stop() in try/except for graceful shutdown.

If chain_webhook_batcher.stop() raises an exception, subsequent cleanup steps (task cancellations, close_redis, close_db) will not execute, potentially leaving resources unreleased.

💡 Suggested improvement
-    await chain_webhook_batcher.stop()
+    try:
+        await chain_webhook_batcher.stop()
+    except Exception as exc:
+        logger.error("chain_webhook_batcher shutdown error: %s", exc)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
await chain_webhook_batcher.stop()
try:
await chain_webhook_batcher.stop()
except Exception as exc:
logger.error("chain_webhook_batcher shutdown error: %s", exc)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/main.py` around lines 176 - 177, Wrap the await
chain_webhook_batcher.stop() call in a try/except so any exception from
chain_webhook_batcher.stop() is caught and logged (use logger.exception or
process_logger.error with the exception) and does not abort the rest of
shutdown; ensure task cancellation and calls to close_redis and close_db still
run (move them into a finally block or simply run them after the except) so
resources are always released even if chain_webhook_batcher.stop() fails.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/api/chain_webhook_indexer.py`:
- Around line 51-57: The indexer credential is currently treated as a single
bearer secret and notify_user_id (Field notify_user_id) is optional, allowing
any holder of the indexer secret to broadcast to all active webhooks; fix this
by requiring scoping and enforcing authorization: make notify_user_id required
in the indexer payload or, if optional, validate the indexer credential against
an allowlist of user UUIDs and reject requests with no notify_user_id; update
the chain_webhook_indexer.py logic that parses the credential (the
bearer/indexer auth handling around the code that reads notify_user_id) to
enforce the requirement and add explicit authorization checks, and modify
contributor_webhook_service.py (the code path in the function that queries
active webhooks around Lines 238-245) so that it never performs an unscoped
query when notify_user_id is missing—return 400/403 instead or scope the query
to allowed user_ids tied to the indexer credential.

In `@backend/app/models/webhook_delivery.py`:
- Around line 42-52: The model's event_types Column uses generic JSON but the
Alembic migration uses postgresql.JSONB, causing schema drift; update the
WebhookDelivery model's event_types Column to use PostgreSQL JSONB (matching the
migration 006_webhook_delivery_attempts.py) and add the corresponding import
from sqlalchemy.dialects.postgresql (ensure any ascixt/text type handling
matches the migration) so Base.metadata.create_all() produces the same JSONB
column as Alembic.

In `@backend/app/services/contributor_webhook_service.py`:
- Around line 187-193: The current sequential loop that awaits
self._deliver_with_retry for each webhook (seen in the fan-out loop around the
for wh in webhooks at lines ~187-193 and the similar loop at ~247-250) causes
one slow/dead endpoint to block all subsequent deliveries; change the
implementation to fire deliveries concurrently instead of awaiting each call
inline: create asyncio Tasks (or use asyncio.gather / an asyncio.TaskGroup) for
each call to self._deliver_with_retry(wh, event, payload_bytes,
event_types=[event]) and then await them as a group; to avoid unbounded
concurrency, wrap individual delivery calls with a concurrency limiter (e.g., an
asyncio.Semaphore or bounded worker pool) so you still limit parallelism while
ensuring a single slow endpoint cannot stall the entire fan-out. Ensure error
handling/logging is preserved for each task.
- Around line 277-338: The loop currently lets DB write failures from
_log_attempt()/_record_delivery() propagate into the outer retry flow and cause
re-sends after a 2xx; to fix, ensure a successful HTTP 2xx response NEVER
triggers a retry even if DB writes fail by isolating those writes in their own
try/except and swallowing/logging DB errors without re-raising them, then
immediately return; specifically, in the send loop around the response-success
branch that references webhook.id, batch_id, attempt, resp.status, call
_log_attempt and _record_delivery inside a nested try/except (or perform the
return before propagating errors) so exceptions from
_log_attempt/_record_delivery do not set last_exc or fall through to the
retry/backoff (MAX_ATTEMPTS, BACKOFF_BASE_SECONDS) path.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: cd85debc-9560-482e-ae99-73017ab19ddb

📥 Commits

Reviewing files that changed from the base of the PR and between 4a9efa5 and 592a5e7.

📒 Files selected for processing (5)
  • .github/workflows/anchor.yml
  • backend/alembic/versions/006_webhook_delivery_attempts.py
  • backend/app/api/chain_webhook_indexer.py
  • backend/app/models/webhook_delivery.py
  • backend/app/services/contributor_webhook_service.py

Comment on lines +35 to +57
block_time: Optional[str] = Field(
None,
description="ISO-8601 UTC timestamp from the block (optional).",
)
accounts: dict[str, Any] = Field(
default_factory=dict,
description="Relevant account pubkeys and parsed fields (indexer-specific).",
)
bounty_id: str = Field(
default="",
description="Optional bounty correlation id when applicable.",
)
extra: dict[str, Any] = Field(
default_factory=dict,
description="Additional fields merged into the webhook ``data`` object.",
)
notify_user_id: Optional[str] = Field(
None,
description=(
"If set, only webhooks owned by this user UUID receive the event. "
"If omitted, all active subscriber URLs receive batched deliveries."
),
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The ingest schema is not enforced for several structured fields.

Lines 35-57 accept block_time and notify_user_id as plain strings, so malformed timestamps and non-UUID values pass validation. Lines 101-106 then forward block_time verbatim into outbound payloads, and backend/app/services/contributor_webhook_service.py Lines 241-242 only coerce notify_user_id to UUID during batch delivery, after this endpoint has already returned 202. The merge at Lines 101-102 also lets extra.accounts overwrite the reserved data.accounts object, which breaks the documented event shape.

As per coding guidelines, backend/**: Python FastAPI backend. Analyze thoroughly: Input validation and SQL injection vectors; Error handling and edge case coverage; API contract consistency with spec.

Also applies to: 101-111

Comment on lines +51 to +57
notify_user_id: Optional[str] = Field(
None,
description=(
"If set, only webhooks owned by this user UUID receive the event. "
"If omitted, all active subscriber URLs receive batched deliveries."
),
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

One shared indexer key can broadcast arbitrary events to every subscriber.

Lines 69-81 treat the indexer credential as a single bearer secret, and Lines 51-57 make recipient scoping optional. In backend/app/services/contributor_webhook_service.py Lines 238-245, a missing notify_user_id becomes an unscoped query over every active webhook. That means any caller holding the indexer secret can synthesize arbitrary on-chain events and fan them out platform-wide, rather than being constrained to a specific source or subscriber set.

As per coding guidelines, backend/**: Python FastAPI backend. Analyze thoroughly: Authentication/authorization gaps.

Also applies to: 69-81, 89-112

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/chain_webhook_indexer.py` around lines 51 - 57, The indexer
credential is currently treated as a single bearer secret and notify_user_id
(Field notify_user_id) is optional, allowing any holder of the indexer secret to
broadcast to all active webhooks; fix this by requiring scoping and enforcing
authorization: make notify_user_id required in the indexer payload or, if
optional, validate the indexer credential against an allowlist of user UUIDs and
reject requests with no notify_user_id; update the chain_webhook_indexer.py
logic that parses the credential (the bearer/indexer auth handling around the
code that reads notify_user_id) to enforce the requirement and add explicit
authorization checks, and modify contributor_webhook_service.py (the code path
in the function that queries active webhooks around Lines 238-245) so that it
never performs an unscoped query when notify_user_id is missing—return 400/403
instead or scope the query to allowed user_ids tied to the indexer credential.

Comment on lines +42 to +52
event_types = Column(JSON, nullable=False)
attempt_number = Column(Integer, nullable=False)
success = Column(Boolean, nullable=False)
http_status = Column(Integer, nullable=True)
error_message = Column(Text, nullable=True)
created_at = Column(
DateTime(timezone=True), nullable=False, default=_utcnow, index=True
)

__table_args__ = (
Index("ix_webhook_delivery_webhook_created", "webhook_id", "created_at"),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, find and examine the webhook_delivery model file
find . -name "webhook_delivery.py" -type f

Repository: SolFoundry/solfoundry

Length of output: 105


🏁 Script executed:

# Find the migration file mentioned
find . -path "*alembic/versions*" -name "*webhook*" -type f

Repository: SolFoundry/solfoundry

Length of output: 179


🏁 Script executed:

# Also check for any Alembic version files that might create webhook_delivery_attempts
find . -path "*alembic/versions*" -name "*.py" -type f | head -20

Repository: SolFoundry/solfoundry

Length of output: 559


🏁 Script executed:

# Read the webhook_delivery model file to see the actual column definition
wc -l backend/app/models/webhook_delivery.py

Repository: SolFoundry/solfoundry

Length of output: 106


🏁 Script executed:

# Read around line 42 in the model
cat -n backend/app/models/webhook_delivery.py | sed -n '30,55p'

Repository: SolFoundry/solfoundry

Length of output: 1131


🏁 Script executed:

# Read the migration file to see the column definition at lines 32-34
cat -n backend/alembic/versions/006_webhook_delivery_attempts.py | sed -n '25,45p'

Repository: SolFoundry/solfoundry

Length of output: 1077


🏁 Script executed:

# Check if the model's metadata is used for runtime schema creation
rg -n "create_all|metadata.create_all" backend/app --type py -A 2 -B 2

Repository: SolFoundry/solfoundry

Length of output: 631


🏁 Script executed:

# Check imports in the model file to understand what JSON type is being used
head -30 backend/app/models/webhook_delivery.py

Repository: SolFoundry/solfoundry

Length of output: 644


🏁 Script executed:

# Check the database initialization code
find backend -name "*.py" -type f | xargs rg -l "create_all|metadata" | head -10

Repository: SolFoundry/solfoundry

Length of output: 398


🏁 Script executed:

# Verify how the JSON type behaves - check if there are any compatibility layers
rg -n "JSON|JSONB" backend/app/models/webhook_delivery.py

Repository: SolFoundry/solfoundry

Length of output: 126


🏁 Script executed:

# Check if there are other models using JSONB to see the pattern
rg -n "postgresql.JSONB" backend/app/models/ -A 1 -B 1

Repository: SolFoundry/solfoundry

Length of output: 47


event_types column uses generic JSON type in model but postgresql.JSONB in migration.

Line 42 of backend/app/models/webhook_delivery.py declares event_types as Column(JSON, nullable=False) (generic SQLAlchemy type). However, backend/alembic/versions/006_webhook_delivery_attempts.py lines 32-34 create the same column as postgresql.JSONB(astext_type=sa.Text()).

This mismatch causes schema divergence: databases bootstrapped via Base.metadata.create_all() (called in backend/app/database.py:183) will create a generic JSON column, while Alembic-migrated databases will have JSONB. The types differ in PostgreSQL behavior—JSONB supports indexing and has distinct operator semantics. This inconsistency violates API contract consistency with the Alembic spec and will trigger schema autogeneration churn.

Align the model to use postgresql.JSONB to match the migration.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/models/webhook_delivery.py` around lines 42 - 52, The model's
event_types Column uses generic JSON but the Alembic migration uses
postgresql.JSONB, causing schema drift; update the WebhookDelivery model's
event_types Column to use PostgreSQL JSONB (matching the migration
006_webhook_delivery_attempts.py) and add the corresponding import from
sqlalchemy.dialects.postgresql (ensure any ascixt/text type handling matches the
migration) so Base.metadata.create_all() produces the same JSONB column as
Alembic.

Comment on lines +187 to +193
for wh in webhooks:
await self._deliver_with_retry(
wh,
event,
payload_bytes,
event_types=[event],
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Sequential fan-out defeats the 5-second batching window under partial failure.

Lines 187-193 and 247-250 await each webhook delivery one endpoint at a time. With the current 10-second timeout plus 2s/4s backoff, one dead endpoint can hold the loop for roughly 36 seconds before later subscribers are attempted. For global events or users near the 10-webhook cap, delivery latency can balloon from seconds to minutes.

Also applies to: 247-250

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/contributor_webhook_service.py` around lines 187 - 193,
The current sequential loop that awaits self._deliver_with_retry for each
webhook (seen in the fan-out loop around the for wh in webhooks at lines
~187-193 and the similar loop at ~247-250) causes one slow/dead endpoint to
block all subsequent deliveries; change the implementation to fire deliveries
concurrently instead of awaiting each call inline: create asyncio Tasks (or use
asyncio.gather / an asyncio.TaskGroup) for each call to
self._deliver_with_retry(wh, event, payload_bytes, event_types=[event]) and then
await them as a group; to avoid unbounded concurrency, wrap individual delivery
calls with a concurrency limiter (e.g., an asyncio.Semaphore or bounded worker
pool) so you still limit parallelism while ensuring a single slow endpoint
cannot stall the entire fan-out. Ensure error handling/logging is preserved for
each task.

Comment on lines +277 to +338
if 200 <= resp.status < 300:
await self._log_attempt(
webhook.id,
batch_id,
"batch",
event_types,
attempt,
True,
resp.status,
None,
)
await self._record_delivery(webhook.id, success=True)
logger.info(
"Webhook batch delivered: id=%s attempt=%d status=%d",
webhook.id,
attempt,
resp.status,
)
return
last_exc = RuntimeError(
f"HTTP {resp.status} from {webhook.url}"
)
await self._log_attempt(
webhook.id,
batch_id,
"batch",
event_types,
attempt,
False,
resp.status,
str(last_exc),
)
logger.warning(
"Webhook batch non-2xx: id=%s attempt=%d status=%d",
webhook.id,
attempt,
resp.status,
)
except Exception as exc:
last_exc = exc
await self._log_attempt(
webhook.id,
batch_id,
"batch",
event_types,
attempt,
False,
None,
str(exc),
)
logger.warning(
"Webhook batch error: id=%s attempt=%d error=%s",
webhook.id,
attempt,
exc,
)

if attempt < MAX_ATTEMPTS:
delay = BACKOFF_BASE_SECONDS**attempt
await asyncio.sleep(delay)

await self._record_delivery(webhook.id, success=False)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

A database write failure after a 2xx response causes duplicate webhook deliveries.

Lines 277-288 and 377-387 keep _log_attempt() / _record_delivery() inside the same try as the outbound POST, while _log_attempt() commits immediately at Lines 460-471. If either DB write raises after the subscriber has already returned 2xx, control drops into the retry path at Lines 315-336 / 416-438 and the same payload is POSTed again. That turns a local persistence issue into duplicated downstream events, including replaying whole batch payloads.

As per coding guidelines, backend/**: Python FastAPI backend. Analyze thoroughly: Error handling and edge case coverage.

Also applies to: 377-440, 449-472

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/contributor_webhook_service.py` around lines 277 - 338,
The loop currently lets DB write failures from _log_attempt()/_record_delivery()
propagate into the outer retry flow and cause re-sends after a 2xx; to fix,
ensure a successful HTTP 2xx response NEVER triggers a retry even if DB writes
fail by isolating those writes in their own try/except and swallowing/logging DB
errors without re-raising them, then immediately return; specifically, in the
send loop around the response-success branch that references webhook.id,
batch_id, attempt, resp.status, call _log_attempt and _record_delivery inside a
nested try/except (or perform the return before propagating errors) so
exceptions from _log_attempt/_record_delivery do not set last_exc or fall
through to the retry/backoff (MAX_ATTEMPTS, BACKOFF_BASE_SECONDS) path.

Comment on lines +351 to 355
*,
event_types: list[str],
delivery_mode: str = "single",
batch_id: UUID | None = None,
) -> None:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Single-event attempts are being recorded with synthetic batch IDs.

Lines 353-365 generate a UUID even when delivery_mode remains "single", and Lines 573-575 expose that batch_id through the dashboard payload. The field stops meaning “correlation id for an actual batched delivery” and instead becomes non-null for every single-event attempt, which weakens the dashboard/API semantics around batch history.

As per coding guidelines, backend/**: Python FastAPI backend. Analyze thoroughly: API contract consistency with spec.

Also applies to: 365-365, 573-575

@KodeSage KodeSage deleted the feat/webhooks_onchain_events branch April 4, 2026 11:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant