feat(openclaw): JSONL-backed retain queue for external API resilience#740
Conversation
cfadeb9 to
b9a3bef
Compare
nicoloboschi
left a comment
There was a problem hiding this comment.
Hey this might be a good idea, but let's use normal files, not SQLite. What we really need is to persist Json lines as payload to send once the API is active again
|
Makes total sense — swapped to a simple JSONL file. No more SQLite, no native deps. Each line is the exact JSON payload that would've gone to the API. Same queue logic (enqueue on failure, periodic flush, FIFO delivery, max age cleanup). Should be in the latest push. |
|
@nicoloboschi Updated to use JSONL files as requested. Ready for re-review! |
nicoloboschi
left a comment
There was a problem hiding this comment.
Two blockers — the O(n²) file I/O during flush and a dead-code guard that silently breaks flushing.
| const clientGlobal = (global as any).__hindsightClient; | ||
| if (!clientGlobal) break; | ||
|
|
||
| let bankClient = clientsByBankId.get(item.bankId); |
There was a problem hiding this comment.
Bug: O(n² file I/O — remove(item.id) calls readAll() + writeAll() on every iteration, so flushing 50 items means 50 full file reads and 50 full file rewrites.
Collect the successfully-flushed IDs into an array, then do a single bulk removal after the loop:
const flushedIds: string[] = [];
for (const item of items) {
// ... send ...
flushedIds.push(item.id);
}
if (flushedIds.length > 0) retainQueue.removeMany(flushedIds);Same concern applies to size() (called on every successful retain and on the timer) — it parses the entire file each time. Worth caching the count in memory and updating it on enqueue/remove.
There was a problem hiding this comment.
Fixed — flush now collects flushed IDs and calls removeMany() for a single file rewrite at the end. Also cached the item count in memory so size() is O(1) — updated on every enqueue(), writeAll(), and removeMany(). @nicoloboschi
| try { | ||
| // Cleanup expired items first | ||
| retainQueue.cleanup(); | ||
|
|
There was a problem hiding this comment.
Bug: clientGlobal is read but never used. This checks (global as any).__hindsightClient and breaks out of the loop if it is falsy, but the variable is never referenced again — the code below uses clientsByBankId / clientOptions instead.
This means the flush silently does nothing whenever __hindsightClient happens to be unset, even if valid per-bank clients exist. Either:
- Remove this check entirely (it guards nothing), or
- Replace it with a check on what is actually needed (
if (!clientOptions) break;)
There was a problem hiding this comment.
Good catch — removed the clientGlobal check entirely and replaced it with if (!clientOptions) return which guards what's actually needed. Moved it before cleanup() too since there's no point expiring items if we can't flush anyway. @nicoloboschi
d78f22d to
654ac71
Compare
When the external Hindsight API is unreachable, retain requests are buffered as JSON lines in a local file and automatically flushed once connectivity is restored. Queue survives process restarts. - Only active in external API mode (local daemon handles its own persistence) - Zero dependencies — uses only Node built-ins (fs, crypto) - Bulk removal via removeMany() for O(1) file rewrites during flush - Cached item count so size() is O(1) - Configurable: retainQueuePath, retainQueueMaxAgeMs (-1 = forever), retainQueueFlushIntervalMs (default 60s) - Flushes on successful retain and on a periodic timer - All logging routed through structured logger (api.logger) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
654ac71 to
b04b8c8
Compare
nicoloboschi
left a comment
There was a problem hiding this comment.
can you resolve conflicts?
…rvice.start() Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@nicoloboschi Conflicts resolved — ready to merge! |
Summary
~/.openclaw/data/hindsight-retain-queue.db) instead of being silently lostbetter-sqlite3is never loadedNew config options
retainQueuePathstring~/.openclaw/data/hindsight-retain-queue.dbretainQueueMaxAgeMsnumber-1(forever)retainQueueFlushIntervalMsnumber60000(1 min)How it works
Logging
API unreachable — retain queued (3 pending, bank: max-openclaw): connect ECONNREFUSEDRetain queue: 3 items pending from previous session, will flush shortlyQueue flush: 3 queued retains delivered, queue emptyTest plan
better-sqlite3lazy-loaded — daemon mode users unaffected🤖 Generated with Claude Code
Heads up: PR includes the Claude Code attribution line — just letting you know before you see it.