Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 8 additions & 0 deletions scripts/cluster/node1.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Kiwi Node 1 Configuration
port 7379
binding 127.0.0.1
data-dir ./db/node1
raft-node-id 1
raft-addr 127.0.0.1:8081
raft-resp-addr 127.0.0.1:7379
raft-data-dir ./raft_data/node1
8 changes: 8 additions & 0 deletions scripts/cluster/node2.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Kiwi Node 2 Configuration
port 7380
binding 127.0.0.1
data-dir ./db/node2
raft-node-id 2
raft-addr 127.0.0.1:8082
raft-resp-addr 127.0.0.1:7380
raft-data-dir ./raft_data/node2
8 changes: 8 additions & 0 deletions scripts/cluster/node3.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Kiwi Node 3 Configuration
port 7381
binding 127.0.0.1
data-dir ./db/node3
raft-node-id 3
raft-addr 127.0.0.1:8083
raft-resp-addr 127.0.0.1:7381
raft-data-dir ./raft_data/node3
193 changes: 193 additions & 0 deletions scripts/cluster/start_cluster.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
#!/bin/bash
#
# Kiwi 3-Node Raft Cluster Startup Script
# Usage: ./start_cluster.sh [--init]
#

set -e
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, locate and read the script file
find . -name "start_cluster.sh" -type f

Repository: arana-db/kiwi

Length of output: 91


🏁 Script executed:

# Read the file to verify the content at lines 7 and 50
cat -n ./scripts/cluster/start_cluster.sh

Repository: arana-db/kiwi

Length of output: 6685


Pipeline failures can be masked with set -e alone; use pipefail for reliability.

Line 50 pipes cargo build into tail -5; with only set -e, the pipeline's exit status depends on tail's success, not cargo's. While the subsequent file existence check (line 51) provides a safety net, relying on post-hoc validation is less reliable than catching the failure immediately.

Proposed fix
-set -e
+set -euo pipefail

And update the build step:

-    cargo build --release --bin kiwi 2>&1 | tail -5
-    if [ ! -f "$PROJECT_ROOT/target/release/kiwi" ]; then
+    if ! cargo build --release --bin kiwi 2>&1 | tail -5; then
         log_error "Build failed!"
         exit 1
     fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/cluster/start_cluster.sh` at line 7, The script uses "set -e" which
can mask failures in pipelines (e.g., the pipeline in the build step that runs
"cargo build | tail -5"), so enable pipefail and stricter error handling by
adding "set -o pipefail" (or replace the existing "set -e" with "set -euo
pipefail") at the top of start_cluster.sh; additionally, update the build
pipeline (the "cargo build | tail -5" invocation) to either capture the cargo
exit code explicitly (run cargo build and check its exit status before piping)
or use a construct that preserves the original command's exit status so failed
cargo builds cause the script to exit immediately.


SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
CONFIG_DIR="$SCRIPT_DIR"
LOG_DIR="$PROJECT_ROOT/logs"
PID_DIR="$PROJECT_ROOT/.pids"

RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m'

NODE_REDIS_PORTS=(7379 7380 7381)
NODE_RAFT_PORTS=(8081 8082 8083)

log_info() { echo -e "${BLUE}[INFO]${NC} $1"; }
log_success() { echo -e "${GREEN}[SUCCESS]${NC} $1"; }
log_warn() { echo -e "${YELLOW}[WARN]${NC} $1"; }
log_error() { echo -e "${RED}[ERROR]${NC} $1"; }

cleanup_old_data() {
log_info "Cleaning up old Raft data..."
rm -rf "$PROJECT_ROOT/raft_data"
rm -rf "$PROJECT_ROOT/db"
rm -rf "$LOG_DIR"
rm -rf "$PID_DIR"
}

create_directories() {
log_info "Creating directories..."
mkdir -p "$LOG_DIR"
mkdir -p "$PID_DIR"
for i in 1 2 3; do
mkdir -p "$PROJECT_ROOT/raft_data/node$i"
mkdir -p "$PROJECT_ROOT/db/node$i"
done
}

build_binary() {
log_info "Building Kiwi binary..."
cd "$PROJECT_ROOT"
cargo build --release --bin kiwi 2>&1 | tail -5
if [ ! -f "$PROJECT_ROOT/target/release/kiwi" ]; then
log_error "Build failed!"
exit 1
fi
log_success "Build complete"
}

start_node() {
local node_id=$1
local config_file="$CONFIG_DIR/node${node_id}.conf"
local log_file="$LOG_DIR/node${node_id}.log"
local pid_file="$PID_DIR/node${node_id}.pid"

log_info "Starting Node $node_id..."

cd "$PROJECT_ROOT"
RUST_LOG=info ./target/release/kiwi --config "$config_file" > "$log_file" 2>&1 &
local pid=$!
echo $pid > "$pid_file"

log_success "Node $node_id started (PID: $pid, Log: $log_file)"
}

wait_for_ports() {
log_info "Waiting for nodes to be ready..."
sleep 3

for i in 0 1 2; do
local redis_port=${NODE_REDIS_PORTS[$i]}
local raft_port=${NODE_RAFT_PORTS[$i]}
local node_num=$((i + 1))

for j in {1..10}; do
if nc -z 127.0.0.1 $redis_port 2>/dev/null; then
log_success "Node $node_num Redis port $redis_port is ready"
break
fi
sleep 1
done

for j in {1..10}; do
if nc -z 127.0.0.1 $raft_port 2>/dev/null; then
log_success "Node $node_num Raft port $raft_port is ready"
break
fi
sleep 1
done
Comment on lines +83 to +97
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check if the script exists and examine the relevant lines
if [ -f "scripts/cluster/start_cluster.sh" ]; then
    echo "File found. Checking lines 83-97:"
    sed -n '83,97p' scripts/cluster/start_cluster.sh
    echo -e "\n\nShowing broader context (lines 70-110):"
    sed -n '70,110p' scripts/cluster/start_cluster.sh
else
    echo "File not found. Listing scripts/cluster directory:"
    ls -la scripts/cluster/ 2>/dev/null || echo "Directory doesn't exist"
fi

Repository: arana-db/kiwi

Length of output: 1905


🏁 Script executed:

# Search for the wait_for_ports function definition
echo "Looking for wait_for_ports function:"
grep -n "wait_for_ports" scripts/cluster/start_cluster.sh 2>/dev/null || echo "Not found in expected location"

# Also search for how this function is called
echo -e "\n\nSearching for function calls:"
rg "wait_for_ports" scripts/cluster/ -n 2>/dev/null || echo "No results"

Repository: arana-db/kiwi

Length of output: 281


🏁 Script executed:

# Get the full function to understand error handling context
echo "Checking for error handling patterns in the script:"
rg -A 5 -B 5 "for j in \{1\.\.10\}" scripts/cluster/start_cluster.sh 2>/dev/null || echo "Pattern not found"

Repository: arana-db/kiwi

Length of output: 737


🏁 Script executed:

# Check the beginning of the script for error handling flags
echo "Checking script beginning:"
head -20 scripts/cluster/start_cluster.sh

echo -e "\n\nChecking how wait_for_ports is called (around line 183):"
sed -n '175,195p' scripts/cluster/start_cluster.sh

echo -e "\n\nChecking what happens after wait_for_ports call:"
sed -n '183,200p' scripts/cluster/start_cluster.sh

Repository: arana-db/kiwi

Length of output: 997


🏁 Script executed:

# Check if there are any other safety checks or error handling
echo "Checking for error handling patterns:"
rg "set -e|set -o|trap|set -E" scripts/cluster/start_cluster.sh || echo "No standard error flags found"

echo -e "\n\nChecking main function flow:"
rg -B 2 -A 10 "main()" scripts/cluster/start_cluster.sh | head -50

Repository: arana-db/kiwi

Length of output: 417


wait_for_ports needs explicit failure handling for port timeout scenarios.

The function checks Redis and Raft port readiness with 10-second timeouts per port, but silently succeeds even if ports never open. Since the function returns 0 implicitly, the script continues to initialize the cluster despite missing ports, producing false "cluster started" status.

Although set -e is enabled at script startup, it cannot catch this silent failure since the function doesn't return an error code. The fix should track readiness state and explicitly return 1 on timeout so the error propagates correctly.

Proposed fix
-        for j in {1..10}; do
+        local redis_ready=false
+        for j in {1..10}; do
             if nc -z 127.0.0.1 $redis_port 2>/dev/null; then
                 log_success "Node $node_num Redis port $redis_port is ready"
+                redis_ready=true
                 break
             fi
             sleep 1
         done
+        if [ "$redis_ready" = false ]; then
+            log_error "Node $node_num Redis port $redis_port did not become ready"
+            return 1
+        fi
         
-        for j in {1..10}; do
+        local raft_ready=false
+        for j in {1..10}; do
             if nc -z 127.0.0.1 $raft_port 2>/dev/null; then
                 log_success "Node $node_num Raft port $raft_port is ready"
+                raft_ready=true
                 break
             fi
             sleep 1
         done
+        if [ "$raft_ready" = false ]; then
+            log_error "Node $node_num Raft port $raft_port did not become ready"
+            return 1
+        fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for j in {1..10}; do
if nc -z 127.0.0.1 $redis_port 2>/dev/null; then
log_success "Node $node_num Redis port $redis_port is ready"
break
fi
sleep 1
done
for j in {1..10}; do
if nc -z 127.0.0.1 $raft_port 2>/dev/null; then
log_success "Node $node_num Raft port $raft_port is ready"
break
fi
sleep 1
done
local redis_ready=false
for j in {1..10}; do
if nc -z 127.0.0.1 $redis_port 2>/dev/null; then
log_success "Node $node_num Redis port $redis_port is ready"
redis_ready=true
break
fi
sleep 1
done
if [ "$redis_ready" = false ]; then
log_error "Node $node_num Redis port $redis_port did not become ready"
return 1
fi
local raft_ready=false
for j in {1..10}; do
if nc -z 127.0.0.1 $raft_port 2>/dev/null; then
log_success "Node $node_num Raft port $raft_port is ready"
raft_ready=true
break
fi
sleep 1
done
if [ "$raft_ready" = false ]; then
log_error "Node $node_num Raft port $raft_port did not become ready"
return 1
fi
🧰 Tools
🪛 Shellcheck (0.11.0)

[warning] 91-91: j appears unused. Verify use (or export if used externally).

(SC2034)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/cluster/start_cluster.sh` around lines 83 - 97, The wait_for_ports
logic currently loops to check redis_port and raft_port readiness but never
fails the function on timeout; update the wait_for_ports function so each port
loop sets a local "ready" flag (e.g., redis_ready, raft_ready), after each loop
check the flag and if false call log_error (including node_num and port) and
return 1; only return 0 at the end when both redis_ready and raft_ready are
true. Reference the existing symbols: wait_for_ports, node_num, redis_port,
raft_port, log_success, and add log_error + explicit return 1 on timeout so the
caller sees failure.

done
}

init_cluster() {
log_info "Initializing Raft cluster..."

local init_response=$(curl -s -X POST http://127.0.0.1:8081/raft/init \
-H "Content-Type: application/json" \
-d '{
"nodes": [
[1, {"raft_addr": "127.0.0.1:8081", "resp_addr": "127.0.0.1:7379"}],
[2, {"raft_addr": "127.0.0.1:8082", "resp_addr": "127.0.0.1:7380"}],
[3, {"raft_addr": "127.0.0.1:8083", "resp_addr": "127.0.0.1:7381"}]
]
}')

if echo "$init_response" | grep -q '"success":true'; then
log_success "Cluster initialized successfully"
else
log_warn "Cluster initialization response: $init_response"
fi

log_info "Waiting for leader election (5 seconds)..."
sleep 5
}

check_leader() {
log_info "Checking leader status..."

for i in 0 1 2; do
local raft_port=${NODE_RAFT_PORTS[$i]}
local node_num=$((i + 1))
local response=$(curl -s http://127.0.0.1:$raft_port/raft/leader 2>/dev/null)

if echo "$response" | grep -q '"leader_id"'; then
local leader_id=$(echo "$response" | grep -o '"leader_id":[0-9]*' | cut -d: -f2)
log_info "Node $node_num sees leader: $leader_id"
fi
done
}

print_status() {
echo ""
echo "=========================================="
echo " Kiwi 3-Node Raft Cluster"
echo "=========================================="
echo ""
echo "Nodes:"
for i in 0 1 2; do
local redis_port=${NODE_REDIS_PORTS[$i]}
local raft_port=${NODE_RAFT_PORTS[$i]}
local node_num=$((i + 1))
local pid=$(cat "$PID_DIR/node${node_num}.pid" 2>/dev/null || echo "N/A")
echo " Node $node_num:"
echo " - Redis: 127.0.0.1:$redis_port"
echo " - Raft: 127.0.0.1:$raft_port"
echo " - PID: $pid"
echo " - Log: $LOG_DIR/node${node_num}.log"
done
echo ""
echo "Commands:"
echo " Test cluster: ./scripts/cluster/test_cluster.sh"
echo " Stop cluster: ./scripts/cluster/stop_cluster.sh"
echo " Check logs: tail -f $LOG_DIR/node1.log"
echo ""
}

main() {
echo ""
echo "=========================================="
echo " Starting Kiwi 3-Node Raft Cluster"
echo "=========================================="
echo ""

if [ "$1" == "--clean" ]; then
cleanup_old_data
fi

create_directories
build_binary

for i in 1 2 3; do
start_node $i
done

wait_for_ports

if [ "$1" == "--init" ] || [ "$1" == "--clean" ]; then
init_cluster
fi

check_leader
print_status
}

main "$@"
66 changes: 66 additions & 0 deletions scripts/cluster/stop_cluster.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
#!/bin/bash
#
# Kiwi Raft Cluster Stop Script
#

SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
PID_DIR="$PROJECT_ROOT/.pids"

# Colors
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m'

log_info() { echo -e "${BLUE}[INFO]${NC} $1"; }
log_success() { echo -e "${GREEN}[SUCCESS]${NC} $1"; }
log_warn() { echo -e "${YELLOW}[WARN]${NC} $1"; }

stop_node() {
local node_id=$1
local pid_file="$PID_DIR/node${node_id}.pid"

if [ -f "$pid_file" ]; then
local pid=$(cat "$pid_file")
if kill -0 $pid 2>/dev/null; then
log_info "Stopping Node $node_id (PID: $pid)..."
kill $pid 2>/dev/null || true

# Wait for graceful shutdown
for i in {1..10}; do
if ! kill -0 $pid 2>/dev/null; then
break
fi
sleep 1
done

# Force kill if still running
if kill -0 $pid 2>/dev/null; then
log_warn "Force killing Node $node_id..."
kill -9 $pid 2>/dev/null || true
fi
Comment on lines +26 to +43
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Verify the file exists and read its full content
if [ -f "scripts/cluster/stop_cluster.sh" ]; then
    wc -l "scripts/cluster/stop_cluster.sh"
    echo "--- Full file content ---"
    cat -n "scripts/cluster/stop_cluster.sh"
else
    echo "File not found: scripts/cluster/stop_cluster.sh"
fi

Repository: arana-db/kiwi

Length of output: 2234


🏁 Script executed:

# Search for where PID files are created and how they're managed
rg "pid_file|\.pid" scripts/cluster/ -A 3 -B 3

Repository: arana-db/kiwi

Length of output: 3365


🏁 Script executed:

# Look for any cleanup or validation logic related to PIDs
rg "rm.*pid|delete.*pid|stale|cleanup" scripts/cluster/ -i

Repository: arana-db/kiwi

Length of output: 369


PID reuse risk: process identity is not verified before kill.

If a stale PID file exists from a previous invocation, the PID may have been recycled to an unrelated process, causing it to be terminated. The kill -0 $pid check only verifies that a process with that PID exists, not that it is the correct kiwi node process.

The script should verify process identity before sending kill signals. Additionally, all $pid variables lack proper quoting.

Proposed fix
-        local pid=$(cat "$pid_file")
-        if kill -0 $pid 2>/dev/null; then
+        local pid
+        pid=$(cat "$pid_file")
+        local cmdline
+        cmdline=$(ps -p "$pid" -o args= 2>/dev/null || true)
+        if [[ "$cmdline" != *"kiwi"* ]]; then
+            log_warn "PID $pid is not a kiwi process; skipping"
+            rm -f "$pid_file"
+            return
+        fi
+        if kill -0 "$pid" 2>/dev/null; then
             log_info "Stopping Node $node_id (PID: $pid)..."
-            kill $pid 2>/dev/null || true
+            kill "$pid" 2>/dev/null || true
@@
-                if ! kill -0 $pid 2>/dev/null; then
+                if ! kill -0 "$pid" 2>/dev/null; then
@@
-            if kill -0 $pid 2>/dev/null; then
+            if kill -0 "$pid" 2>/dev/null; then
                 log_warn "Force killing Node $node_id..."
-                kill -9 $pid 2>/dev/null || true
+                kill -9 "$pid" 2>/dev/null || true
             fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
local pid=$(cat "$pid_file")
if kill -0 $pid 2>/dev/null; then
log_info "Stopping Node $node_id (PID: $pid)..."
kill $pid 2>/dev/null || true
# Wait for graceful shutdown
for i in {1..10}; do
if ! kill -0 $pid 2>/dev/null; then
break
fi
sleep 1
done
# Force kill if still running
if kill -0 $pid 2>/dev/null; then
log_warn "Force killing Node $node_id..."
kill -9 $pid 2>/dev/null || true
fi
local pid
pid=$(cat "$pid_file")
local cmdline
cmdline=$(ps -p "$pid" -o args= 2>/dev/null || true)
if [[ "$cmdline" != *"kiwi"* ]]; then
log_warn "PID $pid is not a kiwi process; skipping"
rm -f "$pid_file"
return
fi
if kill -0 "$pid" 2>/dev/null; then
log_info "Stopping Node $node_id (PID: $pid)..."
kill "$pid" 2>/dev/null || true
# Wait for graceful shutdown
for i in {1..10}; do
if ! kill -0 "$pid" 2>/dev/null; then
break
fi
sleep 1
done
# Force kill if still running
if kill -0 "$pid" 2>/dev/null; then
log_warn "Force killing Node $node_id..."
kill -9 "$pid" 2>/dev/null || true
fi
🧰 Tools
🪛 Shellcheck (0.11.0)

[warning] 26-26: Declare and assign separately to avoid masking return values.

(SC2155)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/cluster/stop_cluster.sh` around lines 26 - 43, The PID check in the
stop logic (uses pid_file, log_info, log_warn) only verifies existence and
doesn’t confirm the process is the expected kiwi node; update the shutdown
sequence to first read and quote "$pid", then verify identity by inspecting
/proc/"$pid"/cmdline or using ps -p "$pid" -o comm= (or full args) and match
against the expected process name or pattern for the kiwi node before sending
signals; only proceed with kill/kill -9 if the command-line check matches, and
ensure every occurrence of $pid is quoted (e.g., "$pid") to avoid
word-splitting.


log_success "Node $node_id stopped"
else
log_warn "Node $node_id (PID: $pid) is not running"
fi
rm -f "$pid_file"
else
log_warn "No PID file for Node $node_id"
fi
}

echo ""
echo "=========================================="
echo " Stopping Kiwi Raft Cluster"
echo "=========================================="
echo ""

for i in 1 2 3; do
stop_node $i
done

log_success "All nodes stopped"
echo ""
Loading
Loading