|
| 1 | +# Provider Error Tracking |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +Conduit automatically tracks provider API errors and can disable API keys that consistently fail. This system helps maintain service reliability by detecting and isolating problematic credentials before they impact your users. |
| 6 | + |
| 7 | +## How It Works |
| 8 | + |
| 9 | +When API requests fail, Conduit: |
| 10 | +1. Classifies the error type based on HTTP status code |
| 11 | +2. Stores the error in Redis for tracking |
| 12 | +3. Evaluates whether the key should be disabled |
| 13 | +4. Publishes events for real-time dashboard updates |
| 14 | + |
| 15 | +## Error Types |
| 16 | + |
| 17 | +### Fatal Errors (Auto-Disable Keys) |
| 18 | + |
| 19 | +These errors indicate fundamental issues that require intervention: |
| 20 | + |
| 21 | +| Error Type | HTTP Status | Description | Disable Policy | |
| 22 | +|------------|-------------|-------------|----------------| |
| 23 | +| Invalid API Key | 401 | API key is invalid, revoked, or malformed | **Immediate** - disabled on first occurrence | |
| 24 | +| Insufficient Balance | 402 | Account has no credits or quota exhausted | 2 occurrences within 5 minutes | |
| 25 | +| Access Forbidden | 403 | Account lacks permission (not balance-related) | 3 occurrences within 10 minutes | |
| 26 | + |
| 27 | +### Warning Errors (Tracked, No Auto-Disable) |
| 28 | + |
| 29 | +These errors are typically transient and don't disable keys: |
| 30 | + |
| 31 | +| Error Type | HTTP Status | Description | Alert Threshold | |
| 32 | +|------------|-------------|-------------|-----------------| |
| 33 | +| Rate Limit Exceeded | 429 | Too many requests to provider | 10 in 5 minutes | |
| 34 | +| Model Not Found | 404 | Requested model doesn't exist | Not tracked | |
| 35 | +| Service Unavailable | 503 | Provider experiencing issues | 5 in 10 minutes | |
| 36 | + |
| 37 | +### Transient Errors (Minimal Tracking) |
| 38 | + |
| 39 | +Network errors and timeouts are tracked minimally as they're usually temporary: |
| 40 | +- Network connectivity issues |
| 41 | +- Request timeouts |
| 42 | +- Unknown/unclassified errors |
| 43 | + |
| 44 | +## Managing Provider Errors |
| 45 | + |
| 46 | +### Viewing Error Dashboard |
| 47 | + |
| 48 | +Access provider errors through the Admin Panel: |
| 49 | + |
| 50 | +1. Navigate to **Providers** in the sidebar |
| 51 | +2. Look for error indicators on provider cards |
| 52 | +3. Click a provider to see detailed error information |
| 53 | + |
| 54 | +**Dashboard shows:** |
| 55 | +- Total errors in the last 24 hours |
| 56 | +- Fatal vs. warning error breakdown |
| 57 | +- Number of disabled keys |
| 58 | +- Errors grouped by provider |
| 59 | + |
| 60 | +### Viewing Recent Errors |
| 61 | + |
| 62 | +The recent errors view shows: |
| 63 | +- Which key credential caused the error |
| 64 | +- Error type and HTTP status code |
| 65 | +- Error message from the provider |
| 66 | +- Timestamp of occurrence |
| 67 | +- Whether it was a fatal or warning error |
| 68 | + |
| 69 | +### Managing Disabled Keys |
| 70 | + |
| 71 | +When a key is disabled: |
| 72 | + |
| 73 | +1. **Identify the Issue** |
| 74 | + - Check the error type (Invalid Key, Insufficient Balance, etc.) |
| 75 | + - Review the error message from the provider |
| 76 | + - Verify the key in your provider's dashboard |
| 77 | + |
| 78 | +2. **Resolve the Problem** |
| 79 | + - For **Invalid API Key**: Generate a new key or check for typos |
| 80 | + - For **Insufficient Balance**: Add credits to your provider account |
| 81 | + - For **Access Forbidden**: Check API permissions and access level |
| 82 | + |
| 83 | +3. **Re-enable the Key** |
| 84 | + - Navigate to the disabled key |
| 85 | + - Click **Clear Errors & Re-enable** |
| 86 | + - Confirm the action |
| 87 | + |
| 88 | +### Manually Disabling Keys |
| 89 | + |
| 90 | +You can manually disable a key for maintenance: |
| 91 | + |
| 92 | +1. Select the provider key |
| 93 | +2. Click **Disable Key** |
| 94 | +3. Provide a reason (for audit purposes) |
| 95 | +4. The key will stop receiving traffic immediately |
| 96 | + |
| 97 | +## Error Retention |
| 98 | + |
| 99 | +- **Fatal errors**: Persisted until manually cleared |
| 100 | +- **Warnings**: Retained for 30 days (last 100 per key) |
| 101 | +- **Recent error feed**: Last 1,000 errors across all providers |
| 102 | + |
| 103 | +## Provider-Level Disabling |
| 104 | + |
| 105 | +When all keys for a provider are disabled: |
| 106 | +- The entire provider is marked as unavailable |
| 107 | +- Requests will fail over to other providers (if configured) |
| 108 | +- Provider shows "Disabled" status in dashboard |
| 109 | + |
| 110 | +## Best Practices |
| 111 | + |
| 112 | +### Monitoring |
| 113 | + |
| 114 | +1. **Check the dashboard regularly** - Review error trends daily |
| 115 | +2. **Set up alerts** - Use webhook integrations for error notifications |
| 116 | +3. **Watch for patterns** - Sudden spikes may indicate provider issues |
| 117 | + |
| 118 | +### Multiple Keys |
| 119 | + |
| 120 | +1. **Use multiple API keys** - Distribute load and provide redundancy |
| 121 | +2. **Different accounts** - Separate keys from different billing accounts |
| 122 | +3. **Primary/Secondary** - Configure primary key with backup alternatives |
| 123 | + |
| 124 | +### Error Prevention |
| 125 | + |
| 126 | +1. **Monitor provider balance** - Keep accounts funded |
| 127 | +2. **Rotate keys periodically** - Update credentials before they expire |
| 128 | +3. **Test new keys** - Verify keys work before deploying to production |
| 129 | + |
| 130 | +## Troubleshooting |
| 131 | + |
| 132 | +### Key Won't Re-enable |
| 133 | + |
| 134 | +**Symptoms:** Clicking "Clear Errors & Re-enable" doesn't work |
| 135 | + |
| 136 | +**Solutions:** |
| 137 | +- Ensure you've actually fixed the underlying issue |
| 138 | +- Check the confirmation checkbox is selected |
| 139 | +- Verify you have admin permissions |
| 140 | +- Check browser console for errors |
| 141 | + |
| 142 | +### Errors Not Appearing |
| 143 | + |
| 144 | +**Symptoms:** Errors occur but don't show in dashboard |
| 145 | + |
| 146 | +**Solutions:** |
| 147 | +- Verify Redis is connected and healthy |
| 148 | +- Check that the Admin API is running |
| 149 | +- Ensure error tracking service is enabled |
| 150 | +- Review Admin API logs for errors |
| 151 | + |
| 152 | +### False Positive Disables |
| 153 | + |
| 154 | +**Symptoms:** Keys disabled but actually valid |
| 155 | + |
| 156 | +**Solutions:** |
| 157 | +- Check if provider had temporary outage |
| 158 | +- Review error timestamps for clustering |
| 159 | +- Consider adjusting thresholds if needed |
| 160 | +- Report patterns to Conduit team |
| 161 | + |
| 162 | +### High Warning Count |
| 163 | + |
| 164 | +**Symptoms:** Many rate limit warnings without issues |
| 165 | + |
| 166 | +**Solutions:** |
| 167 | +- This is informational, not actionable |
| 168 | +- Consider distributing load across more keys |
| 169 | +- Implement request rate limiting on your side |
| 170 | +- Contact provider for higher rate limits |
| 171 | + |
| 172 | +## API Reference |
| 173 | + |
| 174 | +### View Error Statistics |
| 175 | +```bash |
| 176 | +GET /api/provider-errors/stats?hours=24 |
| 177 | +``` |
| 178 | + |
| 179 | +### View Recent Errors |
| 180 | +```bash |
| 181 | +GET /api/provider-errors/recent?limit=100 |
| 182 | +``` |
| 183 | + |
| 184 | +### View Specific Key Errors |
| 185 | +```bash |
| 186 | +GET /api/provider-errors/keys/{keyId} |
| 187 | +``` |
| 188 | + |
| 189 | +### Clear Errors and Re-enable Key |
| 190 | +```bash |
| 191 | +POST /api/provider-errors/keys/{keyId}/clear |
| 192 | +Content-Type: application/json |
| 193 | + |
| 194 | +{ |
| 195 | + "reenableKey": true, |
| 196 | + "confirmReenable": true, |
| 197 | + "reason": "Credits added to account" |
| 198 | +} |
| 199 | +``` |
| 200 | + |
| 201 | +### Manually Disable Key |
| 202 | +```bash |
| 203 | +POST /api/provider-errors/keys/{keyId}/disable |
| 204 | +Content-Type: application/json |
| 205 | + |
| 206 | +{ |
| 207 | + "reason": "Scheduled maintenance" |
| 208 | +} |
| 209 | +``` |
| 210 | + |
| 211 | +## Related Documentation |
| 212 | + |
| 213 | +- [Provider Architecture](../architecture/provider-system/provider-architecture.md) |
| 214 | +- [Error Tracking Developer Guide](../development/error-tracking-architecture.md) |
| 215 | +- [Error Tracking Runbook](../operations/error-tracking-runbook.md) |
0 commit comments