You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: cerebrium/getting-started/introduction.mdx
+9-3Lines changed: 9 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -50,7 +50,11 @@ We can then run this function in the cloud and pass it a prompt.
50
50
cerebrium run main.py::run --prompt "Hello World!"
51
51
```
52
52
53
-
Your should see logs that output the prompt you sent in - this is running in the cloud! Let us now turn this into a scalable REST endpoint.
53
+
Your should see logs that output the prompt you sent in - this is running in the cloud!
54
+
55
+
Use the `run` functionality for quick code iteration and testing snippets or once-off scripts that require large CPU/GPU in the cloud.
56
+
57
+
Let us now turn this into a scalable REST endpoint - something we could put in production!
54
58
55
59
### 4. Deploy your app
56
60
@@ -60,11 +64,13 @@ Run the following command:
60
64
cerebrium deploy
61
65
```
62
66
63
-
This will turn the function into a callable endpoint that accepts json parameters (prompt) and can scale to 1000s of requests automatically!
67
+
This will turn the function into a callable persistent [endpoint](/cerebrium/endpoints/inference-api). that accepts json parameters (prompt) and can scale to 1000s of requests automatically!
64
68
65
69
Once deployed, an app becomes callable through a POST endpoint `https://api.aws.us-east-1.cerebrium.ai/v4/{project-id}/{app-name}/{function-name}` and takes a json parameter, prompt
66
70
67
-
Great! You made it! Join our Community [Discord](https://discord.gg/ATj6USmeE2) for support and updates.
71
+
Great! You made it!
72
+
73
+
Join our Community [Discord](https://discord.gg/ATj6USmeE2) for support and updates.
Copy file name to clipboardExpand all lines: cerebrium/scaling/graceful-termination.mdx
+17-14Lines changed: 17 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,7 +15,7 @@ When Cerebrium needs to terminate an contanier, we do the following:
15
15
16
16
1. Stop routing new requests to the container.
17
17
2. Send a SIGTERM signal to your container.
18
-
3. Waits for `response_grace_period` seconds to elaspse.
18
+
3. Waits for `response_grace_period` seconds to elaspse.
19
19
4. Sends SIGKILL if the container hasn't stopped
20
20
21
21
Below is a chart that shows it more eloquently:
@@ -24,30 +24,29 @@ Below is a chart that shows it more eloquently:
24
24
flowchart TD
25
25
A[SIGTERM sent] --> B[Cortex]
26
26
A --> C[Custom Runtime]
27
-
27
+
28
28
B --> D[automatically captured]
29
29
C --> E[User needs to capture]
30
-
30
+
31
31
D --> F[request finishes]
32
32
D --> G[response_grace_period reached]
33
-
33
+
34
34
E --> H[User logic]
35
-
35
+
36
36
F --> I[Graceful termination]
37
37
G --> J[SIGKILL]
38
-
38
+
39
39
H --> O[Graceful termination]
40
40
H --> G[response_grace_period reached]
41
-
41
+
42
42
J --> K[Gateway Timeout Error]
43
43
```
44
44
45
45
If you do not handle SIGTERM in the custom runtime, Cerebrium terminates containers immediately after sending `SIGTERM`, which can interrupt in-flight requests and cause **502 errors**.
46
46
47
-
48
47
## Example: FastAPI Implementation
49
48
50
-
For custom runtimes using FastAPI, implement the [`lifespan` pattern](https://fastapi.tiangolo.com/advanced/events/) to respond to SIGTERM.
49
+
For custom runtimes using FastAPI, implement the [`lifespan` pattern](https://fastapi.tiangolo.com/advanced/events/) to respond to SIGTERM.
51
50
52
51
The code below tracks active requests using a counter and prevents new requests during shutdown. When SIGTERM is received, it sets a shutdown flag and waits for all active requests to complete before the application terminates.
@@ -111,5 +113,6 @@ exec fastapi run app.py --port ${PORT:-8000}
111
113
Without exec, SIGTERM is sent to the bash script (PID 1) instead of FastAPI, so your shutdown code never runs and Cerebrium force-kills the container after the grace period.
112
114
113
115
<Tip>
114
-
Test SIGTERM handling locally before deploying: start your app, send SIGTERM with `Ctrl+C`, and verify you see graceful shutdown logs.
115
-
</Tip>
116
+
Test SIGTERM handling locally before deploying: start your app, send SIGTERM
117
+
with `Ctrl+C`, and verify you see graceful shutdown logs.
Copy file name to clipboardExpand all lines: cerebrium/scaling/scaling-apps.mdx
+7-1Lines changed: 7 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -79,7 +79,13 @@ During normal replica operation, this simply corresponds to a request timeout va
79
79
waits for the specified grace period, issues a SIGKILL command if the instance has not stopped, and kills any active requests with a GatewayTimeout error.
80
80
81
81
<Note>
82
-
When using the Cortex runtime (default), SIGTERM signals are automatically handled to allow graceful termination of requests. For custom runtimes, you'll need to implement SIGTERM handling yourself to ensure requests complete gracefully before termination. See our [Graceful Termination guide](/cerebrium/scaling/graceful-termination) for detailed implementation examples, including FastAPI patterns for tracking and completing in-flight requests during shutdown.
82
+
When using the Cortex runtime (default), SIGTERM signals are automatically
83
+
handled to allow graceful termination of requests. For custom runtimes, you'll
84
+
need to implement SIGTERM handling yourself to ensure requests complete
85
+
gracefully before termination. See our [Graceful Termination
86
+
guide](/cerebrium/scaling/graceful-termination) for detailed implementation
87
+
examples, including FastAPI patterns for tracking and completing in-flight
88
+
requests during shutdown.
83
89
</Note>
84
90
85
91
Performance metrics available through the dashboard help monitor scaling behavior:
0 commit comments