|
| 1 | +--- |
| 2 | +title: AssemblyAI |
| 3 | +description: Deploy AssemblyAI speech-to-text services on Cerebrium |
| 4 | +--- |
| 5 | + |
| 6 | +<Note> |
| 7 | + AssemblyAI Partner Service is available from CLI version 1.51.0 and greater |
| 8 | +</Note> |
| 9 | + |
| 10 | +Cerebrium's partnership with [AssemblyAI](https://www.assemblyai.com/) helps teams deliver speech-to-text (STT) services with efficient deployment, minimized latency, and region selection for data privacy compliance needs. |
| 11 | + |
| 12 | +## Setup |
| 13 | + |
| 14 | +1. Contact the AssemblyAI team in order to get a self-hosted Contract. You can contact them [here ]([email protected]) |
| 15 | + |
| 16 | +2. Create a simple cerebrium app with the CLI: |
| 17 | + |
| 18 | +```bash |
| 19 | +cerebrium init assembly-ai |
| 20 | +``` |
| 21 | + |
| 22 | +3. AssemblyAI services use a simplified TOML configuration with the `[cerebrium.runtime.assemblyai]` section. Create a `cerebrium.toml` file with the following: |
| 23 | + |
| 24 | +```toml |
| 25 | +[cerebrium.deployment] |
| 26 | +name = "assembly-ai" |
| 27 | +disable_auth = true |
| 28 | + |
| 29 | +[cerebrium.runtime.assemblyai] |
| 30 | +port = 8080 |
| 31 | +model_name = "english" |
| 32 | + |
| 33 | +[cerebrium.hardware] |
| 34 | +cpu = 4 |
| 35 | +memory = 16 |
| 36 | +compute = "AMPERE_A10" |
| 37 | +gpu_count = 1 |
| 38 | +region = "us-east-1" |
| 39 | + |
| 40 | +[cerebrium.scaling] |
| 41 | +min_replicas = 1 |
| 42 | +max_replicas = 3 |
| 43 | +cooldown = 120 |
| 44 | +replica_concurrency = 32 |
| 45 | +scaling_metric = "concurrency_utilization" |
| 46 | +scaling_target = 70 |
| 47 | +``` |
| 48 | + |
| 49 | +<Note> |
| 50 | + The above disables auth meaning anyone can make requests to your endpoint. If you set disable_auth=false, then |
| 51 | + you need to use the API key from your Cerebrium Dashboard. |
| 52 | + key |
| 53 | +</Note> |
| 54 | + |
| 55 | +4. Run `cerebrium deploy` to deploy the AssemblyAI service - the output of which should appear as follows: |
| 56 | + |
| 57 | +``` |
| 58 | +App Dashboard: https://dashboard.cerebrium.ai/projects/p-xxxxxxxx/apps/p-xxxxxxxx-assembly-ai |
| 59 | +``` |
| 60 | + |
| 61 | +5. Use the Deployment url from the output to send requests to the <b>WS</b> AssemblyAI service. We can use their [example repo](https://github.com/AssemblyAI/streaming-self-hosting-stack) to test its working |
| 62 | + |
| 63 | +``` |
| 64 | +1. git clone https://github.com/AssemblyAI/streaming-self-hosting-stack.git |
| 65 | +2. cd streaming_example && python example_with_prerecorded_audio_file.py --audio-file example_audio_file.wav --endpoint wss://api.aws.us-east-1.cerebrium.ai/v4/p-xxxxxx/assembly-ai --language english |
| 66 | +``` |
| 67 | +You should then see the following output: |
| 68 | +``` |
| 69 | +0:00:01.040000-0:00:01.200000, end-of-turn: False: it's true |
| 70 | +0:00:01.040000-0:00:01.280000, end-of-turn: False: it's true that |
| 71 | +0:00:01.040000-0:00:01.600000, end-of-turn: False: it's true that assem |
| 72 | +0:00:01.040000-0:00:01.680000, end-of-turn: False: it's true that assembly |
| 73 | +0:00:01.040000-0:00:02.080000, end-of-turn: False: it's true that assembly a |
| 74 | +0:00:01.040000-0:00:02.160000, end-of-turn: False: it's true that assembly ai |
| 75 | +0:00:01.040000-0:00:02.320000, end-of-turn: False: it's true that assembly ai lets |
| 76 | +0:00:01.040000-0:00:02.400000, end-of-turn: False: it's true that assembly ai lets you |
| 77 | +0:00:01.040000-0:00:02.560000, end-of-turn: False: it's true that assembly ai lets you build |
| 78 | +``` |
| 79 | + |
| 80 | + |
| 81 | +## Scaling and Concurrency |
| 82 | + |
| 83 | +AssemblyAI services support independent scaling configurations: |
| 84 | + |
| 85 | +- **min_replicas**: Minimum instances to maintain (0 for scale-to-zero). Recommended: 1. |
| 86 | +- **max_replicas**: Maximum instances during high load. |
| 87 | +- **replica_concurrency**: Concurrent requests per instance. Recommended: 3. |
| 88 | +- **cooldown**: Seconds an instance remains active after last request. Recommended: 32. |
| 89 | +- **compute**: Instance type. Recommended: `AMPERE_A10`. |
| 90 | + |
| 91 | +Adjust these parameters based on traffic patterns and latency requirements. Best would be to consult the Rime team |
| 92 | +about concurrency and scalability |
| 93 | + |
| 94 | +For further documentation on AssemblyAI, see the [AssemblyAI documentation](https://www.assemblyai.com/docs/deployment/self-hosted-streaming#getting-the-latest-instructions). |
0 commit comments