v1.1 (#66)

Picovoice · Feb 25, 2025 · 42be884 · 42be884
1 parent 6cacd0e
commit 42be884
Show file tree

Hide file tree

Showing 117 changed files with 715 additions and 391 deletions.
diff --git a/.github/workflows/nodejs-perf.yml b/.github/workflows/nodejs-perf.yml
@@ -63,7 +63,7 @@ jobs:
           - machine: rpi3-32
             proc_performance_threshold_sec: 2.9
           - machine: rpi3-64
-            proc_performance_threshold_sec: 2.1
+            proc_performance_threshold_sec: 2.5
           - machine: rpi4-32
             proc_performance_threshold_sec: 1.3
           - machine: rpi4-64

diff --git a/.github/workflows/web-demos.yml b/.github/workflows/web-demos.yml
@@ -25,7 +25,7 @@ jobs:
 
     strategy:
       matrix:
-        node-version: [ 16.x, 18.x, 20.x ]
+        node-version: [ 18.x, 20.x, 22.x ]
 
     steps:
       - uses: actions/checkout@v3

diff --git a/.github/workflows/web.yml b/.github/workflows/web.yml
@@ -29,7 +29,7 @@ jobs:
 
     strategy:
       matrix:
-        node-version: [ 16.x, 18.x, 20.x ]
+        node-version: [ 18.x, 20.x, 22.x ]
 
     steps:
       - uses: actions/checkout@v3

diff --git a/README.md b/README.md
@@ -32,7 +32,7 @@ voice assistants. Orca is:
         - [Orca streaming text synthesis](#orca-input-and-output-streaming-synthesis)
         - [Text input](#text-input)
         - [Custom pronunciations](#custom-pronunciations)
-        - [Voices](#voices)
+        - [Language and Voice](#language-and-voice)
         - [Speech control](#speech-control)
         - [Audio output](#audio-output)
     - [AccessKey](#accesskey)
@@ -93,17 +93,11 @@ The following are examples of sentences using custom pronunciations:
 - "{read|R IY D} this as {read|R EH D}, please."
 - "I {live|L IH V} in {Sevilla|S EH V IY Y AH}. We have great {live|L AY V} sports!"
 
-### Voices
+### Language and Voice
 
-Orca can synthesize speech with various voices, each of which is characterized by a model file located
-in  [lib/common](./lib/common).
-To synthesize speech with a specific voice, provide the associated model file as an argument to the orca init function.
-The following are the voices currently available:
+Orca Streaming Text-to-Speech can synthesize speech in different languages and with a variety of voices, each of which is characterized by a model file (`.pv`) located in [lib/common](./lib/common). The language and gender of the speaker is indicated in the file name.
 
-|                        Model name                         | Sample rate (Hz) |
-|:---------------------------------------------------------:|:----------------:|
-| [orca_params_female.pv](lib/common/orca_params_female.pv) |      22050       |
-|   [orca_params_male.pv](lib/common/orca_params_male.pv)   |      22050       |
+To synthesize speech with a specific language and voice, provide the associated model file as an argument to the Orca init function.
 
 ### Speech control
 
@@ -779,7 +773,14 @@ For more details, see the [Node.js SDK](./binding/nodejs/).
 
 ## Releases
 
-### v1.0.0 - Aug 20th, 2024
+### v1.1.0 - February 24th, 2025
+
+- Added support for Spanish voices
+- Improved English voices
+- Added .NET SDK
+- Improved text normalization
+
+### v1.0.0 - August 20th, 2024
 
 - Improved voice quality
 - Significantly reduced latency in streaming synthesis

diff --git a/binding/android/Orca/orca/build.gradle b/binding/android/Orca/orca/build.gradle
@@ -2,7 +2,7 @@ apply plugin: 'com.android.library'
 
 ext {
     PUBLISH_GROUP_ID = 'ai.picovoice'
-    PUBLISH_VERSION = '1.0.0'
+    PUBLISH_VERSION = '1.1.0'
     PUBLISH_ARTIFACT_ID = 'orca-android'
 }
 

diff --git a/binding/android/OrcaTestApp/orca-test-app/build.gradle b/binding/android/OrcaTestApp/orca-test-app/build.gradle
@@ -71,8 +71,8 @@ android {
 
     tasks.register('copyParams', Copy) {
         from("$projectDir/../../../../lib/common/")
-        include("orca_params_female.pv")
-        include("orca_params_male.pv")
+        include("orca_params_en_female.pv")
+        include("orca_params_en_male.pv")
         into("$projectDir/src/main/assets/models")
     }
 
@@ -113,7 +113,7 @@ dependencies {
     implementation 'androidx.constraintlayout:constraintlayout:2.1.4'
     implementation 'com.google.code.gson:gson:2.10'
     implementation 'com.google.errorprone:error_prone_annotations:2.36.0'
-    implementation 'ai.picovoice:orca-android:1.0.0'
+    implementation 'ai.picovoice:orca-android:1.1.0'
 
     // Espresso UI Testing
     androidTestImplementation 'androidx.test.ext:junit:1.1.5'

diff --git a/...TestApp/orca-test-app/src/androidTest/java/ai/picovoice/orca/testapp/PerformanceTest.java b/...TestApp/orca-test-app/src/androidTest/java/ai/picovoice/orca/testapp/PerformanceTest.java
@@ -38,6 +38,9 @@ public class PerformanceTest extends BaseTest {
     @Parameterized.Parameter(value = 0)
     public String modelFile;
 
+    @Parameterized.Parameter(value = 1)
+    public String procSentence;
+
     @Parameterized.Parameters(name = "{0}")
     public static Collection<Object[]> initParameters() throws IOException {
         String testDataJsonString = getTestDataString();
@@ -48,10 +51,12 @@ public static Collection<Object[]> initParameters() throws IOException {
         final JsonArray testCases = testDataJson.getAsJsonObject("tests").get("sentence_tests").getAsJsonArray();
         JsonObject testCase = testCases.get(0).getAsJsonObject();
 
+        String text = testCase.get("text").getAsString();
+
         List<Object[]> parameters = new ArrayList<>();
         for (JsonElement modelJson : testCase.get("models").getAsJsonArray()) {
             String model = modelJson.getAsString();
-            parameters.add(new Object[]{model});
+            parameters.add(new Object[]{model, text});
         }
         return parameters;
     }
@@ -76,13 +81,9 @@ public void testProcPerformance() throws Exception {
         Assume.assumeFalse(procThresholdString.equals(""));
 
         final double procPerformanceThresholdSec = Double.parseDouble(procThresholdString);
-        final String procSentence = testJson
-                .getAsJsonObject("test_sentences")
-                .get("text")
-                .getAsString();
         final Orca orca = new Orca.Builder()
                 .setAccessKey(accessKey)
-                .setModelPath(modelFile)
+                .setModelPath(getModelFilepath(modelFile))
                 .build(appContext);
 
         long totalNSec = 0;

diff --git a/...droid/OrcaTestApp/orca-test-app/src/main/java/ai/picovoice/orca/testapp/MainActivity.java b/...droid/OrcaTestApp/orca-test-app/src/main/java/ai/picovoice/orca/testapp/MainActivity.java
@@ -1,5 +1,5 @@
 /*
-    Copyright 2024 Picovoice Inc.
+    Copyright 2024-2025 Picovoice Inc.
 
     You may not use this file except in compliance with the license. A copy of the license is
     located in the "LICENSE" file accompanying this source.
@@ -61,7 +61,7 @@ public void runTest() {
 
         ArrayList<TestResult> results = new ArrayList<>();
 
-        final String modelFile = "models/orca_params_female.pv";
+        final String modelFile = "models/orca_params_en_female.pv";
 
         TestResult result = new TestResult();
         result.testName = "Test Init";

diff --git a/binding/android/README.md b/binding/android/README.md
@@ -137,10 +137,11 @@ The pronunciation is expressed in [ARPAbet](https://en.wikipedia.org/wiki/ARPABE
 - "{read|R IY D} this as {read|R EH D}, please."
 - "I {live|L IH V} in {Sevilla|S EH V IY Y AH}. We have great {live|L AY V} sports!"
 
-### Voices
+### Language and Voice
 
-Orca can synthesize speech with various voices, each of which is characterized by a model file located
-in [lib/common](../../lib/common).
+Orca Streaming Text-to-Speech can synthesize speech in different languages and with a variety of voices, 
+each of which is characterized by a model file (`.pv`) located in [lib/common](../../lib/common). 
+The language and gender of the speaker is indicated in the file name.
 
 To add the Orca model file to your Android application:
 

diff --git a/binding/dotnet/Orca/Orca.cs b/binding/dotnet/Orca/Orca.cs
@@ -404,6 +404,7 @@ public short[] Synthesize(string text)
                     pv_orca_pcm_delete(cPcm);
 
                 }
+
                 return pcm;
             }
 
@@ -437,9 +438,14 @@ public short[] Flush()
                     HandlePvStatus(status, "Orca stream flush failed");
                 }
 
-                short[] pcm = new short[numSamples];
-                Marshal.Copy(cPcm, pcm, 0, numSamples);
-                pv_orca_pcm_delete(cPcm);
+                short[] pcm = null;
+                if (numSamples > 0)
+                {
+                    pcm = new short[numSamples];
+                    Marshal.Copy(cPcm, pcm, 0, numSamples);
+                    pv_orca_pcm_delete(cPcm);
+
+                }
 
                 return pcm;
             }

diff --git a/binding/dotnet/Orca/Orca.csproj b/binding/dotnet/Orca/Orca.csproj
@@ -1,7 +1,7 @@
 <Project Sdk="Microsoft.NET.Sdk">
     <PropertyGroup>
         <TargetFrameworks>net8.0;net6.0;netcoreapp3.0;netstandard2.0</TargetFrameworks>
-        <Version>1.0.0</Version>
+        <Version>1.1.0</Version>
         <Authors>Picovoice</Authors>
         <Company />
         <Product>Orca Streaming Text-to-Speech Engine</Product>
@@ -100,12 +100,12 @@
         </Content>
     </ItemGroup>
     <ItemGroup>
-        <Content Include="..\..\..\lib\common\orca_params_female.pv">
+        <Content Include="..\..\..\lib\common\orca_params_en_female.pv">
             <PackagePath>
-                buildTransitive/common/orca_params_female.pv;
+                buildTransitive/common/orca_params_en_female.pv;
             </PackagePath>
             <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
-            <Link>lib\common\orca_params_female.pv</Link>
+            <Link>lib\common\orca_params_en_female.pv</Link>
             <Visible>false</Visible>
         </Content>
     </ItemGroup>

diff --git a/binding/dotnet/Orca/Picovoice.Orca.netstandard2.0.targets b/binding/dotnet/Orca/Picovoice.Orca.netstandard2.0.targets
@@ -21,9 +21,9 @@
 			<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
 			<Visible>false</Visible>
 		</Content>
-		<Content Include="$(MSBuildThisFileDirectory)/../common/orca_params_female.pv">
-			<Link>lib/common/orca_params_female.pv</Link>
-			<PackagePath>content/picovoice/common/orca_params_female.pv</PackagePath>
+		<Content Include="$(MSBuildThisFileDirectory)/../common/orca_params_en_female.pv">
+			<Link>lib/common/orca_params_en_female.pv</Link>
+			<PackagePath>content/picovoice/common/orca_params_en_female.pv</PackagePath>
 			<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
 			<Visible>false</Visible>
 		</Content>

diff --git a/binding/dotnet/Orca/Picovoice.Orca.targets b/binding/dotnet/Orca/Picovoice.Orca.targets
@@ -6,9 +6,9 @@
 			<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
 			<Visible>false</Visible>
 		</Content>
-		<Content Include="$(MSBuildThisFileDirectory)/../common/orca_params_female.pv">
-			<Link>lib/common/orca_params_female.pv</Link>
-            <PackagePath>content/picovoice/common/orca_params_female.pv</PackagePath>
+		<Content Include="$(MSBuildThisFileDirectory)/../common/orca_params_en_female.pv">
+			<Link>lib/common/orca_params_en_female.pv</Link>
+            <PackagePath>content/picovoice/common/orca_params_en_female.pv</PackagePath>
 			<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
 			<Visible>false</Visible>
 		</Content>

diff --git a/binding/dotnet/Orca/Utils.cs b/binding/dotnet/Orca/Utils.cs
@@ -136,7 +136,7 @@ private static string GetCpuPart()
 
         public static string PvModelPath()
         {
-            return Path.Combine(AppContext.BaseDirectory, "lib/common/orca_params_female.pv");
+            return Path.Combine(AppContext.BaseDirectory, "lib/common/orca_params_en_female.pv");
         }
     }
 }
diff --git a/binding/dotnet/README.md b/binding/dotnet/README.md
@@ -161,10 +161,12 @@ The pronunciation is expressed in [ARPAbet](https://en.wikipedia.org/wiki/ARPABE
 - "{read|R IY D} this as {read|R EH D}, please."
 - "I {live|L IH V} in {Sevilla|S EH V IY Y AH}. We have great {live|L AY V} sports!"
 
-### Voices
+### Language and Voice
+
+Orca Streaming Text-to-Speech can synthesize speech in different languages and with a variety of voices,
+each of which is characterized by a model file (`.pv`) located in [lib/common](../../lib/common).
+The language and gender of the speaker is indicated in the file name.
 
-Orca can synthesize speech with various voices, each of which is characterized by a model file located
-in [lib/common](https://github.com/Picovoice/orca/tree/main/lib/common).
 To create an instance of the engine with a specific voice, use:
 
 ```csharp

diff --git a/binding/ios/Orca-iOS.podspec b/binding/ios/Orca-iOS.podspec
@@ -1,7 +1,7 @@
 Pod::Spec.new do |s|
     s.name = 'Orca-iOS'
     s.module_name = 'Orca'
-    s.version = '1.0.1'
+    s.version = '1.1.0'
     s.license = {:type => 'Apache 2.0'}
     s.summary = 'iOS binding for Picovoice\'s Orca Text-to-Speech Engine.'
     s.description =
@@ -11,7 +11,7 @@ Pod::Spec.new do |s|
     Orca is an on-device text-to-speech engine producing high-quality, realistic, spoken audio with zero latency. Orca is:
       - Private; All voice processing runs locally.
       - Cross-Platform:
-        - Linux (x86_64), macOS (x86_64, arm64), Windows (x86_64)
+        - Linux (x86_64), macOS (x86_64, arm64), Windows (x86_64, arm64)
         - Android and iOS
         - Chrome, Safari, Firefox, and Edge
         - Raspberry Pi (3, 4, 5)

diff --git a/binding/ios/Orca.swift b/binding/ios/Orca.swift
@@ -235,7 +235,7 @@ public class Orca {
         }
 
         var cNumCharacters: Int32 = 0
-        var cCharacters: UnsafeMutablePointer<UnsafePointer<Int8>?>?
+        var cCharacters: UnsafePointer<UnsafePointer<Int8>?>?
         let validCharactersStatus = pv_orca_valid_characters(handle, &cNumCharacters, &cCharacters)
         if validCharactersStatus != PV_STATUS_SUCCESS {
             let messageStack = try getMessageStack()

diff --git a/binding/ios/OrcaAppTest/OrcaAppTest.xcodeproj/project.pbxproj b/binding/ios/OrcaAppTest/OrcaAppTest.xcodeproj/project.pbxproj
@@ -236,7 +236,7 @@
 			);
 			mainGroup = 1E00643F27CEDF9B006FF6E9;
 			packageReferences = (
-				E1F352FD2D00ECD60069B0E6 /* XCRemoteSwiftPackageReference "orca" */,
+				07F723562D6D57E90002D88F /* XCRemoteSwiftPackageReference "orca" */,
 			);
 			productRefGroup = 1E00644927CEDF9B006FF6E9 /* Products */;
 			projectDirPath = "";
@@ -655,12 +655,12 @@
 /* End XCLocalSwiftPackageReference section */
 
 /* Begin XCRemoteSwiftPackageReference section */
-		E1F352FD2D00ECD60069B0E6 /* XCRemoteSwiftPackageReference "orca" */ = {
+		07F723562D6D57E90002D88F /* XCRemoteSwiftPackageReference "orca" */ = {
 			isa = XCRemoteSwiftPackageReference;
 			repositoryURL = "https://github.com/Picovoice/orca";
 			requirement = {
 				kind = exactVersion;
-				version = 1.0.1;
+				version = 1.1.0;
 			};
 		};
 /* End XCRemoteSwiftPackageReference section */

diff --git a/binding/ios/OrcaAppTest/OrcaAppTestUITests/OrcaAppTestUITests.swift b/binding/ios/OrcaAppTest/OrcaAppTestUITests/OrcaAppTestUITests.swift
@@ -412,7 +412,7 @@ class OrcaAppTestUITests: BaseTest {
     func testMessageStack() throws {
         let bundle = Bundle(for: type(of: self))
         let modelPath: String = bundle.path(
-                forResource: "orca_params_female",
+                forResource: "orca_params_en_female",
                 ofType: "pv",
                 inDirectory: "test_resources/model_files")!
 
@@ -436,7 +436,7 @@ class OrcaAppTestUITests: BaseTest {
     func testSynthesizeMessageStack() throws {
         let bundle = Bundle(for: type(of: self))
         let modelPath: String = bundle.path(
-                forResource: "orca_params_female",
+                forResource: "orca_params_en_female",
                 ofType: "pv",
                 inDirectory: "test_resources/model_files")!
 

diff --git a/binding/ios/README.md b/binding/ios/README.md
@@ -130,11 +130,13 @@ The pronunciation is expressed in [ARPAbet](https://en.wikipedia.org/wiki/ARPABE
 - "{read|R IY D} this as {read|R EH D}, please."
 - "I {live|L IH V} in {Sevilla|S EH V IY Y AH}. We have great {live|L AY V} sports!"
 
-### Voices
+### Language and Voice
 
-Orca can synthesize speech with various voices, each of which is characterized by a model file located
-in [lib/common](https://github.com/Picovoice/orca/tree/main/lib/common).
-To create an instance of the engine with a specific voice, use:
+Orca Streaming Text-to-Speech can synthesize speech in different languages and with a variety of voices, 
+each of which is characterized by a model file (`.pv`) located in [lib/common](../../lib/common). 
+The language and gender of the speaker is indicated in the file name.
+
+To create an instance of the engine with a specific language and voice, use:
 
 ```swift
 import Orca
@@ -146,7 +148,7 @@ do {
 } catch { }
 ```
 
-and replace `${MODEL_FILE_PATH}` or `${MODEL_FILE_URL}` with the path to the model file with the desired voice.
+and replace `${MODEL_FILE_PATH}` or `${MODEL_FILE_URL}` with the path to the model file with the desired language/voice.
 
 ### Speech control
 

diff --git a/binding/nodejs/README.md b/binding/nodejs/README.md
@@ -105,7 +105,7 @@ orca.release()
 
 ### Text input
 
-Orca supports a wide range of English characters, including letters, numbers, symbols, and punctuation marks. 
+Orca supports a wide range of English characters, including letters, numbers, symbols, and punctuation marks.
 You can get a list of all supported characters by calling `validCharacters()`.
 Pronunciations of characters or words not supported by this list can be achieved with
 [custom pronunciations](#custom-pronunciations).
@@ -119,10 +119,12 @@ The pronunciation is expressed in [ARPAbet](https://en.wikipedia.org/wiki/ARPABE
 - "{read|R IY D} this as {read|R EH D}, please."
 - "I {live|L IH V} in {Sevilla|S EH V IY Y AH}. We have great {live|L AY V} sports!"
 
-### Voices
+### Language and Voice
+
+Orca Streaming Text-to-Speech can synthesize speech in different languages and with a variety of voices,
+each of which is characterized by a model file (`.pv`) located in [lib/common](../../lib/common).
+The language and gender of the speaker is indicated in the file name.
 
-Orca can synthesize speech with various voices, each of which is characterized by a model file located
-in [lib/common](https://github.com/Picovoice/orca/tree/main/lib/common).
 To create an instance of the engine with a specific voice, use:
 
 ```typescript