This project combines speech recognition, AI-based text processing, and text-to-speech capabilities to create a pipeline for converting spoken input into processed speech output.
-
Clone the repository to your local machine.
-
Install the required dependencies with the command
pip install -r requirements.txt
. -
Create a
.env
file in the project root directory with the following contents:DG_API_KEY=your_deepgram_api_key OPENAI_API_KEY=your_openai_api_key
Note: The project requires Python 3.10 or higher. For pyaudio, you may need to install the portaudio library. Instructions can be found here.
- Import the
DOA
class frommain.py
:
from main import DOA
import asyncio
doa = DOA('summarise this text')
# speech to text
def stt():
result = doa.deepgram_util.start_transcription()
print(result)
# get summary ( async fn )
async def summary():
ip = "I was going to the beach where i encounteres sally shelling sea shells by the sea shore, I bought 2 shells for 2 dollars eacha nd got 2 crowns"
result = await doa.openai_util.Action(ip)
# text to speech
def tts():
ip = "I was going to the beach where i encounteres sally shelling sea shells by the sea shore, I bought 2 shells for 2 dollars each and got 2 crowns"
result = doa.openai_util.stream_audio(ip)
print(result)
def main():
## cumulative fn that will run all 3
asyncio.run(doa.start())
if __name__ == "__main__":
main()
- The user has valid API keys for both DeepGram and OpenAI services.
- The system has a working microphone for speech input.
- The system has audio output capabilities for text-to-speech playback.
- The user has a stable internet connection for API calls.
- API Rate Limits:
- Both DeepGram and OpenAI have rate limits. Excessive usage may lead to temporary service interruptions.
- Microphone Access:
- The program may fail if it cannot access the system's microphone or if the microphone is not working properly.
- Audio Playback:
- Issues with the system's audio output could prevent the text-to-speech functionality from working correctly.
- Network Connectivity:
- Poor internet connection may cause delays or failures in API calls.
- Environment Variables:
- If the
.env
file is not set up correctly or API keys are invalid, the program will fail to authenticate with the services.
- If the
- Dependency Conflicts:
- Ensure all dependencies are installed and compatible with your Python version.
- Asynchronous Execution:
- Improper handling of asynchronous functions may lead to unexpected behavior or errors.
- Language Support:
- The current setup is optimized for English. Using other languages may require adjustments to the DeepGram and OpenAI API calls.
- Resource Usage:
- Continuous use of speech recognition and audio streaming may consume significant system resources and battery life on portable devices.
- Error Handling:
- While basic error handling is implemented, some edge cases may not be fully covered.
- Working on deploying this to PyPi and adding more features
- A more performant solution to stream chunk by chunk is still under developement and can be found in
utils/wip/ou2.py
- The project is still in beta and may have some issues. Please report any bugs or suggestions for improvement.
- The project is for educational purposes only and should not be used in critical applications without proper testing and validation.