diff --git a/.env-sample b/.env-sample new file mode 100644 index 00000000..b0f4dffb --- /dev/null +++ b/.env-sample @@ -0,0 +1,6 @@ +OPENAI_API_KEY=YOUR_OPENAI_API_KEY +GOOGLE_API_KEY=YOUR_GOOGLE_API_KEY +OLLAMA_HOST='http://localhost:11434/' +CUSTOM_SYSTEM_PROMPT='You are a helpful assistant that can perform tasks on a computer. You can execute commands, answer questions, and assist with various tasks. Your goal is to help the user efficiently and effectively.' +OPENROUTER_API_KEY='sddfsdf' +OPENROUTER_MODEL="google/gemini-2.0-flash-exp:free" \ No newline at end of file diff --git a/.github/workflows/python-publish.yml b/.github/workflows/python-publish.yml new file mode 100644 index 00000000..f25d2162 --- /dev/null +++ b/.github/workflows/python-publish.yml @@ -0,0 +1,48 @@ +name: Publish Python Package + +on: + release: + types: [published] + +permissions: + contents: read + id-token: write # Required for PyPI trusted publishing + +jobs: + publish: + runs-on: ubuntu-latest + environment: + name: pypi + url: https://pypi.org/p/self-ai-operating-computer + + steps: + - uses: actions/checkout@v4 + + - name: Set up Python + uses: actions/setup-python@v5 + with: + python-version: '3.x' + + - name: Install build tools + run: python -m pip install build setuptools wheel + + - name: Build package + run: python -m build + + - name: Check if version exists on PyPI + id: check-version + run: | + VERSION=$(python setup.py --version) + if curl -s "https://pypi.org/pypi/self-ai-operating-computer/$VERSION/json" | grep -q "Not Found"; then + echo "version_exists=false" >> $GITHUB_OUTPUT + else + echo "version_exists=true" >> $GITHUB_OUTPUT + echo "Version $VERSION already exists on PyPI. Skipping upload." + exit 1 + fi + + - name: Publish to PyPI + if: steps.check-version.outputs.version_exists == 'false' + uses: pypa/gh-action-pypi-publish@release/v1 + with: + packages-dir: dist/ diff --git a/.gitignore b/.gitignore index f23fd690..adeb97b4 100644 --- a/.gitignore +++ b/.gitignore @@ -162,5 +162,6 @@ cython_debug/ .DS_Store # Avoid sending testing screenshots up -*.png +# *.png operate/screenshots/ +screenshots/* diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 1d4ba68c..717920fb 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -31,8 +31,8 @@ It is recommended that a screenshot of the `evaluate.py` output is included in a - **Improve the `SUMMARY_PROMPT`** - **Improve Linux and Windows compatibility**: There are still some issues with Linux and Windows compatibility. PRs to fix the issues are encouraged. - **Adding New Multimodal Models**: Integration of new multimodal models is welcomed. If you have a specific model in mind that you believe would be a valuable addition, please feel free to integrate it and submit a PR. -- **Iterate `--accurate` flag functionality**: Look at https://github.com/OthersideAI/self-operating-computer/pull/57 for previous iteration -- **Enhanced Security**: A feature request to implement a _robust security feature_ that prompts users for _confirmation before executing potentially harmful actions_. This feature aims to _prevent unintended actions_ and _safeguard user data_ as mentioned here in this [OtherSide#25](https://github.com/OthersideAI/self-operating-computer/issues/25) +- **Iterate `--accurate` flag functionality**: Look at https://github.com/malah-code/self-ai-operating-computer/pull/57 for previous iteration +- **Enhanced Security**: A feature request to implement a _robust security feature_ that prompts users for _confirmation before executing potentially harmful actions_. This feature aims to _prevent unintended actions_ and _safeguard user data_ as mentioned here in this [OtherSide#25](https://github.com/malah-code/self-ai-operating-computer/issues/25) ## Guidelines diff --git a/LICENSE b/LICENSE index 2013c4aa..e5061acf 100644 --- a/LICENSE +++ b/LICENSE @@ -1,6 +1,6 @@ MIT License -Copyright (c) 2023 OthersideAI +Copyright (c) 2023 malah-code Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal diff --git a/MANIFEST.in b/MANIFEST.in new file mode 100644 index 00000000..540b7204 --- /dev/null +++ b/MANIFEST.in @@ -0,0 +1 @@ +include requirements.txt \ No newline at end of file diff --git a/README.md b/README.md index 221b3fd8..a090c443 100644 --- a/README.md +++ b/README.md @@ -4,13 +4,7 @@ ome

A framework to enable multimodal models to operate a computer.

-

- Using the same inputs and outputs as a human operator, the model views the screen and decides on a series of mouse and keyboard actions to reach an objective. Released Nov 2023, the Self-Operating Computer Framework was one of the first examples of using a multimodal model to view the screen and operate a computer. -

- -
- -
+ce