Bring order to your digital chaos with this intelligent file organizer that uses lightweight machine learning algorithms to automatically categorize and organize your files - all while running entirely on your local machine with no external dependencies.
The Local File Organizer processes your files using:
- Text Analysis: Uses TF-IDF vectorization and NLTK for understanding text content
- Image Analysis: Employs OpenCV and scikit-learn for visual content understanding
- Smart Categorization: Applies machine learning techniques to group similar files
- Intelligent Renaming: Generates descriptive filenames based on file content
Before:
/messy_folder/
├── IMG_20230515.jpg # Picture of an apple
├── IMG_20230516.jpg # Picture of office desk
├── notes_20230517.txt # Meeting minutes
├── data_analysis.xlsx # Sales data
└── document1.pdf # Company policy
After:
/organized_folder/
├── Food_and_Produce/
│ └── Red_Apple_Photo_May15.jpg
├── Office/
│ ├── Workspace_Setup_May16.jpg
│ └── Meeting_Minutes_May17.txt
├── Financial/
│ └── Sales_Analysis_2023.xlsx
└── Documents/
└── Company_Policy_Guidelines.pdf
- 100% Local Processing: All analysis happens on your machine
- No Internet Required: Works completely offline
- Privacy First: No data leaves your computer
- Resource Efficient: Uses lightweight ML algorithms
- Smart Object Detection: Identifies common objects in images
- Text Understanding: Extracts key topics from documents
- OCR Support: Reads text from images and PDFs
- Images:
.png,.jpg,.jpeg,.gif,.bmp - Documents:
.txt,.docx,.md,.pdf - Data Files:
.xlsx,.csv - Presentations:
.ppt,.pptx
- Python 3.11
- Conda package manager
- Windows, macOS, or Linux
- Create and activate conda environment:
conda create --name file-sorter python=3.11 -y
conda activate file-sorter- Install core dependencies:
conda install -y -c conda-forge numpy=1.24.3 pillow=9.5.0 pandas scikit-learn scipy opencv nltk tesseract- Install remaining packages:
python -m pip install -r requirements.txt- Tesseract OCR: Required for text extraction from images
- Windows: Download from UB-Mannheim/tesseract
- macOS:
brew install tesseract - Linux:
sudo apt install tesseract-ocr
Before running the application, you need to create a .env file in the root directory of the project. This file is used to store your Gemini API key.
-
Create a
.envfile in the root of the project. -
Add the following line to the
.envfile:GEMINI_API_KEY="your_api_key_here"Replace
"your_api_key_here"with your actual Gemini API key.
- Activate the conda environment:
conda activate file-sorter- Run the organizer:
python main.py- Follow the prompts to select input and output directories
- NLTK for tokenization and text preprocessing
- TF-IDF vectorization for content analysis
- Scikit-learn for text classification
- Extractive summarization for file descriptions
- OpenCV for image analysis and object detection
- Color space analysis (HSV) for object recognition
- Shape detection for specific item recognition
- OCR integration for text in images
- Content-based clustering for folder creation
- Intelligent filename generation
- Metadata extraction and analysis
- Multi-threaded processing for performance
The organizer can be customized through:
- File type extensions in
file_utils.py - Processing rules in
data_processing_common.py - Output formatting in
output_filter.py
This project is dual-licensed under the MIT License and Apache 2.0 License. See the LICENSE file for details.