Smart PDF Manager is a Python-based tool designed to automate the organization of PDF files. It leverages SpaCy, a powerful natural language processing (NLP) library, to analyze the contents of PDF files and classify them into categories based on extracted entities such as organizations, people, locations, and more. By using SpaCy's entity recognition models, the tool can intelligently categorize PDFs written in English, German, and French, making it highly versatile for multilingual document management.
- Organize PDFs: Classify and move PDFs into directories based on the most common entity (e.g., Organization, Person) found in the document.
- Supports English, German and French: Detects entities in English, German and French, improving classification for multilingual documents.
- Customizable: Add or adjust categories and keywords to tailor the classification logic to your needs.
-
Clone this repository:
git clone https://github.com/Dagait/SmartPDFManager.git cd SmartPDFManager
-
Install the required dependencies:
pip install -r requirements.txt
-
Run the application (Only if you don't use the executable):
py -m smart_pdf_manager.ui.smart_pdf_manager_app
-
Select a language for entity recognition (English, German, or French).
-
Place the PDF files you want to organize in a directory of your choice (e.g.,
pdfs/
). -
Choose a directory "input" where the PDFs are located and a directory "output" where the PDFs will be moved to.
-
Start the process by clicking on the "Organize PDFs" button.