This project provides a Python script within a Jupyter Notebook to automate the process of organizing hundreds of field photographs. It intelligently sorts image files into folders corresponding to specific data collection points by matching the image's timestamp (from EXIF metadata) with time windows defined in an Excel logsheet.
This script was designed to solve a common problem in field research and data collection: the tedious and error-prone manual task of matching photographic evidence with its corresponding sensor data. By automating this process, it saves significant time and ensures a high degree of accuracy.
- Automated Sorting: Automatically reads image metadata and sorts files into appropriately named folders.
- Time-Based Matching: Uses a precise time window (from the start of one measurement to the start of the next) to associate images with data points.
- Data Anomaly Detection: Identifies and flags data points where recorded times are inconsistent, preventing incorrect file associations.
- Error Handling: Isolates any images that cannot be matched into a separate
otherfolder for easy manual review, ensuring no data is lost. - Reproducible Workflow: As a Jupyter Notebook, the entire process is documented, transparent, and easily reproducible.
The script follows a logical sequence to process and sort the data:
- Load Data: It begins by loading the measurement data from the
Data Tersortis + logsheet gedong songo.xlsxfile into a pandas DataFrame. - Chronological Sort: The data is sorted by the
Date Createdcolumn to establish a correct timeline, which is crucial for the next step. - Define Time Windows: For each data point (row), a time interval is calculated. The window starts at that row's
Date Createdtime and ends at theDate Createdtime of the next row. This defines a precise period during which any photos taken should belong to that data point. - Extract Image Timestamps: The script iterates through all image files in the source folder and uses the
Pillowlibrary to extract the original creation timestamp from the EXIF metadata of each photo. - Match and Sort: It then compares each image's timestamp to the calculated time windows. If an image's timestamp falls within a data point's window, the script copies that image into a new folder named after the data point's
Title. - Isolate Unmatched Files: Any images whose timestamps do not fall into any of the defined windows are copied to a special
otherfolder for manual inspection.
Before running the script, ensure your files are organized as follows:
your-project-folder/
│
├── dokumentasi-adlan.ipynb # The Jupyter Notebook file
│
├── Data Tersortis + logsheet gedong songo.xlsx # Your Excel data file
│
└── dokumentasi-adlan/ # Folder containing ALL the images to be sorted
├── image_001.jpg
├── image_002.jpg
├── image_003.jpg
└── ...
This script requires Python 3 and the following libraries. You can install them using pip:
pip install pandas openpyxl Pillow- Place the Excel file and the
dokumentasi-adlanfolder (containing your images) in the same directory as the Jupyter Notebook. - Open and run the
dokumentasi-adlan.ipynbnotebook. - You can run all cells sequentially. The script will create a new directory called
sorted_documentationcontaining the results.
After the script finishes, a new folder named sorted_documentation will be created with the following structure:
your-project-folder/
│
└── sorted_documentation/
│
├── L1.1/ # Folder for Title 'L1.1'
│ ├── image_001.jpg
│ └── ...
│
├── L1.2/ # Folder for Title 'L1.2'
│ └── ...
│
└── other/ # Folder for all unmatched images
├── image_unmatched.jpg
└── ...
This provides a clean, organized dataset that is ready for further analysis, reporting, or archiving.