Official code repository for "GeoToken: Hierarchical Geolocalization of Images via Next Token Prediction" [ICDM 2025].
GeoToken is a framework for geographic localization using transformer models with multimodal inputs and retrieval-augmented generation.
Note: This repository is being actively updated. Additional components and documentation will be added soon.
Will be updated.
Will be updated.
Train the transformer model that integrates the indexed data for geographic token prediction:
python train_transformer_c_n_grouped.pyGeoToken offers multiple inference strategies:
Generate location predictions using standard sampling with temperature from the decoder:
python sample_n.pyUse beam search for location predictions:
python beam_search.pyLeverage Gemini multimodal LLM to analyze the image and refine predictions using model samples as context:
python ask_gemini_sample_neighbor.pyThis approach combines visual understanding from Gemini with the transformer's candidate predictions to determine the most accurate location.
The results obtained from the code is presented here:
