GeoToken

Official code repository for "GeoToken: Hierarchical Geolocalization of Images via Next Token Prediction" [ICDM 2025].

GeoToken is a framework for geographic localization using transformer models with multimodal inputs and retrieval-augmented generation.

Note: This repository is being actively updated. Additional components and documentation will be added soon.

Pipeline

1. Contrastive Learning with CLIP

Will be updated.

2. Data Indexing for RAG

Will be updated.

3. Transformer Training

Train the transformer model that integrates the indexed data for geographic token prediction:

python train_transformer_c_n_grouped.py

Inference Options

GeoToken offers multiple inference strategies:

Standard Sampling

Generate location predictions using standard sampling with temperature from the decoder:

python sample_n.py

Beam Search

Use beam search for location predictions:

python beam_search.py

MLLM-Assisted Geolocalization

Leverage Gemini multimodal LLM to analyze the image and refine predictions using model samples as context:

python ask_gemini_sample_neighbor.py

This approach combines visual understanding from Gemini with the transformer's candidate predictions to determine the most accurate location.

The results obtained from the code is presented here:

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
figures		figures
README.md		README.md
ask_gemini_sample_neighbor.py		ask_gemini_sample_neighbor.py
beam_search.py		beam_search.py
metrics.py		metrics.py
model_transformer_add_c_n_grouped.py		model_transformer_add_c_n_grouped.py
precompute_s2_tokens.py		precompute_s2_tokens.py
preprocess.py		preprocess.py
s2_token_utils.py		s2_token_utils.py
sample_n.py		sample_n.py
train_reward_n.py		train_reward_n.py
train_transformer_c_n_grouped.py		train_transformer_c_n_grouped.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GeoToken

Pipeline

1. Contrastive Learning with CLIP

2. Data Indexing for RAG

3. Transformer Training

Inference Options

Standard Sampling

Beam Search

MLLM-Assisted Geolocalization

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

NNargesNN/GeoToken

Folders and files

Latest commit

History

Repository files navigation

GeoToken

Pipeline

1. Contrastive Learning with CLIP

2. Data Indexing for RAG

3. Transformer Training

Inference Options

Standard Sampling

Beam Search

MLLM-Assisted Geolocalization

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages