cosmetics

DavidHuji · DavidHuji · commit 0f33a72853a5 · 2022-11-01T21:34:55.000+02:00
diff --git a/README.md b/README.md
@@ -12,6 +12,7 @@ As shown in the paper, CapDec achieves SOTA image-captioning in the setting of t
 This is the formal repository for CapDec, in which you can easily reproduce the papers results.
 
 ## FlickrStyle7k Examples
+Example for styled captions of CapDec on FlickrStyle10K dataset. 
 ![alt text](https://github.com/DavidHuji/CapDec/blob/main/examples.png)
 
 
@@ -28,38 +29,26 @@ conda env create -f others/environment.yml
 conda activate CapDec
 ```
 
-## Download Data
-###COCO: Download [train_captions](https://drive.google.com/file/d/1D3EzUK1d1lNhD2hAvRiKPThidiVbP2K_/view?usp=sharing) to `data/coco/annotations`.
-
-Download [training images](http://images.cocodataset.org/zips/train2014.zip) and [validation images](http://images.cocodataset.org/zips/val2014.zip) and unzip (We use Karpathy et el. split).
-### Flickr
-TBD
-### Flickr7KStyle
-TBD
+# Datasets
+1. Download the datasets using the following links: [COCO](https://www.kaggle.com/datasets/shtvkumar/karpathy-splits), [Flickr30K](https://www.kaggle.com/datasets/shtvkumar/karpathy-splits), [FlickrStyle10k](https://zhegan27.github.io/Papers/FlickrStyle_v0.9.zip).
+2. Parse the data to the correct format using our script parse_karpathy.py, just make sure to edit head the json paths inside the script.
 
 
 #Training
-Extract CLIP features using:
+Make sure to edit head the json or pkl paths inside the scripts.
+1. Extract CLIP features using the following script:
 ```
 python embeddings_generator.py -h
 ```
-Train with fine-tuning of GPT2:
-```
-python train.py --data ./data/coco/oscar_split_ViT-B_32_train.pkl --out_dir ./coco_train/
-```
 
-Train only transformer mapping network:
+2. Training the model using the following script:
 ```
-python train.py --only_prefix --data ./data/coco/oscar_split_ViT-B_32_train.pkl --out_dir ./coco_train/ --mapping_type transformer  --num_layres 8 --prefix_length 40 --prefix_length_clip 40
+python train.py --data clip_embeddings_of_last_stage.pkl --out_dir ./coco_train/
 ```
 
-**If you wish to use ResNet-based CLIP:** 
-
-```
-python parse_coco.py --clip_model_type RN50x4
-```
+**There are a few interesting configurable parameters for training as follows:** 
 ```
-python train.py --only_prefix --data ./data/coco/oscar_split_RN50x4_train.pkl --out_dir ./coco_train/ --mapping_type transformer  --num_layres 8 --prefix_length 40 --prefix_length_clip 40 --is_rn
+output of train.py -h
 ```
 
 # Evaluation
diff --git a/parse_karpathy.py b/parse_karpathy.py
@@ -0,0 +1,53 @@
+import pickle, json
+
+kagle_json = 'annotations/dataset_coco_from_kaggle.json'
+new_json_train = 'post_processed_karpthy_coco/train.json'
+new_json_test = 'post_processed_karpthy_coco/test.json'
+new_json_val = 'post_processed_karpthy_coco/val.json'
+
+
+def map_format_kaggle_to_clipcap():
+    def extract_imgid_from_name(filename):
+        return str(int(filename.split('.')[0].split('_')[-1]))
+
+    with open(kagle_json) as f:
+        kaggle_data = json.load(f)
+    train_data = []
+    test_data = []
+    val_data = []
+    splits = {'train': train_data, 'test': test_data, 'val': val_data, 'restval': train_data}
+    out_names = {'train': new_json_train, 'test': new_json_test, 'val': new_json_val}
+    for img in kaggle_data['images']:
+        imgid = extract_imgid_from_name(img['filename'])
+        for cap in img['sentences']:
+            correct_format = {"image_id": int(imgid), "caption": cap['raw'], "id": int(cap['sentid'])}
+            splits[img['split']].append(correct_format)
+
+    DBG = False
+    if not DBG:
+        for name in out_names:
+            with open(out_names[name], 'w') as f:
+                json.dump(splits[name], f)
+
+        for name in out_names:
+            with open(out_names[name][:-5] + '_metrics_format.json', 'w') as f:
+                annos = splits[name]
+                ids = [{"id": int(a["image_id"])} for a in annos]
+                final = {"images": ids, "annotations": annos}
+                json.dump(final, f)
+
+    if DBG:
+        # rons annotations
+        with open('annotations/train_caption_of_real_training.json') as f:
+        # with open('../../train_caption.json') as f:
+            cur_data = json.load(f)
+        ids = [str(int(c['image_id'])) for c in cur_data]
+        new_ids = [str(int(c['image_id'])) for c in train_data]
+        ids.sort()  # inplace
+        new_ids.sort()
+        assert ids == new_ids
+        print('OK')
+
+
+if __name__ == '__main__':
+    map_format_kaggle_to_clipcap()