Skip to content

Semantic Guided Depth Estimation with Transformers Using Monocular Camera and Sparse Radar

License

Notifications You must be signed in to change notification settings

TUMFTM/CamRaDepth

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

19 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

CamRaDepth: Semantic Guided Depth Estimation Using Monocular Camera and Sparse Radar

Linux Python 3.9 Docker

๐Ÿ“ฐ Abstract

Our research aims to generate robust and dense 3D depth maps for robotics and autonomous driving applications. Since cameras output 2D images and active sensors such as LiDAR or radar produce sparse depth measurements, dense depth maps need to be estimated. Recent methods based on visual transformer networks have outperformed conventional deep learning approaches in various computer vision tasks including depth prediction but have focused on the use of a single camera image. This paper explores the potential of visual transformers applied to the fusion of monocular images, semantic segmentation, and projected sparse radar reflections for robust monocular depth estimation. The addition of a semantic segmentation branch is used to add object-level understanding and is investigated in a supervised and unsupervised manner. We evaluate our new depth estimation approach on the \textit{NuScenes} dataset where it outperforms existing state-of-the-art camera-radar depth estimation methods. We show that even without running segmentation at inference, models can benefit from an additional segmentation branch during the training process by transfer learning.

๐Ÿ“ Folder Structure

  • CamRaDepth: Contains the model's scripts: Running pipeline, the PyTorch model, relevant arguments, utilty functions, etc.

    • data: Contains dataloader.py and the datasplit file, that contains the files name in order of train (0:17901) / validation (17902: 20138) / test (20139: 22375).
    • models:
      • CamRaDepth.py: PyTorch Model. It is a generic implemetation for all the various architectures we present.
      • diffGradNorm.py: A novel optimizer: diffGradNorm
      • simplfied_attention.py: The backbone for the visual transformer from: Simplified Transformer
    • visualization: Contains different scripts to plot or visualize data
  • lib: Contains utility functions from RC-PDA to fuse lidar or radar pointclouds

  • scripts: Contains slightly modified python scripts from RC-PDA to prepare input data and generate ground truth depth maps

๐Ÿ‹ Docker Setup

We suggest using Docker to avoid compatibility problems.

  1. Go to the root directory of this repositors.
cd CamRaDepth/
  1. Build the Docker image.
docker build -t camradepth . 
  1. Run the container and mount the repository and the data.
cd ..
./run_docker.sh

๐Ÿ“‘ Preparation

1. Set Up External Repos

Clone RAFT to external/
and run cd external/RAFT && ./download_models.sh

Clone Panoptic-DeepLab to external/

We use the data preprocessing pipeline from RC-PDA

2. Installation of relevant requirements.

This project doesn't require any special installation or version specific packages. One could simply create a new "Conda" enviroment and install the provided requirements.txt file:

conda create -n <name_of_environment> python=3.9

and then:

pip install -r requirements.txt 

Note: This project was run solely on Linux, and was not tested on Windows or Mac.

3. Download Data

Download the nuScenes dataset, and put it outside - next to this CamRaDepth repository. Note: For quick setup, the scripts are currently using the nuScenes Mini dataset. Change the pathes in the following script to run the full nuScenes dataset.

Note: If you desire to download the dataset through the terminal, please follow the discussion found here. Follow the steps, including the renaming part. Then, unzip the folder using:

tar -zxvf v1.0-mini.tgz -C nuscenes_mini

4. Data Preparation

Run all data preparation scripts:
Adjust DATA_DIR and DATA_VERSION in precprocess_data.sh

./scripts/preprocess_data.sh

Hint: The external repos are using deprecated functions, and might cause an error. Replacing torchvision.models.utils by from torch.hub import load_state_dict_from_url can fix them.
Files:
external/panoptic-deeplab/segmentation/model/backbone/hrnet.py
external/panoptic-deeplab/segmentation/model/backbone/mnasnet.py
external/panoptic-deeplab/segmentation/model/backbone/mobilenet.py
external/panoptic-deeplab/segmentation/model/backbone/resnet.py
external/panoptic-deeplab/segmentation/model/backbone/xception.py

5. Generate Quasi-Ground-Truth Semantic Segmentation

Use external/mseg-semantic from mseg to generate semantic labels for the image:

mkdir external/mseg && cd external/mseg/
# Download weights
wget --no-check-certificate -O "mseg-3m.pth" "https://github.com/mseg-dataset/mseg-semantic/releases/download/v0.1/mseg-3m-1080p.pth"
# Install mseg-api
git clone https://github.com/mseg-dataset/mseg-api.git
cd mseg-api && sed -i '12s/.*/MSEG_DST_DIR="\/dummy\/path"/' mseg/utils/dataset_config.py
pip install -e .
cd ..
# Install mseg-semantic
git clone https://github.com/mseg-dataset/mseg-semantic.git
cd mseg-semantic && pip install -r requirements.txt
pip install -e .
cd ..

Note: We now assume that the current working directory is CamRaDepth/externals/mseg-semantic

change line 23 of file 'mseg-semantic/mseg_semantic/utils/img_path_utils.py' to:

suffix_fpaths = glob.glob(f"{jpg_dir}/*_im.{suffix}")

Run inference with the correct data source directory

CONFIG_PATH="mseg_semantic/config/test/default_config_360_ms.yaml"

python -u mseg_semantic/tool/universal_demo.py \
  --config="$CONFIG_PATH" \
  model_name mseg-3m \
  model_path mseg-3m.pth input_file ../../../nuscenes_mini/prepared_data/

Change and combine the labels for the right format

python scripts/vehicle_seg.py

๐Ÿ“ฅ Download Pretrained Weights

Download pretrained weights:

mkdir src/checkpoints && cd src/checkpoints
wget  https://syncandshare.lrz.de/dl/fi17pZyWBpZf38uxQ5XcS3/checkpoints.zip
unzip checkpoints.zip -d ..

Note 1: "FS" - From scratch, "TL" - Transfer learning scheme.
Note 2: It is assumed that the code is run from the main directory, CamRaDepth

๐Ÿ–ฅ Run the Code

Modes and Expected Outputs:

Pass the argument run_mode:

  • train: if save_model is set to True, the model will save intermediate checkpoints and Tensorsboard files, to a folder that is specified as output_dir/arch_name/run_name. For example: "outputs/CamRaDepth/Base_batchsize_4_run1". Multiple runs could be done to the same folder. The checkpoints are saved according to the best perfromance so far over the validation set, and in thier name specify the val-loss value, for reference. Note that each ckpt file would weigh ~350 MB.

  • test: Would not save anything to the disk, but only print a summary of the model's performance on the test set, stating measurments such as RMSE for 50m and 100m, MAE, runtime, performance only on edge cases, etc.

Arguments

  • There are two ways to pass arguments to the model:

    • conventual: Use the common method of passing the different arguments through the command line.
    • manually (recommended): Under utils/args.py you will find the "Manual settings" section. Uncomment this section, and set your desired values as you wish, much more comfortably.
  • Needed arguments for training:

    • save model: defines if checkpoints and tensorboard files should be saved to disk.
    • load_ckpt: A boolean value that defines if a checkpoint should be loaded from disk.
    • distributed: A boolean value that defines if the model should be trained in a distributed manner (torch.nn.DataParallel()).
    • run_mode: A string our of ["train", "test"], that defines if the model should be trained or tested.
    • model: A string out of ["base (rgb)", "base", "supervised_seg", "unsupervised_seg", "sup_unsup_seg", "sup_unsup_seg (rgb)"].
    • checkpoint: Should be set if transfer learning is desired. The abs-path to the checkpoint that should be stated.
    • arch_name: A string that helps to disntiguish between running modes (e.g. "Debug", "Transformer", etc.)
    • run_name: A specific name for a specific run (e.g. "Base_batchsize_6_run_1").
    • batch_size: The batch size.
    • desired_batch_size: Hardware could be quite limiting, therefore you can set this parameter for "gradients accumelations", meaning that the backprop pipeline will be exceuted every update_interval = desired_batch_size / batch_size iterations.
    • div_factor: Is the div_factor argument for the OneCycleOptimzer by PyTorch.
    • learning_rate: The learning rate.
    • cuda_id: If using cude, and there is more than one GPU in the system. The default --> 0.
    • num_steps: Instead of the setting a fixed number of epochs, set the number of running update steps. (Takes the update_interval into account).
    • num_epochs: If running as specific number of epochs is desired.
    • rgb_only: Set the number of input channels to 3.
    • input_channels: Number of input channels (default --> 7).
    • early_stopping_thresh: The threshold for the early stopping mechanism.

Training Example Command

Note: Training does only make sence with the full dataset as the mini does not provide enough data for meaningful training.

python src/main/runner.py --run_mode train --model base --save_model --batch_size 2 --desired_batch_size 6 --num_steps 60000 --run_name 'base_batch(2-6)' --split <created_split_for_the_full_dataset.npy>

Evaluation (inference) Example Command

With Full nuScenes Dataset:

python src/main/runner.py --run_mode test --checkpoint checkpoints/Base_TL.pth --model base

With nuScenes Mini:

python src/main/runner.py --run_mode test --checkpoint checkpoints/Base_TL.pth --model base --split <your_mini_split_path> --mini_dataset

Note: the mini_dataset argument is a must for the mini split, as the dataloader will take the entire split is a test set. The default split file argument (<your_mini_split_path>) is new_split.npy

  • Needed arguments for testing (inference) - look above for a more detailed description:
    • run_mode
    • checkpoint
    • model
    • cuda_id

๐Ÿ—บ Visualization

python visualization/visualization.py --vis_num 10

This module will create and save to the disk a variety of inputs, such as the depth map, RADAR and LiDAR projection onto the image place, transperent depth projection, semantic segmenation, etc for each one of the input instances of the given dataset (for vis_num different inputs). One could easily view the different visualizations under output_dir/visualizations/collage as a single collage, or go to the corresponding directory with the specific instance name, that one would like to examine.

๐Ÿงฎ Model Architecture

๐Ÿ“‡ Contact Info

Dan Halperin, [email protected], Institute of Automotive Technology, School of Engineering and Design, Technical University of Munich, 85748 Garching, Germany

Florian Sauerbeck, [email protected], Institute of Automotive Technology, School of Engineering and Design, Technical University of Munich, 85748 Garching, Germany

๐Ÿ“ƒ Reference

@ARTICLE{sauerbeck2023camradepth,
  author={Sauerbeck, Florian and Halperin, Dan and Connert, Lukas and Betz, Johannes},
  journal={IEEE Sensors Journal}, 
  title={CamRaDepth: Semantic Guided Depth Estimation Using Monocular Camera and Sparse Radar for Automotive Perception}, 
  year={2023},
  volume={},
  number={},
  pages={},
  doi={10.1109/JSEN.2023.3321886}
}