Skip to content

Files

ptn

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
Jul 20, 2017
Aug 18, 2017
Jul 20, 2017
Jul 20, 2017
Aug 18, 2017
Aug 18, 2017
Aug 18, 2017
Jul 20, 2017
Jul 20, 2017
Jul 20, 2017
Jul 20, 2017
Aug 18, 2017
Jul 20, 2017
Jul 20, 2017
Jul 20, 2017
Sep 7, 2017

Perspective Transformer Nets

Introduction

This is the TensorFlow implementation for the NIPS 2016 work "Perspective Transformer Nets: Learning Single-View 3D Object Reconstrution without 3D Supervision"

Re-implemented by Xinchen Yan, Arkanath Pathak, Jasmine Hsu, Honglak Lee

Reference: Orginal implementation in Torch

How to run this code

This implementation is ready to be run locally or "distributed across multiple machines/tasks". You will need to set the task number flag for each task when running in a distributed fashion. Please refer to the original paper for parameter explanations and training details.

Installation

  • TensorFlow
    • This code requires the latest open-source TensorFlow that you will need to build manually. The documentation provides the steps required for that.
  • Bazel
  • matplotlib
    • Follow the instructions here.
    • You can use a package repository like pip.
  • scikit-image
    • Follow the instructions here.
    • You can use a package repository like pip.
  • PIL
    • Install from here.

Dataset

This code requires the dataset to be in tfrecords format with the following features:

  • image
    • Flattened list of image (float representations) for each view point.
  • mask
    • Flattened list of image masks (float representations) for each view point.
  • vox
    • Flattened list of voxels (float representations) for the object.
    • This is needed for using vox loss and for prediction comparison.

You can download the ShapeNet Dataset in tfrecords format from here*.

* Disclaimer: This data is hosted personally by Arkanath Pathak for non-commercial research purposes. Please cite the ShapeNet paper in your works when using ShapeNet for non-commercial research purposes.

Pretraining: pretrain_rotator.py for each RNN step

$ bazel run -c opt :pretrain_rotator -- --step_size={} --init_model={}

Pass the init_model as the checkpoint path for the last step trained model. You'll also need to set the inp_dir flag to where your data resides.

Training: train_ptn.py with last pretrained model.

$ bazel run -c opt :train_ptn -- --init_model={}

Example TensorBoard Visualizations

To compare the visualizations make sure to set the model_name flag different for each parametric setting:

This code adds summaries for each loss. For instance, these are the losses we encountered in the distributed pretraining for ShapeNet Chair Dataset with 10 workers and 16 parameter servers: ShapeNet Chair Pretraining

You can expect such images after fine tuning the training as "grid_vis" under Image summaries in TensorBoard: ShapeNet Chair experiments with projection weight of 1 Here the third and fifth columns are the predicted masks and voxels respectively, alongside their ground truth values.

A similar image for when trained on all ShapeNet Categories (Voxel visualizations might be skewed): ShapeNet All Categories experiments