It's the baseline model (seq2seq with attention mechanism) for the formosa grand challenge, which is modified from pratical-pytorch seq2seq-translation-batched and Tensorflow Sequence-to-Sequence Models. I'm planning to integrate the Word2Vec in the model soon. You also can try to use the embedding produced from CBOW code or other pre-trained Word2Vec to replace the torch.nn.Embedding(num_embeddings, embedding_dim) in the encoder and decoder models.
pytorch v0.2.0
scikit-learn
sconce
Link
Download from the "Link", then replace the "./data" folder.
Note: This dataset is modified from the NOAH'S ARK LAB Short-Text Conversation dataset.
Please cite the following paper if you use the data in your work.
Neural Responding Machine for Short-Text Conversation. Lifeng Shang, Zhengdong Lu, and Hang Li. ACL 2015.
Link
Download from the "Link", then replace the "./save" folder.