|
|
Microsoft Research Social Media Conversation Corpus |
A collection of 12,696 Tweet Ids representing 4,232 three-step conversational snippets extracted from Twitter logs. Last published: June 1, 2015.
A collection of 12,696 Tweet Ids representing 4,232 three-step conversational snippets extracted from Twitter logs. Each row in the dataset represents a single context-message-response triple that has been evaluated by crowdsourced annotators as scoring an average of 4 or higher on a 5-point Likert scale measuring quality of the response in the context. The data has been randomly binned into tuning (development) and test sets, comprising 2118 and 2114 triples respectively. It is released to the natural language processing community for academic research purposes only. In order to access the underlying tweets and related metadata, you will need to call the Twitter API. If you use this material in your research, we ask that you cite the following paper: Alessandro Sordoni, Michel Galley, Michael Auli, Chris Brockett, Yangfeng Ji, Meg Mitchell, Jian-Yun Nie, Jianfeng Gao, and Bill Dolan, A Neural Network Approach to Context-Sensitive Generation of Conversational Responses, Conference of the North American Chapter of the Association for Computational Linguistics – Human Language Technologies (NAACL-HLT 2015), June 2015. Additional information about this and related projects may be found at http://research.microsoft.com/en-us/projects/convo/.
Files
|
|
Status: LiveThis download is still available on microsoft.com. The downloads below will come directly from the Microsoft Download Center. |
System Requirements
Operating Systems: Windows 10, Windows 7, Windows 8
- Windows 7, Windows 8, or Windows 10
Installation Instructions
- Click Download and follow the instructions.