Multi-Domain Sentiment Dataset

Jan 07 2020
By: John Blitzer, Mark Dredze, Fernando Pereira
From: Association of Computational Linguistics (ACL)
The Multi-Domain Sentiment Dataset contains product reviews taken from from 4 product types (domains): Kitchen, Books, DVDs, and Electronics. Each domain has several thousand reviews, but the exact number varies by domain. Reviews contain star ratings (1 to 5 stars) that can be converted into binary labels if needed. A few notes regarding the data. 1) There are 4 directories corresponding to each of the four domains. Each directory contains 3 files called, and (The book's directory doesn't contain the unlabeled but the link is below.) While the positive and negative files contain positive and negative reviews, these aren't necessarily the splits we used in the experiments. We randomly drew from the three files ignoring the file names. 2) Each file contains a pseudo XML scheme for encoding the reviews
