The computer science team hopes its software will ultimately be used as a “first round filter” by online review sites – such as TripAdvisor – to identify “opinion spam”, or fake reviews.
The software has currently only been tested on hotel reviews, but the researchers hope to apply it to restaurant reviews and ultimately also to consumer product reviews. It works by using linguistic and keyword patterns to identify opinion spam.
"While this is the first study of its kind, and there's a lot more to be done, I think our approach will eventually help review sites identify and eliminate these fraudulent reviews," said Myle Ott, one of the researchers who presented the software at the recent 49th annual meeting of the Association for Computational Linguistics in Portland, US.
"Ultimately, cutting down on deception helps everyone. Customers need to be able to trust the reviews they read, and sellers need feedback on how best to improve their services."
This comes at a time of increasing frustration from hotel and restaurant owners, who find their businesses sustain significant damage from fraudulent or malicious reviews.
Truth vs Deception
To help them develop their software, the Cornell researchers asked a group of people to deliberately write false positive reviews of 20 Chicago hotels. These were then compared with an equal number of carefully verified truthful reviews.
They then submitted these to a number of volunteer human judges, who were asked to identify which reviews were deceptive. However, they “scored no better than chance”, confirmed the researchers.
Computer analysis based on subtle features of text was then applied to the reviews to identify linguistic and keyword patterns.
This found, for example, that truthful hotel reviews are more likely to use concrete words relating to the hotel, like ‘bathroom’, ‘check-in’ or ‘price’.
“Deceivers”, on the other hand, were found to write more about things that set the scene, like ‘vacation’, ‘business trip’ or ‘my husband’.
“Truth-tellers and deceivers also differ in the use of keywords referring to human behavior and personal life, and sometimes in features like the amount of punctuation or frequency of ‘large words’. In parallel with previous analysis of imaginative vs. informative writing, deceivers use more verbs and truth-tellers use more nouns,” write the researchers.
Fake review filter: How it works
Using these approaches, the researchers trained a computer on a subset of true and false reviews, then tested it against the rest of the database.
They found that the best results came from combining keyword analysis with the ways certain words are combined in pairs.
After applying the software to 800 reviews of hotels in Chicago, they found that it identified deceptive reviews with 89.8 per cent accuracy.
However, Ott cautioned that the work so far is only validated for hotel reviews.
He said the next step would be to see if the techniques can be extended to other categories, starting with restaurants and eventually moving to consumer products.