kopia lustrzana https://github.com/thinkst/zippy
1 wiersz
9.8 KiB
Plaintext
1 wiersz
9.8 KiB
Plaintext
ABSTRACT The popularity of online shopping is steadily increasing. At the same time, fake product reviews are published widely and have the potential to affect consumer purchasing behavior. In response, previous work has developed automated methods for the detection of deceptive product reviews. However, studies vary considerably in terms of classification performance, and many use data that contain potential confounds, which makes it difficult to determine their validity. Two possible confounds are data-origin (i.e., the dataset is composed of more than one source) and product ownership (i.e., reviews written by individuals who own or do not own the reviewed product). In the present study, we investigate the effect of both confounds for fake review detection. Using an experimental design, we manipulate data-origin, product ownership, review polarity, and veracity. Supervised learning analysis suggests that review veracity (60.26 - 69.87%) is somewhat detectable but reviews additionally confounded with product-ownership (66.19 - 74.17%), or with data-origin (84.44 - 86.94%) are easier to classify. Review veracity is most easily classified if confounded with product-ownership and data-origin combined (87.78 - 88.12%), suggesting overestimations of the true performance in other work. These findings are moderated by review polarity. Introduction Online shopping is not new, but it is increasing in popularity as seen by the growth of companies such as Amazon and eBay [Palmer, 2020, Soper, 2021, Weise, 2020]. Previous work shows that consumers rely heavily on product reviews posted by other people to guide their purchasing decisions [Anderson and Magruder, 2012, Chevalier and Mayzlin, 2006, Watson, 2018]. While sensible, this has created the opportunity and market for deceptive reviews, which are currently among the most critical problems faced by online shopping platforms and those who use them [Dwoskin and Timberg, 2018, Nguyen, 2018]. Research suggests that for a range of deception detection tasks (e.g. identifying written or verbal lies about an individualâÂÂs experience, biographical facts, or any non-personal events), humans typically perform at the chance level [DePaulo et al., 2003, Kleinberg and Verschuere, 2021]. Furthermore, in the context of considering online reviews, the sheer volume of reviews [Woolf, 2014] makes the task of deception detection implausible for all but the most diligent consumers. With this in mind, the research effort has shifted towards the use and calibration of automated approaches. For written reviews, which are the focus of this article, such approaches typically rely on text mining and supervised machine learning algorithms [Newman et al., 2003, PeÃÂrez-Rosas et al., 2018, Ott et al., 2011, 2013]. However, while the general approach is consistent, classification performance varies greatly between studies, as do the approaches to constructing the datasets used. Higher rates of performance are usually found in studies for which data are constructed from several different sources such as a crowdsourcing platform and an online review platform [Ott et al., 2011, 2013], while lower rates of performance are typically found in studies for which data is extracted from a single source and for which greater experimental control is exercised (e.g., Kleinberg and Verschuere [2021]). Such findings suggest that confounds associated with the construction of datasets may explain some of the variation in classification performance between studies and highlights the need for the exploration of such issues. In the current study, we will explore two possible confounds and estimate their effects on automated classification performance. In what follows, we first identify and explain the two confounds. Next, we provide an outline of how we control for them through a highly controlled data collection procedure. Lastly, we will run six analyses on subsets of the data to demonstrate the pure and combined effects of the confounds in automated veracity classification tasks. Confounding factors In an experiment, confounding variables can lead to an omitted variable bias, in which the omitted variables affect the dependent variable, and the effects are falsely attributed to the independent variables(s). In the case of the detection of fake reviews, two potential confounds might explain why some studies report higher and possibly overestimated automated classification performances than others. The first concerns the provenance of the data used. For example, deceptive reviews are often collected from participants recruited through crowdsourcing platforms, while truthful reviews are scraped from online platforms Ott et al. [2011, 2013], such as TripAdvisor, Amazon, Trustpilot, or Yelp. Creating datasets in this way is efficient but introduces a potential confound. That is,only do the reviews differ in veracity but also their origin. If origin and veracity were counterbalanced so that half of the fake (and genuine) reviews were generated using each source this would be unproblematic but unfortunately in some existing studies, the two are confounded. A second potential confound concerns ownership. In existing studies, participants who write fake reviews are asked to write about products (or services) that they do not own. In contrast, in the case of the scraped reviews â assuming that they are genuine (which is also a problematic assumption) â these will be written by those who own the products (or have used the services). As such, ownership and review veracity (fake or genuine) will also be confounded. Confounds in fake review detection Studies of possible confounding factors in deception detection tasks that involve reviews are scarce. In their study, Salvetti et al. [2016] investigated whether a machine learning classifier could disentangle the effects of two different types of deception â lies vs. fabrications. In the case of the former, participants recruited using AmazonâÂÂs Mechanical Turk (AMT) were asked to write a truthful and deceptive review about an electronic product or a hotel they knew. In the case of the latter, a second group of AMT participants was asked to write deceptive reviews about the same products or hotels. However, this time they were required to do this for products or hotels they had no knowledge of, resulting in entirely fabricated reviews. Salvetti et al. [2016] found that the classifier was able to differentiate between truthful reviews and fabricated ones but not particularly well. However, it could not differentiate between truthful reviews and lies â classification performance was around the chance level. These findings suggest that product ownership (measured here in terms of fabrications vs truthful reviews) is a potentially important factor in deceptive review detection. A different study examined the ability of a classifier to differentiate truthful and deceptive reviews from Amazon [Fornaciari et al., 2020] using the "DeRev" dataset [Fornaciari and Poesio, 2014]. The dataset contains fake Amazon book reviews that were identified through investigative journalism [Flood, 2012, Streitfeld, 2011]. Truthful reviews were selected from Amazon about other books, from famous authors such as Arthur Conan Doyle, Rudyard Kipling, Ken Follett, or Stephen King for which it was assumed that it would not make sense for someone to write fake reviews about them. A second corpus of fake reviews â written about the same books â was then generated by participants recruited through crowdsourcing to provide a comparison with the "DeRev" reviews. The authors then compared the performance of a machine learning classifier in distinguishing between different classes of reviews (e.g., crowdsourced-fake vs. Amazon-fake, crowdsourced-fake vs. Amazon-truthful). Most pertinent here was the finding that the authors found the crowdsourced-fake reviews differed from the Amazon-fake reviews. Both studies [Fornaciari et al., 2020, Salvetti et al., 2016] hint at the problems of confounding factors in deception detection tasks. However, a combination of both factors (ownership and data-origin) or how the two interact for true and deceptive reviews has not been tested yet. Aims of this paper Confounding variables have the potential to distort the findings of studies, leading researchers to conclude that a classifier can distinguish between truthful and fake reviews when, in reality, it is actually distinguishing between other characteristics of the data, such as the way in which it was generated. Such confounds would mean that the real-world value of the research is limited (at best). In the current study, we employ an experimental approach to systematically manipulate these possible confounders and to measure their effects for reviews of smartphones. Specifically, we estimate the effect of product-ownership by collecting truthful and deceptive reviews from participants who do and do not own the products they were asked to review. To examine the effect of data-origin we also use data (for the same products) scraped from an online shopping platform. We first examine how well reviews can be differentiated by veracity alone (i.e., without confounds), and if classification performance changes when this is confounded with product-ownership, data-origin, or both. If ownership or data-origin do influence review content (we hypothesize that they do), reviews should be easier to differentiate when either of the two confounds is present in veracity classification, but reviews should be most easily classifiable if both confounds (ownership, data-origin) are present at the same time. Conclusion Throughexperimental control, we found that product ownership and data-origin are confounding fake review detection resulting in overestimations of model performances. Especially data-origin seems to boost classification performance, and this could easily be misattributed to classifying veracity alone. |