kopia lustrzana https://github.com/thinkst/zippy
1 wiersz
8.0 KiB
Plaintext
1 wiersz
8.0 KiB
Plaintext
Abstract In this work we consider the col- lection of deceptive April FoolsâÃÂàDay (AFD) news articles as a useful addition in existing datasets for deception detec- tion tasks. Such collections have an es- tablished ground truth and are relatively easy to construct across languages. As a result, we introduce a corpus that in- cludes diachronic AFD and normal ar- ticles from Greek newspapers and news websites. On top of that, we build a rich linguistic feature set, and analyze and compare its deception cues with the only AFD collection currently available, which is in English. Following a current re- search thread, we also discuss the individ- ualism/collectivism dimension in decep- tion with respect to these two datasets. Lastly, we build classifiers by testing vari- ous monolingual and crosslingual settings. The results showcase that AFD datasets can be helpful in deception detection stud- ies, and are in alignment with the observa- tions of other deception detection works. Introduction April Foolsâ Day (for short AFD) is a long stand- ing custom, mostly in Western societies. It is the only day of the year when practical jokes and de- ception are expected. This is the case for all social interactions, including journalism, which is gener- ally considered to aim at the presentation of truth. Every year on this day, newspapers and news web- sites take part in an unofficial competition to in- vent the most believable, but untrue story. In this respect, AFD news articles fall into the deception spectrum, as they satisfy widely acceptable defini- tions of deception as in Masip et al. (2005). The massive participation of news media in this custom establishes a rich corpus of decep- tive articles from a diversity of sources. Although AFD articles may exploit common linguistic in- struments with satire news, like exaggeration, hu- mour, irony and paralogism, they are usually con- sidered a distinct category. This is mainly due to the fact that they also employ other mecha- nisms which characterize deception in general, like sophisms, and changes in cognitive load and emotions (Hauch et al., 2015) to deceive their au- dience. AFD articles are often believable, and there exist cases where sophisticated AFD articles have been reproduced by major international news agencies worldwide1. This motivated us to extend our previous work on linguistic cues of deception and their relation to the cultural dimension of individualism and col- lectivism (Papantoniou et al., 2021), in the context of the AFD. That work examines if differences in the usage of linguistic cues of deception (e.g., pronouns) across cultures can be identified and at- tributed to the individualism/collectivism divide. Specifically, the contributions of this work are: ⢠A new corpus that includes diachronic AFD and normal articles from Greek newspapers and news websites2, adding one more AFD collection to the currently unique one in En- glish (Dearden and Baron, 2019). ⢠A study and discussion of the linguistic cues of deception that prevail in the Greek and En- glish collection, along with their similarities. ⢠A discussion on whether the consideration of the individualism/collectivism cultural di- mension in the context of AFD aligns with the results of our previous work. ⢠An examination of the performance of vari- ous classifiers in identifying AFD articles, in- cluding multilanguage setups. Related Work The creation of reliable and realistic ground truth datasets for the deception detection task is a chal- lenging task (Fitzpatrick and Bachenko, 2012). Crowdsourcing, in the form of online campaigns in which people express themselves in truth- ful and/or deceitful manner for a small pay- ment are a well established way to collect de- ceptive data (Ott et al., 2011). Real-life situations such as trials (Soldner et al., 2019) or the use of data from board games have also been employed (Peskov et al., 2020). Also a popular approach is the reuse of content from sites that debunk ar- ticles like fake news and hoaxes (Wang, 2017; Kochkina et al., 2018). Lastly, satire news are another way to collect deceptive texts, but with some particularities due to humorous deception (Skalicky et al., 2020). The only work that explores AFD articles is that of Dearden et al. (2019). They collected 519 AFD and 519 truthful stories and articles in English for a period ofyears. A large set of features was exploited to identify deception cues in AFD sto- ries. Structural complexity and level of detail were among the most valuable features while the ex- ploitation of the same feature set to a fake news dataset resulted in similar observations. To the best of our knowledge, the only decep- tion related dataset for the Greek language is that of Karidi et al. (2005). This work proposed an automatic process for the detection of deception cues, but unfortunately the created corpus over Greek websites is not avail- able. If we also consider that the creation of a fake news dataset for Greek websites through a previous work (2019) is a cumbersome and expensive task, and thats (2019) and (2019)_ in Dearden and Baron (2019) work. (2019) is a long stand- ing custom, mostly in Western societies. It is the only day of the year when practical jokes and de- ception are expected. This is the case for all social interactions, including journalism, which is gener- ally considered to aim at the presentation of truth. Every year on this day, newspapers and news web- sites take part in an unofficial competition to in- vent the most believable, but untrue story. In this respect, AFD news articles fall into the deception spectrum, as they satisfy widely acceptable defini- tions of deception as in Masip et al. (2005). The massive participation of news media in this custom establishes a rich corpus of decep- tive articles from a diversity of sources. Although AFD articles may exploit common linguistic in- struments with satire news, like exaggeration, hu- mour, irony and paralogism, they are usually con- sidered a distinct category. This is mainly due to the fact that they also employ other mecha- nisms which characterize deception in general, like sophisms, and changes in cognitive load and emotions (Hauch et al., 2015) to deceive their au- dience. AFD articles are often believable, and there exist cases where sophisticated AFD articles have been reproduced by major international news agencies worldwide1. This motivated us to extend our previous work on linguistic cues of deception and their relation to the cultural dimension of individualism and col- lectivism (Papantoniou et al., 2021), in the context of the AFD. That work examines if differences in the usage of linguistic cues of deception (e.g., pronouns) across cultures can be identified and at- tributed to the individualism/collectivism divide. Specifically, the contributions of this work are: ⢠A new corpus that includes diachronic AFD and normal articles from Greek newspapers and news websites2, adding one more AFD collection to the currently unique one in En- glish (Dearden and Baron, 2019). ⢠A study and discussion of the linguistic cues of deception that prevail in the Greek and En- glish collection, along with their similarities. ⢠A discussion on whether the consideration of the individualism/collectivism cultural di- mension in the context of AFD aligns with the results of our previous work. ⢠An examination of the performance of vari- ous classifiers in identifying AFD articles, in- cluding multilanguage setups. Related Work The creation of reliable and realistic ground truth datasets for the deception detection task is a chal- lenging task (Fitzpatrick and Bachenko, 2012). Crowdsourcing, in the form of online campaigns in which people express themselves in truth- ful and/or deceitful manner for a small pay- ment are a well established way to collect de- ceptive data (Ott et al., 2011). Real-life situations such as trials (Soldner et al., 2019) or the use of data from board games have also been employed (Peskov et al., 2020). |