kopia lustrzana https://github.com/thinkst/zippy
1 wiersz
7.0 KiB
Plaintext
1 wiersz
7.0 KiB
Plaintext
Abstract While recent work on automated fact-checking has focused mainly on verifying and explain- ing claims, for which the list of claims is read- ily available, identifying check-worthy claim sentences from a text remains challenging. Current claim identification models rely on manual annotations for each sentence in the text, which is an expensive task and challeng- ing to conduct on a frequent basis across mul- tiple domains. This paper explores methodol- ogy to identify check-worthy claim sentences from fake news articles, irrespective of do- main, without explicit sentence-level annota- tions. We leverage two internal supervisory signals - headline and the abstractive sum- mary - to rank the sentences based on seman- tic similarity. We hypothesize that this rank- ing directly correlates to the check-worthiness of the sentences. To assess the effectiveness of this hypothesis, we build pipelines that leverage the ranking of sentences based on ei- ther the headline or the abstractive summary. The top-ranked sentences are used for the downstream fact-checking tasks of evidence retrieval and the articleâÂÂs veracity prediction by the pipeline. Our findings suggest that the top 3 ranked sentences contain enough infor- mation for evidence-based fact-checking of a fake news article. We also show that while the headline has more gisting similarity with how a fact-checking website writes a claim, the summary-based pipeline is the most promising for an end-to-end fact-checking system. Introduction With the rise of social media in recent years, it has become possible to disseminate fake news to mil- lions of people easily and quickly. An MIT media lab study (Vosoughi et al., 2018) from two years ago showed that false information goes six times farther and spreads much faster than real informa- tion. Additionally, personalization techniques have enabled targetting people with specific types of fake news based on their interests and confirma- tion biases. In response, there has been an increase in the number of fact-checking organizations that manually identify check-worthy claims and correct them based on evidence (Graves and Cherubini, 2016). However, a study shows that 50% of the lifetime spread of some very viral fake news hap- pens in the first 10 minutes, which limits the ability of manual fact-checking - a process that takes a day or two, sometimes a week1. Automating any part of the fact-checking process can help scale up the fact-checking efforts. Additionally, end-to-end automation can also enable human fact-checkers to devote more time to complex cases that require careful human judgment (Konstantinovskiy et al., 2021). End-to-end automated fact-checking systems in- volve three core objectives - (1) identifying check- worthy claims, (2) verifying claims against authori- tative sources, and (3) delivering corrections/ expla- nations on the claims (Graves, 2018). The majority of the recent work focuses on verification and ex- planation objectives for which a list of claims is readily available (Thorne et al., 2018; Thorne and Vlachos, 2018; Augenstein et al., 2019; Atanasova et al., 2020; Kazemi et al., 2021). Identifying check-worthy claims, which is a critical first step for fact-checking, remains a challenging task. ClaimBuster is the first work to target check- worthiness (Hassan et al., 2017). It is trained on transcripts of 30 US presidential elections de- bates. Each sentence of the transcripts is annotated for three categories - non-factual sentence, unimportant factual sentence, and check-worthy factual sentence. They then build classifiers to classify sentences into these three labels. Another classification-based approach is to predict whether the content of a given statement makes âÂÂan asser- tion about the world that is checkableâ (Konstanti- novskiy et al., 2021). This approach utilizes anno- tations for sentences extracted from subtitles of UK political shows. The models are trained to classify statements into binary labels - claim or non-claim. Finally, a system called ClaimRank (Jaradat et al., 2018) aims to prioritize the sentences that fact- checkers should consider first for fact-checking. ClaimRank is trained on pre-existing annotations on political debates from 9 fact-checking organiza- tions. This approach first classifies the statement as check-worthy or not. The statements are then ranked based on the probabilities that the model as- signs to a statement toto the positive class. While these works are fundamental towards ap- proaching the problem of check-worthy claim iden- tification, the focus is only on a single domain (pol- itics). Additionally, the models rely on sentence- level human annotations, which is an expensive task, challenging to conduct regularly for multiple domains, and challeng- ing to conduct on a frequent basis across mul- tiple domains. This paper explores methodol- ogy to identify check-worthy claim sentences from fake news articles, irrespective of do- main, without explicit sentence-level annota- tions. We leverage two internal supervisory signals - headline and the abstractive sum- mary - to rank the sentences based on seman- tic similarity. We hypothesize that this rank- ing directly correlates to the check-worthiness of the sentences. To assess the effectiveness of this hypothesis, we build pipelines that leverage the ranking of sentences based on ei- ther the headline or the abstractive summary. The top-ranked sentences are used for the downstream fact-checking tasks of evidence retrieval and the articleâÂÂs veracity prediction by the pipeline. Our findings suggest that the top 3 ranked sentences contain enough infor- mation for evidence-based fact-checking of a fake news article. We also show that while the headline has more gisting similarity with how a fact-checking website writes a claim, the summary-based pipeline is the most promising for an end-to-end fact-checking system. Introduction With the rise of social media in recent years, it has become possible to disseminate fake news to mil- lions of people easily and quickly. An MIT media lab study (Vosoughi et al., 2018) from two years ago showed that false information goes six times farther and spreads much faster than real informa- tion. Additionally, personalization techniques have enabled targetting people with specific types of fake news based on their interests and confirma- tion biases. In response, there has been an increase in the number of fact-checking organizations that manually identify check-worthy claims and correct them based on evidence (Graves and Cherubini, 2016). However, a study shows that 50% of the lifetime spread of some very viral fake news hap- pens in the first 10 minutes, which limits the ability of manual fact-checking - a process that takes a day or two, sometimes a week1. Automating any part of the fact-checking process can help scale up the fact-checking efforts. Additionally, end-to-end automation can also enable human fact-checkers to devote more time to complex cases that require careful human judgment (Konstantinovskiy et al., 2021). |