Lots of fact-checking datasets and models have been created recently. To perform this task, multi-hop reasoning is required, because a combination of multiple evidence pieces is often needed to verify the claim. However, most of the current fact-checking models use only one inference step and does not provide explanations for their decisions.
A recent study introduces PolitiHop, a dataset of real-world claims and annotations of evidence reasoning chains. It consists of 500 manually annotated claims together with a corresponding PolitiFact article. The reasoning models based on multi-hop architecture outperformed those with a single inference step in the performance check. The best results were achieved when the model was pretrained on the in-domain data. PolitiHop can be further improved by providing more examples of evidence in external sources. Also, coherent summaries of the evidence sentences could be generated.
Recently, novel multi-hop models and datasets have been introduced to achieve more complex natural language reasoning with neural networks. One notable task that requires multi-hop reasoning is fact checking, where a chain of connected evidence pieces leads to the final verdict of a claim. However, existing datasets do not provide annotations for the gold evidence pieces, which is a critical aspect for improving the explainability of fact checking systems. The only exception is the FEVER dataset, which is artificially constructed based on Wikipedia and does not use naturally occurring political claims and evidence pages, which is more challenging. Most claims in FEVER only have one evidence sentence associated with them and require no reasoning to make label predictions — the small number of instances with two evidence sentences only require simple reasoning. In this paper, we study how to perform more complex claim verification on naturally occurring claims with multiple hops over evidence chunks. We first construct a small annotated dataset, PolitiHop, of reasoning chains for claim verification. We then compare the dataset to other existing multi-hop datasets and study how to transfer knowledge from more extensive in- and out-of-domain resources to PolitiHop. We find that the task is complex, and achieve the best performance using an architecture that specifically models reasoning over evidence chains in combination with in-domain transfer learning.