Building Fact-Checking Systems: Catching Repeating False Claims Before They Spread

: Why We Want Automated Reality-Checking

Compared to the normal media, the place articles are edited and verified earlier than getting revealed, social media modified the strategy fully. All of the sudden, everybody might elevate their voice. Posts are shared immediately, enabling the entry to concepts and views from everywhere in the world. That was the dream, not less than.

What started as an thought of defending freedom of speech, giving people the chance to precise opinions with out censorship, has include a trade-off. Little or no info will get checked. And that makes it more durable than ever to detect what’s correct and what’s not.

An extra problem is created as false claims hardly ever seem simply as soon as. They’re typically reshared on completely different platforms, typically altered in wording, format, size, and even language, making detection and verification much more tough. As these variations flow into throughout platforms they will appear acquainted and subsequently plausible to its readers.

The unique thought of an area for open, uncensored, and dependable info has run right into a paradox. The very openness meant to empower folks additionally makes it straightforward for misinformation to unfold. That’s precisely the place fact-checking methods are available in.

The Growth of Reality-checking Pipelines

Historically, fact-checking was a handbook course of that relied on consultants (journalists, researchers, or fact-checking organizations) to confirm claims by referencing them with sources akin to official paperwork, or skilled opinions. This strategy was very dependable and thorough, but in addition very time-consuming. The results of this delay was subsequently extra time for the false narratives to flow into, form public opinion, and allow additional manipulation.

That is the place automation is available in. Researchers have developed fact-checking pipelines that behave because the human-fact-checking-experts, however can scale to large quantities of on-line content material. The actual fact-checking pipeline follows a structured course of, which normally contains the next 5 steps:

Declare Detection – discover statements with factual implications.
Declare Prioritization – rank them by pace of unfold, potential hurt, or public curiosity, prioritizing probably the most impactful instances.
Retrieval of Proof – collect supporting materials and supply the context to guage it.
Veracity Prediction – determine whether or not the declare is true, false, or one thing in between.
Technology of Clarification – produce a justification that readers can perceive.

Along with the 5 steps, many pipelines additionally add a sixth step: retrieval of beforehand fact-checked claims (PFCR). As an alternative of redoing the work from scratch, the system checks whether or not a declare, even reformulated, has already been verified. If that’s the case, it’s linked to the fact-check and the declare’s verdict. If not, the pipeline proceeds with proof retrieval.

This shortcut saves effort, accelerates verification, and additional advantages in multilingual settings, because it permits fact-checks in a single language to assist verification in one other.

This element is understood by many names; verified declare retrieval, declare matching, or beforehand fact-checked declare retrieval (PFCR). Whatever the title, the thought is similar: reuse information that already exists to struggle misinformation sooner and extra successfully.

Determine 1: Reality-checking pipeline (created by creator)

Designing the PFCR Element (Retrieval Pipeline)

At its core, beforehand fact-checked declare retrieval (PFCR) is an info retrieval activity: given a declare from a social media put up, we wish to discover probably the most related match in a big assortment of already fact-checked (verified) claims. If a match exists, we will instantly hyperlink it to the supply and the decision, so there is no such thing as a want to begin verification from scratch!

Most fashionable info retrieval methods use a retriever–reranker structure. The retriever acts because the first-layer filter returning a bigger set of candidate paperwork (prime ok) from the corpus. The reranker then takes these candidates and refines the rating utilizing a deeper, extra computationally intensive mannequin. This two-stage design balances pace (retriever) and accuracy (reranker).

Fashions used for retrieval may be grouped into two classes:

Lexical fashions: quick, interpretable, and efficient when there’s robust phrase overlap. However they wrestle when concepts are phrased otherwise (synonyms, paraphrases, translations).
Semantic fashions: seize that means reasonably than floor phrases, making them very best for PFCR. They’d acknowledge that, for instance, “the Earth orbits the Solar” and “our planet revolves across the star on the heart of the photo voltaic system” are describing the identical reality, though the wording is totally completely different.

As soon as candidates are retrieved, the reranking stage applies extra highly effective fashions (typically cross-encoders) to rigorously re-score the highest outcomes guaranteeing that probably the most related fact-checks rank larger. As rerankers are costlier to run, they’re solely utilized to a smaller pool of candidates (e.g., the highest 100).

Collectively, the retriever–reranker pipeline offers each protection (by recognizing a wider vary of doable matches) and precision (by rating larger probably the most related ones). For PFCR, this steadiness is essential because it permits a quick and scalable strategy to detect repeating claims, however with a excessive accuracy in order that customers can belief the knowledge they learn.

Constructing the Ensemble

The retriever–reranker pipeline already delivers stable efficiency. However as I evaluated the fashions and ran the experiments, one factor turned clear: no single mannequin is sweet sufficient by itself.

Lexical fashions, like BM25, are nice at precise key phrase matches, however as quickly because the declare is phrased otherwise, they fail. That’s the place semantic fashions step in. They haven’t any drawback with dealing with paraphrases, translations, or crosslingual eventualities, however typically wrestle with simple matches the place wording issues probably the most. Not all of the semantic fashions are the identical both, each had its personal area of interest: some work higher in English, others in multilingual settings, one other for capturing delicate contextual nuances. In different phrases, simply as misinformation mutates and reappears in numerous variations, semantic retrieval fashions additionally carry completely different strengths relying on how they had been educated. If misinformation is adaptable, then the retrieval system have to be as nicely.

That’s the place the thought of an ensemble got here in. As an alternative of betting on a single “finest” mannequin, I mixed the predictions of a number of fashions in an ensemble so they might collaborate and complement one another. As an alternative of counting on a single mannequin, why not allow them to work as a workforce.

Earlier than going additional into the ensemble design, I’ll briefly clarify the choice making course of for the selection of retrievers.

Establishing a Baseline (Lexical Fashions)

BM25 is among the best and extensively used lexical retrieval fashions typically used as a baseline in fashionable IR analysis. Earlier than evaluating the embedding-based (semantic) fashions, I used to be to see how good (or dangerous) BM25 can carry out. And because it seems, not dangerous in any respect!

Tech element:
BM25 is a rating perform constructed upon TF-IDF. It improves TF-IDF by introducing a saturation perform and doc size normalization. Not like time period frequency scoring, BM25 accounts for repeated occurrences of a time period, stopping lengthy paperwork from being unfairly favoured. It additionally features a parameter (b) that controls the load assigned to time period frequency and doc size.

Semantic Fashions

As a place to begin for the semantic (embedding-based) fashions, I referred to the HuggingFace’s Massive Text Embedding Benchmark (MTEB) and evaluated the main fashions whereas holding the GPU useful resource constraints in thoughts.

The 2 fashions that stood out had been E5 (intfloat/multilingual-e5-large-instruct) and BGE (BAAI/bge-m3). Each achieved robust outcomes when retrieving the highest 100 candidates, so I chosen them for additional tuning and integration with BM25.

Ensemble Design

With retrievers in place, the query was: how will we mix them? I examined completely different aggregation methods together with majority voting, exponential decay weighting, and reciprocal rank fusion (RRF).
RRF labored finest because it doesn’t simply common scores, it rewards paperwork that persistently seem excessive throughout completely different rankings, no matter which mannequin produced them. This fashion, the ensemble favored claims that a number of fashions “agreed on,” whereas nonetheless permitting every mannequin to contribute independently.

I additionally experimented with the variety of candidates retrieved within the first stage (generally known as hyperparameter ok). The thought is easy: in case you solely pull in a really small set of candidates, you danger lacking related fact-checks altogether. Alternatively, if you choose too many, the reranker has to undergo loads of noise, which provides computational value with out really enhancing accuracy.

By way of the experiments, I discovered that as ok elevated, efficiency improved at first as a result of the ensemble had extra probabilities to seek out the suitable fact-checks. However after a sure level, including extra candidates stopped serving to. The reranker might already see sufficient related fact-checks to make good choices, and the additional ones had been largely irrelevant. In observe, this meant discovering a “candy spot” the place the candidate pool was massive sufficient to make sure protection, however not so massive that it decreased the reranker’s effectiveness.

As a last step, I adjusted the weights of every mannequin. Decreasing the BM25’s affect whereas giving extra weight to the semantic retrievers boosted the efficiency. In different phrases, BM25 is helpful, however the heavy lifting is finished by E5 and BGE.

To shortly undergo the PFCR element; the pipeline consists of retrieval and reranking the place for the retrieval we will use lexical or semantic fashions whereas for the reranking we might use a semantic mannequin. Moreover, we seen that combining a number of fashions inside an ensemble improves the retrieval/reranking efficiency. Okay, so the place will we combine the ensemble?

The place Does the Ensemble Match?

The ensemble wasn’t restricted to only one a part of the pipeline. I utilized it inside each the retrieval and reranking.

Retriever stage → I merged the candidate lists produced by BM25, E5, and BGE. This fashion, the system didn’t depend on a single mannequin’s “view” of what is perhaps related however as an alternative pooled their views right into a stronger beginning set.
Reranker stage → I then mixed the rankings from a number of rerankers (once more referring to MTEB and my GPU constraints). Since every reranker captures barely completely different nuances of similarity, mixing them helped refine the ultimate ordering of fact-checks with better accuracy.

On the retriever stage, the ensemble enabled a wider pool of candidates, ensuring that fewer related claims slipped by way of the cracks (enhancing recall).Whereas the reranker stage narrowed down the main focus, pushing probably the most related fact-checks to the highest (enhancing precision).

Determine 2: Retriever-reranker ensemble pipeline (created by creator)

Bringing It All Collectively (TL;DR)

Lengthy story quick; the envisioned digital utopia for open info sharing doesn’t work with out verification, and may even create the opposite – a channel for misinformation.

That was the driving power for the event of automated fact-checking pipelines, which helped us transfer nearer to that authentic promise. They make it simpler to confirm info shortly and at scale, so when false claims pop up in new kinds, they are often noticed and addressed at once, serving to keep accuracy and belief within the digital world.

The takeaway is easy: variety is vital. Simply as misinformation spreads by taking over many kinds, a resilient fact-checking system advantages from a number of views working collectively. Utilizing an ensemble, the pipeline turns into extra sturdy, extra adaptable, and finally enabling a reliable digital house.

For the curious minds

If you happen to’re excited about a deeper technical dive into the retrieval and ensemble methods behind this pipeline, you’ll be able to try my full paper here. It goes into the mannequin decisions, experiments, and detailed analysis metrics inside the system.

References

Scott A. Hale, Adriano Belisario, Ahmed Mostafa, and Chico Camargo. 2024. Analyzing Misinformation Claims Through the 2022 Brazilian Common Election on WhatsApp, Twitter, and Kwai. ArXiv:2401.02395.

Rrubaa Panchendrarajan and Arkaitz Zubiaga. 2024. Declare detection for automated fact-checking: A survey on monolingual, multilingual and cross-lingual analysis. Pure Language Processing Journal, 7:100066.

Matúš Pikuliak, Ivan Srba, Robert Moro, Timo Hromadka, Timotej Smolen, Martin Melišek, Ivan ˇ Vykopal, Jakub Simko, Juraj Podroužek, and Maria Bielikova. 2023. Multilingual Beforehand FactChecked Declare Retrieval. In Proceedings of the 2023 Convention on Empirical Strategies in Pure Language Processing, pages 16477–16500, Singapore. Affiliation for Computational Linguistics.

Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, and Giovanni Da San Martino. 2021. Automated Reality-Checking for Aiding Human Reality-Checkers. ArXiv:2103.07769.

Oana Balalau, Pablo Bertaud-Velten, Younes El Fraihi, Garima Gaur, Oana Goga, Samuel Guimaraes, Ioana Manolescu, and Brahim Saadi. 2024. FactCheckBureau: Construct Your Personal Reality-Verify Evaluation Pipeline. In Proceedings of the thirty third ACM Worldwide Convention on Info and Data Administration, CIKM ’24, pages 5185–5189, New York, NY, USA. Affiliation for Computing Equipment

Alberto Barrón-Cedeño, Tamer Elsayed, Preslav Nakov, Giovanni Da San Martino, Maram Hasanain, Reem Suwaileh, Fatima Haouari, Nikolay Babulkov, Bayan Hamdan, Alex Nikolov, Shaden Shaar, and Zien Sheikh Ali. 2020. Overview of CheckThat! 2020: Automated Identification and Verification of Claims in Social Media. In Experimental IR Meets Multilinguality, Multimodality, and Interplay, pages 215–236, Cham. Springer Worldwide Publishing.

Ashkan Kazemi, Kiran Garimella, Devin Gaffney, and Scott Hale. 2021a. Declare Matching Past English to Scale International Reality-Checking. In Proceedings of the 59th Annual Assembly of the Affiliation for Computational Linguistics and the eleventh Worldwide Joint Convention on Pure Language Processing (Quantity 1: Lengthy Papers), pages 4504–4517, On-line. Affiliation for Computational Linguistics.

Shaden Shaar, Nikolay Babulkov, Giovanni Da San Martino, and Preslav Nakov. 2020. That could be a Identified Lie: Detecting Beforehand Reality-Checked Claims. In Proceedings of the 58th Annual Assembly of the Affiliation for Computational Linguistics, pages 3607– 3618, On-line. Affiliation for Computational Linguistics.

Alberto Barrón-Cedeño, Tamer Elsayed, Preslav Nakov, Giovanni Da San Martino, Maram Hasanain, Reem Suwaileh, Fatima Haouari, Nikolay Babulkov, Bayan Hamdan, Alex Nikolov, Shaden Shaar, and Zien Sheikh Ali. 2020. Overview of checkthat! 2020: Automated identification and verification of claims in social media. In Experimental IR Meets Multilinguality, Multimodality, and Interplay: eleventh Worldwide Convention of the CLEF Affiliation, CLEF 2020, Thessaloniki, Greece, September 22–25, 2020, Proceedings, web page 215–236, Berlin, Heidelberg. Springer-Verlag.

Gordon V. Cormack, Charles L A Clarke, and Stefan Buettcher. 2009. Reciprocal rank fusion outperforms condorcet and particular person rank studying strategies. In Proceedings of the thirty second Worldwide ACM SIGIR Convention on Analysis and Growth in Info Retrieval, SIGIR ’09, web page 758–759, New York, NY, USA. Affiliation for Computing Equipment

Iva Pezo, Allan Hanbury, and Moritz Staudinger. 2025. ipezoTU at SemEval-2025 Process 7: Hybrid Ensemble Retrieval for Multilingual Reality-Checking. In Proceedings of the nineteenth Worldwide Workshop on Semantic Analysis (SemEval-2025), pages 1159–1167, Vienna, Austria. Affiliation for Computational Linguistics.

Source link

Building Fact-Checking Systems: Catching Repeating False Claims Before They Spread

Loop Engineering for RAG Question Parsing: The Small Loop That Runs Before Retrieval

How to Find the Optimal Coding Agent Interface

I Completed Five Years in Analytics Consulting: 5 Lessons That Changed How I Work

GPU-Resident Top-K for Agentic RAG: I Built a CUDA Kernel So My Retrieval Step Would Stop Bouncing Off the GPU

Can Machine Learning Predict the World Cup?

Automate Writing Your LLM Prompts

These Were My Favorite Things Samsung Unpacked During Its 2026 Galaxy Event

AI minister role boosted but tech department axed in Burnham shake-up

Loop Engineering for RAG Question Parsing: The Small Loop That Runs Before Retrieval

The risk of weather data sabotage is rising

Featured Picks

The Flying Egg Parasite Fighter of the Cold War

VMware Collaborates with Samsung on Virtualized RAN to Pave Way for Open 5G Networks

Shopos Raises $20M, Backed by Binny Bansal: What’s Next for E-Commerce?

Building Fact-Checking Systems: Catching Repeating False Claims Before They Spread

: Why We Want Automated Reality-Checking

The Growth of Reality-checking Pipelines

Designing the PFCR Element (Retrieval Pipeline)

Constructing the Ensemble

Establishing a Baseline (Lexical Fashions)

Semantic Fashions

Ensemble Design

The place Does the Ensemble Match?

Bringing It All Collectively (TL;DR)

For the curious minds

References

Related Posts