agrawal2024

## Can Knowledge Graphs Reduce Hallucinations in LLMs? : A Survey > [!Abstract]- > The contemporary LLMs are prone to producing hallucinations, stemming mainly from the knowledge gaps within the models. To address this critical limitation, researchers employ diverse strategies to augment the LLMs by incorporating external knowledge, aiming to reduce hallucinations and enhance reasoning accuracy. Among these strategies, leveraging knowledge graphs as a source of external information has demonstrated promising results. In this survey, we comprehensively review these knowledge-graph-based augmentation techniques in LLMs, focusing on their efficacy in mitigating hallucinations. We systematically categorize these methods into three overarching groups, offering methodological comparisons and performance evaluations. Lastly, this survey explores the current trends and challenges associated with these techniques and outlines potential avenues for future research in this emerging field. > [!Cite]- > Agrawal, Garima, Tharindu Kumarage, Zeyad Alghamdi, and Huan Liu. “Can Knowledge Graphs Reduce Hallucinations in LLMs? : A Survey.” arXiv:2311.07914. Preprint, arXiv, March 16, 2024. [https://doi.org/10.48550/arXiv.2311.07914](https://doi.org/10.48550/arXiv.2311.07914). > > [link](http://arxiv.org/abs/2311.07914) [online](http://zotero.org/users/17587716/items/YU4G6BP2) [local](zotero://select/library/items/YU4G6BP2) [pdf](file://C:\Users\erikt\Zotero\storage\3FKIEIDC\Agrawal%20et%20al.%20-%202024%20-%20Can%20Knowledge%20Graphs%20Reduce%20Hallucinations%20in%20LLMs%20%20A%20Survey.pdf) ## Notes %% begin notes %% Agrawal et al catalogues knowledge-graph-based augmentation techniques in LLMs, focusing on their efficacy in mitigating **hallucinations**. Techniques are categorized as Knowledge-Aware Inference, Knowledge-Aware Learning, and Knowledge-Aware Validation. %% end notes %% %% begin annotations %% ### Imported: 2025-07-17 8:53 am LLMs also face challenges in accurately interpreting phrases or terms when the context is vague and resides in a knowledge gap region of the model, leading to outputs that may sound plausible but Figure 1: Knowledge Graphs (KG) employed to reduce hallucinations in LLMs at different stages. are often irrelevant or incorrect (Ji et al., 2023; Lenat and Marcus, 2023). This phenomenon, often termed "hallucinations," undermines the reliability of these models (Mallen et al., 2023). Zheng et al. (Zheng et al., 2023) demonstrate that augmenting these models with comprehensive external knowledge from KGs can boost their performance and facilitate a more robust reasoning process. Knowledge graphs (KGs) organize information into a structured format, capturing relationships between real-world entities, making it comprehensible to both humans and machines (Hogan et al., 2021). KGs are used in a semantic search to enhance search engines semantic understanding (Singhal, 2012), enterprise knowledge management (Deng et al., 2023b), supply chain optimization (Deng et al., 2023a), education (Agrawal et al., 2022), financial fraud detection (Mao et al., 2022), cybersecurity (Agrawal et al., 2023b), recommendation systems (Guo et al., 2020), and QA systems (Agrawal et al., 2023a; Omar et al., 2023; Jiang et al., 2021). This survey comprehensively reviews existing methodologies aimed at mitigating hallucinations and enhancing the reasoning capabilities of LLMs through the augmentation of KGs using these three techniques. We classify them as Knowledge-Aware Inference, Knowledge-Aware Learning, and Knowledge-Aware Validation. Baek et al. (Baek et al., 2023) introduced KAPING, which matches entities in questions to retrieve related triples from knowledge graphs for zero-shot question answering. Wu et al. (Wu et al., 2023) found that converting these triples into textualized statements enhances LLM performance. Sen et al. (Sen et al., 2023) developed a retriever module trained on a KGQA model, addressing the inadequacy of similarity-based retrieval for complex questions. StructGPT (Jiang et al., 2023) augments LLMs with data from knowledge graphs, tables, and databases, utilizing structured queries for information extraction. Other notable works include IAG (Zhang et al., 2023b), KICGPT (Wei et al., 2023), and SAFARI (Wang et al., 2023b). Different knowledge augmentation techniques using knowledge graphs, inspired by CoT and ToT prompting, enhance reasoning in domain-specific and open-domain tasks. “Rethinking with Retrieval" (He et al., 2022) model uses decomposed reasoning steps from chain-of-thought prompting to retrieve external knowledge, leading to more accurate and faithful explanations. IRCoT (Trivedi et al., 2022) interleaves generating chain-of-thoughts (CoT) and retrieving knowledge from graphs, iteratively guiding retrieval and reasoning for multi-step questions. MindMap (Wen et al., 2023) introduces a plug-and-play approach to evoke graph-of-thoughts reasoning in LLMs. Reasoning on Graphs (RoG) (Luo et al., 2023) uses knowledge graphs to create faithful reasoning paths based on various relations, enabling interpretable and accurate reasoning in LLMs. Complementary advancements include MoT (Li and Qiu, 2023), Democratizing Reasoning (Wang et al., 2023c), ReCEval (Prasad et al., 2023), RAP (Hao et al., 2023), EoT (Yin et al., 2023b) and Tree Prompting (Singh et al., 2023), each contributing uniquely to the development of reasoning capabilities in LLMs. Liu et al. (Liu et al., 2021) used a second model to produce question-related knowledge statements for deductions. Binder (Cheng et al., 2022) uses Codex to parse context and generate task API calls. KB-Binder (Li et al., 2023) also employs Codex to create logical drafts for questions, integrating knowledge graphs for complete answers. Brate et al. (Brate et al., 2022) create cloze-style prompts for entities in knowledge graphs, enhancing them with auxiliary data via SPARQL queries, improving recall and accuracy. KnowPrompt (Chen et al., 2022) generates prompts from a pre-trained model and tunes them for relation extraction in cloze-style tasks. BeamQA (Atif et al., 2023) uses a language model to generate inference paths for knowledge graph embeddingbased search in link prediction. ALCUNA (Yin et al., 2023a) and PRCA (Yang et al., 2023) are other significant methods in controlled generation. Another stage where we can address hallucination issues in LLMs is to utilize KGs to optimize their learning either by improving the quality of training data at the model pre-training stage or by finetuning the pre-trained language model (PLM) to adapt to specific tasks or domains. We classify these methods as Knowledge-Aware Pre-Training and Knowledge-Aware Fine-Tuning. Knowledge-Enhanced Models: These methods enriched the large-scale text corpora with KGs for improved language representation. ERNIE (Zhang et al., 2019) used masked language modeling (MLM) and next sentence prediction (NSP) in pre-training to capture the text’s lexical and syntactical elements, combining context with knowledge facts for predictions. ERNIE 3.0 (Sun et al., 2021a) further evolved by integrating an auto-regressive model with an auto-encoding network, addressing the limitations of a single autoregressive framework in exploring enhanced knowledge. Meanwhile, Rosset et al. (Rosset et al., 2020) introduced a knowledge-aware input through an entity tokenizer dictionary, enhancing semantic understanding without altering the transformer architecture. Knowledge-Guided Masking: utilized linked knowledge graphs to mask key entities in texts, enhancing question-answering and knowledge-base completion tasks by leveraging relational knowledge. Similarly, Sentiment Knowledge Enhanced Pre-training (SKEP) (Tian et al., 2020) employed sentiment masking to develop unified sentiment representations, improving performance across various sentiment analysis tasks. Knowledge-Fusion: These methods integrates the KGs into LLMs using graph query encoders (Wang et al., 2021; Ke et al., 2021; He et al., 2019). JointLK (Sun et al., 2021b) employed knowledge fusion and joint reasoning for commonsense question answering, selectively using relevant KG nodes and synchronizing updates between text and graph encoders. LKPNR (Runfeng et al., 2023) combined LLMs with KGs, enhancing semantic understanding in complex news texts to create a personalized news recommendation framework through a KG-augmented encoder. Knowledge-Probing: Knowledge probing involves examining language models to assess their factual and commonsense knowledge (Petroni et al., 2019). This process aids in evaluating and enhancing the models (Kassner et al., 2021; Swamy et al., 2021). Rewirethen-Probe (Meng et al., 2021) introduced a self-supervised contrastive-probing approach, utilizing biomedical knowledge graphs to learn language representations. SKILL (Moiseev et al., 2022) used synthetic sentences converted from WikiData (Seminar et al., 2019) KELM (Agarwal et al., 2020) used KGs to fine-tune the pre-trained model checkpoints. KGLM (Youn and Tagkopoulos, 2022) employed an entity-relation embedding layer with KG triples for link prediction tasks. Fine-tuning language models like ChatGPT, limited by their last knowledge update in 2021, is more efficient than training from scratch. It handles queries beyond this cutoff using a curated, domainspecific knowledge graph. The extent to which updated knowledge is integrated into the model remains to be determined. Onoe et al.’s (Onoe et al., 2023) evaluation framework indicate that while models can recall facts about new entities, inferring based on these is harder. The effect of updating knowledge on existing entities is still an open research question. The third category type uses structured data as a fact-checking mechanism and provides a reference for the model to verify information. The fact-aware language model, KGLM (Logan IV et al., 2019), referred to a knowledge graph to generate entities and facts relevant to the context. SURGE (Kang et al., 2022b) retrieves high similarity context-relevant triples as a sub-graph from a knowledge graph “Text critic" classifier (Lango and Dušek, 2023) was proposed to guide the generation by assessing the match between the input data and the generated text. FOLK (Wang and Shu, 2023) used first-order-logic (FOL) predicates for claim verification in online misinformation. Beyond verification, FOLK generates explicit explanations, providing valuable assistance to human fact-checkers in understanding and interpreting the model’s decisions. Retrieved facts enhance small LLMs: Smaller models, due to their limited parameter spaces, struggle to incorporate extensive knowledge in pretraining. Augmenting facts from knowledge graphs, rather than increasing model size, enhanced answer correctness by over 80% for question-answering tasks (Baek et al., 2023; Sen et al., 2023; Wu et al., 2023). However, the success of these methods with complex queries heavily relies on the retriever modules, whose capabilities are limited to the knowledge graph (BehnamGhader et al., 2022). Step-wise reasoning more effective in larger models: Variations of CoT methods offer costeffective control and task-specific tuning, enhancing model performance. For instance, RoG (Luo et al., 2023) reported an increase in ChatGPT’s accuracy from 66.8% to 85.7% in reasoning tasks with knowledge graph augmentation. Similarly, Mindmap (Wen et al., 2023) boosted accuracy in disease diagnosis and drug recommendation to 88.2% using a clinical reasoning graph. Controlled generation boosts the performance: Knowledge-controlled generation methods surpass baseline models in accuracy and contextual relevance, enhancing their ability to handle diverse queries (Chen et al., 2022; Cheng et al., 2022; Atif et al., 2023). However, these methods can vary in quality and are sometimes prone to generating incorrect or irrelevant information. Fact-checking ensures reliability: Knowledge validation through fact-checking reduces hallucinations by checking model-generated data against a knowledge graph, but it increases computational load and may miss some inaccuracies (Kang et al., 2022b; Lango and Dušek, 2023). After the extensive GPT series of LLMs, retraining the huge model with billions of parameters became impractical and resource-intensive. More efforts were made to fine-tune the models with task-specific data without training from scratch. Very recently, there has been a shift towards using knowledgeaugmented retrieval, reasoning, generation, and validation methods without incurring additional training costs. %% end annotations %% %% Import Date: 2025-07-17T08:53:50.678-06:00 %%