## Knowledge Graphs, Large Language Models, and Hallucinations: An NLP Perspective > [!Abstract]- > Large Language Models (LLMs) have revolutionized Natural Language Processing (NLP) based applications including automated text generation, question answering, chatbots, and others. However, they face a significant challenge: hallucinations, where models produce plausible-sounding but factually incorrect responses. This undermines trust and limits the applicability of LLMs in different domains. Knowledge Graphs (KGs), on the other hand, provide a structured collection of interconnected facts represented as entities (nodes) and their relationships (edges). In recent research, KGs have been leveraged to provide context that can fill gaps in an LLM understanding of certain topics offering a promising approach to mitigate hallucinations in LLMs, enhancing their reliability and accuracy while benefiting from their wide applicability. Nonetheless, it is still a very active area of research with various unresolved open problems. In this paper, we discuss these open challenges covering state-of-the-art datasets and benchmarks as well as methods for knowledge integration and evaluating hallucinations. In our discussion, we consider the current use of KGs in LLM systems and identify future directions within each of these challenges. > [!Cite]- > > > [link](http://arxiv.org/abs/2411.14258) [online](http://zotero.org/users/17587716/items/2QRIC25Y) [local](zotero://select/library/items/2QRIC25Y) [pdf](file://C:\Users\erikt\Zotero\storage\MCNRSG2U\Lavrinovics%20et%20al.%20-%202024%20-%20Knowledge%20Graphs,%20Large%20Language%20Models,%20and%20Hallucinations%20An%20NLP%20Perspective.pdf) ## Notes %% begin notes %% %% end notes %% %% begin annotations %% ### Imported: 2025-08-05 3:18 pm A major flaw that prevents widespread deployment of LLMs is factual inconsistencies, also referred to as hallucinations, which impair trust in AI systems and even pose societal risks in the form of generating convincing false information (Augenstein et al., 2024, Puccetti et al., 2024). While Perkovic ́ et al. (2024) points out that hallucinations can be useful for brainstorming or generating artwork, they are a limiting factor for contexts where factuality is a priority, including use cases that require large-scale text processing, such as question answering, information retrieval, summarization, and recommendations. Recent research (Pan et al., 2024, 2023) has identified Knowledge Graphs (KGs) as relevant structured information of knowledge for factual grounding that LLMs can be synergized with and conditioned on to improve general factual consistency of an LLM’s output. the evaluation of hallucinations is a complex problem in itself since for generative tasks it is necessary to evaluate the semantics of the output. Keeping this in mind, there are metrics, such as BERTScore Zhang* et al. (2020), and BARTScore Yuan et al. (2021) that evaluate semantic similarity between two pieces of text, e.g., LLM output and reference text. Additionally, textual entailment models can be used to classify whether a part of a hypothesis (LLM output) entails or contradicts a given premise (factual knowledge). Furthermore, we argue for the importance of the following open research directions in which KGs can play a critical role: 1. Robust detection of hallucinations with a fine-grained overview of particular hallucinatory text spans 2. Effective methods for integrating knowledge in LLMs that move away from textual prompting 3. Evaluation of factuality in a multi-prompt, multilingual, and multitask space for an in-depth analysis of model performance A common naive method to integrate external knowledge is through prompting. Given a prompt P, the LLM input can be formed through pairs of knowledge K and queries Q resulting in P = {K, Q}. This is used in RAG applications to append full documents or knowledge triples Lewis et al. (2020), Sun et al. (2023). Such an approach is problematic as the LLM output depends on hand-crafting the prompt template through the overall phrasing of the query, quality of the relevant evidence, fixed context window lengths and lack of control over efficient usage of the prompt text by the model. These problems are also outlined by Mizrahi et al. (2024) and we support the call for more robust evaluation methodologies at least through multi-prompt evaluations. To this end, reliance on prompting can also be observed in other previous works Guo et al. (2024), Jin et al. (2024), Mou et al. (2024). Prompt-based knowledge injection is also limited by the context window size and does not deal with cases where the model’s internal knowledge may conflict with the provided evidence. Therefore, context-aware decoding Shi et al. (2024) proposes a strategy for prioritizing in-prompt knowledge through a learnable parameter. It is worth noting that context-aware decoding requires two inference passes to generate a final output, therefore increasing the computational cost twofold. Another line of work Guan et al. (2024) proposes retrofitting LLM output factuality by consulting an external KG once an answer is generated by an LLM. The methodology follows a 5-stage pipeline, where an output is generated, claims are extracted, cross-checked against an external KG, and afterwards the original output is patched up as needed according to a claim verification module. Each of the five stages in the pipeline relies on an LLM performing the designated task. Reliance on knowledge pre-training means that the knowledge is encoded statically. While the methods suggest factual and task-specific improvements, this approach does not solve the fundamental problem of rapid knowledge updates required by use cases where knowledge develops continuously While previous methods suggest improvements, hallucination mitigation is still an ongoing research problem with no single solution that is general enough to solve the task at hand. We believe the semantic web and NLP communities together can solve the problem by combining expertise and research within effective and multilingual graph creation and completion, entity extraction from text, graph embedding extraction, multilingual entity linking, and exploring methods of synergizing KGs and LLMs. %% end annotations %% %% Import Date: 2025-08-05T15:18:58.330-06:00 %%