## Large Language Models and Knowledge Graphs: Opportunities and Challenges > [!Abstract]- > Large Language Models (LLMs) have taken Knowledge Representation -- and the world -- by storm. This inflection point marks a shift from explicit knowledge representation to a renewed focus on the hybrid representation of both explicit knowledge and parametric knowledge. In this position paper, we will discuss some of the common debate points within the community on LLMs (parametric knowledge) and Knowledge Graphs (explicit knowledge) and speculate on opportunities and visions that the renewed focus brings, as well as related research topics and challenges. > [!Cite]- > J. Z. Pan _et al._, “Large Language Models and Knowledge Graphs: Opportunities and Challenges,” Aug. 11, 2023, _arXiv_: arXiv:2308.06374. doi: [10.48550/arXiv.2308.06374](https://doi.org/10.48550/arXiv.2308.06374). > > [link](http://arxiv.org/abs/2308.06374) [online](http://zotero.org/users/17587716/items/PF428XSK) [local](zotero://select/library/items/PF428XSK) [pdf](file://C:\Users\erikt\Zotero\storage\KFSDUQ9M\Pan%20et%20al.%20-%202023%20-%20Large%20Language%20Models%20and%20Knowledge%20Graphs%20Opportunities%20and%20Challenges.pdf) ## Notes %% begin notes %% Pan et al. review the state of the art in integrating LLMs and Knowledge Graphs and lay out their vision for the field: - In Explicit-Knowledge-First use cases, our vision is that LLMs will enable, advance, and simplify crucial steps in the knowledge engineering pipeline so much as to enable KGs at unprecedented scale, quality, and utility. - In Parametric-Knowledge-First use cases, our vision is that KGs will improve, ground, and verify LLM generations so as to significantly increase reliability and trust in LLM usage. The authors acknowledge the field-shattering effect that LLMs have had on the field of knowledge representation in artificial intelligence. They argue that KGs do have a role to play in modern AI applications, but recommend to "murder your (pipeline) darlings," recognizing that LLMs can replace, out of the box, many of the techniques that have been so hard won in the field. %% end notes %% %% begin annotations %% ### Imported: 2025-08-04 3:34 pm Large Language Models (LLMs) have taken Knowledge Representation (KR)—and the world by storm, as they have demonstrated human-level performance on a vast spectrum of natural language tasks, including some tasks requiring human knowledge. Following this, people are gradually starting to accept the possibility of having knowledge represented in the parameters of some language models. The arrival of LLMs announces the era of Knowledge Computing, in which the notion of reasoning within KR is broadened to many computation tasks based on various knowledge representations. For a long time, people focused on explicit knowledge, such as those embedded in texts, sometimes also known as unstructured data, and those in a structured form, such as in databases and knowledge graphs (KGs) [123] Historically, for a long time, humans used texts to pass down their knowledge from one generation to another, until around the 1960s, when researchers started to study knowledge representation for better natural language understanding and developed early systems, such as ELIZA [180] at the MIT. In the early 2000s, the Knowledge Representation and the Semantic Web communities worked together to standardize the widely used knowledge representation languages, such as RDF [121] and OWL [55], at web scale, using which the large-scale knowledge bases are then more widely known as KGs [123], due to their helpful graph structures, enabling the both logical reasoning and graph-based learning. This inflection point, with the arrival of LLMs, marks a paradigm shift from explicit knowledge representation to a renewed focus on the hybrid representation of both explicit knowledge and parametric knowledge. Some works use LLMs to augment KGs for, e.g., knowledge extraction, KG construction, and refinement, while others use KGs to augment LLMs for, e.g., training and prompt learning, or knowledge augmentation. The usage of parametric and explicit knowledge together is a topic of debate in the Knowledge Computing community, with proponents and skeptics offering different perspectives. Proponents of LLMs like ChatGPT, highlight their ability to generalize from large-scale text corpora, capturing a range of information, and excellent language understanding capabilities. On the one hand, LLMs could generate plausible but incorrect or nonsensical responses, such as hallucinations, due to a lack of explicit knowledge representation [193]. There are also doubts on whether LLMs have the ability to learn directional entailments [96] or infer subsumption between concepts [61] On the other hand, KGs can be costly to build. While LLMs can be expensive to train too, they can be readily usable to support many downstream applications To sum up, in comparison to the classic trade-off between expressiveness and decidability in Knowledge Representation, here we have the trade-off between precision and recall considering using explicit and parametric knowledge in Knowledge Computing tasks. One of the key research questions on LLMs for the Knowledge Computing community (and beyond) is how much knowledge LLMs remember [107]. Investigations indicate that LLMs’ performance significantly deteriorates when dealing with random Wikidata facts, specifically those associated with long-tail entities, in comparison to popular entities, as evidenced in the PopQA dataset [107] and other datasets [133, 167]. This effect can be traced back to a causal relationship between the frequency of an entity’s appearance in the pre-training corpus and the LLMs’ capacity for memorization [44]. Even sizable LLMs face difficulties when trying to retain information about long-tail entities [80]. KGs inherently present an advantage over LLMs through their provision of knowledge about long-tail entities [78, 167] and thus can further help improve the recall for Knowledge Computing tasks. Specifically, ontology creation, which generally comprises manual rules factored by opinions, motivations, and personal choices, is a source of bias [73, 43]. Also, automated pipelines for KG construction exhibit gender bias [109] KGs are often preferred in scenarios where explainability and interpretability are crucial [28], as they explicitly represent relationships between entities and provide a structured knowledge representation. Skeptics of LLMs argue that these models lack transparency and interpretability, making it difficult to understand how they arrive at their answers or recommendations. Proponents of LLMs acknowledge the challenge of explainability but argue that recent research efforts [8, 72] are improving LLM’s interpretability through techniques like attention mechanisms, model introspection. Some also argue that Chain-of-Thoughts (CoT) [177] can also improve the explainability of LLMs, although question decomposition and precisely answering sub-questions with LLMs are still far from being solved. Attribution evaluation and augmentation of LLMs with e.g., source paragraphs and sentences is another recent research topic for improving their explainability in question answering [17]. One of the key questions this paper needs to answer is, now with the emergence of parametric knowledge, what new opportunities do we have? Here are some of our thoughts on such new opportunities with the arrival of parametric knowledge and its potential integration with explicit knowledge. Instant access to huge text corpora: As mentioned in the Introduction, for a long time, human beings passed down their knowledge in texts. Thus, a lot of knowledge these days are in textual form. Using LLMs gives access to extremely large text corpora at high speed, and recently even on consumer hardware [65]. This allows AI developers to avoid getting bogged down in previously critical challenges around data gathering, preparation, storage, and querying at scale. It also helps to reduce previously critical dependencies on the field of information retrieval. Out-of-the-box, with fine-tuning on a few examples, or via few-shot prompting, LLMs have advanced many tasks such as dependency and structured parsing, entity recognition, and relation extraction. By injecting explicit, and in particular structured, knowledge into LLMs, such as through retrieval augmented methods, one can make explicit knowledge more readily usable for such wide range of downstream tasks, further realising the vision of ‘Knowledge is power’. Now with potential novel approaches to combining parametric knowledge with explicit knowledge, it is possible to have even more advanced language understanding, not only for textual entailments, but also for other NLP tasks, such as summarization and consistent generation. An important step in traditional knowledge engineering is the consolidation and aggregation of conflicting and concurring pieces of information, requiring often elaborate methods for consolidating observations from sentences, patterns, and constraints [149]. In LLM training, an aggregation occurs automatically. Although this step is not entirely understood, it brings the potential for outsourcing a major challenge in knowledge engineering. In Explicit-Knowledge-First use cases, our vision is that LLMs will enable, advance, and simplify crucial steps in the knowledge engineering pipeline so much as to enable KGs at unprecedented scale, quality, and utility. In Parametric-Knowledge-First use cases, our vision is that KGs will improve, ground, and verify LLM generations so as to significantly increase reliability and trust in LLM usage. Entity resolution (also known as entity matching, entity linking or entity alignment) is the process of linking pieces of information occurring in multiple heterogeneous datasets and referring to the same world entity Concluding, entity alignment and matching are necessary pre-processing steps for full-fledged knowledge reasoning. The combination of general entity linking approaches with embeddingbased ones, as well as the leveraging of LLM-driven rule and labeled data construction, can lead to better integration of LLMs with knowledge reasoning [66] OntoGPT [25], which extracts instances from texts to populate an ontology using ChatGPT is known, but there are no counterparts for tables. LLMs have demonstrated proficiency in extracting knowledge from languages other than English, including low-resource languages, paving the way for cross-lingual knowledge extraction and enabling the utilization of LLMs in diverse linguistic contexts [89 Also, for constructing domain-specific KGs, the stakes are higher, and hence scrutinizing the generated text (by experts) is necessary. However, it is still a step forward since human annotation is less expensive than human text generation. Inductive link prediction (ILP), in contrast, focuses on techniques that can predict links to new entities not originally contained in a KG. The key research question of link prediction is how well a method could learn to infer new triples based on existing ones. LLMs are trained based on a massive corpus that might overlap with KGs such as Wikidata [169]. Thus it is not easy to distinguish whether the LLM completes the prediction by utilizing its memory or reasoning over existing facts Cao et al. [24] propose three paradigms for factual knowledge extraction from LLMs: prompt-based, case-based, and context-based. Prompt engineering [10] aims to create prompts that efficiently elicit desired responses from LLMs for a specific task. However, a limited number of manually created prompts only reveal a portion of the model’s encoded knowledge [74], as the response can be influenced by the phrasing of the question. Thus, prompt engineering is a crucial part of knowledge retrieval from LLMs. However, when extracting facts from LLMs, entity disambiguation presents several challenges, since LLMs only operate on word token level. Hence, polysemy and homonymy make it difficult to determine the correct entity when a term has multiple meanings or is spelled the same as others but has different meanings. Also, the need to resolve co-references, where the same entity is mentioned in various ways within a text, further complicates the process. Existing LLMs still manifest a low level of precision on long-tail entities. Models may begin to generate incorrect information when they fail to memorize the right facts. The answers provided by these models often lack consistency. Incorrect correlations drawn from the pre-training corpus can lead to various biases in KG completion. Extracting factual knowledge directly from LLMs does not provide provenance, the origin and credibility of the information, which presents multiple issues. Without provenance, verifying the accuracy of information becomes challenging, potentially leading to the spread of misinformation. Additionally, bias detection is hindered, as the lack of source information makes it difficult to account for potential biases in the data used for training. Provenance also provides critical context, without which, information can be misunderstood or misapplied. Lastly, the absence of source information compromises model transparency, making it hard to evaluate the accountability of the LLMs. a KG is never considered complete since the closed world assumption does not hold [40, 128], i.e., it is not possible to conclude that a missing fact is false unless it contradicts another existing fact. Instead, we usually consider that in a KG it holds the open-world assumption, that is a missing fact is simply considered as unknown. In KGs, rules and constraints can take the form of Graph Functional Dependencies [48], declarative first-order logic rules [52], or validating shapes [85, 135]. Nonetheless, a fundamental challenge is how to generate such rules and constraints. Specifying them manually is prohibitively difficult and expensive [2, 136]. On the one hand, the domain experts, who know the semantics for the dataset at hand, may not have the skill set or the background necessary to formally express those rules. Even when skilled, domain experts would require a substantial amount of manual work to exhaustively materialize a complete list of such rules [137]. One of the most promising abilities of LLMs is parsing long texts. In companies and organizations, documents exist that contain reference governing information, e.g., procedures, regulations, and specifications. Here we see an untapped opportunity in parsing these documents in relation to the entities and predicates in the KG to extract constraints. Yet, the challenge arises in the fact that the LLM needs to use the correct vocabulary of entities and relations and the correct rule syntax. Finally, an even more fundamental challenge is that of transcending the usage of LLMs for NLP alone, and using them directly on large sets of facts within a KG. We can think of this setting as a special kind of multi-modal LLMs, where the KG is a specific modality. Besides formally represented knowledge, real-world ontologies, such as the widely used medical ontology SNOMED CT1 and food ontology FoodOn2, also include a lot of meta information defined by different annotation properties for usability, such as entity labels, synonyms and natural language definition. Taking the concept obo:FOODON_00002809 in FoodOn as an example, it has not only formal knowledge such as named super concepts and logical restrictions, but also labels and synonyms (e.g., “edamame”), definitions (e.g., “Edamame is a preparation of immature soybean ...”), comments and so on. These meta information, especially the natural language text, further motivates people to use LLMs for ontology refinement. For a refinement task, usually there are quite a few existing examples in the original ontology. Therefore, a straightforward solution, which has been adopted by most current methods, is finetuning a Pre-trained Language Model such as BERT together with an attached classifier. Exploiting LLMs is a promising direction for ontology refinement, but it still needs much effort before they become practical tools. DeepOnto [59], which is a Python-based package that can support quite a few ontology engineering tasks, has already included some tools for ontology refinement and alignment using LLMs, but more development is needed to make it more accessible and to support generative LLMs like LLaMA and GPT-4. Ontology alignment (a.k.a. ontology matching), which is to identify cross-ontology mappings between entities that have an equivalent, subsumption or membership relationship, thus becomes especially important for knowledge integration. Firstly, KGs can be used as training data for LLMs. Secondly, triples in KGs can be used for prompt construction. Last but not least, KGs can be used as external knowledge in retrieval augmented language models. KGs usually contain information extracted from highly trusted sources, post-processed, and vetted by human evaluations. Information from KGs has been integrated into the pre-training corpus since natural language text alone can lead to limited information coverage [187, 130, 1, 184]. Using factual knowledge from KGs to pre-train LLMs has also infused structured knowledge [112]. This integration of KGs with LLMs, along with efficient prompts, has made it convenient to inject world knowledge and incorporate new evolving information into language models [41]. Additionally, knowledge expressed in high-resource language KBs has been transferred into LMs tuned for low-resource languages [201, 100]. Furthermore, grounding knowledge from KGs to pre-train LMs has shown improvements in performance on generation and QA tasks [30, 142, 120]. In another approach, [166] proposed an interpretable neuro-symbolic KB, where the memory consists of vector representations of entities and relations from an existing KB. These representations are augmented to an LM during pretraining and fine-tuning, enabling the model to excel in knowledge-intensive QA tasks. The attention received by the integration of KGs and LLMs has grown recently. Approaches like KnowPrompt [31] use KGs to incorporate semantic and prior knowledge among relation labels into prompt-tuning for relation extraction, enhancing the prompt construction process and optimizing their representation with structured constraints. Certain studies have utilized LLMs and prompts in the task of reasoning over KGs [34], e.g., LARK uses entities and relations in queries to find pertinent sub-graph contexts within abstract KGs, and then, performs chain reasoning over these contexts using LLM prompts of decomposed logical queries outperforming previous state-of-the-art approaches by a significant margin. The integration of KGs and LLMs in a unified approach holds significant potential, as their combination mutually enhances and complements each other in a valuable manner. For instance, KGs provide very accurate and explicit knowledge, which is crucial for some applications i.e. healthcare, whereas LLMs have been criticized for their lack of factual knowledge leading to hallucinations and inaccurate facts. secondly, LLMs lack explainability instead, KGs given their symbolic reasoning ability, are able to generate interpretable results. On the other hand, KGs are difficult to construct from unstructured text and suffer from incompleteness therefore, LLMs could be utilized in addressing these challenges by text processing. Various applications have adopted this methodology of combining LLMs with KGs, such as healthcare assistants3, question answering systems [188] or ChatBots, and sustainability, among others. ConceptNet is the most well-known commonsense knowledge graph, developed using manual crowdsourcing along with automated refinement techniques [102]. The first study to investigate extracting knowledge from a language model to the best of our knowledge was indeed one that targeted commonsense knowledge [159]. The authors mined commonsense triples such as hasProperty (apples, green) from the Google Web 1T n-gram data as well as from Microsoft’s Web-scale smoothed language models [67]. This was later extended into a large-scale commonsense knowledge graph [161] that covered a range of different relations and became a part of the WebChild KG [160]. Digital Healthcare is one of the most critical application domains for the adoption of LLMs. The needs of the major stakeholders (i.e., physicians, healthcare providers, and policymakers) row against the paradigm behind the creation of LLMs. In particular, the two major significant risks related to the model’s accuracy and the privacy concerns stemming from its usage. Datasets like Visual Genome [88] annotate images with scene graphs. A scene graph is a small KG that describes, with a structured formal graphical representation, the contents of an image in terms of objects (people, animals, items) as nodes connected via pairwise relationships (e.g., actions or positioning relationships) as edges. Therefore, Multimodal LLM can be trained to reason and exploit this additional representation offering an advanced ability to understand the contents of an image (or a video). On the other hand, the digitalization of domain specific documents, e.g., especially contracts, is enabling in-depth applications of machine intelligence to help humans more effectively perform time-consuming tasks. Among these, contract review costs humans substantial time, money, and attention (many law firms spend approximately 50% of their time reviewing contracts, costing hundreds of thousands of dollars) [63]. The Contract Understanding Atticus Dataset (CUAD) is a new dataset for legal contract review [63]. CUAD was created with legal experts and consisted of over 13,000 annotations. We give out the following recommendations: 1. Don’t throw out the KG with the paradigm shift: For a range of reliability or safety-critical applications, structured knowledge remains indispensible, and we have outlined many ways in which KGs and LLMs can fertilize each other. KGs are here to stay, do not just ditch them out of fashion. 2. Murder your (pipeline) darlings: LLMs have substantially advanced many tasks in the KG and ontology construction pipeline, and even made some tasks obsolete. Take critical care in examining even the most established pipeline components, and compare them continuously with the LLM-based state of the art. 3. Stay curious, stay critical: LLMs are arguably the most impressive artifact of AI research of the past years. Nonetheless, there exist a magnitude of exaggerated claims and expectations in the public as well as in the research literature, and one should retain a healthy dose of critical reflection. In particular, a fundamental fix to the so-called problem of hallucinations is not in sight. 4. The past is over, let’s begin the new journey: The advances triggered by LLMs have uprooted the field in an unprecedented manner, and enable to enter the field with significant shortcuts. There is no better time to start anew in fields related to Knowledge Computing, than now. %% end annotations %% %% Import Date: 2025-08-04T15:35:10.373-06:00 %%