## SustainGraph: A knowledge graph for tracking the progress and the interlinking among the sustainable development goals’ targets
> [!Abstract]-
> The development of solutions to manage or mitigate climate change impacts is very challenging, given the complexity and dynamicity of the socio-environmental and socio-ecological systems that have to be modeled and analyzed, and the need to include qualitative variables that are not easily quantifiable. The existence of qualitative, interoperable and well-interlinked data is considered a requirement rather than a desire in order to support this objective, since scientists from different disciplines will have no option but to collaborate and co-design solutions, overcoming barriers related to the semantic misalignment of the plethora of available data, the existence of multiple data silos that cannot be easily and jointly processed, and the lack of data quality in many of the produced datasets. In the current work, we present the SustainGraph, as a Knowledge Graph that is developed to track information related to the progress towards the achievement of targets defined in the United Nations Sustainable Development Goals (SDGs) at national and regional levels. The SustainGraph aims to act as a unified source of knowledge around information related to the SDGs, by taking advantage of the power provided by the development of graph databases and the exploitation of Machine Learning (ML) techniques for data population, knowledge production and analysis. The main concepts represented in the SustainGraph are detailed, while indicative usage scenarios are provided. A set of opportunities to take advantage of the SustainGraph and open research areas are identified and presented.
> [!Cite]-
> Fotopoulou, Eleni, Ioanna Mandilara, Anastasios Zafeiropoulos, et al. “SustainGraph: A Knowledge Graph for Tracking the Progress and the Interlinking among the Sustainable Development Goals’ Targets.” _Frontiers in Environmental Science_ 10 (October 2022). [https://doi.org/10.3389/fenvs.2022.1003599](https://doi.org/10.3389/fenvs.2022.1003599).
>
> [link](https://www.frontiersin.org/journals/environmental-science/articles/10.3389/fenvs.2022.1003599/full) [online](http://zotero.org/users/17587716/items/TPLNTXJ3) [local](zotero://select/library/items/TPLNTXJ3) [pdf](file://C:\Users\erikt\Zotero\storage\HLEA82V3\Fotopoulou%20et%20al.%20-%202022%20-%20SustainGraph%20A%20knowledge%20graph%20for%20tracking%20the%20progress%20and%20the%20interlinking%20among%20the%20sustainable.pdf)
## Notes
%% begin notes %%
Presents SustainGraph, a property graph based KG for interlinking datasets related to the SDGs. Focuses on data interlinking than semantic interoperability. Presents interesting use cases that highlight data analysis with linked data (e.g., translates EU-focused and general SDG KPIs to combined datasets). Presented at KnowledgeGraph conference in 2023. Code available at https://gitlab.com/netmode/sustaingraph (as series of `.ipynb` notebooks). References [[joshi_2021]] but does not adopt RDF for graph; instead publishes an ontology separately, available [here](https://netmode.gitlab.io/sustaingraph-ontology/).
%% end notes %%
%% begin annotations %%
### Imported: 2025-10-20 11:26 am
Under this perspective, we present the SustainGraph as a Knowledge Graph (KG) that has been conceptualized and developed to track the progress towards the SDG targets,
A systemic nexus approach has been considered for supporting the data population processes of the KG, while taking advantage of participatory system mapping processes (Matti et al., 2020; Midgley and Lindhult, 2021). By the term systemic nexus, we refer to the interconnection of resource management concepts, considering resources such as energy, water, food, land and climate. In the context of the SDGs, a nexus approach can facilitate the advancement of multiple SDGs simultaneously, while reducing the risk that contributions to one SDG undermine progress on another (van Zanten and van Tulder, 2021).
Specifically, the effective fusion of the collected data and their transformation to systematized nexus-coherent knowledge, can lead to novel insights (Laspidou et al., 2020), significant improvement of the participatory processes (Matti et al., 2020) and the development of collective environmental intelligence (Zafeiropoulos et al., 2021) among the engaged stakeholders and communities.
we detail the implementation of the SustainGraph and the set of data population mechanisms from a plethora of open data sources and data providers. Data
population to the KG and data analysis over the KG are assisted through the exploitation of Machine Learning (ML) techniques. In this way, participatory modeling and analysis processes can be designed and implemented, taking advantage of the semantic alignment of the represented terms and the knowledge produced through the analysis of the information that is made available in the SustainGraph.
Systems innovation refers to the development of novel participatory technological solutions and breakthroughs that can lead to major transformation in national and regional economies (De Vicente Lopez and Matti, 2016).
Knowledge management is a fundamental part of the systems innovation approach, since a collective understanding of the system is crucial to develop transformative solutions.
By getting access to semantically aligned and interlinked data, a participatory modeling process can be facilitated. Interdisciplinary scientists can collaborate more easily and co-create their models, given the alignment of terms coming from different scientific domains. Such modeling processes can be based on the adoption of modeling tools, such as System Dynamics Modeling, to better understand complex systems and lead to the creation of new knowledge by revealing feedback loops as well as interlinkages and cascading effects that propagate through the system (Laspidou et al., 2020).
KGs are considered suitable for bridging data silos, by interlinking the concepts represented in the graphs with welldefined semantics (see Figure 3). In this way, the interconnected datasets in the KG can be enriched with meaning, misalignment of terminologies of the same concepts under different data schemas can be tackled, while relationships among concepts can be made explicit. Thus, the main motivation for the development of a KG is the usage of graphs to represent data -that can be interconnected and enriched with meaning-to explicitly represent knowledge (Noy et al., 2019; Hogan et al., 2021).
Data volatility is managed, since relationships among nodes in a KG can be dynamic, making them suitable for representation of complex and dynamic systems (e.g., socioenvironmental systems (Zafeiropoulos et al., 2021)).
Keeping a high standard of data quality in a KG is challenging and is related mostly with the data quality of the input data. Quality management processes have to be applied to identify data quality issues (e.g., data inconsistency, data redundancy, missing values) and proceed to improvements (e.g., outliers removal) (Xue and Zou, 2022).
Moving one step further, KGs facilitate reasoning over the available data and support analysis and complex decisionmaking (see Figure 3). Reasoning over KGs is required to obtain new knowledge, extract insights and conclusions from existing data (Chen X. et al., 2020). Through reasoning, KG completion and evolution can be supported via the identification and prediction of new relationships among entities (Chen Z. et al., 2020; Issa et al., 2021).
The SustainGraph is specified and developed in the form of a labeled property graph (LPG) model.
However, the LPG model does not support a formal language representation that can be used for automated knowledge reasoning.
To properly detail the semantic information associated with each node and relationship, a SustainGraph ontology has been made available (Mandilara et al., 2022). The ontological description of the main concepts introduced in the SustainGraph can be considered as accompanying information of the structure introduced in the LPG model.
The main set of entities in the SustainGraph has to do with the description of the structure of the UN Sustainable Development Goals (SDGs), building upon an existing formal knowledge organization system for this purpose (Joshi et al., 2021).
The SustainGraph is developed based on the Neo4j graph data platform. It is conceptualized in the form of a labeled property graph (LPG) model (Fotopoulou et al., 2022), as well as in the form of an ontology (Mandilara et al., 2022). The data population mechanisms are implemented through Python scripts by using the Py2neo client library and toolkit that is supported by Neo4j. For the data analysis pipelines, the Neo4j Graph Data Science data analytics and machine learning platform is used. Visualizations are produced based on the usage of the NeoDash dashboard builder for the Neo4j graph database, the Neo4j Bloom visualization tool and SemSpect as a scalable graphical exploration interface for knowledge graphs. The SustainGraph is released as an open-source KG that can be adopted and used by the scientific community. It is made openly available in a GitLab repository (Fotopoulou et al., 2022) under an Eclipse Public License 2.0.
Even by having performed a basic conceptualization step, a great deal of work is still ahead to make the SustainGraph easily adoptable and exploitable by end users coming from various disciplines and perspectives (e.g., socio-environmental scientists, policy makers, data scientists, citizen observatories).
%% end annotations %%
%% Import Date: 2025-10-20T11:26:44.304-06:00 %%