## Integrating human-centered AI for land use policy: Insights from agricultural interventions in international development
> [!Abstract]-
> Policymakers often struggle with information overload from vast technical documentation, hindering effective evidence-based decision-making. This study explores how a human-centered artificial intelligence model was fine-tuned to analyze agricultural interventions within international development projects, providing a methodological foundation to support the synthesis of complex evidence for more informed land use policy formulation. Engaging domain experts and incorporating human expertise, we developed a taxonomy of land use practices—such as water resource management, land use planning, and agronomic practices—that reflects the nuanced realities of local interventions. By integrating this human-centered taxonomy into the model's training, we ensured that the artificial intelligence system efficiently identified and categorized interventions in a way that upholds humanistic practices and aligns with the needs of policymakers and communities. Our findings demonstrate that this approach enhances the analysis of land use interventions. The model proved to be both scalable and cost-effective, analyzing large volumes of data more rapidly than traditional human analysis. These results underscore the potential of human-centered artificial intelligence in transforming land use policymaking by empowering stakeholders with faster and more accurate data synthesis. This methodological approach has the potential to support policymakers in synthesizing evidence more efficiently, which could ultimately lead to more informed and effective land use policies and improved outcomes in international development.
> [!Cite]-
> Moore, Lindsey, Mindel van de Laar, Pui-Hang Wong, and Cathal O’Donoghue. “Integrating Human-Centered AI for Land Use Policy: Insights from Agricultural Interventions in International Development.” _Land Use Policy_ 158 (November 2025): 107716. [https://doi.org/10.1016/j.landusepol.2025.107716](https://doi.org/10.1016/j.landusepol.2025.107716).
>
> [link](https://www.sciencedirect.com/science/article/pii/S0264837725002509) [online](http://zotero.org/users/17587716/items/RMYM2IZF) [local](zotero://select/library/items/RMYM2IZF) [pdf](file:///home/eriktuck/Zotero/storage/JWIUAMC4/Moore%20et%20al.%20-%202025%20-%20Integrating%20human-centered%20AI%20for%20land%20use%20policy%20Insights%20from%20agricultural%20interventions%20in%20inter.pdf)
## Notes
%% begin notes %%
DELLM is a LORA on RoBERTa finetuned with a few 1000 examples of agricultural interventions from USAID evaluation reports. Taxonomy development required 3 years! The model signficantly outperformed GPT 3.5 Turbo on extracting taxonomy concepts from evaluation excerpts, although it wasn't clear how GPT was asked to perform this task--and not necessarily surprising if given no context on the taxonomy itself.
%% end notes %%
%% begin annotations %%
### Imported: 2025-12-18 1:15 pm
International development agencies rely heavily on rigorous evaluation of agricultural and land management interventions to guide policy and investment choices. However, the sheer volume of project evaluations and technical documentation—amounting to millions of pages—has made systematic evidence synthesis increasingly challenging.
We hypothesize that fine-tuning these artificial intelligence models with domainspecific knowledge significantly improves their performance in interpreting technical documentation.
Where the sector has employed LLMs, their use has primarily been focused on classifying the Sustainable Development Goals (SDGs), categorizing textual descriptions of aid activities, and determining SDG relevance in documents (Hajikhani and Suominen, 2022; LaFleur, 2023; Ricciardi et al., 2020; Toetzke et al., 2022). While valuable, such generalizations do not capture the complexity and specificity required to meaningfully analyze and interpret detailed technical documents.
The primary objective of this study is to introduce and evaluate a novel methodological approach to fine-tune the open-source LLM, Robustly Optimized BERT Pretraining Approach (RoBERTa), specifically adapted to the context of agricultural land use policy in international development.
By incorporating a comprehensive dataset of manually labeled intervention excerpts—coded by domain experts—we created the Development Evidence Large Learning Model (DELLM), designed to accurately identify and categorize agricultural interventions that influence land use policy.
The study assesses whether this fine-tuned model significantly improves the efficiency and accuracy of analyzing development project evaluations compared to both general-purpose models and traditional human-led analysis.
DELLM leverages Low-Rank Adaptation (LoRA) techniques
Secondly, recognizing the challenge posed by the vast quantity and diversity of existing project documentation, this study establishes a rigorous, expert-validated taxonomy of agricultural and land-userelated interventions. The resulting structured taxonomy, built upon a manually coded dataset of 46,599 excerpts from the United States Agency for International Development (USAID) agricultural project evaluations, enables systematic analysis at a previously unattainable scale.
Initially developed by USAID in 1970, the Logical Framework has become a widely adopted tool for systematically mapping desired outcomes to specific, actionable interventions (Sartorius, 1991).
Devlin et al. (2018) observed that training on a large dataset, followed by fine-tuning for specific tasks, resulted in consistent improvements on the General Language Understanding Evaluation (GLUE) benchmark for English as the model’s hyperparameters were scaled up (Wang et al., 2018).4 For instance, BioRoBERTa, which has been fine-tuned on biomedical corpora, has achieved notable improvements in tasks such as named entity recognition and relation extraction compared to general-purpose models (Lee et al., 2020). Similarly, LegalRoBERTa, tailored specifically to legal documents, has demonstrated superior performance in legal text classification and case law analysis (Chalkidis et al., 2020).
We drew from the DEC database, focusing on "final evaluations" and "final contractor/grantee reports." Our dataset included 1406 agriculture-related projects, but we also incorporated documents from non-agricultural sectors to improve the model’s ability to distinguish between relevant and irrelevant excerpts. This broader dataset, spanning all countries globally from 1956 until 2020, ensured that the model can accurately classify agricultural interventions in diverse contexts. Each evaluation report is substantial, with an average length of 100 pages.
The aim of the thematic coding process was to develop a custom taxonomy specifically tailored to USAID’s terminology. The initial set of (deductively derived) thematic codes (taxonomy) came from a USAID study that cataloged agricultural interventions implemented between 2010 and 2015. This study analyzed 195 final agricultural project evaluations across 64 countries (USAID, 2016a).
As a final step in the taxonomy creation, the proposed additions were reviewed by technical experts from USAID’s Center for Agriculture to ensure their relevance and accuracy. Their feedback was integrated into the final version of the agricultural framework, which now comprises 129 distinct intervention categories, each with a precise definition.
Each document was independently thematically coded using the derived taxonomy whereby segments of text referencing an intervention were extracted from the document and labeled by an expert based on the intervention categories within the taxonomy.
The coding team was on average allocating two minutes for coding of a single page. This coding was supplemented by an additional two minutes of the reviewer. Lastly, there was time spent in review meetings. The total workload for coding and creating a taxonomy surpassed 1200 workdays. With a team of five experts working part-time, this task was completed in three years. The result of this effort was a final dataset, comprising 46,599 labeled excerpts, that provides the foundational dataset for the fine-tuning of DELLM.
To address the challenge of processing vast amounts of international development documents, DELLM was developed as a domain-specific LLM tailored for identifying and categorizing agricultural interventions within USAID project documents. DELLM aims to automate the extraction of intervention-related information from these large datasets, with the potential to enhance both the efficiency and accuracy of document analysis. By leveraging advanced NLP techniques, DELLM is designed to assist decision-makers by reducing the reliance on laborintensive manual review processes.
RoBERTa was chosen as the foundational model for DELLM due to its exceptional ability to handle complex NLP tasks, particularly those requiring a deep understanding of contextual language.
PT’s unidirectional nature, focusing on generating text from left to right, makes it less suitable for tasks that require a deep understanding of both the preceding and succeeding context, which is essential for precise classification.
Overall, DELLM significantly outperformed manual coding methods in terms of accuracy, speed, and cost-efficiency. The model can process 100 documents within approximately 10 hours at a minimal cost of $3. In comparison, a human researcher, assuming an average rate of two minutes per page, would require roughly 20,000 hours to complete the same task.
The results of this study highlight significant benefits of integrating human-centered AI, specifically through fine-tuning large language models (LLMs) with domain-specific expertise. DELLM consistently outperformed general-purpose models, such as GPT-3.5 Turbo, across all critical evaluation metrics, including precision, recall, and F1 scores.
For instance, USAID’s Innovation Technology and Research unit reported accelerating their research capability, allowing analysts to pinpoint relevant examples approximately 300 times faster compared to manual analysis. Similarly, the USAID Tajikistan Mission reclaimed roughly 1250 annual staff hours, substantially alleviating workloads under tight deadlines.
Underidentification or misclassification of less frequently occurring interventions, such as land use planning or ecosystem management, could inadvertently deprioritize these critical areas in project design.
%% end annotations %%
%% Import Date: 2025-12-18T13:16:26.527-07:00 %%