Trustworthiness evaluation of Large language models using multi-criteria decision making

Aksoy, Meltam; Adem, Aylin; Dağdeviren, Metin

Trustworthiness evaluation of Large language models using multi-criteria decision making

dc.contributor.author	Aksoy, Meltam
dc.contributor.author	Adem, Aylin
dc.contributor.author	Dağdeviren, Metin
dc.date.accessioned	2025-10-06T07:58:22Z
dc.date.available	2025-10-06T07:58:22Z
dc.date.issued	2025-09-22
dc.description.abstract	As Large language models (LLMs) become increasingly integrated into high-stakes applications, ensuring their trustworthiness has emerged as a critical research concern. This study proposes a novel evaluation framework that applies a multi-criteria decision making (MCDM) methodology, specifically the hesitant fuzzy analytic hierarchy process (AHP), to assess and rank LLMs based on five key trust dimensions: fairness, robustness, integrity, explainability, and safety. Drawing from expert evaluations, the framework systematically determines the relative importance of each criterion and applies a weighted scoring approach to compare seven leading LLMs, including both proprietary models such as GPT-3.5, GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 and open-source models such as Llama 3.1, Mistral Large 2 and DeepSeek V3. Results reveal GPT-4o as the most trustworthy model, significantly outperforming its peers, particularly in robustness and fairness. Open-source models showed lower scores, especially in safety and explainability, highlighting persistent gaps in their alignment with trust expectations. The findings demonstrate the effectiveness of MCDM in capturing expert uncertainty and prioritizing trust criteria, offering a robust and adaptable framework for evaluating LLMs in dynamic and sensitive domains.	en
dc.identifier.uri	http://hdl.handle.net/2003/44021
dc.language.iso	en
dc.relation.ispartofseries	IEEE access / Institute of Electrical and Electronics Engineers; 13
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Fuzzy AHP	en
dc.subject	Expert evaluation	en
dc.subject	Multi-criteria decision making	en
dc.subject	Large language models	en
dc.subject	Trustworthiness	en
dc.subject.ddc	004
dc.subject.rswk	Entscheidungsfindung	de
dc.subject.rswk	Großes Sprachmodell	de
dc.subject.rswk	Vertrauenswürdigkeit	de
dc.title	Trustworthiness evaluation of Large language models using multi-criteria decision making	en
dc.type	Text
dc.type.publicationtype	ResearchArticle
dcterms.accessRights	open access
eldorado.dnb.deposit	true
eldorado.doi.register	false
eldorado.secondarypublication	true
eldorado.secondarypublication.primarycitation	Aksoy, M., Adem, A., & Dağdeviren, M. (2025). Trustworthiness evaluation of Large language models using multi-criteria decision making. IEEE Access / Institute of Electrical and Electronics Engineers, 13, 168183–168201. https://doi.org/10.1109/access.2025.3612568
eldorado.secondarypublication.primaryidentifier	https://doi.org/10.1109/ACCESS.2025.3612568

Dateien

Originalbündel

Gerade angezeigt 1 - 1 von 1

Name:: Trustworthiness_Evaluation_of_Large_Language_Models_Using_Multi-Criteria_Decision_Making.pdf
Größe:: 3.28 MB
Format:: Adobe Portable Document Format
Beschreibung:: DNB

Herunterladen

Lizenzbündel

Gerade angezeigt 1 - 1 von 1

Name:: license.txt
Größe:: 4.82 KB
Format:: Item-specific license agreed upon to submission
Beschreibung:

Herunterladen

Sammlungen

Chair of Data Science and Data Engineering