Trustworthiness evaluation of Large language models using multi-criteria decision making

dc.contributor.authorAksoy, Meltam
dc.contributor.authorAdem, Aylin
dc.contributor.authorDağdeviren, Metin
dc.date.accessioned2025-10-06T07:58:22Z
dc.date.available2025-10-06T07:58:22Z
dc.date.issued2025-09-22
dc.description.abstractAs Large language models (LLMs) become increasingly integrated into high-stakes applications, ensuring their trustworthiness has emerged as a critical research concern. This study proposes a novel evaluation framework that applies a multi-criteria decision making (MCDM) methodology, specifically the hesitant fuzzy analytic hierarchy process (AHP), to assess and rank LLMs based on five key trust dimensions: fairness, robustness, integrity, explainability, and safety. Drawing from expert evaluations, the framework systematically determines the relative importance of each criterion and applies a weighted scoring approach to compare seven leading LLMs, including both proprietary models such as GPT-3.5, GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 and open-source models such as Llama 3.1, Mistral Large 2 and DeepSeek V3. Results reveal GPT-4o as the most trustworthy model, significantly outperforming its peers, particularly in robustness and fairness. Open-source models showed lower scores, especially in safety and explainability, highlighting persistent gaps in their alignment with trust expectations. The findings demonstrate the effectiveness of MCDM in capturing expert uncertainty and prioritizing trust criteria, offering a robust and adaptable framework for evaluating LLMs in dynamic and sensitive domains.en
dc.identifier.urihttp://hdl.handle.net/2003/44021
dc.language.isoen
dc.relation.ispartofseriesIEEE access / Institute of Electrical and Electronics Engineers; 13
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectFuzzy AHPen
dc.subjectExpert evaluationen
dc.subjectMulti-criteria decision makingen
dc.subjectLarge language modelsen
dc.subjectTrustworthinessen
dc.subject.ddc004
dc.subject.rswkEntscheidungsfindungde
dc.subject.rswkGroßes Sprachmodellde
dc.subject.rswkVertrauenswürdigkeitde
dc.titleTrustworthiness evaluation of Large language models using multi-criteria decision makingen
dc.typeText
dc.type.publicationtypeResearchArticle
dcterms.accessRightsopen access
eldorado.dnb.deposittrue
eldorado.doi.registerfalse
eldorado.secondarypublicationtrue
eldorado.secondarypublication.primarycitationAksoy, M., Adem, A., & Dağdeviren, M. (2025). Trustworthiness evaluation of Large language models using multi-criteria decision making. IEEE Access / Institute of Electrical and Electronics Engineers, 13, 168183–168201. https://doi.org/10.1109/access.2025.3612568
eldorado.secondarypublication.primaryidentifierhttps://doi.org/10.1109/ACCESS.2025.3612568

Dateien

Originalbündel

Gerade angezeigt 1 - 1 von 1
Lade...
Vorschaubild
Name:
Trustworthiness_Evaluation_of_Large_Language_Models_Using_Multi-Criteria_Decision_Making.pdf
Größe:
3.28 MB
Format:
Adobe Portable Document Format
Beschreibung:
DNB

Lizenzbündel

Gerade angezeigt 1 - 1 von 1
Lade...
Vorschaubild
Name:
license.txt
Größe:
4.82 KB
Format:
Item-specific license agreed upon to submission
Beschreibung: