Trustworthiness evaluation of Large language models using multi-criteria decision making

Aksoy, Meltam; Adem, Aylin; Dağdeviren, Metin

Trustworthiness evaluation of Large language models using multi-criteria decision making

Files

Trustworthiness_Evaluation_of_Large_Language_Models_Using_Multi-Criteria_Decision_Making.pdf (3.28 MB)

Date

2025-09-22

Authors

Aksoy, Meltam

Adem, Aylin

Dağdeviren, Metin

Abstract

As Large language models (LLMs) become increasingly integrated into high-stakes applications, ensuring their trustworthiness has emerged as a critical research concern. This study proposes a novel evaluation framework that applies a multi-criteria decision making (MCDM) methodology, specifically the hesitant fuzzy analytic hierarchy process (AHP), to assess and rank LLMs based on five key trust dimensions: fairness, robustness, integrity, explainability, and safety. Drawing from expert evaluations, the framework systematically determines the relative importance of each criterion and applies a weighted scoring approach to compare seven leading LLMs, including both proprietary models such as GPT-3.5, GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 and open-source models such as Llama 3.1, Mistral Large 2 and DeepSeek V3. Results reveal GPT-4o as the most trustworthy model, significantly outperforming its peers, particularly in robustness and fairness. Open-source models showed lower scores, especially in safety and explainability, highlighting persistent gaps in their alignment with trust expectations. The findings demonstrate the effectiveness of MCDM in capturing expert uncertainty and prioritizing trust criteria, offering a robust and adaptable framework for evaluating LLMs in dynamic and sensitive domains.

Keywords

Fuzzy AHP, Expert evaluation, Multi-criteria decision making, Large language models, Trustworthiness

Subjects based on RSWK

Entscheidungsfindung, Großes Sprachmodell, Vertrauenswürdigkeit

URI

http://hdl.handle.net/2003/44021

Collections

Chair of Data Science and Data Engineering

Full item page

Trustworthiness evaluation of Large language models using multi-criteria decision making

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Alternative Title(s)

Abstract

Description

Table of contents

Keywords

Subjects based on RSWK

Citation

URI

Collections