Statistische Methoden in den Sozialwissenschaften

Permanent URI for this collection

http://hdl.handle.net/2003/38456

Browse

Now showing 1 - 7 of 7

Estimating and using block information in the Thurstonian IRT model
(2025-01-01) Frick, Susanne
Multidimensional forced-choice (MFC) tests are increasing in popularity but their construction is complex. The Thurstonian item response model (Thurstonian IRT model) is most often used to score MFC tests that contain dominance items. Currently, in a frequentist framework, information about the latent traits in the Thurstonian IRT model is computed for binary outcomes of pairwise comparisons, but this approach neglects stochastic dependencies. In this manuscript, it is shown how to estimate Fisher information on the block level. A simulation study showed that the observed and expected standard errors based on the block information were similarly accurate. When local dependencies for block sizes > 2 were neglected, the standard errors were underestimated, except with the maximum a posteriori estimator. It is shown how the multidimensional block information can be summarized for test construction. A simulation study and an empirical application showed small differences between the block information summaries depending on the outcome considered. Thus, block information can aid the construction of reliable MFC tests.
Sequence data mining in cognitive science
(2024) Huang, He; Doebler, Philipp; Pauly, Markus
This thesis summarizes my research work over a five-year period from February 2020 to August 2024, including all of the papers I published during that time. As it is a cumulative, this thesis provides a concise overview of the contributed articles, omitting exhaustive results and instead referring to the original publications for full details. The main text integrates these publications into a coherent narrative, starting with basic concepts and providing background on the respective research areas. For an in-depth discussion of specific research findings, readers are recommended to consult the relevant articles directly. This thesis covers the field of sequence data mining (SDM) in cognitive science. Cognitive science increasingly examines sequence data to understand cognitive tasks involving ordered steps or elements, such as language processing, decision-making, and memory formation. SDM techniques are used to uncover patterns and models within sequential data. However, modern data mining techniques like deep learning, which have been broadly applied in other domains, have not been fully integrated into traditional cognitive science tasks. Moreover, cognitive science deals with complex sequence data, such as scanpaths and trajectories, which pose challenges that traditional pattern discovery methods and modern techniques have not successfully overcome. This thesis aims to extend SDM methods in cognitive science by focusing on the application of advanced techniques and the creation of new methods specifically tailored for handling these complex, domain-specific sequences. For instance, a machine learning-based pipeline for automatic scoring in diversity thinking tasks is proposed in one of my published papers, utilizing algorithms such as Random Forest, XGBoost, and Support Vector Regression. Another two papers introduce novel approaches to analysis scanpaths and handwritten trajectories. Through experimental validation in each paper, the newly developed methods demonstrate superior performance compared to existing approaches. Overall, my research advances SDM by integrating modern data mining techniques to address the challenges posed by complex sequential data in cognitive science.
Item response models for count data
(2024) Beisemann, Marie; Doebler, Philipp; Groll, Andreas
Item response theory (IRT) represents a statistical framework within which responses to psychological tests can be modelled. A psychological test consists of a set of items (e.g., tasks to solve or statements to rate) to which a person taking the test responds. IRT assumes that responses are influenced by respondents' latent traits (e.g., personality traits or cognitive abilities) as well as by items' characteristics (e.g., difficulty). IRT models exist for a variety of different response types; the focus of this thesis lies on count responses. These can for example be generated by cognitive tests measuring idea fluency (counts: number of ideas), as process data during test taking (counts: number of clicks), or by reading proficiency assessments (counts: number of errors). Previously comparatively understudied, the field of count item response theory (CIRT) has witnessed a steady increase in interest in recent years. As a result, a number of new CIRT models have been proposed that address limitations of previously existing CIRT models, broadening the empirical applicability of CIRT. An important concern regarding modelling of counts is their dispersion: The most common distribution for counts, the Poisson distribution, assumes its mean equals its variance (so called equidispersion). By relying on the Poisson distribution, prominent CIRT models assume such equidispersion for responses (conditional on the latent trait(s)). Research has found this assumption empirically violated for some tests. A recently introduced unidimensional CIRT model using the Conway-Maxwell-Poisson (CMP) distribution instead, accommodates over- and underdispersed conditional responses as well. Nonetheless, the model maintains some of the restricting assumptions of previous models. Thus, even with new model proposals, CIRT still offers less modelling flexibility than IRT for other response types (such as binary responses). The present cumulative thesis aims to address three such gaps in the CIRT landscape. In the first article, I propose a unidimensional CIRT model with a conditional CMP response distribution which extends a previously proposed model through the inclusion of another item parameter (i.e., a discrimination parameter). As such a model has previously not been computable with existing estimation methods, I derive a maximum likelihood estimation procedure to this end, using the Expectation-Maximization (EM) algorithm. In the second article, we propose two extensions of this model which allow the inclusion of item- and person-specific covariates, respectively. Therewith, we allow to investigate explanations for differences between items and participants, respectively. Again, we provide corresponding estimation methods. In the third article, we generalize the unidimensional CIRT model proposed in the first article to a multidimensional count item response model framework, with a focus on exploratory models. We provide a respective estimation procedure, of which we additionally develop a lasso-penalized variant. The articles in this thesis are accompanied by the development of an R package that implements the proposed models and estimation methods.
A flexible approach to modelling over‐, under‐ and equidispersed count data in IRT: the Two‐Parameter Conway–Maxwell–Poisson model
(2022-06-09) Beisemann, Marie
Several psychometric tests and self-reports generate count data (e.g., divergent thinking tasks). The most prominent count data item response theory model, the Rasch Poisson Counts Model (RPCM), is limited in applicability by two restrictive assumptions: equal item discriminations and equidispersion (conditional mean equal to conditional variance). Violations of these assumptions lead to impaired reliability and standard error estimates. Previous work generalized the RPCM but maintained some limitations. The two-parameter Poisson counts model allows for varying discriminations but retains the equidispersion assumption. The Conway–Maxwell–Poisson Counts Model allows for modelling over- and underdispersion (conditional mean less than and greater than conditional variance, respectively) but still assumes constant discriminations. The present work introduces the Two-Parameter Conway–Maxwell–Poisson (2PCMP) model which generalizes these three models to allow for varying discriminations and dispersions within one model, helping to better accommodate data from count data tests and self-reports. A marginal maximum likelihood method based on the EM algorithm is derived. An implementation of the 2PCMP model in R and C++ is provided. Two simulation studies examine the model's statistical properties and compare the 2PCMP model to established models. Data from divergent thinking tasks are reanalysed with the 2PCMP model to illustrate the model's flexibility and ability to test assumptions of special cases.
The machines take over: a comparison of various supervised learning approaches for automated scoring of divergent thinking tasks
(2022-08-08) Buczak, Philip; Huang, He; Forthmann, Boris; Doebler, Philipp
Traditionally, researchers employ human raters for scoring responses to creative thinking tasks. Apart from the associated costs this approach entails two potential risks. First, human raters can be subjective in their scoring behavior (inter-rater-variance). Second, individual raters are prone to inconsistent scoring patterns (intra-rater-variance). In light of these issues, we present an approach for automated scoring of Divergent Thinking (DT) Tasks. We implemented a pipeline aiming to generate accurate rating predictions for DT responses using text mining and machine learning methods. Based on two existing data sets from two different laboratories, we constructed several prediction models incorporating features representing meta information of the response or features engineered from the response’s word embeddings that were obtained using pre-trained GloVe and Word2Vec word vector spaces. Out of these features, word embeddings and features derived from them proved to be particularly effective. Overall, longer responses tended to achieve higher ratings as well as responses that were semantically distant from the stimulus object. In our comparison of three state-of-the-art machine learning algorithms, Random Forest and XGBoost tended to slightly outperform the Support Vector Regression.
Comparison of random‐effects meta‐analysis models for the relative risk in the case of rare events - a simulation study
(2020-06-08) Beisemann, Marie; Doebler, Philipp; Holling, Heinz
Pooling the relative risk (RR) across studies investigating rare events, for example, adverse events, via meta‐analytical methods still presents a challenge to researchers. The main reason for this is the high probability of observing no events in treatment or control group or both, resulting in an undefined log RR (the basis of standard meta‐analysis). Other technical challenges ensue, for example, the violation of normality assumptions, or bias due to exclusion of studies and application of continuity corrections, leading to poor performance of standard approaches. In the present simulation study, we compared three recently proposed alternative models (random‐effects [RE] Poisson regression, RE zero‐inflated Poisson [ZIP] regression, binomial regression) to the standard methods in conjunction with different continuity corrections and to different versions of beta‐binomial regression. Based on our investigation of the models' performance in 162 different simulation settings informed by meta‐analyses from the Cochrane database and distinguished by different underlying true effects, degrees of between‐study heterogeneity, numbers of primary studies, group size ratios, and baseline risks, we recommend the use of the RE Poisson regression model. The beta‐binomial model recommended by Kuss (2015) also performed well. Decent performance was also exhibited by the ZIP models, but they also had considerable convergence issues. We stress that these recommendations are only valid for meta‐analyses with larger numbers of primary studies. All models are applied to data from two Cochrane reviews to illustrate differences between and issues of the models. Limitations as well as practical implications and recommendations are discussed; a flowchart summarizing recommendations is provided.
Evaluating an automated number series item generator using linear logistic test models
(2018-04-02) Loe, Bao Sheng; Sun, Luning; Simonfy, Filip; Doebler, Philipp
This study investigates the item properties of a newly developed Automatic Number Series Item Generator (ANSIG). The foundation of the ANSIG is based on five hypothesised cognitive operators. Thirteen item models were developed using the numGen R package and eleven were evaluated in this study. The 16-item ICAR (International Cognitive Ability Resource1) short form ability test was used to evaluate construct validity. The Rasch Model and two Linear Logistic Test Model(s) (LLTM) were employed to estimate and predict the item parameters. Results indicate that a single factor determines the performance on tests composed of items generated by the ANSIG. Under the LLTM approach, all the cognitive operators were significant predictors of item difficulty. Moderate to high correlations were evident between the number series items and the ICAR test scores, with high correlation found for the ICAR Letter-Numeric-Series type items, suggesting adequate nomothetic span. Extended cognitive research is, nevertheless, essential for the automatic generation of an item pool with predictable psychometric properties.

Browse

Recent Submissions