While GLMMs offer, in theory, a more appropriate framework for modeling RT data compared to LMMs, it remains unclear how these mixed modelling approaches—as well as traditional methods—perform relative to one another in terms of statistical power and coverage when applied to complex experimental data. To address this gap, the present study compares five different methods with respect to power, coverage, and convergence probability, using virtual experiments based on the datasets from Cipora et al. (2022) and Roth et al. (2025). Because different experimental paradigms may vary in their data structures and effect sizes, we further broaden our evaluation by conducting additional simulations on a large dataset (Barzykowski et al., 2022) that includes two widely used conflict tasks in cognitive science: the Flanker (Eriksen & Eriksen, 1974) and Stroop (Stroop, 1935) paradigms.