# Confirmed Speakers

McGill University

### Bayesian mediation analysis with N = 1

Single-Case Experimental Designs (SCEDs) are a useful tool for evaluating therapy effectiveness in heterogeneous and low-incidence conditions. Mediation analysis informs researchers about the mechanism through which the intervention leads to changes in the outcome of interest. Hypotheses about mediated effects are often phrased in terms of within-person processes but evaluated using group-level analyses. In recent years there have been several proposed methods for mediation analysis using repeated measures data on the mediator and outcome of a single participant in at least two treatment phases. In this talk I will describe the most promising Bayesian methods for mediation analysis in SCEDs.

San Diego State University

### Power for dyadic models

Many psychological researchers examine the degree to which members of a dyad relate to one another. However, researchers often face challenges while attempting to identify the minimal sample size required to detect a specific dyadic effect (i.e., perform a power analysis). More specifically, researchers often have considerable challenge identifying effect sizes (e.g., exact values for fixed effects and variances for random effects) required to perform power analysis. This talk describes those challenges, provides new methodology for identifying effect sizes (i.e., via pseudo R-square values), demonstrates how to perform power analysis using those new effect sizes (i.e., via Monte Carlo simulation), and presents general results for minimal sample sizes to detect dyadic effect under a range of conditions.

Utrecht University

### Aggregating continuous streams of sensor data

Sensors included on smartphones or other wearables potentially allow for the precise study of human behavior. Accelerometers typically measure movement behavior in three dimensions (x,y,z) for people at a rate of 60 times per second. Geolocation sensors measure someone physical location in latitude and longitude at a rate that in practice varies between once per second, and once every several minutes. This presentation will focus on explaining the challenges of working with intensive longitudinal data from smartphones and wearables, and present ideas on how to aggregate data from those sensors.

Utrecht University

### A design perspective on unique challenges of longitudinal surveys: panel attrition and panel conditioning

Longitudinal surveys which gather information from the same units at different point in time offer analytic and data collection advantages. However, there are unique drawbacks: panel attrition, whereby first wave participants discontinue participation and panel conditioning, a type of measurement error in which different answers are given in subsequent waves as a result of survey participation. These issues can be especially problematic for small samples. Survey methodology offers solutions on how to estimate the detrimental effects and reduce them. In this presentation, I will focus on practices that can help address panel conditioning and attrition challenges at the phase of designing data collection. The presentation aims to provide actionable guidelines for traditional data collection (face-to-face, telephone, mail, online) and discuss how these two challenges should be considered when designing data collection with smartphone sensors.

Utrecht University

### Survey design in an era of mobile web

In this presentation I will discuss the do’s and don’ts related to survey design. Online surveys are being completed on a range of different devices (PC, tablet, mobile phone) and this affects optimal survey design, response rates and data quality. I will discuss survey errors and how to overcome them.

Utrecht University

### RePAIR: a power solution to animal experimentation

The past years have witnessed an increasing awareness that many animal studies are heavily underpowered. Based on a systematic search involving over 1200 articles, we estimated that at best only 12% of animal experiments are sufficiently powered. This poses a serious problem, challenging the reliability of animal models. One could increase the number of animals per study but this raises important ethical and practical issues. We present an alternative solution, RePAIR, a statistical method using information from previous experiments on the same experimental endpoint. We first show, in a simulation study, that RePAIR can increase the power by up to 100% or decrease the total sample size required by 49%. We prove the added value of RePAIR in a unique dataset created by the collaboration of several laboratories around the world, on cognitive effects of early life adversity. RePAIR can be widely used to improve quality of animal experimentation and comes with an open-source web-based tool for this purpose.

Tilburg University

### Sample size reduction by combining data from multiple studies investigating different subpopulations

We often find multiple studies investigating the same intervention on different groups of patients. Combining data from these trials allows us to share information about the effect of the intervention in a.) one of the specific patient groups; and/or b.) the general patient population. The aim of information sharing between trials is often twofold: A more comprehensive picture of treatment effects can be obtained, while potentially fewer participants have to be included. Naively merging datasets increases the risk of a false superiority conclusion however, so it is important to correct for differences between trials. The availability of covariates can be helpful to address this issue: If we know how subpopulations differ, we may include this information to increase information-sharing. This talk covers a procedure to combine data from two different subpopulations properly, and demonstrates how sample sizes may be reduced.

University of North Carolina at Chapel Hill

### Testing differential item functioning in small samples: The impact of model complexity

Differential item functioning (DIF) is a pernicious statistical issue that can mask true group differences on a target latent construct. A considerable amount of research has focused on evaluating methods for testing DIF, such as using likelihood ratio tests in item response theory (IRT). Most of this research has focused on the asymptotic properties of DIF testing, in part because many latent variable methods require large samples to obtain stable parameter estimates. Much less research has evaluated these methods in small sample sizes despite the fact that many social and behavioral scientists frequently encounter small samples in practice. In this article, we examine the extent to which model complexity—the number of model parameters estimated simultaneously—affects the recovery of DIF in small samples. We compare three models that vary in complexity: logistic regression with sum scores, the 1-parameter logistic IRT model, and the 2-parameter logistic IRT model. We expected that logistic regression with sum scores and the 1-parameter logistic IRT model would more accurately estimate DIF because these models yielded more stable estimates despite being misspecified. Indeed, a simulation study and empirical example of adolescent substance use show that, even when data are generated from / assumed to be a 2-parameter logistic IRT, using parsimonious models in small samples leads to more powerful tests of DIF while adequately controlling for Type I error. We also provide evidence for minimum sample sizes needed to detect DIF, and we evaluate whether applying corrections for multiple testing is advisable. Finally, we provide recommendations for applied researchers who conduct DIF analyses in small samples

University of Iowa

### Investigating the impact of residualized likelihoods in Bayesian multilevel models with normal residuals

Multilevel models (i.e., mixed-effects models) are used to predict outcomes with one or more sources of dependency, such as in clustered observations or repeated measures. In frequentist settings, the dominant estimation method for multilevel models with normally distributed residuals at each level (i.e., general linear mixed-effects models) is residual maximum likelihood (REML), which provides unbiased estimates of variance components. Use of non-residualized normal distributions (i.e., maximum likelihood or ML) results in negatively biased estimates of the variance components, with the size of the bias related to the sample size and the number of fixed effects in the model. In REML-estimated models, however, the benefit of unbiased variance components extends beyond just the estimates of variance components. As the standard errors for fixed effects depend on the variance components, in ML-estimated models, negatively biased estimates from non-residualized normal distributions produce standard errors that are also negatively biased. Critically, these REML-related advantages are most pronounced in smaller higher-level samples, in which the use of ML instead can result in too-small variance estimates and, consequentially, too-small standard error estimates, leading to greater rates of Type I error for corresponding fixed effects.

In Bayesian multilevel models, the data likelihood most commonly used is the non-residualized normal distribution—the same as in standard ML in the frequentist version—but with two notable procedural differences. First, prior distributions can be used to reduce the influence of the likelihood, which can be particularly advantageous in small higher-level samples. Second, common Bayesian estimation programs display posterior distribution summaries using an expected a posteriori (EAP) estimate. Given that random effects variances are likely to have positively skewed distributions, the use of an EAP estimate (i.e., a mean) instead of a maximum a posteriori (MAP) estimate (i.e., a mode; the analog to a maximum likelihood estimate used in standard ML in frequentist models) can obscure the negative bias in the variance components. However, the incremental benefits of using a non-residualized likelihood function in Bayesian multilevel models have not yet been explored. The purpose of this study is to fill this gap and demonstrate the effects of doing so in small higher-level samples.

In this presentation, we show preliminary results for our attempt to bring an analog to the residualized likelihood of REML into Bayes—the development of a residualized likelihood function in a Markov chain Monte Carlo algorithm for Bayesian multilevel models. We first show that, similar to ML-based frequentist estimation results, the use of a traditional ML-inspired non-residualized likelihood leads to posterior distributions of the variance components with negative bias in both their posterior mode (i.e., the MAP estimate of central tendency) and their posterior variance. These same problems then propagate to the posterior variance of the analog of the fixed-effect parameters: as expected, the extent of downward bias in their posterior variance is most pronounced in smaller higher-level samples. We conclude by demonstrating how Bayesian multilevel models with residualized likelihoods may be useful in research and practice with small sample sizes.