Using guess

guess: Adjust Estimates of Learning for Guessing

Over informative processes, naive estimator of learning—difference between post and pre process scores—underestimates actual learning. A heuristic account for why the naive estimator is negatively biased is as follows: people know as much or more after exposed to an informative process than before it. And the less people know, the larger the number of items they don’t know. And greater the opportunity to guess.

Guessing, even when random, only increases the proportion correct. Thus, bias due to guessing for naive measures of knowledge is always positive. On average, thus, there is more positive bias in the pre-process scores than post-process scores. And naturally, subtracting pre-process scores from post-process provides an attenuated estimate of actual learning. For a more complete treatment of the issue, read this paper by Ken Cor and Gaurav Sood.

We provide a few different ways to adjust estimates of learning for guessing. For now, we limit our attention to cases where the same battery of knowledge questions has been asked in both the pre- and the post-process wave. And to cases where closed-ended questions have been asked. (Guessing is not a serious issue on open-ended items. See more evidence for that in DK Means DK by Robert Luskin and John Bullock.) More generally, the package implements the methods to adjust learning for guessing discussed in this paper.

Measuring Learning:

Estimand

Proportion of people who learned a particular piece of information over the course of an informative process.

Other Issues

Measurement of knowledge is fundamentally reactive – we must probe to learn. But probing is not without its problems. For instance, people who don’t know the answer try to triangulate based on the cues in the question itself. For another, people are remarkably averse to confessing to their ignorance. So on a closed ended question, lots of people who don’t know the right answer, guess. Here are some pertinent issues that relate to how we analyze the data:

  1. Dealing with Missing Data
    If you assume that the data are missing completely at random, you can simply ignore them. Generally, however, respondents tend to skip items they don’t know. So missing responses on knowledge questions typically indicate ignorance. (Of course, it is important to investigate other potential reasons behind missing data. And we encourage researchers to take all precautions.) In our treatment, however, for simplicity sake, we treat missing as indicators of ignorance.

  2. Dealing with Don’t Knows
    We now know a little bit about Don’t Know. One generally strategy is to treat Don’t Know responses as ignorance. But research suggests that on average there is approximately 3% hidden knowledge behind Don’t Know responses. See DK Means DK by Robert Luskin and John Bullock. Thus one can also choose to replace Don’t Know responses with .03.

  3. Related Knowledge
    People either know a particular piece of information or they don’t. On an open-ended question, they may struggle to remember it but those kinds of concerns don’t apply to closed-ended questions where the task is simply to identify the correct answer. What does on occassion happen on closed-ended questions is that people have related cognitions. So for instance, asked to identify the prime minister of France, the respondents sometimes know that one of the options is the king of Sudan and may guess randomly between the remaining options. But that isn’t the same as knowing the prime minister of France.

Standard Correction for Guessing

The standard correction for guessing assumes that people guess randomly. And that people either know or don’t know. Using this assumption, it then uses total number of incorrect answers to estimate the total number of items that the person guessed on. For instance, let us assume there are 4 options on a multiple choice question. Say we have data from 100 respondents. Say there are 70 incorrect answers and 30 correct. Incorrect answers reflect attempts of guessing. (We also assume that people aren’t misinformed.) This means we can triangulat the total number of questions respondents guessed on – 70*(4/3). This means that the proportion of people who know the piece of information is roughly .067. Do it for the pre and the post wave and you have estimate of learning adjusted for guessing using the standard correction.

Latent Class Correction for Guessing

See the paper for details.


Installation

To get the current CRAN version:

install.packages("guess")

To get the current development version from GitHub:

# install.packages("devtools")
library(devtools)
#devtools::install_github("soodoku/guess")

Usage

Standard Correction for Guessing

To adjust estimates of learning for standard correction of guessing, use stnd_cor. The function requires takes pre test and post test data frames containing responses to the items on the pre- and the post-test, and a lucky vector that contains the probability of getting an item correct when guessing randomly. Under standard guessing correction, it is taken to be inverse of total number of options.

Structure of the Input Data:

  1. For current purposes, we assume missing responses to indicate ignorance. Thus functions internally code missing responses as 0.
  2. If the items offer an option to mark Don't know, code all Don't Know responses as ‘d’.
# Load library
library(guess)

# Generate some data without DK
pre_test <- data.frame(item1 = c(1, 0, 0, 1, 0), item2 = c(1, NA, 0, 1, 0)) 
pst_test <- pre_test + cbind(c(0, 1, 1, 0, 0), c(0, 1, 0, 0, 1))
lucky <- rep(.25, 2)

# Unadjusted Effect
# Treating Don't Know as ignorance
colMeans(nona(pst_test) - nona(pre_test))

# MCAR
colMeans(pst_test - pre_test, na.rm = T)

# Adjusted Effect
stnd_cor(pre_test, pst_test, lucky)

Transition Matrix

# Without Don't Know
pre_test_var <- c(1, 0, 0, 1, 0, 1, 0) 
pst_test_var <- c(1, 0, 1, 1, 0, 1, 1)
print(transmat(pre_test_var, pst_test_var))

# With Don't Know
pre_test_var <- c(1, 0, NA, 1, "d", "d", 0, 1, 0)
pst_test_var <- c(1, 0, 1, "d", 1, 0, 1, 1, "d")
print(transmat(pre_test_var, pst_test_var))

Adjusting Using the Latent Class Model

# Create example data for demonstration
set.seed(123)
nitems <- 10  # Number of knowledge questions
npeople <- 100  # Number of respondents

# Generate simulated pre-test and post-test responses (0=incorrect, 1=correct)
pre_test <- replicate(nitems, rbinom(npeople, 1, 0.4))  # 40% correct pre-test
post_test <- replicate(nitems, rbinom(npeople, 1, 0.6))  # 60% correct post-test

transmatrix <- multi_transmat(pre_test, post_test)

res <- lca_cor(transmatrix)

round(res$params[,1:4], 3)

round(res$learning[1:4], 3)

# LCA Correction with Don't Know responses
# Generate data with Don't Know responses (coded as 2)
pre_test_dk <- replicate(nitems, sample(c(0,1,2), npeople, replace=TRUE, prob=c(0.5, 0.35, 0.15)))
post_test_dk <- replicate(nitems, sample(c(0,1,2), npeople, replace=TRUE, prob=c(0.4, 0.45, 0.15)))

transmatrix <- multi_transmat(pre_test_dk, post_test_dk, force9 = TRUE)
res_dk <- lca_cor(transmatrix)

round(res_dk$params[,1:4], 3)

round(res_dk$learning[1:4], 3)

Adjust by Groups

Account for propensity to guess and item level attribute that a guess would be lucky.

pre_test_var <- data.frame(pre=c(1, 0, 1, 1, 0, "d", "d", 0, 1, NA, 0, 1, 1, 1, 1, 0, 0, 'd', 0, 0))
pst_test_var <- data.frame(pst=c(1, 0, NA, 1, "d", 1, 0, 1, 1, "d", 1, 1, 1, 0, 1, 1, 0, 1, 1, 0))
grp = c(rep(1, 10), rep(0, 10))

group_adj(pre_test_var, pst_test_var, gamma = 0, dk = 0)$learn
group_adj(pre_test_var, pst_test_var, gamma = .25, dk = 0)$learn
stnd_cor(pre_test_var, pst_test_var, lucky = .25)$learn

grp0_raw <- group_adj(subset(pre_test_var, grp == 0), subset(pst_test_var, grp == 0), gamma = 0, dk = 0)$learn
grp1_raw <- group_adj(subset(pre_test_var, grp == 1), subset(pst_test_var, grp == 1), gamma = 0, dk = 0)$learn

grp0_adj <- group_adj(subset(pre_test_var, grp == 0), subset(pst_test_var, grp == 0), gamma = .25, dk = 0)$learn
grp1_adj <- group_adj(subset(pre_test_var, grp == 1), subset(pst_test_var, grp == 1), gamma = .25, dk = 0)$learn

grp0_raw - grp1_raw
grp0_adj - grp1_adj

Standard Errors

# Raw 
# Generate some data without DK
pre_test <- data.frame(item1 = c(1, 0, 0, 1, 0), item2 = c(1, NA, 0, 1, 0)) 
pst_test <- pre_test + cbind(c(0, 1, 1, 0, 0), c(0, 1, 0, 0, 1))
diff <- pst_test - pre_test
stnd_err <-  sapply(diff, function(x) sqrt(var(x, na.rm = T)/length(x)))

# Bootstrapped s.e.

# LCA model
lca_stnd_err <- lca_se(pre_test, post_test, 10)
sapply(lca_stnd_err, function(x) round(head(x, 1), 3))

lca_dk_stnd_err <- lca_se(pre_test_dk, post_test_dk, 10)
sapply(lca_dk_stnd_err, function(x) round(head(x, 1), 3))

Fit

fit <- fit_nodk(pre_test, post_test, res$params["gamma", ], res$params[c("gg", "gk", "kk"), ])

print(fit[, 1:4])

fit <- fit_dk(pre_test_dk, post_test_dk, res_dk$params["gamma", ], res_dk$params[c("gg", "gk", "gd", "kg", "kk", "kd", "dd"), ], force9 = TRUE)

print(fit[, 1:4])

Model Criticism: Perplexity and Cross-Validation

The package provides tools to evaluate model fit through perplexity and cross-validation. These help assess whether the LCA model adequately captures the data-generating process.

Perplexity measures how well the model predicts the data. Lower values indicate better fit. It can be interpreted as the effective number of equiprobable outcomes the model assigns to each observation.

Cross-validation provides held-out performance estimates, helping detect overfitting.

Two sets of functions are available: - *_items: Work with aggregated transition matrices, split by items - *_individuals: Work with individual-level data, split by individuals

# Generate data
set.seed(42)
nitems <- 10
npeople <- 200

pre_test <- as.data.frame(replicate(nitems, rbinom(npeople, 1, 0.4)))
post_test <- as.data.frame(replicate(nitems, pmin(1, pre_test +
              replicate(nitems, rbinom(npeople, 1, 0.25)))))
names(pre_test) <- names(post_test) <- paste0("item", 1:nitems)

# Fit model using convenience wrapper
fit <- lca_fit(pre_test, post_test)

# Or equivalently:
# transmatrix <- multi_transmat(pre_test, post_test)
# fit <- lca_cor(transmatrix)
Perplexity from aggregated data (items)
transmatrix <- multi_transmat(pre_test, post_test)

# Overall perplexity
perplexity_items(fit, transmatrix)

# Per-item perplexity
sapply(1:nitems, function(i) perplexity_items(fit, transmatrix, item = i))
Perplexity from individual data
# Overall perplexity (averaged across individuals)
perplexity_individuals(fit, pre_test, post_test)

# Per-individual perplexity (identify poorly-fit respondents)
ind_ppl <- perplexity_individuals(fit, pre_test, post_test, per_individual = TRUE)
hist(ind_ppl, main = "Distribution of Individual Perplexity")
Cross-validation over items

Useful when you have many items and want to assess generalization to new items.

transmatrix <- multi_transmat(pre_test, post_test)
cv_result <- cv_items(transmatrix, k = 5, seed = 123)

cv_result$perplexity    # Held-out perplexity
cv_result$se            # Standard error across folds
cv_result$fold_results  # Per-fold details
Cross-validation over individuals

Useful when you want to assess generalization to new respondents.

cv_result <- cv_individuals(pre_test, post_test, k = 5, seed = 123)

cv_result$perplexity    # Held-out perplexity
cv_result$se            # Standard error across folds
cv_result$fold_results  # Per-fold details
Model comparison

Use perplexity to compare different model specifications or parameter settings.

# True parameters (if known from simulation)
true_params <- c(0.4, 0.3, 0.3, 0.25)

# Misspecified parameters
bad_params <- c(0.1, 0.1, 0.8, 0.5)

transmatrix <- multi_transmat(pre_test, post_test)

# Compare: lower perplexity = better fit
perplexity_items(true_params, transmatrix)
perplexity_items(bad_params, transmatrix)
perplexity_items(fit, transmatrix)  # Fitted model should be close to true

Simulation and Parameter Recovery Validation

The package provides functions to simulate data from known parameters, useful for validating the estimation procedure and understanding model behavior.

Basic simulation (No-DK model)
# Simulate data with known learning rate of 30%
sim <- simulate_lca(
  n = 500,           # 500 individuals
  n_items = 2,       # 2 test items
  gg = 0.35,         # 35% in guess->guess (stable ignorance)
  gk = 0.30,         # 30% in guess->know (LEARNED)
  kk = 0.35,         # 35% in know->know (stable knowledge)
  gamma = 0.25,      # 25% chance of guessing correctly
  seed = 123
)

# Fit model and check recovery
fit <- lca_fit(sim$pre, sim$post)
fit$params["gk", ]  # Should be close to 0.30 (true learning rate)
Simulation with Don’t Know responses
# Simulate DK data with known parameters
sim_dk <- simulate_lca_dk(
  n = 500,
  n_items = 1,
  gg = 0.25, gk = 0.15, gd = 0.10,
  kg = 0.10, kk = 0.15, kd = 0.10,
  dd = 0.15, gamma = 0.25,
  seed = 456
)

# Fit DK model
fit_dk <- lca_fit(sim_dk$pre, sim_dk$post)
fit_dk$params["gk", ]  # Should be close to 0.15
Comprehensive parameter recovery validation

Use validate_recovery() for Monte Carlo studies of parameter recovery:

# Run 100 simulations to assess parameter recovery
results <- validate_recovery(
  true_params = c(gg = 0.35, gk = 0.30, kk = 0.35, gamma = 0.25),
  n = 500,        # Sample size per simulation
  n_items = 2,    # Number of items
  n_sims = 100,   # Number of simulations
  seed = 789
)

# View results
print(results)
# Shows: bias, RMSE, standard error, and coverage for each parameter

Statistical Efficiency: Sample Size and Parameter Recovery

Individual-level pre/post response data provides rich information for parameter estimation. As sample size increases, the LCA model produces more precise estimates with lower bias and better confidence interval coverage. This section demonstrates these efficiency gains through simulation studies.

We assess estimation quality using three metrics: - Bias: Mean estimate minus true value (should approach 0) - RMSE: Root mean squared error (reflects both bias and variance) - Coverage: Proportion of 95% CIs containing the true value (should approach 0.95)

Sample Size Effect

The most fundamental efficiency result is that RMSE decreases approximately with 1/√n. Here we compare parameter recovery across different sample sizes:

# Compare parameter recovery across sample sizes
sample_sizes <- c(100, 250, 500, 1000)
true_params <- c(gg = 0.35, gk = 0.30, kk = 0.35, gamma = 0.25)

results_by_n <- lapply(sample_sizes, function(n) {
  validate_recovery(true_params, n = n, n_items = 2, n_sims = 100, seed = 123)
})

# Combine into summary table
efficiency_table <- do.call(rbind, lapply(seq_along(sample_sizes), function(i) {
  r <- results_by_n[[i]]
  data.frame(
    n = sample_sizes[i],
    parameter = r$parameter,
    bias = round(r$bias, 4),
    rmse = round(r$rmse, 4),
    coverage = round(r$coverage_95, 2)
  )
}))

# Display results for the learning parameter (gk)
gk_results <- efficiency_table[efficiency_table$parameter == "gk", ]
print(gk_results)

Expected pattern for the learning parameter (gk): - n=100: RMSE ~0.10-0.12, coverage ~0.85 - n=250: RMSE ~0.06-0.08, coverage ~0.90 - n=500: RMSE ~0.04-0.06, coverage ~0.93 - n=1000: RMSE ~0.03-0.04, coverage ~0.95

The ratio of RMSEs between adjacent sample sizes should be approximately √2 (e.g., RMSE at n=500 / RMSE at n=1000 ≈ √2), confirming the 1/√n efficiency pattern.

Number of Items Effect

Each item provides independent information about the guessing probability (gamma). With more items, gamma is estimated more precisely, which indirectly improves estimation of the latent class proportions:

# Compare recovery with varying number of items
n_items_values <- c(1, 2, 5, 10)
true_params <- c(gg = 0.35, gk = 0.30, kk = 0.35, gamma = 0.25)

results_by_items <- lapply(n_items_values, function(ni) {
  validate_recovery(true_params, n = 500, n_items = ni, n_sims = 50, seed = 456)
})

# Combine results
items_table <- do.call(rbind, lapply(seq_along(n_items_values), function(i) {
  r <- results_by_items[[i]]
  data.frame(
    n_items = n_items_values[i],
    parameter = r$parameter,
    rmse = round(r$rmse, 4),
    coverage = round(r$coverage_95, 2)
  )
}))

# Show gamma results (most affected by number of items)
gamma_results <- items_table[items_table$parameter == "gamma", ]
print(gamma_results)

With more items, gamma estimation improves substantially because each item provides an independent estimate of the guessing rate. This also improves the identifiability of the latent class proportions.

Parameter Scenarios

Recovery performance varies with the true parameter configuration. Some scenarios are inherently harder to estimate:

# Test different true parameter configurations
scenarios <- list(
  balanced = c(gg = 0.35, gk = 0.30, kk = 0.35, gamma = 0.25),
  high_learning = c(gg = 0.20, gk = 0.50, kk = 0.30, gamma = 0.25),
  low_learning = c(gg = 0.50, gk = 0.10, kk = 0.40, gamma = 0.25),
  high_guessing = c(gg = 0.35, gk = 0.30, kk = 0.35, gamma = 0.40)
)

scenario_results <- lapply(scenarios, function(params) {
  validate_recovery(params, n = 500, n_items = 2, n_sims = 50, seed = 789)
})

# Compare learning (gk) recovery across scenarios
gk_comparison <- data.frame(
  scenario = names(scenarios),
  true_gk = sapply(scenarios, function(p) p["gk"]),
  bias = sapply(scenario_results, function(r) round(r$bias[r$parameter == "gk"], 4)),
  rmse = sapply(scenario_results, function(r) round(r$rmse[r$parameter == "gk"], 4)),
  coverage = sapply(scenario_results, function(r) round(r$coverage_95[r$parameter == "gk"], 2))
)
print(gk_comparison)
Why Individual-Level Data Provides Efficiency Gains

The key advantage of individual-level data is that it preserves inter-item correlation within individuals. When we observe the same person’s responses across multiple items, we can exploit the correlation structure:

  • Individuals in the “know-know” (kk) class tend to answer multiple items correctly
  • Individuals in the “guess-guess” (gg) class show inconsistent patterns across items
  • This correlation provides information beyond what aggregated transition matrices capture

With aggregated data, we only observe cell counts (e.g., how many people went from correct to correct). With individual data, we observe that the same person answered correctly on items 1, 2, and 3—information that helps distinguish true knowledge from lucky guessing.