
1Fordham University, 2University of British Columbia, 3University of Cambridge, 4Federal Reserve Bank of Chicago, 5Georgia Institute of Technology, 6Forecasting Research Institute
in general, psychometric models are statistical models that predict item responses. All psychometric models include:
According to this IRT model, the probability of a correct response to a binary item \(i\) for person \(j\) (\(Y_{ij} = 1\)) is:
\[P(Y_{ij} = 1|a_i,b_i,\theta_j) = \mathrm{logit}(a_i(\theta_j - b_i))\]
With psychometric models, not only do we explain the observed responses, but we also obtain item characteristics and person characteristics.
Question: Can we do the same for quantile forecast items in the Forecasting Proficiency Test (FPT; Himmelstein et al., 2024)?

Responses to FPT quantile forecast items are on very different scales (e.g. dollars/gallon, thousands of dollars, percentages,…). We define the outcome measure, historically scaled signed error, as
\[ Y_i = \frac{\hat{Y}_i - Y_{\mathrm{res},i}}{SD_{Y_{\mathrm{hist},i}}} \]
\(Y_i\): SD units away from the resolution.





Item G2404: What will be the 12-month percentage change in the U.S. Consumer Price Index (CPI) for “Food” in the month between May 1, 2024 and May 31, 2024?
We model \(Y_{jiq}\), the signed error of person \(j\) to item \(i\) at quantile \(q\).
\[Y_{jiq} \sim \mathrm{Student\ T}(\mu_{iq}, \sigma_{ji}, \mathrm{df}_i) \\ \mu_{iq} = b_i + Q_q \times d_i \\ \sigma_{ji} = \frac{\sigma_i}{\mathrm{Exp}[a_i \times \theta_j]}\]
All models were estimated in PyMC (Abril-Pla et al., 2023) using Markov Chain Monte Carlo (MCMC) estimation (warmup = 1000, draws = 5000, ~ 40 minutes). All Rhats \(\leq 1.01\).

Distribution of \(\theta\) for the 1194 forecasters (better forecasters have higher \(\theta\) values).
For any set of quantile forecasts, \(\theta\) is maximized if and only if all person forecasts equal exactly the corresponding item expected forecasts.
\(\theta\) has less uncertainty around its estimates:





<a href="https://tenor.com/view/magnus-carlsen-magnus-slam-table-slam-angry-gif-16228483763522153196">
Magnus Carlsen Slam GIF
</a>
from
<a href="https://tenor.com/search/magnus+carlsen-gifs">Magnus Carlsen GIFs</a>
Negative log-likelihood function of \(\theta\) given item parameters and forecast:
Given the complexity of the FPT items, item parameters are likely to change depending on many factors. Still, there seems to be reasonable stability even after a month between Wave 1 and Wave 7 (test-retest):
SPUDM 2025