Methodology

Overview

estimador.pt uses Bayesian statistical models to forecast Portuguese elections. Our approach is fundamentally probabilistic: rather than predicting a single outcome, we estimate probability distributions that reflect the inherent uncertainty in the electoral process.

Our models are implemented in PyMC, a probabilistic programming library that enables defining complex statistical models and performing efficient Bayesian inference.

Parliamentary Elections

Model Structure

The parliamentary election model combines polling data with historical election results to estimate the evolution of party support over time. The core structure uses three overlapping Gaussian Processes (GPs) that capture dynamics at different time scales:

1. Baseline GP (Long-Term Trend)

Time scale: ~4 years
Captures structural changes in the party system
Uses an ExpQuad (Exponential Quadratic) kernel

2. Medium-Term GP

Time scale: ~1 year
Captures political cycles and gradual shifts in public opinion
Uses a Matern 5/2 kernel

3. Short-Term GP

Time scale: ~14 days
Captures campaign dynamics and reactions to events
Uses a Matern 3/2 kernel to allow faster variations

The combination of these three processes separates long-term signal from short-term noise while still capturing real movements during the campaign.

Technical Details: Gaussian Processes

Gaussian Processes are a non-parametric way to model unknown functions. We define priors over functions rather than fixed parameters, which allows the model to learn the shape of trends from the data.

We use the HSGP (Hilbert Space Gaussian Process) approximation for computational efficiency. This technique approximates the GP using a series of basis functions, drastically reducing computational cost without significant loss of precision.

The baseline GP covariance function is:

k(t, t') = σ² × exp(-|t-t'|² / (2ℓ²))

Priors used:

Baseline lengthscale: LogNormal(μ=log(1460), σ=0.3) — centered at ~4 years
Medium-term lengthscale: LogNormal(μ=log(365), σ=0.5) — centered at ~1 year
Short-term lengthscale: LogNormal(μ=log(14), σ=0.3) — centered at ~14 days
GP amplitudes: HalfNormal(σ=0.2-0.3)

Pollster House Effects

Each polling organization has systematic tendencies to over- or under-estimate certain parties. Our model estimates these "house effects" from historical data.

For example, if a pollster consistently overestimates PS by 2 percentage points, the model corrects for this tendency when aggregating polls from multiple sources.

House effects are modeled with a zero-sum constraint: if a pollster overestimates one party, they must underestimate other(s) to compensate. The prior for house effect standard deviation is HalfNormal(σ=0.05) per party, allowing typical biases of ~2-3 percentage points.

District Effects

Portugal uses a proportional representation system with seat allocation at the district level. The model includes static district offsets that capture regional differences in party support.

These offsets are estimated from previous election results and allow predicting how national vote shares translate into district-level votes. The prior for district offsets is HalfNormal(σ=0.1) per party, reflecting typical regional variations of ~5 percentage points.

Likelihood

Poll observations follow a Dirichlet-Multinomial distribution, which is the natural choice for compositional data (percentages that sum to 100%).

Technical Details: Dirichlet-Multinomial

The Dirichlet-Multinomial is parameterized by:

n: poll sample size
α: concentration vector (proportional to each party's probabilities)

This distribution has two sources of variation: sampling error (which depends on n) and extra variation captured by the concentration parameter. This correctly models the fact that polls don't always behave like simple random samples.

Concentration priors:

Polls: Gamma(α=2, β=0.01) — mean ~200, allows extra variation beyond sampling error
Election results: Gamma(α=100, β=0.1) — mean ~1000, tighter fit to actual results

D'Hondt Method and Seat Allocation

To convert vote forecasts into seat forecasts, we simulate the Portuguese electoral system:

22 electoral districts: 18 mainland districts + 2 autonomous regions + Europe + Outside Europe
D'Hondt method: Sequential divisor system (1, 2, 3, ...) for proportional seat allocation
230 total seats (excluding the 4 emigration seats in the 2024 elections)

The model runs thousands of Monte Carlo simulations:

For each simulation, draw a sample from the posterior distribution of national vote shares
Apply district offsets to obtain district-level votes
Run the D'Hondt algorithm in each district
Sum each party's seats

The result is a complete distribution of possible parliamentary compositions.

Metrics of Interest

From the simulations, we calculate:

Probability of each party winning the most seats
Majority probabilities: right-wing, left-wing, or hung parliament
Expected seats: mean and credibility intervals (50% and 80%)
Contested districts: using ENSC (Effective Number of Seat Changes) with a threshold of 0.8

Presidential Elections

Model Structure

The presidential model differs from the parliamentary one because:

Candidates are individuals, not parties with long track records
The system uses two rounds (absolute majority required)
Campaign dynamics are more important

We use a random walk model in log-odds space to capture the evolution of candidate support.

Technical Details: Random Walk

The random walk is defined as:

latent[t] = latent[t-1] + innovation[t]

where innovation[t] ~ ZeroSumNormal(σ) ensures that one candidate's gains correspond to others' losses.

The innovation standard deviation (σ ≈ 0.05 log-odds/day) is calibrated to allow movements of ~5 percentage points over a 50-day campaign. This value is fixed (not learned) following The Economist's methodology, since with few polls it's not possible to reliably estimate volatility.

Candidate priors:

Baseline: Normal(μ=logit(party_prior), σ=transformed) — centered on the candidate's party historical support
Likelihood concentration: Fixed at 60 for presidential polls

Candidate-Specific Priors

For candidates with known party affiliation, we use informative priors based on those parties' historical support. This helps stabilize estimates when few polls are available.

Pollster House Effects

When available, we use house effects estimated from the parliamentary model as informative priors for the presidential model. This transfers information about each pollster's systematic biases.

Second Round Modeling

For each simulation:

We check if any candidate reaches >50% in the first round
If not, we identify the top two candidates
We record the probability of each pair of candidates facing off in the runoff

Uncertainty Quantification

Our models incorporate multiple sources of uncertainty:

Polling error — Expected variation between polls and actual results
House effects — Uncertainty in estimated pollster biases
Model uncertainty — GP parameters and other model components
Campaign effects — Potential for late shifts not captured in polls

Communicating Uncertainty

We present uncertainty through:

Credibility intervals: 50% and 80% bands in trend charts
Probabilities: "70% probability of X winning the most seats"
Simulation distributions: visualizations showing the full range of possible outcomes

Model Validation

We validate our models through:

Backtesting: Apply the model to past elections using only data available before each election
Calibration: Verify that credibility intervals contain actual results at the expected frequency
Error metrics: Calculate MAE, RMSE, and log-scores
Post-election comparison: Analyze performance after each election

Data Sources

Polls

Aximage, CESOP, Pitagórica, Intercampus, Eurosondagem, and other ERC-registered polling firms

Election Results

National Elections Commission (CNE)
SGMAI (General Secretariat of the Ministry of Internal Administration)

Demographic Data

National Statistics Institute (INE)

Limitations

Our model has important limitations:

Poll dependence: If polls have systematic biases not captured by historical data, forecasts will be affected
Unforeseen events: Scandals, crises, or other events can rapidly alter the electoral landscape
Historical patterns: We assume past patterns inform the future, which may not hold in unprecedented contexts
Undecided voters: Difficulty predicting how undecided voters will behave

Our forecasts should be interpreted as informed probabilistic estimates, not certainties.

References and Inspiration

Our methodology is inspired by: