Ensuring Point-in-Time Correctness¶

When preparing datasets for analytics or AI, one of the most common – and most damaging – pitfalls is data leakage. This happens when a model is trained using information that would not have been available at the moment a prediction was made.

The result is misleading: the model appears to perform exceptionally well during training, but once deployed, it fails to match that performance in the real world.

With Event Sourcing, you have the full historical record needed to avoid this trap – but only if projections and features are designed with point-in-time correctness as a core principle.

Why It Matters for AI and Machine Learning¶

Machine learning models learn from examples. Those examples must:

Reflect only knowledge available at the time of the decision
Exclude any future information – even indirectly
Keep evaluation metrics realistic and honest

If you break these rules:

The model benefits from an unfair advantage during training
Metrics are inflated
Production performance drops sharply

Point-in-time correctness ensures that every record in your dataset reflects exactly what was known at that moment in history – no more, no less.

How Event Sourcing Helps¶

Event-sourced systems store events in strict chronological order. This makes it possible to:

Rebuild system state as it existed at any past moment
Generate projections that show only the information available at that time
Avoid accidental inclusion of future events in features

Example: When calculating a member's on-time return rate for a given loan, include only returns that happened before that loan was taken out. The same principle applies to backtesting – you can compare historical predictions with actual outcomes in a fully controlled way.

Practical Example¶

In a library system, you want to predict whether a borrowed book will be returned late.

If you include a LateFeeIncurred event in the training data for that loan:

You are effectively giving the model the answer in advance
This creates perfect accuracy in training but poor generalization in production

Point-in-time correctness means stopping the clock at the prediction moment and building features only from data available up to that point.

A Discipline, Not Just a Feature¶

Point-in-time correctness is not something you can simply switch on. It requires:

Thinking carefully about what the system knew and when it knew it
Ensuring all projections, datasets, and feature engineering respect these boundaries
Enforcing the same rules in both training and inference

Event Sourcing provides the raw material for this discipline – applying it consistently is up to your data engineering practices.

Next up: Building Analytical Projections – transform event streams into queryable datasets for reporting, statistics, and machine learning.