Parallel Perception
Video Input8 video frames processed simultaneously. Each frame is divided into patches (ViT-style) for parallel encoding.
A unified architecture that processes time-indexed data in parallel, performs iterative diffusion-based reasoning, then hands off to a base LLM for sequential token generation.
8 video frames processed simultaneously. Each frame is divided into patches (ViT-style) for parallel encoding.
Vision Transformer converts frame patches into compact latent representations. All frames processed in parallel batch.
Reverse-time SDE diffusion across latent batch. Iteratively refines understanding to produce a compressed plan/spec or adapter weights (ΔW).
Adapters OFF. Base LLM generates tokens sequentially using the diffusion-derived plan/spec as additional conditioning context.
All frames processed simultaneously, not sequentially. Vision Transformer encodes patches in parallel batches, dramatically reducing inference latency compared to autoregressive approaches.
High-dimensional video frames compressed into compact latent vectors. The encoder learns to extract essential features while discarding redundant spatial information.
Diffusion reasoning operates by reversing a stochastic process — starting from noise and iteratively denoising toward coherent plans. This enables probabilistic multi-path reasoning.
Unlike chain-of-thought prompting, diffusion reasoning occurs in latent space — invisible to users. This prevents token overhead while maintaining interpretability via latent probes.
Each component (encoder, diffusion module, LLM) can be swapped independently. This enables task-specific optimisation without retraining the entire pipeline.
Base LLM runs with adapters OFF during inference. The diffusion-derived plan/spec provides conditioning context without modifying model weights at runtime.
Each frame can have its own noise schedule (vectorised timestep). Frame-aware Video Diffusion Models (FVDM) assign different denoising rates, capturing fine-grained temporal dependencies.
Motion vectors between frames guide diffusion to maintain temporal consistency. Helps model understand object trajectories and predict plausible future states.
Unsupervised object-centric representation carves scenes into "slots" (objects). Diffusion can predict/manipulate slots independently for compositional generalisation.
Neural network estimates gradient of log-density (score) at each diffusion step. Guides denoising by predicting how to move toward higher-probability states.
Progressive uncertainty reduction: high entropy (noise) → low entropy (focused prediction). Early steps explore broadly, later steps converge to realistic outcomes.
Generate multiple futures by sampling noise. Each rollout represents one plausible scenario, enabling probabilistic forecasting and ensemble predictions.
Predict pedestrian trajectories, vehicle behaviour, and physical dynamics. Generate multiple possible futures to plan safe, robust actions in real-time.
Probabilistic nowcasting and 10-day forecasts. GenCast outperforms traditional physics-based ensembles while running 100× faster with better uncertainty quantification.
Generate synthetic market scenarios for stress-testing. Detect regime shifts, model tail risks, and simulate order book dynamics for high-frequency trading.
Design novel proteins (RFdiffusion), predict drug molecules, forecast chemical reactions. Simulate molecular dynamics faster than traditional physics-based methods.
Wargaming scenarios, intelligence forecasting, and strategic planning. Thunderforge programme integrates AI agents for multi-domain operation planning.
Vision-token OCR (DeepSeek) compresses documents 10× while preserving layout. Enables long-context reasoning over massive document corpora efficiently.
Pilot projects and proofs-of-concept. First demos of autonomous vehicles using diffusion prediction, NOAA trials of diffusion nowcasting, financial firms publishing whitepapers. Initial Thunderforge deployments in military planning exercises.
Early adoption begins. Near real-time robotics prediction, operational diffusion weather models producing public forecasts, hedge funds using generative scenarios. First AI-designed drugs enter clinical trials. Unified multi-modal models demonstrated.
Maturation and scaling. Robots achieve ≥1× real-time physics prediction. AI weather forecasts become backbone of meteorological services. Financial stress tests incorporate AI scenarios. Humanoid robots demonstrate dynamic tasks using predictive world models.
Standardisation. Diffusion-based forecasting becomes standard toolkit across industries. Interpretable latent spaces emerge. Regulatory frameworks established. Integration into daily consumer applications. Personal AI assistants use predictive models.
Based on "Diffusion Forecasting White Paper v4" by Ken Graham (October 2025)
ken@maketech.com.au