LLMs excel at token prediction, but struggle to form internal representations of the physical world.
In this AI Tech Experts Webinar, Michał Kulczykowski, Senior ML Engineer, explains why world models are proposed as an alternative foundation for intelligent systems — and where their limitations still lie.
The talk introduces JEPA (Joint Embedding Predictive Architecture), a latent-space, energy-based approach promoted by Yann LeCun, and walks through its key variants:
- I-JEPA for image-based representation learning,
- V-JEPA for video and temporal prediction,
- LeJEPA, a mathematically grounded variant replacing training heuristics with explicit regularization.
You’ll see how predictive learning in representation space differs from pixel- or token-based generation, how model collapse arises, and why contrastive and regularized methods attempt to prevent it. Michał also discusses robotic control via energy minimization, open research challenges, and the lack of strong benchmarks validating world models at scale.
Timeline
01:50 What are World Models and why they matter
03:36 From early world models to modern scaling
04:48 JEPA overview: energy-based predictive learning
06:13 JEPA architecture and training challenges
08:20 I-JEPA and V-JEPA implementations
12:45 Limitations, criticism & conclusion
Speaker
Michał Kulczykowski
Senior ML Engineer at deepsense.ai






