Home Resources World Models Explained: JEPA, Energy-Based Learning and the Limits of LLMs

World Models Explained: JEPA, Energy-Based Learning and the Limits of LLMs

LLMs excel at token prediction, but struggle to form internal representations of the physical world.

In this AI Tech Experts Webinar, Michał Kulczykowski, Senior ML Engineer, explains why world models are proposed as an alternative foundation for intelligent systems — and where their limitations still lie.

The talk introduces JEPA (Joint Embedding Predictive Architecture), a latent-space, energy-based approach promoted by Yann LeCun, and walks through its key variants:

  • I-JEPA for image-based representation learning,
  • V-JEPA for video and temporal prediction,
  • LeJEPA, a mathematically grounded variant replacing training heuristics with explicit regularization.

You’ll see how predictive learning in representation space differs from pixel- or token-based generation, how model collapse arises, and why contrastive and regularized methods attempt to prevent it. Michał also discusses robotic control via energy minimization, open research challenges, and the lack of strong benchmarks validating world models at scale.

Timeline

01:50 What are World Models and why they matter

03:36 From early world models to modern scaling

04:48 JEPA overview: energy-based predictive learning

06:13 JEPA architecture and training challenges

08:20 I-JEPA and V-JEPA implementations

12:45 Limitations, criticism & conclusion

Speaker