As AI evolves beyond text and video, virtual simulations are becoming the key to creating smarter, more adaptable systems.

The development of artificial intelligence is undergoing a transformative shift, moving beyond text and video to create systems capable of understanding and interacting with the physical world. Today’s AIs, while highly proficient in processing vast amounts of data, are essentially “book smart,” learning from language, images, and videos. To become truly intelligent, they need to be “street smart,” which requires what experts call “world models”—virtual environments that allow AI to learn through trial and error, much like humans and animals do in the real world.
World models are being trained using simulations that mimic real-life scenarios, such as driving a car or flying a plane. These models enable AIs to plan, take actions, and predict future outcomes by incorporating elements like physics and time. Already, this approach is having a tangible impact in fields like drone warfare, robotics, and autonomous vehicles. Venture capitalist Moritz Baier-Lentz highlights the potential of world models to revolutionize technology, stating that they are critical to achieving next-generation AI and, eventually, artificial general intelligence (AGI).
Prominent figures in the AI field are investing heavily in this new direction. Stanford professor Fei-Fei Li has raised $230 million to launch World Labs, a startup focused on world-model research. Meanwhile, Nvidia CEO Jensen Huang envisions world models as a key to unlocking “physical AI,” enabling robots and self-driving cars to operate autonomously.
Despite the growing momentum, challenges remain. Current AI systems, including large language models like ChatGPT, rely on probabilistic models that can be incomplete or contradictory. This limitation is evident in tasks like chess, where even advanced AI can struggle with logical reasoning compared to older systems that use internal world models. For instance, an Atari 2600 running a 1979 program can beat cutting-edge chatbots at chess because it maintains an accurate internal representation of the game state.
To address these issues, researchers are developing more sophisticated simulations. Google DeepMind’s Genie 3, for instance, can generate photo-realistic virtual environments from text prompts, creating a space where AI can experiment and learn without real-world risks. This system is part of a broader effort to build AI that can interact with the real world, such as navigating complex environments or understanding human behavior.
Similarly, companies like Waabi are using virtual worlds to train AI to drive trucks. Their virtual environment, called Waabi World, allows AI to log millions of virtual driving miles safely and affordably. Raquel Urtasun, the company’s CEO, notes that their software is expected to be able to autonomously pilot a real truck on a real road by the end of the year.
As these advancements continue, the implications for both white-collar and blue-collar jobs are significant. While AI has already begun to take over certain functions in office environments, world-model-based AI could soon extend into roles traditionally held by humans, such as truck driving, plumbing, and caregiving. The future of AI is not just in processing information but in understanding and interacting with the world in a way that mirrors human intelligence.
Moreover, the data generated from virtual environments—such as user interactions in gaming platforms like Medal.tv—could prove invaluable in training AI systems to better understand human behavior and decision-making. This data could accelerate the path toward AGI, where AI systems can perform any intellectual task that a human can.
The journey toward creating truly intelligent AI is still in its early stages, but the development of world models is undeniably shaping the next phase of artificial intelligence. As research continues and more companies invest in this space, the line between AI and human intelligence may become increasingly blurred.




