In pursuit of an AI world model
The last few years have been an incredible time to watch AI progress, but I think we've reached some short term limits.
LLMs have mastered language, but the world is so much more than just language.
Having a better model of the world (the kind that we have naturally developed through actually living in it) would allow AIs to understand the real world.
By introducing new sets of micro-targeted training data, we've improved little by little, mole-whacking our way along, patching little holes in LLM understanding. However, we still don't have a fundamental world model.
xAI is working on this. Google is working on it with Genie3.
From what I understand, the biggest impediment to world model development is training data, which isn't readily available in the same abundance that LLM training data is.
A lot of smart AI researchers are focused on this. Google is building Genie 3. xAI is working on it too.
I was excited to see Fei Fei Lee launch Marble to the public. I played around with it, and it is pretty amazing, allowing you to generate navigable 3D spaces from words or images.
I gave it a photo of my record player, and it built a beautiful living room around it with one noticeable oddity- the couch faces an empty wall, while the TV looks out towards the kitchen island.
Now personally, I love an old-school living room with no TV, but if you're going to have a TV, there should be a place to sit and watch it. As a human who has sat in poorly-designed living rooms, trying to pretend like TV from a far-off angle is enjoyable, this is part of my world model.
This makes me wonder, will world models actually put AI any closer to a real understanding of the world, or will it just allow us to generate 3D spaces, the way an LLM can generate sentences, stories, code, etc.