I've noticed that large language models (LLMs) are getting really good at things like everyday inquiries, research, and content generation, whether it's text, images, or video. But I'm curious about their performance on longer, continuous tasks. For instance, there's this example of an LLM struggling with a prolonged task related to a vending machine from a study I found. Are LLMs fundamentally flawed when it comes to handling these kinds of continuous responses, or is it just that we haven't optimized them for these tasks yet? Maybe is there a need for a new architecture to enhance their capabilities? Are we still too early in this development?
5 Answers
It seems a bit premature to say LLMs have hit a wall. Just give it some time. They’re constantly improving, and I expect upcoming developments could change how they handle longer tasks significantly.
Or are we just in a phase of waiting for breakthroughs?
Honestly, I think LLMs have shown little to no improvement in this domain so far. If they struggle now, the chances are they might never get better at continuous tasks. Maybe it's time to rethink our expectations.
I don't think LLMs are permanently flawed for continuous tasks. It’s more about the fact that companies haven't optimized them for this kind of work yet. There’s definitely room for improvement!
Why should we take opinions from folks who aren’t AI experts seriously? I mean, they say AGI is around the corner in June 2025 — does anyone actually believe that?
Wait, June 2025? Is that supposed to be a serious claim?
We should consider building a new architecture for LLMs focused on handling long-running tasks more effectively. Right now, they have limitations like context limits and statelessness, which hold them back. It's a complex issue, and fundamentally, they might need an overhaul to be truly capable in these areas.
Exactly! There are experiments showing how tasks could be managed better, like the one with Minecraft back in 2023.
Yeah, but it does make you wonder if a completely new structure is necessary to pull off those continuous tasks effectively.