This article uses the computational complexity hammer way too hard, discounts huge progress in every field of AI outside of the hot trend of transformers and LLMs. Nobody is saying the future of AI is autoregressive and this article pretty much ignores any of the research that has been posted here around diffusion based text generation or how it can be combined with autoregressive methods… discounts multi-modal models entirely. He also pretty much discounts everything that’s happened with AlphaFold, Alpha Go etc. reinforcement learning etc.
The argument that computational complexity has something to do with this could have merit but the article certainly doesn’t give indication as to why. Is the brain NP complete? Maybe maybe not. I could see many arguments about why modern research will fail to create AGI but just hand waving “reality is NP-hard” is not enough.
The fact is: something fundamental has changed that enables a computer to pretty effectively understand natural language. That’s a discovery on the scale of the internet or google search and shouldn’t be discounted… and usage proves it. In 2 years there is a platform with billions of users. On top of that huge fields of new research are making leaps and bounds with novel methods utilizing AI for chemistry, computational geometry, biology etc.
I agree with everything you wrote, the technology is unbelievable and 6 years ago, maybe even 3.1 years would have been considered magic.
A steel man argument for why winter might be coming is all the dumb stuff companies are pushing AI for. On one hand (and I believe this) we argue it’s the most consequential technology in generations. On the other, everybody is using it for nonsense like helping you write an email that makes you sound like an empty suit, or providing a summary you didn’t ask for.
There’s still a ton of product work to cross whatever that valleys called between concept and product, and if that doesn’t happen, money is going to start disappearing. The valuation isn’t justified by the dumb stuff we do with it, it needs PMF.
> maybe even 3.1 years would have been considered magic
I vividly remember sitting there for literally hours talking to a computer on launch day, it was a very short night. This feeling of living in the future has not left me since, it's got quieter, but it's still there. It's still magic after those three years, perhaps even more so. It wasn't supposed to work this well for decades! and yet.
I think you're missing the point of "AI winter". It's not about how good the products are now. It's about how quickly the products are improving and creating the potential for profit. That's what drives investment.
3 things we know about the AI revolution in 2025:
- LLMs are amazing, but they have reached a plateau. AGI is not within reach.
- LLM investment has sacrificed many hundreds of billions of dollars, much of it from the world's pension funds.
- There is no credible path to a high-margin LLM product. Margins will be razor-thin positive at best once the free trial of the century starts to run out of steam.
This all adds up to a rather nasty crunch.
The thing about winter, though, is that it's eventually followed by another summer.
Many technologies plateau, but we don't say they all have winters. Terrestrial radio winter? Television winter? Automobile winter? Air travel winter? Nuclear power comes close in terms of its tarnished image and reluctance to reinvest.
I personally believe contemporary AI is over-hyped, but I cannot say with confidence that it is going to lead to a similar winter as the last time. It seems like today's products satisfy enough users to remain as a significant area, even if it doesn't greatly expand...
The only way I could see it fizzling as a product category is if it turns out it is not economically feasible to operate a sustainable service. Will users pay a fair price to keep the LLM datacenters running, without speculative investment subsidies?
The other aspect of the winter is government investment, rather than commercial. What could the next cycle of academic AI research look like? E.g. exploration that needs to happen in grant-funded university labs instead of venture-funded companies?
The federal funding picture seems unclear, but that's true across the board right now for reasons that have nothing to do with AI per se.
I think of LLMs like brains or CPUs. They're the core that does the processing, but needs to be embedded in a bigger system to be useful. Even if LLMs plateau, there will be a lot of development and improvement in the systems that use these LLM. We will be seeing a lot of innovation going forward, especially in systems that will be able to monetize these LLMs.
> The argument that computational complexity has something to do with this could have merit but the article certainly doesn’t give indication as to why.
OP says it is because that predicting the next token can be correct or not, but it always looks plausible because that is what it calculates. Therefore it is dangerous and can not be fixed because it is how it works in essence.
Another anecdote. I've got a personal benchmark that I try out on these systems every time there's a new release. It is an academic math question which could be understood by an undergraduate, and which seems easy enough to solve if I were just to hammer it out over a few weeks. My prompt includes a big list of mistakes it is likely to fall into and which it should avoid. The models haven't ever made any useful progress on this question. They usually spin their wheels for a while and then output one of the errors I said to avoid.
My hit/miss rate with using these models for academic questions is low, but non-trivial. I've definitely learned new math because of using them, but it's really just an indulgence because they make stuff up so frequently.
I get generally good results from prompts asking for something I know definitely exists or is definitely possible, like an ffmpeg command I know I’ve used in the past but can’t remember. Recently I asked how to something in Imagemagick which I’d not done before but felt like the kind of thing Imagemagick should be able to do. It made up a feature that doesn’t exist.
Maybe I should have asked it to write a patch that implements that feature.
To take a different perspective on the same event.
The model expected a feature to exist because it fitted with the overall structure of the interface.
This in itself can be a valuable form of feedback. I currently don't know of any people doing it, but testing interfaces by getting LLMs to use them could be an excellent resource. Th the AI runs into trouble, it might be worth checking your designs to see if you have any inconsistencies, redundancies or other confusion causing issues.
One would assume that a consistent user interface would be easier for both AI and humas. Fixing the issues would improve it for both.
That failure could be leveraged into an automated process that identified areas to improve.
There is no difference between "hallucination" and "soberness", it's just a database you can't trust.
The response to your query might not be what you needed, similar to interacting with an RDBMS and mistyping a table name and getting data from another table or misremembering which tables exist and getting an error. We would not call such faults "hallucinations", and shouldn't when the database is a pile of eldritch vectors either. If we persist in doing so we'll teach other people to develop dangerous and absurd expectations.
No it's absolutely not. One of these is a generative stochastic process that has no guarantee at all that it will produce correct data, and in fact you can make the OPPOSITE guarantee, you are guaranteed to sometimes get incorrect data. The other is a deterministic process of data access. I could perhaps only agree with you in the sense that such faults are not uniquely hallucinatory, all outputs from an LLM are.
I don't agree with these theoretical boundaries you provide. Any database can appear to lack in determinism, because data might get deleted, corrupted or mutated. Hardware and software involved might fail intermittently.
The illusion of determinism in RDBMS systems is just that, an illusion. The reason why I used the examples of failures in interacting with such systems that I did is that most experienced developers are familiar with those situations and can relate to them, while the probability for the reader to having experienced a truer apparent indeterminism is lower.
LLM:s can provide an illusion of determinism as well, some are quite capable of repeating themselves, e.g. overfitting, intentional or otherwise.
Yep. All these do is “hallucinate”. It’s hard to work those out of the system because that’s the entire thing it does. Sometimes the hallucinations just happen to be useful.
Sorry, I'm failing to see the danger of this choice of language? People who aren't really technical don't care about these nuances. It's not going to sway their opinion one way or another.
If the information it gives is wrong, but is grammatically correct, then the "AI" has fulfilled its purpose. So it isn't really "wrong output" because that is what the system was designed to do. The problem is when people use "AI" and expect it will produce truthful responses - it was never designed to do that.
But the point is that everyone uses the phrase "hallucinations" and language is just how people use it. In this forum at least, I expect everyone to understand that it is simply the result of next token generation and not an edge case failure mode.
I like asking it about my great great grandparents (without mentioning they were my great great grandparents just saying their names, jobs, places of birth).
It hallucinates whole lives out of nothing but stereotypes.
> It sometimes feels like the only thing saving LLMs are when they’re forced to tap into a better system like running a search engine query.
This is actually very profound. All free models are only reasonable if they scrape 100 web pages (according to their own output) before answering. Even then they usually have multiple errors in their output.
Responding with "skill issue" in a discussion is itself a skill issue. Maybe invest in some conversational skills and learn to be constructive rather than parroting a useless meme.
First of all, there is no such thing as "prompt engineering". Engineering, by definition, is a matter of applying scientific principles to solve practical problems. There are no clear scientific principles here. Writing better prompts is more a matter of heuristics, intuition, and empiricism. And there's nothing wrong with that — it can generate a lot of business value — but don't presume to call it engineering.
Writing better prompts can reduce the frequency of hallucinations but frequent hallucinations still occur even with the latest frontier LLMs regardless of prompt quality.
So you are saying the acceptable customer experience for these systems is that we need to explicitly tell them to accept defeat when they can’t find any training content/web search results that matches my query enough?
Why don't they have any concept of having a percentage of confidence in their answer?
It isn’t 2022 anymore, this is supposed to be a mature product.
Why am I even using this thing rather than using the game’s own mod database search tool? Or the wiki documentation?
What value is this system adding for me if I’m supposed to be a prompt engineer?
Technologically, I believe that you're right. On the other hands, the previous AI winters happened despite novel, useful technologies, some of which proved extremely useful and actually changed the world of software. They happened because of overhype, then investor moving on to the next opportunity.
Here, the investors are investing in LLMs. Not in AlphaFold, AlphaGo, neurosymbolic, focus learning, etc. If (when) LLMs prove insufficient to the insane level of hype and if (when) experience shows that there is only so much money that you can make with LLMs, it's possible that the money will move on to other types of AI, but there are chances that it will actually go to something entirely different, perhaps quantum, leaving AI in winter.
> that enables a computer to pretty effectively understand natural language
I'd argue that it pretty effectively mimics natural language. I don't think it really understands anything, it is just the best madlibs generator that the world has ever seen.
For many tasks, this is accurate 99+% of the time, and the failure cases may not matter. Most humans don't perform any better, and arguably regurgitate words without understanding as well.
But if the failure cases matter, then there is no actual understanding and the language the model is generating isn't ever getting "marked to market/reality" because there's no mental world model to check against. That isn't going to be usable if there are real-world consequences of the LLM getting things wrong, and they can wind up making very basic mistakes that humans wouldn't make--because we can innately understand how the world works and aren't always just stringing words together that sound good.
I don't think anybody expects AI development to stop. A winter is defined by a relative drying-up of investment and, importantly, it's almost certain that any winter will eventually be followed by another summer.
The pace of investment in the last 2 years has been so insane that even Altman has claimed that it's a bubble.
> I could see many arguments about why modern research will fail to create AGI
Why is AGI even necessary? If the loop between teaching the AI something, and it being able to repeat similar enough tasks; if that loop becomes short enough, days or hours instead of months, who cares if some ill-defined bar of AGI is met?
> something fundamental has changed that enables a computer to pretty effectively understand natural language.
You understand how the tech works right? It's statistics and tokens. The computer understands nothing. Creating "understanding" would be a breakthrough.
Edit: I wasn't trying to be a jerk. I sincerely wasn't. I don't "understand" how LLMs "understand" anything. I'd be super pumped to learn that bit. I don't have an agenda.
I think it is fair to say that AIs do not yet "understand" what they say or what we ask them.
When I ask it to use a specific MCP to complete a certain task, and it proceeds to not use that MCP, this indicates a clear lack of understanding.
You might say that the fault was mine, that I didn't setup or initialize the MCP tool properly, but wouldn't an understanding AI recognize that it didn't have access to the MCP and tell me that it cannot satisfy my request, rather than blindly carrying on without it?
LLMs consistently prove that they lack the ability to evaluate statements for truth. They lack, as well, an awareness of their unknowing, because they are not trying to understand; their job is to generate (to hallucinate).
It astonishes me that people can be so blind to this weakness of the tool. And when we raise concerns, people always say
"How can you define what 'thinking' is?"
"How can you define 'understanding'?"
These philosophical questions are missing the point. When we say it doesn't "understand", we mean that it doesn't do what we ask. It isn't reliable. It isn't as useful to us as perhaps it has been to you.
Before one may begin to understand something one must first be able to estimate the level of certainty. Our robot friends, while really helpful and polite, seem to be lacking in that department. They actually think the things we've written on the internet, in books, academic papers, court documents, newspapers, etc are actually true. Where the humans aren't omniscient it fills the blanks with nonsense.
> As someone who was an engineer on the original Copilot team
Right, so "as someone who is a sociopath completely devoid of ethics" you were one of the cogs in the machine who said "fuck your license, we're training our llm on your code whether you like it or otherwise".
> that doesn’t mean something interesting isn’t going on
Wow! Such double negative. Much science. So rigor.
> At this point I’d argue that humans “hallucinate” and/or provide wrong answers far more often than SOTA LLMs.
Yikes, may the Corpo Fascist Gods protect any pitiable humans still in your life.
We could use a little more kindness in discussion. I think the commenter has a very solid understanding on how computer works. The “understanding” is somewhat complex but I do agree with you that we are not there yet. I do think that the paradigm shift though is more about the fact that now we can interact with the computer in a new way.
Are we even sure we understand the hardware? My understanding is even that is contested, for example orchestrated objective reduction, holonomic brain theory or GVF theory.
The end effect certainly gives off "understanding" vibe. Even if method of achieving it is different. The commenter obviously didn't mean the way human brain understands
"I don't "understand" how LLMs "understand" anything."
Why does the LLM need to understand anything. What today's chatbots have achieved is a software engineering feat. They have taken a stateless token generation machine that has compressed the entire internet's vocabulary to predict the next token and have 'hacked' a whole state management machinery around it. End result is a product that just feels like another human conversing with you and remembering your last birthday.
Engineering will surely get better and while purists can argue that a new research perspective is needed, the current growth trajectory of chatbots, agents and code generation tools will carry the torch forward for years to come.
If you ask me, this new AI winter will thaw in the atmosphere even before it settles on the ground.
LLMs activate similar neurons for similar concepts not only across languages, but also across input types. I’d like to know if you’d consider that as a good representation of “understanding” and if not, how would you define it?
Anthropic is pretty notorious for peddling hype. This is a marketing article - it has not undergone peer-review and should not be mistaken for scientific research.
If i could understand what the brain scans actually meant, I would consider it a good representation. I don't think we know yet what they mean. I saw some headline the other day about a person with "low brain activity" and said person was in complete denial about it, I would be too.
As I said then, and probably echoing what other commenters are saying - what do you mean by understanding when you say computers understand nothing? do humans understand anything? if so, how?
“You understand how the brain works right? It’s neurons and electrical charges. The brain understands nothing.”
I’m always struck by how confidently people assert stuff like this, as if the fact that we can easily comprehend the low-level structure somehow invalidates the reality of the higher-level structures. As if we know concretely that the human mind is something other than emergent complexity arising from simpler mechanics.
I’m not necessarily saying these machines are “thinking”. I wish I could say for sure that they’re not, but that would be dishonest: I feel like they aren’t thinking, but I have no evidence to back that up, and I haven’t seen non-self-referential evidence from anyone else.
LLMs aren't as good as humans at understanding, but it's not just statistics. The stochastic parrot meme is wrong. The networks create symbolic representations in training, with huge multidimensional correlations between patterns in the data, whether its temporal or semantic. The models "understand" concepts like emotions, text, physics, arbitrary social rules and phenomena, and anything else present in the data and context in the same fundamental way that humans do it. We're just better, with representations a few orders of magnitude higher resolution, much wider redundancy, and multi-million node parallelism with asynchronous operation that silicon can't quite match yet.
In some cases, AI is superhuman, and uses better constructs than humans are capable of, in other cases, it uses hacks and shortcuts in representations, mimics where it falls short, and in some cases fails entirely, and has a suite of failure modes that aren't anywhere in the human taxonomy of operation.
LLMs and AI aren't identical to human cognition, but there's a hell of a lot of overlap, and the stochastic parrot "ItS jUsT sTaTiStIcS!11!!" meme should be regarded as an embarrassing opinion to hold.
"Thinking" models that cycle context and systems of problem solving also don't do it the same way humans think, but overlap in some of the important pieces of how we operate. We are many orders of magnitude beyond old ALICE bots and MEgaHAL markov chains - you'd need computers the size of solar systems to run a markov chain equivalent to the effective equivalent 40B LLM, let alone one of the frontier models, and those performance gains are objectively within the domain of "intelligence." We're pushing the theory and practice of AI and ML squarely into the domain of architectures and behaviors that qualify biological intelligence, and the state of the art models clearly demonstrate their capabilities accordingly.
For any definition of understanding you care to lay down, there's significant overlap between the way human brains do it and the way LLMs do it. LLMs are specifically designed to model constructs from data, and to model the systems that produce the data they're trained on, and the data they model comes from humans and human processes.
You appear to be a proper alchemist, but you can't support an argument of understanding if there is no definition of understanding that isn't circular.
If you want to believe the friendly voice really understands you, we have a word for that, faith.
The skeptic sees the interactions with a chatbot as a statistical game that shows how uninteresting (e.g. predictable) humans and our stupid language are.
There are useful gimmicks coming out like natural language processing, for low risk applications, but this form of AI pseudoscience isn't going to survive, but it will take some time for research to catch up to understanding how to describe the falsehoods of contemporary AI toys
Understanding is the thing that happens when your neurons coalesce into a network of signaling and processing such that it empowers successful prediction of what happens next. This powers things like extrapolation, filling in missing parts of perceived patterns, temporal projection, and modeling hidden variables.
Understanding is the construction of a valid model. In biological brains, it's a vast parallelized network columns and neuron clusters in coordinated asynchronous operation, orchestrated to ingest millions of data points both internal and external, which result in a complex and sophisticated construct comprising the entirety of our subjective experience.
LLMs don't have the subjective experience module, explicitly. They're able to emulate the bits that are relevant to being good at predicting things, so it's possible that every individual token inference process produces a novel "flash" of subjective experience, but absent the explicit construct and a persistent and coherent self construct, it's not mapping the understanding to the larger context of its understanding of its self in the same way humans do it. The only place where the algorithmic qualities needed for subjective experience reside in LLMs is the test-time process slice, and because the weights themselves are unchanged in relation to any novel understanding which arises, there's no imprint left behind by the sensory stream (text, image, audio, etc.) Absent the imprint mechanism, there's no possibility to perpetuate the construct we think of as conscious experience, so for LLMs, there can never be more than individual flashes of subjectivity, and those would be limited to very low resolution correlations a degree or more of separation away from the direct experience of any sensory inputs, whereas in humans the streams are tightly coupled to processing, update in real-time, and persist through the lifetime of the mind.
The pieces being modeled are the ones that are useful. The utility of consciousness has been underexplored; it's possible that it might be useful in coordination and orchestration of the bits and pieces of "minds" that are needed to operate intelligently over arbitrarily long horizon planning, abstract generalization out of distribution, intuitive leaps between domains that only relate across multiple degrees of separation between abstract principles, and so on. It could be that consciousness will arise as an epiphenomenological outcome from the successful linking together of systems that solve the problems LLMs currently face, and the things which overcome the jagged capabilities differential are the things that make persons out of human minds.
It might also be possible to orchestrate and coordinate those capabilities without bringing a new mind along for the ride, which would be ideal. It's probably very important that we figure out what the case is, and not carelessly summon a tortured soul into existence.
It could very well be that statistics and tokens is how our brains work at the computational level too. Just that our algorithms have slightly better heuristics due to all those millennia of A/B testing of our ancestors.
I think it’s a disingenuous read to assume original commenter means “understanding” in the literal sense. When we talk about LLM “understanding”, we usually mean it from a practical sense. If you give an input to the computer, and it gives you an expected output, then colloquially the computer “understood” your input.
What do you mean by “understand”? Do you mean conscious?
Understand just means “parse language” and is highly subjective. If I talk to someone African in Chinese they do not understand me but they are still conscious.
If I talk to an LLM in Chinese it will understand me but that doesn’t mean it is conscious.
If I talk about physics to a kindergartner they will not understand but that doesn’t mean they don’t understand anything.
GOFAI was also a paradigm shift, regardless of that winter. For example, banks started automating assessments of creditworthiness.
What we didn't get was what had been expected, namely things like expert systems that were actual experts, so called 'general intelligence' and war waged through 'blackboard systems'.
We've had voice controlled electronics for a long time. On the other hand, machine vision applications have improved massively in certain niches, and also allowed for new forms of intense tyranny and surveillance where errors are actually considered a feature rather than a bug since they erode civil liberties and human rights but are still broadly accepted because 'computer says'.
While you could likely argue "leaps and bounds with novel methods utilizing AI for chemistry, computational geometry, biology etc." by downplaying the first part or clarifying that it is mainly an expectation, I think most people are going to, for the foreseeable future, keep seeing "AI" as more or less synonymous with synthetic infantile chatbot personalities that substitute for human contact.
LLMs are an amazing advancement. The tech side of things is very impressive. Credit where credit is due.
Where the current wave all falls apart is on the financials. None of that makes any sense and there’s no obvious path forward.
Folks say handwavy things like “oh they’ll just sell ads” but even a cursory analysis shows that math doesn’t ad up relative to the sums of money being invested at the moment.
Tech wise I’m bullish. Business wise, AI is setting up to be a big disaster. Those that aimlessly chased the hype are heading for a world of financial pain.
Hard disagree, I'm in the process of deploying several AI solutions in Healthcare. We have a process a nurse usually spends about an hour on, and costs $40-$70 depending on if they are offshore and a few other factors. Our AI can match it at a few dollars often less. A nuse still reviews the output, but its way less time. The economics of those tokens is great. We have another solution that just finds money, $10-$30 in tokens can find hundreds of thousands of dollars. The tech isn't perfect (that's why we have a human in the loop still) but its more than good enough to do useful work, and the use cases are valuable.
It's true, but do you really trust the AI generated + Nurse Review output more than Organic Nurse generated?
In my experience, management types use the fact that AI generated + Nurse Review is faster to push a higher quota of forms generated per hour.
Eventually, from fatigue or boredom, the human in the loop just ends up being a rubber stamper. Would you trust this with your own or your children's life?
The human in the loop becomes a lot less useful when it's pressured to process a certain quota against an AI that's basically stochastic "most probable next token", aka professional bullshitter, literally trained to generate plasuible outputs with no responsibility to accurate outputs.
These same questions could be asked about self driving cars, but they've been shown to be consistently safer drivers than humans. If this guy is getting consistently better results from ai+human than it is from just humans, what would it matter if the former results in errors given the latter results in more and costs more?
TFA's while point is that there is no easy way to tell if LLM output is correct or not. Driving mistakes provide instant feedback if the output of whatever AI is driving is correct or not. Bad comparison.
If the cars weren't considerably safer drivers than humans they wouldn't be allowed on the road. There isn't as much regulation blocking deploying this healthcare solution... until those errors actually start costing hospitals money from malpractice lawsuits (or not), we don't know whether it will be allowed to remain in use.
I think they were referring to the costs of training and hosting the models. You're counting the cost of what you're buying, but the people selling it to you are in the red.
wrong. OpenAI is literally the only AI company with horrific financials. You think google is actually bleeding money on AI? they are funding it all with cash flow and still have monster margins.
OpenAI may be the worst, but I am pretty sure Anthropic is still bleeding money on AI, and I would expect a bunch of smaller dedicated AI firms are too; Google is the main firm with competitive commercial models at the high end across multiple domains that is funding AI efforts largely from its own operations (and even there, AI isn’t self sufficient, its just an internal rather than an external subsidy.)
> You think google is actually bleeding money on AI? they are funding it all with cash flow and still have monster margins.
They can still be "bleeding money on AI" if they're making enough in other areas to make up for the loss.
The question is: "Are LLMs profitable to train and host?" OpenAI, being a pure LLM company, will go bankrupt if the answer is no. The equivalent for Google is to cut its losses and discontinue the product. Maybe Gemini will have the same fate as Google+.
It appears very much not. There has been some suggestion that inference may be “profitable” on a unit basis but that’s ignoring most of the costs. When factoring everything in most of these look very much upside down.
While there demand at the moment, it’s also unclear what the demand would be if the prices where “real” aka what it would take to run a sustainable business.
Those sound like typical bootstrap-sized workflow optimization opportunities, which are always available but have a modest ceiling on both sales volume and margin.
That's great that you happened to find a way to use "AI solutions" for this, but it fits precisely inside the parents "tech wise, I'm bullish" statement. It's genuinely new tech, which can unearth some new opportunities like this, by addressing many niche problems that were either out of reach before or couldn't be done efficiently enough before. People like yourself should absolutely be looking for smart new small businesses to build with it, and maybe you'll even be able to grow that business into something incredible for yourself over the next 20 years. Congratulations and good luck.
The AI investment bubble that people are concerned about is about a whole different scale of bet being made; a bet which would only have possibly paid off if this technology completely reconfigured the economy within the next couple years. That really just doesn't seem to be in the cards.
Folks were super bullish tech wise on the internet when it was new and that turned out it be correct. It was also correct that the .com bubble wiped out a generation of companies and those that survived took a decade or more to recover.
The same thing is playing out here… tech is great and not going away but also the business side is increasingly looking like another implosion waiting to happen.
>Folks say handwavy things like “oh they’ll just sell ads” but even a cursory analysis shows that math doesn’t ad up relative to the sums of money being invested at the moment.
Ok, so I think there's 2 things here that people get mixed on.
First, Inference of the current state of the art is Cheap now. There's no 2 ways about it. Statements from Google, Altman as well as costs of 3rd parties selling tokens of top tier open source models paint a pretty good picture. Ads would be enough to make Open AI a profitable company selling current SOTA LLMs to consumers.
Here's the other thing that mixes things up. Right now, Open AI is not just trying to be 'a profitable company'. They're not just trying to stay where they are and build a regular business off it. They are trying to build and serve 'AGI', or as they define it, 'highly autonomous systems that outperform humans at most economically valuable work'. They believe that, to build and serve this machine to hundreds of millions would require costs order(s) of magnitudes greater.
In service of that purpose is where all the 'insane' levels of money is moving to. They don't need hundreds of billions of dollars in data centers to stay afloat or be profitable.
If they manage to build this machine, then those costs don't matter, and if things are not working out midway, they can just drop the quest. They will still have an insanely useful product that is already used by hundreds of millions every week, as well as the margins and unit economics to actually make money off of it.
If OpenAI was the only company doing this that argument might sort of make sense.
The problem is they have real competition now and that market now looks like an expensive race to an undifferentiated bottom.
If someone truly invents AGI and it’s not easily copied by others then I agree it’s a whole new ballgame.
The reality is that years into this we seem to be hitting a limit to what LLMs can do with only marginal improvements with each release. On that path this get ugly fast.
As far as Consumer LLMs go, they don't really have competition. Well they do, but it's more Google vs Bing than Android vs Apple. 2nd place is a very very distant 2nd and almost all growth and usage is still being funneled to Open AI. Even if it's 'easily copied', getting there first could prove extremely valuable.
Most of the researchers outside big tech only have access to a handful of consumer GPUs at best. They are under a lot of pressure to invent efficient algorithms. The cost coming down by orders of magnitude seems like a good bet.
The business trajectory will be like Uber. A few big companies (Google, OpenAI) will price their AI services at a loss until consumers find it to be indispensable and competitors run out of money, then they'll steadily ramp up the pricing to the point where they're gouging consumers (and raking in profits) but still a bit cheaper or better than alternatives (humans in this case).
>None of that makes any sense and there’s no obvious path forward.
The top end models with their high compute requirements probably don't but there is value in lower end models for sure.
After all, its the AWS approach. Most of AWS services is stuff you can easily get for cheaper if you just rent an EC2 and set it up yourself. But because AWS offers very simple setup, companies don't mind paying for it.
> Tech wise I’m bullish. Business wise, AI is setting up to be a big disaster. Those that aimlessly chased the hype are heading for a world of financial pain.
I'm not going to pretend to be on the cutting edge of news here, but isn't this where on-device models becomes relevant? It sounds like Apple's neural engine or whatever in the M5 have seen noteworthy performance improvements, and maybe in a few more generations, we don't need these openai-sized boondoggles to benefit from the tech?
> Folks say handwavy things like “oh they’ll just sell ads” but even a cursory analysis shows that math doesn’t ad up relative to the sums of money being invested at the moment.
We should factor in that messaging that's seamless and undisclosed in conversational LLM output will be a lot more valuable that what we think of as advertising today.
I don't see that happening. People have stuck with streaming and social networking as they've trended user-hostile. And with LLMs an even greater type of dependence is being cultivated.
This has convinced many non-programmers that they can program, but the results are consistently disastrous, because it still requires genuine expertise to spot the hallucinations.
I've been programming for 30+ years and now a people manager. Claude Code has enabled me to code again and I'm several times more productive than I ever was as an IC in the 2000s and 2010s. I suspect this person hasn't really tried the most recent generation, it is quite impressive and works very well if you do know what you are doing
If you’ve been programming for 30+ years, you definitely don’t fall under the category of “non-programmers”.
You have decades upon decades of experience on how to approach software development and solve problems. You know the right questions to ask.
The actual non-programmers I see on Reddit are having discussions about topics such as “I don’t believe that technical debt is a real thing” and “how can I go back in time if Claude Code destroyed my code”.
People learning to code always have had those questions and issues though. For example, “git ate my code’ or “I don’t believe in python using white space as a bracket so I’m going to end all my blocks with #endif”
But it can work well even if you don't know what you are doing (or don't look at the impl).
For example, build a TUI or GUI with Claude Code while only giving it feedback on the UX/QA side. I've done it many times despite 20 years of software experience. -- Some stuff just doesn't justify me spending my time credentializing in the impl.
Hallucinations that lead to code that doesn't work just get fixed. Most code I write isn't like "now write an accurate technical essay about hamsters" where hallucinations can sneak through lest I scrutinize it; rather the code would just fail to work and trigger the LLM's feedback loop to fix it when it tries to run/lint/compile/typecheck it.
But the idea that you can only build with LLMs if you have a software engineer copilot isn't true and inches further away from true every month, so it kinda sounds like a convenient lie we tell ourselves as engineers (and understandably so: it's scary).
> Hallucinations that lead to code that doesn't work just get fixed
How about hallucinations that lead to code that doesn't work outside of the specific conditions that happen to be true in your dev environment? Or, even more subtly, hallucinations that lead to code which works but has critical security vulnerabilities?
The author headline starts with "LLMs are a failure", hard to take author seriously with such a hyperbole even if second part of headline ("A new AI winter is coming") might be right.
I have a journalist friend with 0 coding experience who has used ChatGPT to help them build tools to scrape data for their work. They run the code, report the errors, repeat, until something usable results. An agent would do an even better job. Current LLMs are pretty good at spotting their own hallucinations if they're given the ability to execute code.
The author seems to have a bias. The truth is that we _do not know_ what is going to happen. It's still too early to judge the economic impact of current technology - companies need time to understand how to use this technology. And, research is still making progress. Scaling of the current paradigms (e.g. reasoning RL) could make the technology more useful/reliable. The enormous amount of investment could yield further breakthroughs. Or.. not! Given the uncertainty, one should be both appropriately invested and diversified.
Last week I gave antigravity a try, with the latest models and all, it generated subpar code that did the job very quickly for sure, but no one would have ever accepted this code in a PR, it took me 10x more time to clean it up than to have gemini shit it out.
The only thing I learned is that 90% of devs are code monkeys with very low expectations which basically amount to "it compiles and seems to work then it's good enough for me"
For toy and low effort coding it works fantastic. I can smash out changes and PRs fantastically quick, and they’re mostly correct. However, certain problem domains and tough problems cause it to spin its wheels worse than a junior programmer. Especially if some of the back and forth troubleshooting goes longer than one context compaction. Then it can forget the context of what it’s tried in the past, and goes back to square one (it may know that it tried something, but it won’t know the exact details).
That was true six months ago - the latest versions are much better at memory and adherence, and my senior engineer friends are adopting LLMs quickly for all sorts of advanced development.
..and works very well if you do know what you are doing
That's the issue. AI coding agents are only as good as the dev behind the prompt. It works for you because you have an actual background in software engineering of which coding is just one part of the process. AI coding agents can't save the inexperienced from themselves. It just helps amateurs shoot themselves in the foot faster while convincing them they're a marksman.
But if you actually filter it out, instead of (over) reacting to it in either direction, progress has been phenomenal and the fact there is visible progress in many areas, including LLMs, in the order of months demonstrates no walls.
Visible progress doesn’t mean astounding progress. But any tech that is improving year to year is moving at a good speed.
Huge apparent leaps in recent years seem to have spoiled some people. Or perhaps desensitized them. Or perhaps, created frustration that big leaps don’t happen every week.
I can’t fathom anyone not using models for 1000 things. But we all operate differently, and have different kinds of lives, work and problems. So I take claims that individuals are not getting much from models at face value.
But that some people are not finding the value isn’t an argument that those of us getting value, increasing value isn’t real.
We're getting improvements coming out month by month. No reasonable person would look at a technology improving at this frequency and write a blog post about how we've hit the ceiling.
There was value in leaded gas and asbestos insulation too, nobody denies that.
You're blind to all the negative side effects, AI generated slop ads, engagement traps, political propaganda, scams, &c. The amount of pollution is incredible, search engines are dead, blogs are dead, YouTube is dead, social medias are dead, it's virtually impossible to find non slop content, the ratio is probably already 50:1 by now
And these are only the most visible things, I know a few companies losing hundreds of hours every month replying to support tickets that are fully llm generated an more often than not don't make any sense. Another big topic is education.
Ironically, if generative AI ends up killing social media, it might actually be a net positive. How do you avoid engaging with AI content? Why, go find an actual human IRL and speak with them.
I have pretty negative feelings about all this stuff and how the future will be but also have to admit it's crazy how good it is at so many things I would have considered safe a few years ago before chatgpt.
There are a couple really disingenuous bloggers out there who have big audiences themselves and are "experts" for others audiences who really push hard this narrative that AI is a joke and will never progress by where it is today, it is actually completely useless and just a scam. This is comforting for those of us that worry more than are excited about AI so some eat it up while barely trying it for themselves
Yeah after reading the intro lines I noped out of this garbage blog (?). Maybe it's their SEO strategy to write the flat-out untrue incendiary stuff like "the technology is essentially a failure" - If that's the case, I don't have time for this.
Maybe they actually think this way, then, I certainly have time for this.
what I have time for? virtual bonding with fello HN ppl :)
Interesting take. His argument is basically that LLMs have hit their architectural ceiling and the industry is running on hype and unsustainable economics. I’m not fully convinced, but the points about rising costs and diminishing returns are worth paying attention to. The gap between what these models can actually do and what they’re marketed as might become a real problem if progress slows.
claude code + opus4.5 does exactly what it says on the box.
Today I pasted a screenshot of frontend dropdown menu with a prompt "add a an option here to clear the query cache". Claude found the relevant frontend files, figured out the appropriate backend routes / controllers to edit, and submitted a PR.
I think the unsustainably cheap consumer-facing AI products are the spoonful of sugar getting us to swallow a technology that will almost entirely be used to make agents that justify mass layoffs.
The existence of an AI hype train doesn’t mean there isn’t a productive AI no-hype train.
Context: I have been writing software for 30 years. I taught myself assembly language and hacked games/apps as a kid, and have been a professional developer for 20 years. I’m not a noob.
I’m currently building a real-time research and alerting side project using a little army of assistant AI developers. Given a choice, I would never go back to how I developed software before this. That isn’t my mind poisoned by hype and marketing.
I think we are not even close to using the potential of current LLMs. Even if capabilities of LLMs would not improve, we will see better performance on the software and hardware side. It is no longer a question of "if", but of "when" there will be a Babelfish like device available. And this is only one obvious application, I am 100% sure that people are still finding useful new applications of AI every day.
However, there is a real risk that AI stocks will crash and pull the entire market down, just like it happened in 2000 with the dotcom bubble. But did we see an internet or dotcom winter after 2000? No, everybody kept using the Internet, Windows, Amazon, Ebay, Facebook and all the other "useless crap". Only the stock market froze over for a few years and previously overhyped companies had a hard time, but given the exaggeration before 2000 this was not really a surprise.
What will happen is that the hype train will stop or slow down, and people will no longer get thousands, millions, billions, or trillions in funding just because they slap "AI" to their otherwise worthless project. Whoever is currently working on such a project should enjoy the time while it lasts - and rest assured that it will not last forever.
Well, the original "AI winter" was caused by defense contracts running out without anything to show for it -- turns out, the generals of the time could only be fooled by Eliza clones for so long...
The current AI hype is fueled by public markets, and as they found out during the pandemic, the first one to blink and acknowledge the elephant in the room loses, bigly.
So, even in the face of a devastating demonstration of "AI" ineffectiveness (which I personally haven't seen, despite things being, well, entirely underwhelming), we may very well stuck in this cycle for a while yet...
I’m fascinated by people who say that LLMs have failed in practice.
Last week, when I was on PTO, I used AI to to a full redesign of a music community website I run. I touched about 40k lines of code in a week. The redesign is shipped and everyone is using it. AI let me go about 5-10x faster than if I would have done this by hand. (In fact, I have tried doing this in the past, so I really do have an apples to apples comparison for velocity. AI enabled it happening at all: I’ve tried a few other times in the past but never been able to squeeze it into a week.)
The cited 40% inaccuracy rate doesn’t track for me at all. Claude basically one-shot anything I asked for, to the point that the bottleneck was mostly thinking of what I should ask it to do next.
At this point, saying AI has failed feels like denying reality.
Yes, and I've had similar results. I'm easily 10x times more productive with AI, and I'm seeing the same in my professional network. The power of AI is so clear and obvious, that I'm astonished so many folks remain in vigorous denial.
So when I read articles like this, I too am fascinated by the motivations and psychology of the author. What is going on there? The closest analogue I can think of is Climate Change denialism.
When the hype is infinite (technological singularity and utopia), any reality will be a let down.
But there is so much real economic value being created - not speculation, but actual business processes - billions of dollars - it’s hard to seriously defend the claim that LLMs are “failures” in any practical sense.
Doesn’t mean we aren’t headed for a winter of sobering reality… but it doesn’t invalidate the disruption either.
Other than inflated tech stocks making money off the promise of AI, what real economic impact has it actually had? I recall plenty of articles claiming that companies are having trouble actually manifesting the promised ROI.
My company’s spending a lot of money doing things they could have done fifteen or more years ago with classic computer vision libraries and other pre-LLM techniques, cheaper and faster.
Most of the value of AI for our organization is the hype itself providing the activation energy, if you will, to make these projects happen. The value added by the AI systems per se has been minimal.
(YMMV but that’s what I’m seeing at the non-tech bigco I’m at—it’s pretty silly but the checks continue to clear so whatever)
Blog posts like this make me think model adoption and appropriate use case for the model is...lumpy at best. Every time I read something like it I wonder what tools they are using and how? Modern systems are not raw transformers. A raw transformer will “always output something,” they're right, but nobody deploys naked transformers. This is like claiming CPUs can’t do long division because the ALU doesn’t natively understand decimals. Also, a model is stat aprox trained on the empirical distribution of human knowledge work. It is not trying to compute the exact solution to NP complete problems? Nature does not require worst case complexity, real world cognitive tasks are not worst case NP hardness instances...
> AI has failed.
>The rumor mill has it that about 95% of generative AI projects in the corporate world are failures.
AI tooling has only just barely reached the point where enterprise CRUD developers can start thinking about. Langchain only reached v1.0.0 in the last 60 days (Q4 2025); OpenAI effectively announced support for MCP in Q2 2025. The spec didn't even approach maturity until Q4 of 2024. Heck most LLMs didn't have support for tools in 2024.
In 2-3 years a lot of these libraries will be part way through their roadmap towards v2.0.0 to fix many of the pain points and fleshing out QOL improvements, and standard patterns evolved for integrating different workflows. Consumer streaming of audio and video on the web was a disaster of a mess until around ~2009 despite browsers having plugins for it going back over a decade. LLMs continue to improve at a rapid rate, but tooling matures more slowly.
Of course previous experiments failed or were abandoned; the technology has been moving faster than the average CRUD developer can implement features. A lot of "cutting edge" technology we put into our product in 2023 are now standard features for the free tier of market leaders like ChatGPT etc. Why bother maintaining a custom fork of 2023-era (effectively stone age) technology when free tier APIs do it better in 2025? MCP might not be the be-all, end-all, but at least it is a standard interface that's at least maintainable in a way that developers of mature software can begin conceiving of integrating it into their product as a permanent feature, rather than a curiosity MVP at the behest of a non technical exec.
A lot of AI-adjacent libraries we've been using finally hit v1.0.0 this year, or creeping close to it; providing stable interfaces for maintainable software. It's time to hit the reset button on "X% of internal AI initiatives failed"
I am of a belief that upcoming winter will look more like normalization than collapse.
The reason is hype deflation and technical stagnation don't have to arrive together. Once people stop promising AGI by Christmas and clamp down on infinite growth + infinite GPU spend, things will start to look more normal.
At this point, it feels more like the financing story was the shaky part not the tech or the workflows. LLMs’ve changed workflows in a way that’s very hard to unwind now.
I think its worth discounting against the failure rates of humans also.
> Depending on the context, and how picky you need to be about recognizing good or bad output, this might be anywhere from a 60% to a 95% success rate, with the remaining 5%-40% being bad results. This just isn't good enough for most practical purposes
This seems to suggest that humans are 100%. I'd be surprised if i was anywhere close to that after 10 years of programming professionally
I agree, I have over 20 years of software engineering experience and after vibe coding/engineering/architecting (or whatever you want to call it) for a couple months, I also don't see the technology progressing further. LLMs are more or less the same as 6 months ago, incremental improvements, but no meaningful progress. And what we have is just bad. I can use it because I know the exact code I want generated and I will just re-prompt if I don't get what I want, but I'm unconvinced that this is faster than a good search engine and writing code myself.
I think I will keep using it while it's cheap, but once I have to pay the real costs of training/running a flagship modell I think I will quit. It's too expensive as it is for what it does.
> LLMs are more or less the same as 6 months ago, incremental improvements, but no meaningful progress.
Go back to a older version of a LLM and then say the same. You will notice that older LLM versions do less, have more issues, write worse code etc...
There have been large jumps in the last 2 years but because its not like we go from a LLM to AGI, that people underestimate the gains.
Trust me, try it, try Claude 3.7 > 4.0 > 4.5 > Opus 4.5 ...
> I'm unconvinced that this is faster than a good search engine and writing code myself.
What i see is mostly somebody who is standoffish on LLMs ... Just like i was. I tried to shoehorn their code generation into "my" code, and while it works reasonably, you tend to see the LLM working on "your code", as a invasion. So you never really use its full capabilities.
LLMs really work the best, if you plan, plan, plan, and then have them create the code for you. The moment you try to get the LLMs work inside existing code, that is especially structured how YOU like it, people tend to be more standoffish.
> It's too expensive as it is for what it does.
CoPilot is like 27 Euro/month(year payment + dollar/euro) for 1500 requests here. We pay just for basic 100Mbit internet 45 Euro per month. I mean ... Will it get more expensive in the future, o yes, for sure. But we may also have alternatives by then. Open Source/Open Weight models are getting better and better, especially with MoE.
Pricing is how you look at it... If i do the work what takes me a months in a few days, what is the price then? If i do work that i normally need to outsource. Or code that is some monotoon repeating end me %@#$ ... in a short time by paying a few cents to a LLM, ...
Reality is, thing change, and just like the farmers that complained about the tractor, while their neighbor now did much more work thanks to that thing, LLMs are the same.
I often see the most resistance from us older programmer folks, who are set in our ways, when its us who actually are the best at wrangling LLMs the best. As we have the experience to fast spot where a LLM goes wrong, and guide it down the right path. Tell it, the direction is debugging is totally wrong and where the bug more likely is ...
For the last 2 years i paid just the basic cheap $10 subscription, and use it but never strongly. It helped with those monotoon tasks etc. Until ... a months ago, i needed a specific new large project and decided to just agent / vibe code it, just to try at first. And THEN i realized, how much i was missing out off. Yes, that was not "my" code, and when the click came in my head that "i am the manager, not the programmer", you suddenly gain a lot.
Its that click that is hard for most seasoned veterans. And its ironically often the seasoned guys that complain the most about AI ... when its the same folks that can get the most out of LLMs.
> Trust me, try it, try Claude 3.7 > 4.0 > 4.5 > Opus 4.5 ...
I started with Sonnet 4 and now using Opus 4.5. I don't see a meaningful difference. I think I'm a bit more confident to one prompt some issues but that's it. I think the main issue is that I always knew what/how to prompt (same skill as googling) so I can adjust once I learn what a modell can do. Sonnet is kinda the same for me as Opus.
> LLMs really work the best, if you plan, plan, plan, and then have them create the code for you. The moment you try to get the LLMs work inside existing code, that is especially structured how YOU like it, people tend to be more standoffish.
My project is AI code only. Around 30k lines, I never wrote a single line. I know I cannot just let it vibe code because I lost a month getting rid of AI spaghetti that it created in the beginning. It just got stuck rewriting and making new bugs as soon as it fixed one. Since then I'm doing a lot more handholding and a crazy amount of unit/e2e testing. Which btw. is a huge limiting factor. Now I want powerful dev machine again because if unit + e2e testing takes more than a couple seconds it slows down LLMs.
> Pricing is how you look at it... If i do the work what takes me a months in a few days, what is the price then?
I spent around 200 USD on subscriptions so far. I wanted to try out Opus 4.5 so I splurged on a Claude Max subscription this month. It's definitely a very expensive hobby.
> And its ironically often the seasoned guys that complain the most about AI
Token economics are very sound - it's the training which is expensive.
Tokens/week have gone up 23x year-over-year according to https://openrouter.ai/rankings. This is probably around $500M-1B in sales per year.
The real question is where the trajectory of this rocket ship is going. Will per-token pricing be a race to the bottom against budget chinese model providers? Will we see another 20x year year over the next 3 years, or will it level out sooner?
I think the author is onto something. but (s)he didn’t highlight that there are some scenarios where factual accuracy is unimportant, or maybe even a detractor.
for example, fictional stories. If you want to be entertained and it doesn’t matter if it’s true or not, there’s no downsides to “hallucinations”. you could argue that stories ARE hallucinations.
another example is advertisements. what matters is how people perceive them, not what’s actually true.
or, content for a political campaign.
the more i think about it, genAI really is a perfect match for social media companies
Have you ever see how much work is spent on writing a novel? Even a short one? Character and world building are not easy and require very logical reasoning even if the premise are imaginary. You can say something is humans then give it three hands in the next sentence.
While both are unlikely, if I have to choose one I would bet on AGI than AI winter in the next five years.
AI just got better and better. People thought it couldn't solve math problems without some human formalizes them first. Then it did. People thought it couldn't generate legible text. Then it did.
All while people swore it had reached a "plateau," "architecture ceiling," "inherent limit," or whatever synonym of the goalpost.
AlexNet was only released in 2011. The progress made in just 14 years has been insane. So while I do agree that we are approaching a "back to the drawing board" era, calling the past 14 years a "failure" is just not right.
> This means that they should never be used in medicine, for evaluation in school or college, for law enforcement, for tax assessment, or a myriad of other similar cases.
If AI models can deliver measurably better accuracy than doctors, clearer evaluations than professors and fairer prosecutions than courts, then it should be adopted. Waymo has already shown a measurable decrease in loss of life by eliminating humans from driving.
I believe, technically, moderns LLMs are sufficiently advanced to meaningfully disrupt the aforementioned professions as Waymo has done for taxis. Waymo's success relies on 2 non-llm factors that we've yet to see for other professions. First is exhaustive collection and labelling of in-domain high quality data. Second is the destruction of the pro-human regulatory lobby (thanks to work done by Uber in the Zirp era that came before).
To me, an AI winter isn't a concern, because AI is not the bottleneck. It is regulatory opposition and sourcing human experts who will train their own replacements. Both are significantly harder to get around for high-status white collar work. The great-AI-replacement may still fail, but it won't be because of the limitations of LLMs.
> My advice: unwind as much exposure as possible you might have to a forthcoming AI bubble crash.
Hedging when you have much at stake is always a good idea. Bubble or no bubble.
Eric S. Raymond (yes, that ESR; insert long-winded DonHopkins rant, extensively cross-referenced to his own USENET and UNIX-HATERS mailing list postings, about what a horrible person "Eric the Flute" is) reports a shortening of time to implement his latest project from weeks to hours. He says the advantages of LLMs in the programming space are yet to truly unfold, because they not only let you do things faster, they let you try things you wouldn't have started otherwise because it'd be too much of a time/effort commitment.
Assuming these claims are even partially true, we'd be stupid—at the personal and societal level—not to avail ourselves of these tools and reap the productivity gains. So I don't see AI going away any time soon. Nor will it be a passing fad like Krugman assumed the internet would be. We'd have to course-correct on its usage, but it truly is a game changer.
The winters are the best part, economic harm aside.
Winters are when technology falls out of the vice grip of Capital and into the hands of the everyman.
Winters are when you’ll see folks abandon this AIaaS model for every conceivable use case, and start shifting processing power back to the end user.
Winters ensure only the strongest survive into the next Spring. They’re consequences for hubris (“LLMs will replace all the jobs”) that give space for new things to emerge.
So, yeah, I’m looking forward to another AI winter, because that’s when we finally see what does and does not work. My personal guess is that agents and programming-assistants will be more tightly integrated into some local IDEs instead of pricey software subscriptions, foundational models won’t be trained nearly as often, and some accessibility interfaces will see improvement from the language processing capabilities of LLMs (real-time translation, as an example, or speech-to-action).
That, I’m looking forward to. AI in the hands of the common man, not locked behind subscription paywalls, advertising slop, or VC Capital.
I'm so annoyed by this negativity. Is AI perfect? No, far from it. Does it have a positive impact on productivity? 100%. Do I care about financials? Absolutely not. There's the regular hype cycle. But these idiotic takes like "dotcom 2.0" and "AI winter" are just and show that the author has no clue what they are talking about.
That 100% claim is doing an awful lot of work. Theres is ample documentation from most corporate rollouts that show the opposite, and as a SWE I find it worse than useless...
Most “AI is rubbish” takes treat it as an open-loop system: prompt → code → judgment. That’s not how development works. Even humans can’t read a spec, dump code, and ship it. Real work is closed-loop: test, compare to spec, refine, repeat. AI shines in that iterative feedback cycle, which is where these critiques miss the point.
What’s tough for me is figuring out where people are realizing significant improvements from this.
If you have to set up good tests [edit: and gather/generate good test data!] and get the spec hammered out in detail and well-described in writing, plus all the ancillary stuff like access to any systems you need, sign-offs from stakeholders… dude that’s more than 90% of the work, I’d say. I mean fuck, lots of places just skip half that and figure it out in the code as they go.
what a curious coincidence that a soft-hard AI landing would happen to begin at the exact same time as the US government launches a Totally Not a Bailout Strategic Investment Plan Bro I Promise. who could have predicted this?
Maybe. Depends upon's who's hype. But I think it is fine to say that we don't have AGI today (however that is defined) and that some people hyped that up.
2) LLMs haven't failed outright
I think that this is a vast understatement.
LLMs have been a wild success. At big tech over 40% of checked in code is LLM generated. At smaller companies the proportion is larger. ChatGPT has over 800 million weekly active users.
Students throughout the world, and especially in the developed world are using "AI" at 85-90% (from some surveys).
Between 40% of professionals and 90% (depending upon survey and profession) are using "AI".
This is 3 years after the launch of ChatGPT (and the capabilities of chatGPT 3.5 were so limited compared to today that it is a shame that they get bundled together in our discussions). I would say instead of "failed outright" that they are the most successful consumer product of all time (so far).
from what I've seen in a several-thousand-eng company: LLMs generally produce vastly more code than is necessary, so they quickly out-pace human coders. they could easily be producing half or more of all of the code even if only 10% of the teams use it. particularly because huge changes often get approved with just a "lgtm", and LLM-coding teams also often use/trust LLMs for reviews.
but they do that while making the codebase substantially worse for the next person or LLM. large code size, inconsistent behavior, duplicates of duplicates of duplicates strewn everywhere with little to no pattern so you might have to fix something a dozen times in a dozen ways for a dozen reasons before it actually works, nothing handles it efficiently.
the only thing that matters in a business is value produced, and I'm far from convinced that they're even break-even if they were free in most cases. they're burning the future with tech debt, on the hopes that it will be able to handle it where humans cannot, which does not seem true at all to me.
Measuring the value is very difficult. However there are proxies (of varying quality) which are measured, and they are showing that AI code is clearly better than copy-pasted code (which used to be the #1 source of lines of code) and at least as "good" (again, I can't get into the metrics) as human code.
Hopefully one of the major companies will release a comprehensive report to the public, but they seem to guard these metrics.
> At big tech over 40% of checked in code is LLM generated.
Assuming this is true though, how much of that 40% is boilerplate or simple, low effort code that could have been knocked out in a few minutes previously? It's always been the case that 10% of the code is particularly thorny and takes 80% of the time, or whatever.
Not to discount your overall point, LLMs are definitely a technical success.
Before LLMs I used whatever autocomplete tech came with VSCode and the plugins I used. Now with Cursor a lot of what the autocomplete did is replaced with LLM output, at much greater cost. Counting this in the "LLM generated" statistic is misleading at best, and I'm sure it's being counted
As someone who is an expert in the area, everything in this article is misleading nonsense, failing at even the most basic CS101 principles. The level of confusion here is astounding.
> People were saying that this meant that the AI winter was over
The last AI winter was over 20 years ago. Transformers came during an AI boom.
> First time around, AI was largely symbolic
Neural networks were already hot and the state of the art across many disciplines when Transformers came out.
> The other huge problem with traditional AI was that many of its algorithms were NP-complete
Algorithms are not NP-complete. That's a type error. Problems can be NP-complete, not algorithms.
> with the algorithm taking an arbitrarily long time to terminate
This has no relationship to something being NP-complete at all.
> but I strongly suspect that 'true AI', for useful definitions of that term, is at best NP-complete, possibly much worse
I think the author means that "true AI" returns answers quickly and with high accuracy? A statement that has no relationship to NP-completeness at all.
> For the uninitiated, a transformer is basically a big pile of linear algebra that takes a sequence of tokens and computes the likeliest next token
This is wrong on many levels. A Transformer is not a linear network, linear networks are well-characterized and they aren't powerful enough to do much. It's the non-linearities in the Transformer that allows it to work. And only Decoders compute the distribution over the next token.
> More specifically, they are fed one token at a time, which builds an internal state that ultimately guides the generation of the next token
Totally wrong. This is why Transformers killed RNNs. Transformers are provided all tokens simultaneously and then produce a next token one at a time. RNNs don't have that ability to simultaneously process tokens. This is just totally the wrong mental model of what a Transformer is.
> This sounds bizarre and probably impossible, but the huge research breakthrough was figuring out that, by starting with essentially random coefficients (weights and biases) in the linear algebra, and during training back-propagating errors, these weights and biases could eventually converge on something that worked.
Again, totally wrong. Gradient descent dates back to the late 1800s early 1900s. Backprop dates back to the 60s and 70s. So this clearly wasn't the key breakthrough of Transformers.
> This inner loop isn't Turing-complete – a simple program with a while loop in it is computationally more powerful. If you allow a transformer to keep generating tokens indefinitely this is probably Turing-complete, though nobody actually does that because of the cost.
This isn't what Turing-completeness is. And by definition all practical computing is not a Turing Machine, simply because TMs require an infinite tape. Our actual machines are all roughly Linear Bounded Automata. What's interesting is that this doesn't really provide us with anything useful.
> Transformers also solved scaling, because their training can be unsupervised
Unsupervised methods predate Transformers by decades and were already the state of the art in computer vision by the time Transformers came out.
> In practice, the transformer actually generates a number for every possible output token, with the highest number being chosen in order to determine the token.
Greedy decoding isn't the default in most applications.
> The problem with this approach is that the model will always generate a token, regardless of whether the context has anything to do with its training data.
Absolutely not. We have things like end tokens exactly for this, to allow the model to stop generating.
I got tired of reading at this point. This is drivel by someone who has no clue what's going on.
> This isn't what Turing-completeness is. And by definition all practical computing is not a Turing Machine, simply because TMs require an infinite tape.
I think you are too triggered and entitled in your nit-picking. Its obvious in potentially limited universe infinite tape can't exists, but for practical purpose in CS, turing-completeness means expressiveness of logic to emulate TM regardless of tape size.
>Expect OpenAI to crash, hard, with investors losing their shirts.
Lol someone doesn't understand how the power structure system works "the golden rule". There is a saying if you owe the bank 100k you have a problem. If you owe the bank ten million the bank has a problem. OpenAI and the other players have made this bubble so big that there is no way the power system will allow themselves to take the hit. Expect some sort of tax subsided bailout in the near future.
Tax subsized bailouts usually are structured to protect the corporation as a separate entity, creditors (compared to letting the org continue on its unimpeded path), employees, and sometimes, beyond their position as current creditors, suppliers, but not as often do they protect existing equity holders.
I trully hate how absurdly over-hyped this LLM concept of AI is. However, it can do some cool things. Normally that's the path to PMF. The real problem is the revenue. There is no extant revenue model whatsoever. But there are industries that are described as "a business of pennies" Ex. telephony. Someone may yet eek out a win. But the hype-to-reality conversion will come first.
This blog post is full of bizarre statements and the author seems almost entirely ignorant of the history or present of AI. I think it's fair to argue there may be an AI bubble that will burst, but this blog post is plainly wrong in many ways.
Here's a few clarifications (sorry this is so long...):
"I should explain for anyone who hasn't heard that term [AI winter]... there was much hope, as there is now, but ultimately the technology stagnated. "
The term AI winter typically refers to a period of reduced funding for AI research/development, not the technology stagnating (the technology failing to deliver on expectations was the cause of the AI winter, not the definition of AI winter).
"[When GPT3 came out, pre-ChatGPT] People were saying that this meant that the AI winter was over, and a new era was beginning."
People tend to agree there were two AI winters already, one having to do with symbolic AI disappointments/general lack of progress (70s), and the latter related to expert systems (late 80s). That AI winter has long been over. The Deep Learning revolution started in ~2012, and by 2020 (GPT 3) huge amount of talent and money were already going into AI for years. This trend just accelerated with ChatGPT.
"[After symbolic AI] So then came transformers. Seemingly capable of true AI, or, at least, scaling to being good enough to be called true AI, with astonishing capabilities ... the huge research breakthrough was figuring out that, by starting with essentially random coefficients (weights and biases) in the linear algebra, and during training back-propagating errors, these weights and biases could eventually converge on something that worked."
Transformers came about in 2017. The first wave of excitement about neural nets and backpropagation goes all the way back to the late 80s/early 90s, and AI (computer vision, NLP, to a lesser extent robotics) were already heavily ML-based by the 2000s, just not neural-net based (this changed in roughly 2012).
"All transformers have a fundamental limitation, which can not be eliminated by scaling to larger models, more training data or better fine-tuning ... This is the root of the hallucination problem in transformers, and is unsolveable because hallucinating is all that transformers can do."
The 'highest number' token is not necessarily chosen, this depends on the decoding algorithm. That aside, 'the next token will be generated to match that bad choice' makes it sound like once you generate one 'wrong' token the rest of the output is also wrong. A token is a few characters, and need not 'poison' the rest of the output.
That aside, there are plenty of ways to 'recover' from starting to go down the wrong route. A key aspect of why reasoning in LLMs works well is that it typically incorporates backtracking - going earlier in the reasoning to verify details or whatnot. You can do uncertainty estimation in the decoding algorithm, use a secondary model, plenty of things (here is a detailed survey https://arxiv.org/pdf/2311.05232 , one of several that is easy to find).
"The technology won't disappear – existing models, particularly in the open source domain, will still be available, and will still be used, but expect a few 'killer app' use cases to remain, with the rest falling away."
A quick google search shows ChatGPT currently has 800 million weekly active users who are using it for all sorts of things. AI-assisted programming is certainly here to stay, and there are plenty of other industries in which AI will be part of the workflow (helping do research, take notes, summarize, build presentations, etc.)
I think discussion is good, but it's disappointing to see stuff with this level of accuracy being on front page of HN.
"The technology is essentially a failure" is in the headline of this article. I have to disagree with that. For the first time in the history of the UNIVERSE, an entity exists that can converse in human language at the same level that humans can.
But that's me being a sucker. Because in reality this is just a clickbait headline for an article basically saying that the tech won't fully get us to AGI and that the bubble will likely pop and only a few players will remain. Which I completely agree with. It's really not that profound.
The poster is right. LLMs are Gish Gallop machines that produce convincing sounding output.
People have figured it out by now. Generative "AI" will fail, other forms may continue, though it it would be interesting to hear from experts in other fields how much fraud there is. There are tons of material science "AI" startups, it is hard to believe they all deliver.
Well, correctness(though not only correctness) sounds convincing, the most convincing even, and ought to be information-theory-wise cheaper to generate than a fabrication, I think.
So if this assumption holds, the current tech might have some ceiling left if we just continue to pour resources down the hole.
How do LLMs do on things that are common confusions? Do they specifically have to be trained against them?
I'm imagining a Monty Hall problem that isn't in the training set tripping them up the same way a full wine glass does
Modern LLMs could be like the equivalent of those steam powered toys the Romans had in their time. Steam tech went through a long winter before finally being fully utilized for the Industrial Revolution. We’ll probably never see the true AI revolution in our lifetime, only glimpses of what could be, through toys like LLMs.
What we should underscore though, is that even if there is a new AI winter, the world isn’t going back to what it was before AI. This is it, forever.
Generations ahead will gaslight themselves into thinking this AI world is better, because who wants to grow up knowing they live in a shitty era full of slop? Don’t believe it.
The development of steam technology is a great metaphor. The basic understanding of steam as a thing that could yield some kind of mechanical force almost certainly predated even the Romans. That said, it was the synthesis of other technologies with these basic concepts that started yielding really interesting results.
Put another way, the advent of industrialized steam power wasn't so much about steam per se, but rather the intersection of a number of factors (steam itself obviously being an important one). This intersection became a lot more likely as the pace of innovation in general began accelerating with the Enlightenment and the ease with which this information could be collected and synthesized.
I suspect that the LLM itself may also prove to be less significant than the density of innovation and information of the world it's developed in. It's not a certainty that there's a killer app on the scale of mechanized steam, but the odds of such significant inventions arguably increase as the basics of modern AI become basic knowledge for more and more people.
Its mostly metallurgy. The fact that we became so much better and precise at metallurgy enabled us to make use of steam machines. Of course a lot of stuff helped (Glassmaking, whale oil immediatly come to mind) but mostly, metallurgy.
I remember reading an article that argued that it was basically a matter of being path dependent. The earliest steam engines that could do useful work were notoriously large and fuel-inefficient, which is why their first application was for pumps in coal mines - it effectively made the fuel problem moot and similarly their other limitations were not important in that context, while at the same time rising wages in UK made even those inefficient engines more affordable than manual labor. And then their use in that very narrow niche allowed them to be gradually improved to the point where they became suitable for other contexts as well.
But if that analogy holds, then LLM use in software development is the "new coal mines" where it will be perfected until it spills over into other areas. We're definitely not at the "Roman stage" anymore.
The development of LLM's required access to huge amounts of decent training data.
It could very well that the current generation of AI has poisoned the well for any future endeavors of creating AI. You can't trivially filter out the AI slop and humans are less likely to make their handcrafted content freely available for training. In fact violating GPL code to train models on it might be ruled to be illegal as well generally stricter rules on which data you are allowed to use for training.
We might have reached a local optimum that is very difficult to escape from. There might be a long, long AI winter ahead of us, for better or worse.
> the world isn’t going back to what it was before AI. This is it, forever.
I feel this so much. I though my longing for the pre-smartphone days was bad but damn we have lost so much.
Maybe the path to redemption for AI stealing jobs would be if people would have to be rehired en masse to write and produce art, about whatever they want, so that it can be used to train more advanced AI.
LLMs are useful tools, but certainly have big limitations.
I think we'll continue to see anything be automated that can be automated in a way that reduces head count. So you have the dumb AI as a first line of defense and lay off half the customer service you had before.
In the meantime, fewer and fewer jobs (especially entry level), a rising poor class as the middle class is eliminated and a greater wealth gap than ever before. The markets are going to also collapse from this AI bubble. It's just a matter of when.
This article uses the computational complexity hammer way too hard, discounts huge progress in every field of AI outside of the hot trend of transformers and LLMs. Nobody is saying the future of AI is autoregressive and this article pretty much ignores any of the research that has been posted here around diffusion based text generation or how it can be combined with autoregressive methods… discounts multi-modal models entirely. He also pretty much discounts everything that’s happened with AlphaFold, Alpha Go etc. reinforcement learning etc.
The argument that computational complexity has something to do with this could have merit but the article certainly doesn’t give indication as to why. Is the brain NP complete? Maybe maybe not. I could see many arguments about why modern research will fail to create AGI but just hand waving “reality is NP-hard” is not enough.
The fact is: something fundamental has changed that enables a computer to pretty effectively understand natural language. That’s a discovery on the scale of the internet or google search and shouldn’t be discounted… and usage proves it. In 2 years there is a platform with billions of users. On top of that huge fields of new research are making leaps and bounds with novel methods utilizing AI for chemistry, computational geometry, biology etc.
It’s a paradigm shift.
I agree with everything you wrote, the technology is unbelievable and 6 years ago, maybe even 3.1 years would have been considered magic.
A steel man argument for why winter might be coming is all the dumb stuff companies are pushing AI for. On one hand (and I believe this) we argue it’s the most consequential technology in generations. On the other, everybody is using it for nonsense like helping you write an email that makes you sound like an empty suit, or providing a summary you didn’t ask for.
There’s still a ton of product work to cross whatever that valleys called between concept and product, and if that doesn’t happen, money is going to start disappearing. The valuation isn’t justified by the dumb stuff we do with it, it needs PMF.
> maybe even 3.1 years would have been considered magic
I vividly remember sitting there for literally hours talking to a computer on launch day, it was a very short night. This feeling of living in the future has not left me since, it's got quieter, but it's still there. It's still magic after those three years, perhaps even more so. It wasn't supposed to work this well for decades! and yet.
I think you're missing the point of "AI winter". It's not about how good the products are now. It's about how quickly the products are improving and creating the potential for profit. That's what drives investment.
3 things we know about the AI revolution in 2025:
- LLMs are amazing, but they have reached a plateau. AGI is not within reach.
- LLM investment has sacrificed many hundreds of billions of dollars, much of it from the world's pension funds.
- There is no credible path to a high-margin LLM product. Margins will be razor-thin positive at best once the free trial of the century starts to run out of steam.
This all adds up to a rather nasty crunch.
The thing about winter, though, is that it's eventually followed by another summer.
Many technologies plateau, but we don't say they all have winters. Terrestrial radio winter? Television winter? Automobile winter? Air travel winter? Nuclear power comes close in terms of its tarnished image and reluctance to reinvest.
I personally believe contemporary AI is over-hyped, but I cannot say with confidence that it is going to lead to a similar winter as the last time. It seems like today's products satisfy enough users to remain as a significant area, even if it doesn't greatly expand...
The only way I could see it fizzling as a product category is if it turns out it is not economically feasible to operate a sustainable service. Will users pay a fair price to keep the LLM datacenters running, without speculative investment subsidies?
The other aspect of the winter is government investment, rather than commercial. What could the next cycle of academic AI research look like? E.g. exploration that needs to happen in grant-funded university labs instead of venture-funded companies?
The federal funding picture seems unclear, but that's true across the board right now for reasons that have nothing to do with AI per se.
I think of LLMs like brains or CPUs. They're the core that does the processing, but needs to be embedded in a bigger system to be useful. Even if LLMs plateau, there will be a lot of development and improvement in the systems that use these LLM. We will be seeing a lot of innovation going forward, especially in systems that will be able to monetize these LLMs.
[flagged]
> I agree with everything you wrote, the technology is unbelievable and 6 years ago, maybe even 3.1 years would have been considered magic.
People said the same thing about ELIZA in 1967.
> The argument that computational complexity has something to do with this could have merit but the article certainly doesn’t give indication as to why.
OP says it is because that predicting the next token can be correct or not, but it always looks plausible because that is what it calculates. Therefore it is dangerous and can not be fixed because it is how it works in essence.
I just want to point out a random anecdote.
Literally yesterday ChatGPT hallucinated an entire feature of a mod for a video game I am playing including making up a fake console command.
It just straight up doesn’t exist, it just seemed like a relatively plausible thing to exist.
This is still happening. It never stopped happening. I don’t even see a real slowdown in how often it happens.
It sometimes feels like the only thing saving LLMs are when they’re forced to tap into a better system like running a search engine query.
Another anecdote. I've got a personal benchmark that I try out on these systems every time there's a new release. It is an academic math question which could be understood by an undergraduate, and which seems easy enough to solve if I were just to hammer it out over a few weeks. My prompt includes a big list of mistakes it is likely to fall into and which it should avoid. The models haven't ever made any useful progress on this question. They usually spin their wheels for a while and then output one of the errors I said to avoid.
My hit/miss rate with using these models for academic questions is low, but non-trivial. I've definitely learned new math because of using them, but it's really just an indulgence because they make stuff up so frequently.
I get generally good results from prompts asking for something I know definitely exists or is definitely possible, like an ffmpeg command I know I’ve used in the past but can’t remember. Recently I asked how to something in Imagemagick which I’d not done before but felt like the kind of thing Imagemagick should be able to do. It made up a feature that doesn’t exist.
Maybe I should have asked it to write a patch that implements that feature.
To take a different perspective on the same event.
The model expected a feature to exist because it fitted with the overall structure of the interface.
This in itself can be a valuable form of feedback. I currently don't know of any people doing it, but testing interfaces by getting LLMs to use them could be an excellent resource. Th the AI runs into trouble, it might be worth checking your designs to see if you have any inconsistencies, redundancies or other confusion causing issues.
One would assume that a consistent user interface would be easier for both AI and humas. Fixing the issues would improve it for both.
That failure could be leveraged into an automated process that identified areas to improve.
When asking question I use chatgpt only as turbo search engine. Having it double check it's sources and citations helped tremendously.
There is no difference between "hallucination" and "soberness", it's just a database you can't trust.
The response to your query might not be what you needed, similar to interacting with an RDBMS and mistyping a table name and getting data from another table or misremembering which tables exist and getting an error. We would not call such faults "hallucinations", and shouldn't when the database is a pile of eldritch vectors either. If we persist in doing so we'll teach other people to develop dangerous and absurd expectations.
No it's absolutely not. One of these is a generative stochastic process that has no guarantee at all that it will produce correct data, and in fact you can make the OPPOSITE guarantee, you are guaranteed to sometimes get incorrect data. The other is a deterministic process of data access. I could perhaps only agree with you in the sense that such faults are not uniquely hallucinatory, all outputs from an LLM are.
I don't agree with these theoretical boundaries you provide. Any database can appear to lack in determinism, because data might get deleted, corrupted or mutated. Hardware and software involved might fail intermittently.
The illusion of determinism in RDBMS systems is just that, an illusion. The reason why I used the examples of failures in interacting with such systems that I did is that most experienced developers are familiar with those situations and can relate to them, while the probability for the reader to having experienced a truer apparent indeterminism is lower.
LLM:s can provide an illusion of determinism as well, some are quite capable of repeating themselves, e.g. overfitting, intentional or otherwise.
Yep. All these do is “hallucinate”. It’s hard to work those out of the system because that’s the entire thing it does. Sometimes the hallucinations just happen to be useful.
This seems unnecessarily pedantic. We know how the system works, we just use "hallucination" colloquially when the system produces wrong output.
Other people do not, hence the danger and the responsibility of not giving them the wrong impression of what they're dealing with.
Sorry, I'm failing to see the danger of this choice of language? People who aren't really technical don't care about these nuances. It's not going to sway their opinion one way or another.
If the information it gives is wrong, but is grammatically correct, then the "AI" has fulfilled its purpose. So it isn't really "wrong output" because that is what the system was designed to do. The problem is when people use "AI" and expect it will produce truthful responses - it was never designed to do that.
You are preaching to the choir.
But the point is that everyone uses the phrase "hallucinations" and language is just how people use it. In this forum at least, I expect everyone to understand that it is simply the result of next token generation and not an edge case failure mode.
"Eldritch vectors" is a perfect descriptor, thank you.
I like asking it about my great great grandparents (without mentioning they were my great great grandparents just saying their names, jobs, places of birth).
It hallucinates whole lives out of nothing but stereotypes.
> It sometimes feels like the only thing saving LLMs are when they’re forced to tap into a better system like running a search engine query.
This is actually very profound. All free models are only reasonable if they scrape 100 web pages (according to their own output) before answering. Even then they usually have multiple errors in their output.
[flagged]
Responding with "skill issue" in a discussion is itself a skill issue. Maybe invest in some conversational skills and learn to be constructive rather than parroting a useless meme.
First of all, there is no such thing as "prompt engineering". Engineering, by definition, is a matter of applying scientific principles to solve practical problems. There are no clear scientific principles here. Writing better prompts is more a matter of heuristics, intuition, and empiricism. And there's nothing wrong with that — it can generate a lot of business value — but don't presume to call it engineering.
Writing better prompts can reduce the frequency of hallucinations but frequent hallucinations still occur even with the latest frontier LLMs regardless of prompt quality.
So you are saying the acceptable customer experience for these systems is that we need to explicitly tell them to accept defeat when they can’t find any training content/web search results that matches my query enough?
Why don't they have any concept of having a percentage of confidence in their answer?
It isn’t 2022 anymore, this is supposed to be a mature product.
Why am I even using this thing rather than using the game’s own mod database search tool? Or the wiki documentation?
What value is this system adding for me if I’m supposed to be a prompt engineer?
> What value is this system adding....
https://news.ycombinator.com/item?id=44588383
[dead]
is this supposed to be some kind of mic drop?
Technologically, I believe that you're right. On the other hands, the previous AI winters happened despite novel, useful technologies, some of which proved extremely useful and actually changed the world of software. They happened because of overhype, then investor moving on to the next opportunity.
Here, the investors are investing in LLMs. Not in AlphaFold, AlphaGo, neurosymbolic, focus learning, etc. If (when) LLMs prove insufficient to the insane level of hype and if (when) experience shows that there is only so much money that you can make with LLMs, it's possible that the money will move on to other types of AI, but there are chances that it will actually go to something entirely different, perhaps quantum, leaving AI in winter.
> that enables a computer to pretty effectively understand natural language
I'd argue that it pretty effectively mimics natural language. I don't think it really understands anything, it is just the best madlibs generator that the world has ever seen.
For many tasks, this is accurate 99+% of the time, and the failure cases may not matter. Most humans don't perform any better, and arguably regurgitate words without understanding as well.
But if the failure cases matter, then there is no actual understanding and the language the model is generating isn't ever getting "marked to market/reality" because there's no mental world model to check against. That isn't going to be usable if there are real-world consequences of the LLM getting things wrong, and they can wind up making very basic mistakes that humans wouldn't make--because we can innately understand how the world works and aren't always just stringing words together that sound good.
I don't think anybody expects AI development to stop. A winter is defined by a relative drying-up of investment and, importantly, it's almost certain that any winter will eventually be followed by another summer.
The pace of investment in the last 2 years has been so insane that even Altman has claimed that it's a bubble.
> I could see many arguments about why modern research will fail to create AGI
Why is AGI even necessary? If the loop between teaching the AI something, and it being able to repeat similar enough tasks; if that loop becomes short enough, days or hours instead of months, who cares if some ill-defined bar of AGI is met?
> something fundamental has changed that enables a computer to pretty effectively understand natural language.
You understand how the tech works right? It's statistics and tokens. The computer understands nothing. Creating "understanding" would be a breakthrough.
Edit: I wasn't trying to be a jerk. I sincerely wasn't. I don't "understand" how LLMs "understand" anything. I'd be super pumped to learn that bit. I don't have an agenda.
It astonishes me how people can make categorical judgements on things as hard to define as 'understanding'.
I would say that, except for the observable and testable performance, what else can you say about understanding?
It is a fact that LLMs are getting better at many tasks. From their performance, they seem to have an understanding of say python.
The mechanistic way this understanding arises is different than humans.
How can you say then it is 'not real', without invoking the hard problem of consciousness, at which point, we've hit a completely open question.
To be fair, it can be hard to define “chair” to the satisfaction of an unsympathetic judge.
“Do chairs exist?”:
https://m.youtube.com/watch?v=fXW-QjBsruE
I think it is fair to say that AIs do not yet "understand" what they say or what we ask them.
When I ask it to use a specific MCP to complete a certain task, and it proceeds to not use that MCP, this indicates a clear lack of understanding.
You might say that the fault was mine, that I didn't setup or initialize the MCP tool properly, but wouldn't an understanding AI recognize that it didn't have access to the MCP and tell me that it cannot satisfy my request, rather than blindly carrying on without it?
LLMs consistently prove that they lack the ability to evaluate statements for truth. They lack, as well, an awareness of their unknowing, because they are not trying to understand; their job is to generate (to hallucinate).
It astonishes me that people can be so blind to this weakness of the tool. And when we raise concerns, people always say
"How can you define what 'thinking' is?" "How can you define 'understanding'?"
These philosophical questions are missing the point. When we say it doesn't "understand", we mean that it doesn't do what we ask. It isn't reliable. It isn't as useful to us as perhaps it has been to you.
As someone who was an engineer on the original Copilot team, yes I understand how tech works.
You don’t know how your own mind “understands” something. No one on the planet can even describe how human understanding works.
Yes, LLMs are vast statistical engines but that doesn’t mean something interesting isn’t going on.
At this point I’d argue that humans “hallucinate” and/or provide wrong answers far more often than SOTA LLMs.
I expect to see responses like yours on Reddit, not HN.
Before one may begin to understand something one must first be able to estimate the level of certainty. Our robot friends, while really helpful and polite, seem to be lacking in that department. They actually think the things we've written on the internet, in books, academic papers, court documents, newspapers, etc are actually true. Where the humans aren't omniscient it fills the blanks with nonsense.
[dead]
> As someone who was an engineer on the original Copilot team
Right, so "as someone who is a sociopath completely devoid of ethics" you were one of the cogs in the machine who said "fuck your license, we're training our llm on your code whether you like it or otherwise".
> that doesn’t mean something interesting isn’t going on
Wow! Such double negative. Much science. So rigor.
> At this point I’d argue that humans “hallucinate” and/or provide wrong answers far more often than SOTA LLMs.
Yikes, may the Corpo Fascist Gods protect any pitiable humans still in your life.
> I expect to see responses like yours on Reddit, not HN.
I suppose that says something about both of us.
[dead]
We could use a little more kindness in discussion. I think the commenter has a very solid understanding on how computer works. The “understanding” is somewhat complex but I do agree with you that we are not there yet. I do think that the paradigm shift though is more about the fact that now we can interact with the computer in a new way.
You understand how the brain works right? It's probability distributions mapped to sodium ion channels. The human understands nothing.
I've heard that this human brain is rigged to find what it wants to find.
Thats how the brain works, not how the mind works. We understand the hardware, not the software.
Are we even sure we understand the hardware? My understanding is even that is contested, for example orchestrated objective reduction, holonomic brain theory or GVF theory.
The end effect certainly gives off "understanding" vibe. Even if method of achieving it is different. The commenter obviously didn't mean the way human brain understands
Birds and planes operate using somewhat different mechanics, but they do both achieve flight.
Birds and planes are very similar other than the propulsion and landing gear, and construction materials. Maybe bird vs helicopter, or bird vs rocket.
"I don't "understand" how LLMs "understand" anything."
Why does the LLM need to understand anything. What today's chatbots have achieved is a software engineering feat. They have taken a stateless token generation machine that has compressed the entire internet's vocabulary to predict the next token and have 'hacked' a whole state management machinery around it. End result is a product that just feels like another human conversing with you and remembering your last birthday.
Engineering will surely get better and while purists can argue that a new research perspective is needed, the current growth trajectory of chatbots, agents and code generation tools will carry the torch forward for years to come.
If you ask me, this new AI winter will thaw in the atmosphere even before it settles on the ground.
Every time I see comments like these I think about this research from anthropic: https://www.anthropic.com/research/mapping-mind-language-mod...
LLMs activate similar neurons for similar concepts not only across languages, but also across input types. I’d like to know if you’d consider that as a good representation of “understanding” and if not, how would you define it?
Anthropic is pretty notorious for peddling hype. This is a marketing article - it has not undergone peer-review and should not be mistaken for scientific research.
it has a proper paper attached right at the beginning of the article
It’s not peer-reviewed, and was never accepted by a scientific journal. It’s a marketing paper masquerading as science.
If i could understand what the brain scans actually meant, I would consider it a good representation. I don't think we know yet what they mean. I saw some headline the other day about a person with "low brain activity" and said person was in complete denial about it, I would be too.
As I said then, and probably echoing what other commenters are saying - what do you mean by understanding when you say computers understand nothing? do humans understand anything? if so, how?
“You understand how the brain works right? It’s neurons and electrical charges. The brain understands nothing.”
I’m always struck by how confidently people assert stuff like this, as if the fact that we can easily comprehend the low-level structure somehow invalidates the reality of the higher-level structures. As if we know concretely that the human mind is something other than emergent complexity arising from simpler mechanics.
I’m not necessarily saying these machines are “thinking”. I wish I could say for sure that they’re not, but that would be dishonest: I feel like they aren’t thinking, but I have no evidence to back that up, and I haven’t seen non-self-referential evidence from anyone else.
You don't understand how the tech works, then.
LLMs aren't as good as humans at understanding, but it's not just statistics. The stochastic parrot meme is wrong. The networks create symbolic representations in training, with huge multidimensional correlations between patterns in the data, whether its temporal or semantic. The models "understand" concepts like emotions, text, physics, arbitrary social rules and phenomena, and anything else present in the data and context in the same fundamental way that humans do it. We're just better, with representations a few orders of magnitude higher resolution, much wider redundancy, and multi-million node parallelism with asynchronous operation that silicon can't quite match yet.
In some cases, AI is superhuman, and uses better constructs than humans are capable of, in other cases, it uses hacks and shortcuts in representations, mimics where it falls short, and in some cases fails entirely, and has a suite of failure modes that aren't anywhere in the human taxonomy of operation.
LLMs and AI aren't identical to human cognition, but there's a hell of a lot of overlap, and the stochastic parrot "ItS jUsT sTaTiStIcS!11!!" meme should be regarded as an embarrassing opinion to hold.
"Thinking" models that cycle context and systems of problem solving also don't do it the same way humans think, but overlap in some of the important pieces of how we operate. We are many orders of magnitude beyond old ALICE bots and MEgaHAL markov chains - you'd need computers the size of solar systems to run a markov chain equivalent to the effective equivalent 40B LLM, let alone one of the frontier models, and those performance gains are objectively within the domain of "intelligence." We're pushing the theory and practice of AI and ML squarely into the domain of architectures and behaviors that qualify biological intelligence, and the state of the art models clearly demonstrate their capabilities accordingly.
For any definition of understanding you care to lay down, there's significant overlap between the way human brains do it and the way LLMs do it. LLMs are specifically designed to model constructs from data, and to model the systems that produce the data they're trained on, and the data they model comes from humans and human processes.
You appear to be a proper alchemist, but you can't support an argument of understanding if there is no definition of understanding that isn't circular. If you want to believe the friendly voice really understands you, we have a word for that, faith. The skeptic sees the interactions with a chatbot as a statistical game that shows how uninteresting (e.g. predictable) humans and our stupid language are. There are useful gimmicks coming out like natural language processing, for low risk applications, but this form of AI pseudoscience isn't going to survive, but it will take some time for research to catch up to understanding how to describe the falsehoods of contemporary AI toys
Understanding is the thing that happens when your neurons coalesce into a network of signaling and processing such that it empowers successful prediction of what happens next. This powers things like extrapolation, filling in missing parts of perceived patterns, temporal projection, and modeling hidden variables.
Understanding is the construction of a valid model. In biological brains, it's a vast parallelized network columns and neuron clusters in coordinated asynchronous operation, orchestrated to ingest millions of data points both internal and external, which result in a complex and sophisticated construct comprising the entirety of our subjective experience.
LLMs don't have the subjective experience module, explicitly. They're able to emulate the bits that are relevant to being good at predicting things, so it's possible that every individual token inference process produces a novel "flash" of subjective experience, but absent the explicit construct and a persistent and coherent self construct, it's not mapping the understanding to the larger context of its understanding of its self in the same way humans do it. The only place where the algorithmic qualities needed for subjective experience reside in LLMs is the test-time process slice, and because the weights themselves are unchanged in relation to any novel understanding which arises, there's no imprint left behind by the sensory stream (text, image, audio, etc.) Absent the imprint mechanism, there's no possibility to perpetuate the construct we think of as conscious experience, so for LLMs, there can never be more than individual flashes of subjectivity, and those would be limited to very low resolution correlations a degree or more of separation away from the direct experience of any sensory inputs, whereas in humans the streams are tightly coupled to processing, update in real-time, and persist through the lifetime of the mind.
The pieces being modeled are the ones that are useful. The utility of consciousness has been underexplored; it's possible that it might be useful in coordination and orchestration of the bits and pieces of "minds" that are needed to operate intelligently over arbitrarily long horizon planning, abstract generalization out of distribution, intuitive leaps between domains that only relate across multiple degrees of separation between abstract principles, and so on. It could be that consciousness will arise as an epiphenomenological outcome from the successful linking together of systems that solve the problems LLMs currently face, and the things which overcome the jagged capabilities differential are the things that make persons out of human minds.
It might also be possible to orchestrate and coordinate those capabilities without bringing a new mind along for the ride, which would be ideal. It's probably very important that we figure out what the case is, and not carelessly summon a tortured soul into existence.
It could very well be that statistics and tokens is how our brains work at the computational level too. Just that our algorithms have slightly better heuristics due to all those millennia of A/B testing of our ancestors.
Except we know for a fact that the brain doesn’t work that way. You’re ignoring the entire history of neuroscience.
I think it’s a disingenuous read to assume original commenter means “understanding” in the literal sense. When we talk about LLM “understanding”, we usually mean it from a practical sense. If you give an input to the computer, and it gives you an expected output, then colloquially the computer “understood” your input.
What do you mean by “understand”? Do you mean conscious?
Understand just means “parse language” and is highly subjective. If I talk to someone African in Chinese they do not understand me but they are still conscious.
If I talk to an LLM in Chinese it will understand me but that doesn’t mean it is conscious.
If I talk about physics to a kindergartner they will not understand but that doesn’t mean they don’t understand anything.
Do you see where I am going?
[dead]
GOFAI was also a paradigm shift, regardless of that winter. For example, banks started automating assessments of creditworthiness.
What we didn't get was what had been expected, namely things like expert systems that were actual experts, so called 'general intelligence' and war waged through 'blackboard systems'.
We've had voice controlled electronics for a long time. On the other hand, machine vision applications have improved massively in certain niches, and also allowed for new forms of intense tyranny and surveillance where errors are actually considered a feature rather than a bug since they erode civil liberties and human rights but are still broadly accepted because 'computer says'.
While you could likely argue "leaps and bounds with novel methods utilizing AI for chemistry, computational geometry, biology etc." by downplaying the first part or clarifying that it is mainly an expectation, I think most people are going to, for the foreseeable future, keep seeing "AI" as more or less synonymous with synthetic infantile chatbot personalities that substitute for human contact.
LLMs are an amazing advancement. The tech side of things is very impressive. Credit where credit is due.
Where the current wave all falls apart is on the financials. None of that makes any sense and there’s no obvious path forward.
Folks say handwavy things like “oh they’ll just sell ads” but even a cursory analysis shows that math doesn’t ad up relative to the sums of money being invested at the moment.
Tech wise I’m bullish. Business wise, AI is setting up to be a big disaster. Those that aimlessly chased the hype are heading for a world of financial pain.
Hard disagree, I'm in the process of deploying several AI solutions in Healthcare. We have a process a nurse usually spends about an hour on, and costs $40-$70 depending on if they are offshore and a few other factors. Our AI can match it at a few dollars often less. A nuse still reviews the output, but its way less time. The economics of those tokens is great. We have another solution that just finds money, $10-$30 in tokens can find hundreds of thousands of dollars. The tech isn't perfect (that's why we have a human in the loop still) but its more than good enough to do useful work, and the use cases are valuable.
It's true, but do you really trust the AI generated + Nurse Review output more than Organic Nurse generated?
In my experience, management types use the fact that AI generated + Nurse Review is faster to push a higher quota of forms generated per hour.
Eventually, from fatigue or boredom, the human in the loop just ends up being a rubber stamper. Would you trust this with your own or your children's life?
The human in the loop becomes a lot less useful when it's pressured to process a certain quota against an AI that's basically stochastic "most probable next token", aka professional bullshitter, literally trained to generate plasuible outputs with no responsibility to accurate outputs.
These same questions could be asked about self driving cars, but they've been shown to be consistently safer drivers than humans. If this guy is getting consistently better results from ai+human than it is from just humans, what would it matter if the former results in errors given the latter results in more and costs more?
TFA's while point is that there is no easy way to tell if LLM output is correct or not. Driving mistakes provide instant feedback if the output of whatever AI is driving is correct or not. Bad comparison.
If the cars weren't considerably safer drivers than humans they wouldn't be allowed on the road. There isn't as much regulation blocking deploying this healthcare solution... until those errors actually start costing hospitals money from malpractice lawsuits (or not), we don't know whether it will be allowed to remain in use.
I think they were referring to the costs of training and hosting the models. You're counting the cost of what you're buying, but the people selling it to you are in the red.
Correct
wrong. OpenAI is literally the only AI company with horrific financials. You think google is actually bleeding money on AI? they are funding it all with cash flow and still have monster margins.
OpenAI may be the worst, but I am pretty sure Anthropic is still bleeding money on AI, and I would expect a bunch of smaller dedicated AI firms are too; Google is the main firm with competitive commercial models at the high end across multiple domains that is funding AI efforts largely from its own operations (and even there, AI isn’t self sufficient, its just an internal rather than an external subsidy.)
> You think google is actually bleeding money on AI? they are funding it all with cash flow and still have monster margins.
They can still be "bleeding money on AI" if they're making enough in other areas to make up for the loss.
The question is: "Are LLMs profitable to train and host?" OpenAI, being a pure LLM company, will go bankrupt if the answer is no. The equivalent for Google is to cut its losses and discontinue the product. Maybe Gemini will have the same fate as Google+.
Are the companies providing these AI services actually profitable? My impression is that AI prices are grossly suppressed and might explode soon.
It appears very much not. There has been some suggestion that inference may be “profitable” on a unit basis but that’s ignoring most of the costs. When factoring everything in most of these look very much upside down.
While there demand at the moment, it’s also unclear what the demand would be if the prices where “real” aka what it would take to run a sustainable business.
Those sound like typical bootstrap-sized workflow optimization opportunities, which are always available but have a modest ceiling on both sales volume and margin.
That's great that you happened to find a way to use "AI solutions" for this, but it fits precisely inside the parents "tech wise, I'm bullish" statement. It's genuinely new tech, which can unearth some new opportunities like this, by addressing many niche problems that were either out of reach before or couldn't be done efficiently enough before. People like yourself should absolutely be looking for smart new small businesses to build with it, and maybe you'll even be able to grow that business into something incredible for yourself over the next 20 years. Congratulations and good luck.
The AI investment bubble that people are concerned about is about a whole different scale of bet being made; a bet which would only have possibly paid off if this technology completely reconfigured the economy within the next couple years. That really just doesn't seem to be in the cards.
Well said.
Folks were super bullish tech wise on the internet when it was new and that turned out it be correct. It was also correct that the .com bubble wiped out a generation of companies and those that survived took a decade or more to recover.
The same thing is playing out here… tech is great and not going away but also the business side is increasingly looking like another implosion waiting to happen.
>Folks say handwavy things like “oh they’ll just sell ads” but even a cursory analysis shows that math doesn’t ad up relative to the sums of money being invested at the moment.
Ok, so I think there's 2 things here that people get mixed on.
First, Inference of the current state of the art is Cheap now. There's no 2 ways about it. Statements from Google, Altman as well as costs of 3rd parties selling tokens of top tier open source models paint a pretty good picture. Ads would be enough to make Open AI a profitable company selling current SOTA LLMs to consumers.
Here's the other thing that mixes things up. Right now, Open AI is not just trying to be 'a profitable company'. They're not just trying to stay where they are and build a regular business off it. They are trying to build and serve 'AGI', or as they define it, 'highly autonomous systems that outperform humans at most economically valuable work'. They believe that, to build and serve this machine to hundreds of millions would require costs order(s) of magnitudes greater.
In service of that purpose is where all the 'insane' levels of money is moving to. They don't need hundreds of billions of dollars in data centers to stay afloat or be profitable.
If they manage to build this machine, then those costs don't matter, and if things are not working out midway, they can just drop the quest. They will still have an insanely useful product that is already used by hundreds of millions every week, as well as the margins and unit economics to actually make money off of it.
If OpenAI was the only company doing this that argument might sort of make sense.
The problem is they have real competition now and that market now looks like an expensive race to an undifferentiated bottom.
If someone truly invents AGI and it’s not easily copied by others then I agree it’s a whole new ballgame.
The reality is that years into this we seem to be hitting a limit to what LLMs can do with only marginal improvements with each release. On that path this get ugly fast.
As far as Consumer LLMs go, they don't really have competition. Well they do, but it's more Google vs Bing than Android vs Apple. 2nd place is a very very distant 2nd and almost all growth and usage is still being funneled to Open AI. Even if it's 'easily copied', getting there first could prove extremely valuable.
Most of the researchers outside big tech only have access to a handful of consumer GPUs at best. They are under a lot of pressure to invent efficient algorithms. The cost coming down by orders of magnitude seems like a good bet.
The business trajectory will be like Uber. A few big companies (Google, OpenAI) will price their AI services at a loss until consumers find it to be indispensable and competitors run out of money, then they'll steadily ramp up the pricing to the point where they're gouging consumers (and raking in profits) but still a bit cheaper or better than alternatives (humans in this case).
>None of that makes any sense and there’s no obvious path forward.
The top end models with their high compute requirements probably don't but there is value in lower end models for sure.
After all, its the AWS approach. Most of AWS services is stuff you can easily get for cheaper if you just rent an EC2 and set it up yourself. But because AWS offers very simple setup, companies don't mind paying for it.
Don't confuse OpenAI financials with Google financials. OpenAI could fold and Google would be fine.
Google would actually be in grave danger… of drowning themselves in champagne.
> Tech wise I’m bullish. Business wise, AI is setting up to be a big disaster. Those that aimlessly chased the hype are heading for a world of financial pain.
I'm not going to pretend to be on the cutting edge of news here, but isn't this where on-device models becomes relevant? It sounds like Apple's neural engine or whatever in the M5 have seen noteworthy performance improvements, and maybe in a few more generations, we don't need these openai-sized boondoggles to benefit from the tech?
> Folks say handwavy things like “oh they’ll just sell ads” but even a cursory analysis shows that math doesn’t ad up relative to the sums of money being invested at the moment.
We should factor in that messaging that's seamless and undisclosed in conversational LLM output will be a lot more valuable that what we think of as advertising today.
But will also completely poison the well, reducing over-all trust and usage.
I don't see that happening. People have stuck with streaming and social networking as they've trended user-hostile. And with LLMs an even greater type of dependence is being cultivated.
This has convinced many non-programmers that they can program, but the results are consistently disastrous, because it still requires genuine expertise to spot the hallucinations.
I've been programming for 30+ years and now a people manager. Claude Code has enabled me to code again and I'm several times more productive than I ever was as an IC in the 2000s and 2010s. I suspect this person hasn't really tried the most recent generation, it is quite impressive and works very well if you do know what you are doing
If you’ve been programming for 30+ years, you definitely don’t fall under the category of “non-programmers”.
You have decades upon decades of experience on how to approach software development and solve problems. You know the right questions to ask.
The actual non-programmers I see on Reddit are having discussions about topics such as “I don’t believe that technical debt is a real thing” and “how can I go back in time if Claude Code destroyed my code”.
People learning to code always have had those questions and issues though. For example, “git ate my code’ or “I don’t believe in python using white space as a bracket so I’m going to end all my blocks with #endif”
Isn't that what the author means?
"it still requires genuine expertise to spot the hallucinations"
"works very well if you do know what you are doing"
But it can work well even if you don't know what you are doing (or don't look at the impl).
For example, build a TUI or GUI with Claude Code while only giving it feedback on the UX/QA side. I've done it many times despite 20 years of software experience. -- Some stuff just doesn't justify me spending my time credentializing in the impl.
Hallucinations that lead to code that doesn't work just get fixed. Most code I write isn't like "now write an accurate technical essay about hamsters" where hallucinations can sneak through lest I scrutinize it; rather the code would just fail to work and trigger the LLM's feedback loop to fix it when it tries to run/lint/compile/typecheck it.
But the idea that you can only build with LLMs if you have a software engineer copilot isn't true and inches further away from true every month, so it kinda sounds like a convenient lie we tell ourselves as engineers (and understandably so: it's scary).
> Hallucinations that lead to code that doesn't work just get fixed
How about hallucinations that lead to code that doesn't work outside of the specific conditions that happen to be true in your dev environment? Or, even more subtly, hallucinations that lead to code which works but has critical security vulnerabilities?
The author headline starts with "LLMs are a failure", hard to take author seriously with such a hyperbole even if second part of headline ("A new AI winter is coming") might be right.
I have a journalist friend with 0 coding experience who has used ChatGPT to help them build tools to scrape data for their work. They run the code, report the errors, repeat, until something usable results. An agent would do an even better job. Current LLMs are pretty good at spotting their own hallucinations if they're given the ability to execute code.
The author seems to have a bias. The truth is that we _do not know_ what is going to happen. It's still too early to judge the economic impact of current technology - companies need time to understand how to use this technology. And, research is still making progress. Scaling of the current paradigms (e.g. reasoning RL) could make the technology more useful/reliable. The enormous amount of investment could yield further breakthroughs. Or.. not! Given the uncertainty, one should be both appropriately invested and diversified.
Last week I gave antigravity a try, with the latest models and all, it generated subpar code that did the job very quickly for sure, but no one would have ever accepted this code in a PR, it took me 10x more time to clean it up than to have gemini shit it out.
The only thing I learned is that 90% of devs are code monkeys with very low expectations which basically amount to "it compiles and seems to work then it's good enough for me"
For toy and low effort coding it works fantastic. I can smash out changes and PRs fantastically quick, and they’re mostly correct. However, certain problem domains and tough problems cause it to spin its wheels worse than a junior programmer. Especially if some of the back and forth troubleshooting goes longer than one context compaction. Then it can forget the context of what it’s tried in the past, and goes back to square one (it may know that it tried something, but it won’t know the exact details).
That was true six months ago - the latest versions are much better at memory and adherence, and my senior engineer friends are adopting LLMs quickly for all sorts of advanced development.
..and works very well if you do know what you are doing
That's the issue. AI coding agents are only as good as the dev behind the prompt. It works for you because you have an actual background in software engineering of which coding is just one part of the process. AI coding agents can't save the inexperienced from themselves. It just helps amateurs shoot themselves in the foot faster while convincing them they're a marksman.
It seems to work well if you DONT really know what you are doing. Because you can not spot the issues.
If you know what you are doing it works kind of mid. You see how anything more then a prototype will create lots of issues in the long run.
Dunning-Kruger effect in action.
I am simply stunned at the negativity.
Yes, there is hype.
But if you actually filter it out, instead of (over) reacting to it in either direction, progress has been phenomenal and the fact there is visible progress in many areas, including LLMs, in the order of months demonstrates no walls.
Visible progress doesn’t mean astounding progress. But any tech that is improving year to year is moving at a good speed.
Huge apparent leaps in recent years seem to have spoiled some people. Or perhaps desensitized them. Or perhaps, created frustration that big leaps don’t happen every week.
I can’t fathom anyone not using models for 1000 things. But we all operate differently, and have different kinds of lives, work and problems. So I take claims that individuals are not getting much from models at face value.
But that some people are not finding the value isn’t an argument that those of us getting value, increasing value isn’t real.
We're getting improvements coming out month by month. No reasonable person would look at a technology improving at this frequency and write a blog post about how we've hit the ceiling.
Yeah, totally agreed - there is still far too little negativity in comparison to what is going on.
There was value in leaded gas and asbestos insulation too, nobody denies that.
You're blind to all the negative side effects, AI generated slop ads, engagement traps, political propaganda, scams, &c. The amount of pollution is incredible, search engines are dead, blogs are dead, YouTube is dead, social medias are dead, it's virtually impossible to find non slop content, the ratio is probably already 50:1 by now
And these are only the most visible things, I know a few companies losing hundreds of hours every month replying to support tickets that are fully llm generated an more often than not don't make any sense. Another big topic is education.
Ironically, if generative AI ends up killing social media, it might actually be a net positive. How do you avoid engaging with AI content? Why, go find an actual human IRL and speak with them.
I have pretty negative feelings about all this stuff and how the future will be but also have to admit it's crazy how good it is at so many things I would have considered safe a few years ago before chatgpt.
There are a couple really disingenuous bloggers out there who have big audiences themselves and are "experts" for others audiences who really push hard this narrative that AI is a joke and will never progress by where it is today, it is actually completely useless and just a scam. This is comforting for those of us that worry more than are excited about AI so some eat it up while barely trying it for themselves
Yeah after reading the intro lines I noped out of this garbage blog (?). Maybe it's their SEO strategy to write the flat-out untrue incendiary stuff like "the technology is essentially a failure" - If that's the case, I don't have time for this.
Maybe they actually think this way, then, I certainly have time for this.
what I have time for? virtual bonding with fello HN ppl :)
Interesting take. His argument is basically that LLMs have hit their architectural ceiling and the industry is running on hype and unsustainable economics. I’m not fully convinced, but the points about rising costs and diminishing returns are worth paying attention to. The gap between what these models can actually do and what they’re marketed as might become a real problem if progress slows.
claude code + opus4.5 does exactly what it says on the box.
Today I pasted a screenshot of frontend dropdown menu with a prompt "add a an option here to clear the query cache". Claude found the relevant frontend files, figured out the appropriate backend routes / controllers to edit, and submitted a PR.
I think the unsustainably cheap consumer-facing AI products are the spoonful of sugar getting us to swallow a technology that will almost entirely be used to make agents that justify mass layoffs.
And surveil and rat on us, sometimes incorrectly.
The existence of an AI hype train doesn’t mean there isn’t a productive AI no-hype train.
Context: I have been writing software for 30 years. I taught myself assembly language and hacked games/apps as a kid, and have been a professional developer for 20 years. I’m not a noob.
I’m currently building a real-time research and alerting side project using a little army of assistant AI developers. Given a choice, I would never go back to how I developed software before this. That isn’t my mind poisoned by hype and marketing.
I think we are not even close to using the potential of current LLMs. Even if capabilities of LLMs would not improve, we will see better performance on the software and hardware side. It is no longer a question of "if", but of "when" there will be a Babelfish like device available. And this is only one obvious application, I am 100% sure that people are still finding useful new applications of AI every day.
However, there is a real risk that AI stocks will crash and pull the entire market down, just like it happened in 2000 with the dotcom bubble. But did we see an internet or dotcom winter after 2000? No, everybody kept using the Internet, Windows, Amazon, Ebay, Facebook and all the other "useless crap". Only the stock market froze over for a few years and previously overhyped companies had a hard time, but given the exaggeration before 2000 this was not really a surprise.
What will happen is that the hype train will stop or slow down, and people will no longer get thousands, millions, billions, or trillions in funding just because they slap "AI" to their otherwise worthless project. Whoever is currently working on such a project should enjoy the time while it lasts - and rest assured that it will not last forever.
Well, the original "AI winter" was caused by defense contracts running out without anything to show for it -- turns out, the generals of the time could only be fooled by Eliza clones for so long...
The current AI hype is fueled by public markets, and as they found out during the pandemic, the first one to blink and acknowledge the elephant in the room loses, bigly.
So, even in the face of a devastating demonstration of "AI" ineffectiveness (which I personally haven't seen, despite things being, well, entirely underwhelming), we may very well stuck in this cycle for a while yet...
I’m fascinated by people who say that LLMs have failed in practice.
Last week, when I was on PTO, I used AI to to a full redesign of a music community website I run. I touched about 40k lines of code in a week. The redesign is shipped and everyone is using it. AI let me go about 5-10x faster than if I would have done this by hand. (In fact, I have tried doing this in the past, so I really do have an apples to apples comparison for velocity. AI enabled it happening at all: I’ve tried a few other times in the past but never been able to squeeze it into a week.)
The cited 40% inaccuracy rate doesn’t track for me at all. Claude basically one-shot anything I asked for, to the point that the bottleneck was mostly thinking of what I should ask it to do next.
At this point, saying AI has failed feels like denying reality.
Yes, and I've had similar results. I'm easily 10x times more productive with AI, and I'm seeing the same in my professional network. The power of AI is so clear and obvious, that I'm astonished so many folks remain in vigorous denial.
So when I read articles like this, I too am fascinated by the motivations and psychology of the author. What is going on there? The closest analogue I can think of is Climate Change denialism.
On a music blog, yes! Now go try to rewrite the firmware for your car.
95% of the work engineers on this forum do are high level apps like a "music blog", and not car firmware...
And then there's the small % doing music firmware :)
(and no, I don't find LLMs much use on this)
When the hype is infinite (technological singularity and utopia), any reality will be a let down.
But there is so much real economic value being created - not speculation, but actual business processes - billions of dollars - it’s hard to seriously defend the claim that LLMs are “failures” in any practical sense.
Doesn’t mean we aren’t headed for a winter of sobering reality… but it doesn’t invalidate the disruption either.
Other than inflated tech stocks making money off the promise of AI, what real economic impact has it actually had? I recall plenty of articles claiming that companies are having trouble actually manifesting the promised ROI.
My company’s spending a lot of money doing things they could have done fifteen or more years ago with classic computer vision libraries and other pre-LLM techniques, cheaper and faster.
Most of the value of AI for our organization is the hype itself providing the activation energy, if you will, to make these projects happen. The value added by the AI systems per se has been minimal.
(YMMV but that’s what I’m seeing at the non-tech bigco I’m at—it’s pretty silly but the checks continue to clear so whatever)
> not speculation, but actual business processes
Is there really a clear-cut distinction between the two in today's VC and acquisition based economy?
Hype Infinity is a form of apologia that I haven’t seen before.
This is why I hate hype.
"We just cured cancer! All cancer! With a simple pill!"
"But you promised it would rejuvenate everyone to the metabolism of a 20 year old and make us biologically immortal!"
New headline: "After spending billions, project to achieve immortality has little to show..."
Blog posts like this make me think model adoption and appropriate use case for the model is...lumpy at best. Every time I read something like it I wonder what tools they are using and how? Modern systems are not raw transformers. A raw transformer will “always output something,” they're right, but nobody deploys naked transformers. This is like claiming CPUs can’t do long division because the ALU doesn’t natively understand decimals. Also, a model is stat aprox trained on the empirical distribution of human knowledge work. It is not trying to compute the exact solution to NP complete problems? Nature does not require worst case complexity, real world cognitive tasks are not worst case NP hardness instances...
> AI has failed. >The rumor mill has it that about 95% of generative AI projects in the corporate world are failures.
AI tooling has only just barely reached the point where enterprise CRUD developers can start thinking about. Langchain only reached v1.0.0 in the last 60 days (Q4 2025); OpenAI effectively announced support for MCP in Q2 2025. The spec didn't even approach maturity until Q4 of 2024. Heck most LLMs didn't have support for tools in 2024.
In 2-3 years a lot of these libraries will be part way through their roadmap towards v2.0.0 to fix many of the pain points and fleshing out QOL improvements, and standard patterns evolved for integrating different workflows. Consumer streaming of audio and video on the web was a disaster of a mess until around ~2009 despite browsers having plugins for it going back over a decade. LLMs continue to improve at a rapid rate, but tooling matures more slowly.
Of course previous experiments failed or were abandoned; the technology has been moving faster than the average CRUD developer can implement features. A lot of "cutting edge" technology we put into our product in 2023 are now standard features for the free tier of market leaders like ChatGPT etc. Why bother maintaining a custom fork of 2023-era (effectively stone age) technology when free tier APIs do it better in 2025? MCP might not be the be-all, end-all, but at least it is a standard interface that's at least maintainable in a way that developers of mature software can begin conceiving of integrating it into their product as a permanent feature, rather than a curiosity MVP at the behest of a non technical exec.
A lot of AI-adjacent libraries we've been using finally hit v1.0.0 this year, or creeping close to it; providing stable interfaces for maintainable software. It's time to hit the reset button on "X% of internal AI initiatives failed"
I am of a belief that upcoming winter will look more like normalization than collapse.
The reason is hype deflation and technical stagnation don't have to arrive together. Once people stop promising AGI by Christmas and clamp down on infinite growth + infinite GPU spend, things will start to look more normal.
At this point, it feels more like the financing story was the shaky part not the tech or the workflows. LLMs’ve changed workflows in a way that’s very hard to unwind now.
I think its worth discounting against the failure rates of humans also.
> Depending on the context, and how picky you need to be about recognizing good or bad output, this might be anywhere from a 60% to a 95% success rate, with the remaining 5%-40% being bad results. This just isn't good enough for most practical purposes
This seems to suggest that humans are 100%. I'd be surprised if i was anywhere close to that after 10 years of programming professionally
I agree, I have over 20 years of software engineering experience and after vibe coding/engineering/architecting (or whatever you want to call it) for a couple months, I also don't see the technology progressing further. LLMs are more or less the same as 6 months ago, incremental improvements, but no meaningful progress. And what we have is just bad. I can use it because I know the exact code I want generated and I will just re-prompt if I don't get what I want, but I'm unconvinced that this is faster than a good search engine and writing code myself.
I think I will keep using it while it's cheap, but once I have to pay the real costs of training/running a flagship modell I think I will quit. It's too expensive as it is for what it does.
Yet, the real cost of inference for a given model complexity is dropping.
> LLMs are more or less the same as 6 months ago, incremental improvements, but no meaningful progress.
Go back to a older version of a LLM and then say the same. You will notice that older LLM versions do less, have more issues, write worse code etc...
There have been large jumps in the last 2 years but because its not like we go from a LLM to AGI, that people underestimate the gains.
Trust me, try it, try Claude 3.7 > 4.0 > 4.5 > Opus 4.5 ...
> I'm unconvinced that this is faster than a good search engine and writing code myself.
What i see is mostly somebody who is standoffish on LLMs ... Just like i was. I tried to shoehorn their code generation into "my" code, and while it works reasonably, you tend to see the LLM working on "your code", as a invasion. So you never really use its full capabilities.
LLMs really work the best, if you plan, plan, plan, and then have them create the code for you. The moment you try to get the LLMs work inside existing code, that is especially structured how YOU like it, people tend to be more standoffish.
> It's too expensive as it is for what it does.
CoPilot is like 27 Euro/month(year payment + dollar/euro) for 1500 requests here. We pay just for basic 100Mbit internet 45 Euro per month. I mean ... Will it get more expensive in the future, o yes, for sure. But we may also have alternatives by then. Open Source/Open Weight models are getting better and better, especially with MoE.
Pricing is how you look at it... If i do the work what takes me a months in a few days, what is the price then? If i do work that i normally need to outsource. Or code that is some monotoon repeating end me %@#$ ... in a short time by paying a few cents to a LLM, ...
Reality is, thing change, and just like the farmers that complained about the tractor, while their neighbor now did much more work thanks to that thing, LLMs are the same.
I often see the most resistance from us older programmer folks, who are set in our ways, when its us who actually are the best at wrangling LLMs the best. As we have the experience to fast spot where a LLM goes wrong, and guide it down the right path. Tell it, the direction is debugging is totally wrong and where the bug more likely is ...
For the last 2 years i paid just the basic cheap $10 subscription, and use it but never strongly. It helped with those monotoon tasks etc. Until ... a months ago, i needed a specific new large project and decided to just agent / vibe code it, just to try at first. And THEN i realized, how much i was missing out off. Yes, that was not "my" code, and when the click came in my head that "i am the manager, not the programmer", you suddenly gain a lot.
Its that click that is hard for most seasoned veterans. And its ironically often the seasoned guys that complain the most about AI ... when its the same folks that can get the most out of LLMs.
> Trust me, try it, try Claude 3.7 > 4.0 > 4.5 > Opus 4.5 ...
I started with Sonnet 4 and now using Opus 4.5. I don't see a meaningful difference. I think I'm a bit more confident to one prompt some issues but that's it. I think the main issue is that I always knew what/how to prompt (same skill as googling) so I can adjust once I learn what a modell can do. Sonnet is kinda the same for me as Opus.
> LLMs really work the best, if you plan, plan, plan, and then have them create the code for you. The moment you try to get the LLMs work inside existing code, that is especially structured how YOU like it, people tend to be more standoffish.
My project is AI code only. Around 30k lines, I never wrote a single line. I know I cannot just let it vibe code because I lost a month getting rid of AI spaghetti that it created in the beginning. It just got stuck rewriting and making new bugs as soon as it fixed one. Since then I'm doing a lot more handholding and a crazy amount of unit/e2e testing. Which btw. is a huge limiting factor. Now I want powerful dev machine again because if unit + e2e testing takes more than a couple seconds it slows down LLMs.
> Pricing is how you look at it... If i do the work what takes me a months in a few days, what is the price then?
I spent around 200 USD on subscriptions so far. I wanted to try out Opus 4.5 so I splurged on a Claude Max subscription this month. It's definitely a very expensive hobby.
> And its ironically often the seasoned guys that complain the most about AI
I think because we understand what it can do.
Is "A New AI Winter Is Coming" the new "The Year of Linux Desktop"?
No, the new "Year of the Linux Desktop" is "This Year Software Engineers Won't Exist Any More Because Of LLMs". Very obviously.
Token economics are very sound - it's the training which is expensive.
Tokens/week have gone up 23x year-over-year according to https://openrouter.ai/rankings. This is probably around $500M-1B in sales per year.
The real question is where the trajectory of this rocket ship is going. Will per-token pricing be a race to the bottom against budget chinese model providers? Will we see another 20x year year over the next 3 years, or will it level out sooner?
> it's a horrible liability to have to maintain a codebase that nobody on the team actually authored.
exactly - just like 99.9873% of all codebases currently running in production worldwide :)
I think the author is onto something. but (s)he didn’t highlight that there are some scenarios where factual accuracy is unimportant, or maybe even a detractor.
for example, fictional stories. If you want to be entertained and it doesn’t matter if it’s true or not, there’s no downsides to “hallucinations”. you could argue that stories ARE hallucinations.
another example is advertisements. what matters is how people perceive them, not what’s actually true.
or, content for a political campaign.
the more i think about it, genAI really is a perfect match for social media companies
Have you ever see how much work is spent on writing a novel? Even a short one? Character and world building are not easy and require very logical reasoning even if the premise are imaginary. You can say something is humans then give it three hands in the next sentence.
While both are unlikely, if I have to choose one I would bet on AGI than AI winter in the next five years.
AI just got better and better. People thought it couldn't solve math problems without some human formalizes them first. Then it did. People thought it couldn't generate legible text. Then it did.
All while people swore it had reached a "plateau," "architecture ceiling," "inherent limit," or whatever synonym of the goalpost.
Can't wait
These sort of articles are new "economists have predicted 17 of the past 3 recessions".
AlexNet was only released in 2011. The progress made in just 14 years has been insane. So while I do agree that we are approaching a "back to the drawing board" era, calling the past 14 years a "failure" is just not right.
> This means that they should never be used in medicine, for evaluation in school or college, for law enforcement, for tax assessment, or a myriad of other similar cases.
If AI models can deliver measurably better accuracy than doctors, clearer evaluations than professors and fairer prosecutions than courts, then it should be adopted. Waymo has already shown a measurable decrease in loss of life by eliminating humans from driving.
I believe, technically, moderns LLMs are sufficiently advanced to meaningfully disrupt the aforementioned professions as Waymo has done for taxis. Waymo's success relies on 2 non-llm factors that we've yet to see for other professions. First is exhaustive collection and labelling of in-domain high quality data. Second is the destruction of the pro-human regulatory lobby (thanks to work done by Uber in the Zirp era that came before).
To me, an AI winter isn't a concern, because AI is not the bottleneck. It is regulatory opposition and sourcing human experts who will train their own replacements. Both are significantly harder to get around for high-status white collar work. The great-AI-replacement may still fail, but it won't be because of the limitations of LLMs.
> My advice: unwind as much exposure as possible you might have to a forthcoming AI bubble crash.
Hedging when you have much at stake is always a good idea. Bubble or no bubble.
Eric S. Raymond (yes, that ESR; insert long-winded DonHopkins rant, extensively cross-referenced to his own USENET and UNIX-HATERS mailing list postings, about what a horrible person "Eric the Flute" is) reports a shortening of time to implement his latest project from weeks to hours. He says the advantages of LLMs in the programming space are yet to truly unfold, because they not only let you do things faster, they let you try things you wouldn't have started otherwise because it'd be too much of a time/effort commitment.
Assuming these claims are even partially true, we'd be stupid—at the personal and societal level—not to avail ourselves of these tools and reap the productivity gains. So I don't see AI going away any time soon. Nor will it be a passing fad like Krugman assumed the internet would be. We'd have to course-correct on its usage, but it truly is a game changer.
The winters are the best part, economic harm aside.
Winters are when technology falls out of the vice grip of Capital and into the hands of the everyman.
Winters are when you’ll see folks abandon this AIaaS model for every conceivable use case, and start shifting processing power back to the end user.
Winters ensure only the strongest survive into the next Spring. They’re consequences for hubris (“LLMs will replace all the jobs”) that give space for new things to emerge.
So, yeah, I’m looking forward to another AI winter, because that’s when we finally see what does and does not work. My personal guess is that agents and programming-assistants will be more tightly integrated into some local IDEs instead of pricey software subscriptions, foundational models won’t be trained nearly as often, and some accessibility interfaces will see improvement from the language processing capabilities of LLMs (real-time translation, as an example, or speech-to-action).
That, I’m looking forward to. AI in the hands of the common man, not locked behind subscription paywalls, advertising slop, or VC Capital.
I'm so annoyed by this negativity. Is AI perfect? No, far from it. Does it have a positive impact on productivity? 100%. Do I care about financials? Absolutely not. There's the regular hype cycle. But these idiotic takes like "dotcom 2.0" and "AI winter" are just and show that the author has no clue what they are talking about.
That 100% claim is doing an awful lot of work. Theres is ample documentation from most corporate rollouts that show the opposite, and as a SWE I find it worse than useless...
Most “AI is rubbish” takes treat it as an open-loop system: prompt → code → judgment. That’s not how development works. Even humans can’t read a spec, dump code, and ship it. Real work is closed-loop: test, compare to spec, refine, repeat. AI shines in that iterative feedback cycle, which is where these critiques miss the point.
What’s tough for me is figuring out where people are realizing significant improvements from this.
If you have to set up good tests [edit: and gather/generate good test data!] and get the spec hammered out in detail and well-described in writing, plus all the ancillary stuff like access to any systems you need, sign-offs from stakeholders… dude that’s more than 90% of the work, I’d say. I mean fuck, lots of places just skip half that and figure it out in the code as they go.
How’s this meaningfully speeding things up?
what a curious coincidence that a soft-hard AI landing would happen to begin at the exact same time as the US government launches a Totally Not a Bailout Strategic Investment Plan Bro I Promise. who could have predicted this?
I sure hope another winter is on the way, the LLM hype wagon is beyond exhausting.
Every critique of AI assumes to some degree that contemporary implementations will not, or cannot, be improved upon.
Lemma: any statement about AI which uses the word "never" to preclude some feature from future realization is false.
Lemma: contemporary implementations have almost always already been improved upon, but are unevenly distributed.
(Ximm's Law)
LLMs have failed to live up to the hype, but they haven't failed outright.
Two claims here:
1) LLMs have failed to live up to the hype.
Maybe. Depends upon's who's hype. But I think it is fine to say that we don't have AGI today (however that is defined) and that some people hyped that up.
2) LLMs haven't failed outright
I think that this is a vast understatement.
LLMs have been a wild success. At big tech over 40% of checked in code is LLM generated. At smaller companies the proportion is larger. ChatGPT has over 800 million weekly active users.
Students throughout the world, and especially in the developed world are using "AI" at 85-90% (from some surveys).
Between 40% of professionals and 90% (depending upon survey and profession) are using "AI".
This is 3 years after the launch of ChatGPT (and the capabilities of chatGPT 3.5 were so limited compared to today that it is a shame that they get bundled together in our discussions). I would say instead of "failed outright" that they are the most successful consumer product of all time (so far).
> At big tech over 40% of checked in code is LLM generated. At smaller companies the proportion is larger.
I have a really hard time believing that stat without any context, is there a source for this?
from what I've seen in a several-thousand-eng company: LLMs generally produce vastly more code than is necessary, so they quickly out-pace human coders. they could easily be producing half or more of all of the code even if only 10% of the teams use it. particularly because huge changes often get approved with just a "lgtm", and LLM-coding teams also often use/trust LLMs for reviews.
but they do that while making the codebase substantially worse for the next person or LLM. large code size, inconsistent behavior, duplicates of duplicates of duplicates strewn everywhere with little to no pattern so you might have to fix something a dozen times in a dozen ways for a dozen reasons before it actually works, nothing handles it efficiently.
the only thing that matters in a business is value produced, and I'm far from convinced that they're even break-even if they were free in most cases. they're burning the future with tech debt, on the hopes that it will be able to handle it where humans cannot, which does not seem true at all to me.
Measuring the value is very difficult. However there are proxies (of varying quality) which are measured, and they are showing that AI code is clearly better than copy-pasted code (which used to be the #1 source of lines of code) and at least as "good" (again, I can't get into the metrics) as human code.
Hopefully one of the major companies will release a comprehensive report to the public, but they seem to guard these metrics.
many value/productivity metrics in use are just "Lines Of Code" in a trenchcoat. a game which LLMs are fantastic at playing.
> At big tech over 40% of checked in code is LLM generated.
Assuming this is true though, how much of that 40% is boilerplate or simple, low effort code that could have been knocked out in a few minutes previously? It's always been the case that 10% of the code is particularly thorny and takes 80% of the time, or whatever.
Not to discount your overall point, LLMs are definitely a technical success.
Before LLMs I used whatever autocomplete tech came with VSCode and the plugins I used. Now with Cursor a lot of what the autocomplete did is replaced with LLM output, at much greater cost. Counting this in the "LLM generated" statistic is misleading at best, and I'm sure it's being counted
> The technology is essentially a failure
Really? I derive a ton of value from it. For me it’s a phenomenal advancement and not a failure at all.
Zero citations, random speculations. Random blogpost versus the most rapidly adopted technology in history...
As someone who is an expert in the area, everything in this article is misleading nonsense, failing at even the most basic CS101 principles. The level of confusion here is astounding.
> People were saying that this meant that the AI winter was over
The last AI winter was over 20 years ago. Transformers came during an AI boom.
> First time around, AI was largely symbolic
Neural networks were already hot and the state of the art across many disciplines when Transformers came out.
> The other huge problem with traditional AI was that many of its algorithms were NP-complete
Algorithms are not NP-complete. That's a type error. Problems can be NP-complete, not algorithms.
> with the algorithm taking an arbitrarily long time to terminate
This has no relationship to something being NP-complete at all.
> but I strongly suspect that 'true AI', for useful definitions of that term, is at best NP-complete, possibly much worse
I think the author means that "true AI" returns answers quickly and with high accuracy? A statement that has no relationship to NP-completeness at all.
> For the uninitiated, a transformer is basically a big pile of linear algebra that takes a sequence of tokens and computes the likeliest next token
This is wrong on many levels. A Transformer is not a linear network, linear networks are well-characterized and they aren't powerful enough to do much. It's the non-linearities in the Transformer that allows it to work. And only Decoders compute the distribution over the next token.
> More specifically, they are fed one token at a time, which builds an internal state that ultimately guides the generation of the next token
Totally wrong. This is why Transformers killed RNNs. Transformers are provided all tokens simultaneously and then produce a next token one at a time. RNNs don't have that ability to simultaneously process tokens. This is just totally the wrong mental model of what a Transformer is.
> This sounds bizarre and probably impossible, but the huge research breakthrough was figuring out that, by starting with essentially random coefficients (weights and biases) in the linear algebra, and during training back-propagating errors, these weights and biases could eventually converge on something that worked.
Again, totally wrong. Gradient descent dates back to the late 1800s early 1900s. Backprop dates back to the 60s and 70s. So this clearly wasn't the key breakthrough of Transformers.
> This inner loop isn't Turing-complete – a simple program with a while loop in it is computationally more powerful. If you allow a transformer to keep generating tokens indefinitely this is probably Turing-complete, though nobody actually does that because of the cost.
This isn't what Turing-completeness is. And by definition all practical computing is not a Turing Machine, simply because TMs require an infinite tape. Our actual machines are all roughly Linear Bounded Automata. What's interesting is that this doesn't really provide us with anything useful.
> Transformers also solved scaling, because their training can be unsupervised
Unsupervised methods predate Transformers by decades and were already the state of the art in computer vision by the time Transformers came out.
> In practice, the transformer actually generates a number for every possible output token, with the highest number being chosen in order to determine the token.
Greedy decoding isn't the default in most applications.
> The problem with this approach is that the model will always generate a token, regardless of whether the context has anything to do with its training data.
Absolutely not. We have things like end tokens exactly for this, to allow the model to stop generating.
I got tired of reading at this point. This is drivel by someone who has no clue what's going on.
> This isn't what Turing-completeness is. And by definition all practical computing is not a Turing Machine, simply because TMs require an infinite tape.
I think you are too triggered and entitled in your nit-picking. Its obvious in potentially limited universe infinite tape can't exists, but for practical purpose in CS, turing-completeness means expressiveness of logic to emulate TM regardless of tape size.
I've been reading about this supposed AI winter for at least 5 years by now, and in the meantime any AI-related stock has gone 10x and more.
5 years? That's so interesting bro. Tell us more about what your thoughts on ChatGPT were back in 2020.
Go ahead and double check when the LLM craze started and perhaps reconsider making things up.
May 2018: AI winter is well on its way (piekniewski.info) [1]
January 2020: Researchers: Are we on the cusp of an ‘AI winter’? (bbc.co.uk) [2]
I'm sure you can easily find more. Felt good to be called a "bro", though, made me feel younger.
[1] HN discussion, almost 500 comments: https://news.ycombinator.com/item?id=17184054
[2] HN discussion on BBC article, ~110 comments: https://news.ycombinator.com/item?id=22069204
>Expect OpenAI to crash, hard, with investors losing their shirts.
Lol someone doesn't understand how the power structure system works "the golden rule". There is a saying if you owe the bank 100k you have a problem. If you owe the bank ten million the bank has a problem. OpenAI and the other players have made this bubble so big that there is no way the power system will allow themselves to take the hit. Expect some sort of tax subsided bailout in the near future.
Tax subsized bailouts usually are structured to protect the corporation as a separate entity, creditors (compared to letting the org continue on its unimpeded path), employees, and sometimes, beyond their position as current creditors, suppliers, but not as often do they protect existing equity holders.
The difference is that OpenAI isn't financed by borrowed money.
I trully hate how absurdly over-hyped this LLM concept of AI is. However, it can do some cool things. Normally that's the path to PMF. The real problem is the revenue. There is no extant revenue model whatsoever. But there are industries that are described as "a business of pennies" Ex. telephony. Someone may yet eek out a win. But the hype-to-reality conversion will come first.
This blog post is full of bizarre statements and the author seems almost entirely ignorant of the history or present of AI. I think it's fair to argue there may be an AI bubble that will burst, but this blog post is plainly wrong in many ways.
Here's a few clarifications (sorry this is so long...):
"I should explain for anyone who hasn't heard that term [AI winter]... there was much hope, as there is now, but ultimately the technology stagnated. "
The term AI winter typically refers to a period of reduced funding for AI research/development, not the technology stagnating (the technology failing to deliver on expectations was the cause of the AI winter, not the definition of AI winter).
"[When GPT3 came out, pre-ChatGPT] People were saying that this meant that the AI winter was over, and a new era was beginning."
People tend to agree there were two AI winters already, one having to do with symbolic AI disappointments/general lack of progress (70s), and the latter related to expert systems (late 80s). That AI winter has long been over. The Deep Learning revolution started in ~2012, and by 2020 (GPT 3) huge amount of talent and money were already going into AI for years. This trend just accelerated with ChatGPT.
"[After symbolic AI] So then came transformers. Seemingly capable of true AI, or, at least, scaling to being good enough to be called true AI, with astonishing capabilities ... the huge research breakthrough was figuring out that, by starting with essentially random coefficients (weights and biases) in the linear algebra, and during training back-propagating errors, these weights and biases could eventually converge on something that worked."
Transformers came about in 2017. The first wave of excitement about neural nets and backpropagation goes all the way back to the late 80s/early 90s, and AI (computer vision, NLP, to a lesser extent robotics) were already heavily ML-based by the 2000s, just not neural-net based (this changed in roughly 2012).
"All transformers have a fundamental limitation, which can not be eliminated by scaling to larger models, more training data or better fine-tuning ... This is the root of the hallucination problem in transformers, and is unsolveable because hallucinating is all that transformers can do."
The 'highest number' token is not necessarily chosen, this depends on the decoding algorithm. That aside, 'the next token will be generated to match that bad choice' makes it sound like once you generate one 'wrong' token the rest of the output is also wrong. A token is a few characters, and need not 'poison' the rest of the output.
That aside, there are plenty of ways to 'recover' from starting to go down the wrong route. A key aspect of why reasoning in LLMs works well is that it typically incorporates backtracking - going earlier in the reasoning to verify details or whatnot. You can do uncertainty estimation in the decoding algorithm, use a secondary model, plenty of things (here is a detailed survey https://arxiv.org/pdf/2311.05232 , one of several that is easy to find).
"The technology won't disappear – existing models, particularly in the open source domain, will still be available, and will still be used, but expect a few 'killer app' use cases to remain, with the rest falling away."
A quick google search shows ChatGPT currently has 800 million weekly active users who are using it for all sorts of things. AI-assisted programming is certainly here to stay, and there are plenty of other industries in which AI will be part of the workflow (helping do research, take notes, summarize, build presentations, etc.)
I think discussion is good, but it's disappointing to see stuff with this level of accuracy being on front page of HN.
"The technology is essentially a failure" is in the headline of this article. I have to disagree with that. For the first time in the history of the UNIVERSE, an entity exists that can converse in human language at the same level that humans can.
But that's me being a sucker. Because in reality this is just a clickbait headline for an article basically saying that the tech won't fully get us to AGI and that the bubble will likely pop and only a few players will remain. Which I completely agree with. It's really not that profound.
The poster is right. LLMs are Gish Gallop machines that produce convincing sounding output.
People have figured it out by now. Generative "AI" will fail, other forms may continue, though it it would be interesting to hear from experts in other fields how much fraud there is. There are tons of material science "AI" startups, it is hard to believe they all deliver.
>produce convincing sounding output
Well, correctness(though not only correctness) sounds convincing, the most convincing even, and ought to be information-theory-wise cheaper to generate than a fabrication, I think.
So if this assumption holds, the current tech might have some ceiling left if we just continue to pour resources down the hole.
How do LLMs do on things that are common confusions? Do they specifically have to be trained against them? I'm imagining a Monty Hall problem that isn't in the training set tripping them up the same way a full wine glass does
Modern LLMs could be like the equivalent of those steam powered toys the Romans had in their time. Steam tech went through a long winter before finally being fully utilized for the Industrial Revolution. We’ll probably never see the true AI revolution in our lifetime, only glimpses of what could be, through toys like LLMs.
What we should underscore though, is that even if there is a new AI winter, the world isn’t going back to what it was before AI. This is it, forever.
Generations ahead will gaslight themselves into thinking this AI world is better, because who wants to grow up knowing they live in a shitty era full of slop? Don’t believe it.
The development of steam technology is a great metaphor. The basic understanding of steam as a thing that could yield some kind of mechanical force almost certainly predated even the Romans. That said, it was the synthesis of other technologies with these basic concepts that started yielding really interesting results.
Put another way, the advent of industrialized steam power wasn't so much about steam per se, but rather the intersection of a number of factors (steam itself obviously being an important one). This intersection became a lot more likely as the pace of innovation in general began accelerating with the Enlightenment and the ease with which this information could be collected and synthesized.
I suspect that the LLM itself may also prove to be less significant than the density of innovation and information of the world it's developed in. It's not a certainty that there's a killer app on the scale of mechanized steam, but the odds of such significant inventions arguably increase as the basics of modern AI become basic knowledge for more and more people.
Its mostly metallurgy. The fact that we became so much better and precise at metallurgy enabled us to make use of steam machines. Of course a lot of stuff helped (Glassmaking, whale oil immediatly come to mind) but mostly, metallurgy.
I remember reading an article that argued that it was basically a matter of being path dependent. The earliest steam engines that could do useful work were notoriously large and fuel-inefficient, which is why their first application was for pumps in coal mines - it effectively made the fuel problem moot and similarly their other limitations were not important in that context, while at the same time rising wages in UK made even those inefficient engines more affordable than manual labor. And then their use in that very narrow niche allowed them to be gradually improved to the point where they became suitable for other contexts as well.
But if that analogy holds, then LLM use in software development is the "new coal mines" where it will be perfected until it spills over into other areas. We're definitely not at the "Roman stage" anymore.
The development of LLM's required access to huge amounts of decent training data.
It could very well that the current generation of AI has poisoned the well for any future endeavors of creating AI. You can't trivially filter out the AI slop and humans are less likely to make their handcrafted content freely available for training. In fact violating GPL code to train models on it might be ruled to be illegal as well generally stricter rules on which data you are allowed to use for training.
We might have reached a local optimum that is very difficult to escape from. There might be a long, long AI winter ahead of us, for better or worse.
> the world isn’t going back to what it was before AI. This is it, forever.
I feel this so much. I though my longing for the pre-smartphone days was bad but damn we have lost so much.
That would create the impetus to create a training data marketplace.
Maybe the path to redemption for AI stealing jobs would be if people would have to be rehired en masse to write and produce art, about whatever they want, so that it can be used to train more advanced AI.
LLMs are useful tools, but certainly have big limitations.
I think we'll continue to see anything be automated that can be automated in a way that reduces head count. So you have the dumb AI as a first line of defense and lay off half the customer service you had before.
In the meantime, fewer and fewer jobs (especially entry level), a rising poor class as the middle class is eliminated and a greater wealth gap than ever before. The markets are going to also collapse from this AI bubble. It's just a matter of when.