> This is a search tool that will only return content created before ChatGPT's first public release on November 30, 2022.
The problem is that Google's search engine - but, oddly enough, ALL search engines - got worse before that already. I noticed that search engines got worse several years before 2022. So, AI further decreased the quality, but the quality had a downwards trend already, as it was. There are some attempts to analyse this on youtube (also owned by Google - Google ruins our digital world); some explanations made sense to me, but even then I am not 100% certain why Google decided to ruin google search.
One key observation I made was that the youtube search, was copied onto Google's regular search, which makes no sense for google search. If I casually search for a video on youtube, I may be semi-interested in unrelated videos. But if I search on Google search for specific terms, I am not interested in crap such as "others also searched for xyz" - that is just ruining the UI with irrelevant information. This is not the only example, Google made the search results worse here and tries to confuse the user in clicking on things. Plus placement of ads. The quality really worsened.
I’ve had much better results with Kagi than with Google in the past few months. I’d trialed them a couple times in the past and been disappointed, but that’s no longer the case.
There are few other powerful countries, with countless Web services, who freely wages war(s) on other countries and support wars in many different ways. Is there a way to avoid their products?
Whataboutism doesn't get us anywhere — saying "but what about X" (insert anything for X here) usually results in doing nothing.
Some of us would rather take a stand, imperfect as it is, than just sit and do nothing. Especially in the very clear case of someone (Kagi) doing business with a country that invaded a neighboring country for no reason, and keeps killing people there.
Why this particular stand? Is doing nothing any better than taking what are essentially random stands? Obviously if you are Ukrainian this will be an important stand to you, but otherwise doing things based on a mix of what the media you like focuses on or whatever is not really very different from doing nothing.
I think "no wars of conquest" is a bright line that was crossed by Russia, that hasn't been crossed by other nations in a long time. And I think it's important for the whole world to take a stand on that, not just the nation that was invaded. It's not a "random stand."
I find it much easier to take a strong stand on Russia/Ukraine than on Israel/Palestine. The history of Israel/Palestine is much more of a gray area. Palestine has used plenty of aggressive actions and rhetoric that make Israel's actions more understandable (if not justified).
Example of actions: Gaza invaded Israel and killed, raped, and kidnapped civilians on October 7. Ukraine had no such triggering event that caused Russia to invade.
Example of rhetoric: Gaza's political leaders have said they want to destroy Israel. I don't think anyone in power in Ukraine has said they want to destroy the Russian state.
I am amused by my (unpopular and downvoted by now) comment by the scourge of "whataboutism" sparked a discussion, where comments begin with "how about" :-)
That is exactly my point! Saying "but what about" is akin to saying "you shouldn't do anything, because there is another unrelated $thing happening elsewhere". I refuse to follow this line of thinking.
First, any stand is better than whataboutism and just sitting there doing nothing.
Second, this stand results from my thoughts. It is my stand. There are many like it, but this one is mine.
Third, in the history of the modern world there were very few black&white situations where there was one side which was clearly the aggressor. This is one of them.
> First, any stand is better than whataboutism and just sitting there doing nothing.
I definitely disagree with this. There are many cases where you might take the wrong stand, especially where you do not have detailed knowledge of the issue you re taking a stand over.
Yandex has the best image search, and others are years behind it. Further more Nebius has sold all group’s businesses in Russia and certain international market. They are completely divested from Russia for a 1.5 years already: https://nebius.com/newsroom/ynv-announces-successful-complet...
The post you linked was posted when the divestment was already going underway, so it is at least dishonest if not malicious.
Yandex is the government approved search engine in Russia, which is impossible without the state exerting control over it. I wouldn't pay much attention to divestment, it's not how any of that works.
Some clarification. Since 2024 Yandex NV split into Nebius (NL-registred NASDAQ-listed company, no longer a search engine) and russian-based Yandex. The latter is fully controlled by russian investors.
The government's where the offices of a software company are physically located exert control over them. To follow this logic to its end and apply it even handedly results in nation based NIH syndrome surely?
You are talking about an entity whose ownership is 99.8% Russian nationals and state companies; whose employees for the most part are Russian nationals, whose main market is Russia and with very little tangible assets that can be arrested in the Netherlands. The only reason for this "divestment" is sanctions evasion.
At least in my area, there are legal avenues if alimony goes unpaid. Assets can be seized to pay off late payments and wages can be garnished.
Its a different story if the payer truly can't afford to pay the alimony, but at that point they wouldn't have the immense power you are concerned with.
You are mistaken to think that zealots can be reasoned with. They have been conditioned to react upon anything “Russia” like a Pavlovian cue, a command of the trained animal. They are a herd that moves as a herd, based on cues of lead animals. No amount of proof or evidence will ever dissuade them from a position that the herd is moving in. They cannot reason on their own and lack the courage to separate, let alone say something that the herd disapproves of, lest they be expelled from the herd and ganged up on.
I find this amusing, because it seems like Kagi's target audience dislikes this (politically polarized), and I as someone who is not Kagi's target audience likes this (politically neutral).
There's a secret third dimension you can ascend to through a hole in the neutral middle where the forces of the other two axes cancel out. 'The Elites' doesn't want you to know this.
Wait, what? Their choice is specifically a politically neutral one, wouldn't that mean their target audience is a politically neutral one? Why is your impression that Kagi's target audience is politically polarized users? Been a paying user of Kagi for years, never got that impression.
FWIW, I don't think Kagi should remove or avoid indexing content from countries that invade others, because a lot of the times websites in those countries have useful information on them. If Kagi were to enact such a block, it would mean it would no longer surface results from HN, reddit and a bunch of other communities, effectively making the search engine a lot less useful.
Why is supporting Yandex, who are involved in Russian politics and linked to the ruling regime, a neutral decision? That is very much a political decision, in the same way that working with US tech companies is a political decision. You need to decide what you're willing to tolerate and where your ethical lines are drawn; the alternative isn't neutrality, it's nihilism.
Imo, Kagi is still the better option, because it isn't supporting the global surveillance mechanism we call advertising. All these people, missing the forest for the single yandex tree.
Why's that something to be aware of? Yandex is actually a good search engine, so I'm told, as long as you don't search for things related to Russian politics. Kagi presumably knows this and won't use their results related to Russian politics.
Feels more like a scare campaign to me - someone doesn't want you to use Kagi, and points to Yandex as a reason for that.
So if America invades Venezuela should we all stop using google? Should we have stopped using google when the U.S. invaded Iraq and killed 150,000 people[1]?
Should we stop using products imported from China for the cultural genocide they've perpetrated against the Uyghurs?[2]
You can take whatever stand you want. When there’s a country that killed, raped and tried to exterminate most of Eastern Europe we can choose to cut any and all ties with it and consider them for all intents and purposes ..terrorists.
I sort-of see where you're coming from, but it also ignores a double standard to me. Don't buy search from a company that uses an api from another company that is (or was? unclear) based in a country that invaded another country and completely upended the world order. For some people that's a line that they don't want to cross and I get it.
However if that's the case how can they continue buying Chinese products when China has done the same thing, but worse, and for longer, to their own population? Because it's less convenient to stop? _That_ to me lands squarely in the "take whatever stand you want" category with the addendum of, "and don't worry if it doesn't make sense."
Is it because it's within their own borders and therefore isn't our problem?
And the fact that there are other countries that should also be considered terrorists, doesn't mean we shouldn't boycott this one. It means we should boycott them all. But boycotting a few is still better than nothing.
Honest answers are yes, yes, and yes. It may be unavoidable for the average person to avoid imported goods from China, but we should remain aware of our place in the world and try where we can. If the US does invade Venezuela, I sincerely hope that individuals and business owners try to cut as many ties with complicit US tech companies as possible. Honestly, with this clusterfuck of war crimes going on over "drug boats," I hope they're already starting.
I don't agree with this logic. It implies that people who use Google, Bing and a million other products made by US-based companies are supportive of the huge amount of attrocities commited or aided by the United States. Or other countries. It feels very odd to single out Russia's invasion of Ukraine but to minimize the Israeli genocide of palestinians in Gaza, the multiple unjust wars waged by the United States all over the world etc.
It's often fairly easy to find US government-centric news and criticism with Google.
But as one counterexample: The end of the US penny was formed and announced not with public legislative discourse, nor even with an executive order, but with a brief social media post by the president.
And I don't mean that it's atrocious or anything, but I wanted to see that social media post myself. Not a report about it, or someone's interpretation of it, but -- you know -- the actual utterance from the horse's mouth.
Which should be a simple matter. After all, it's the WWW.
And I've been Googling for as long as there has been a Google to Google with. I'd like to think that I am proficient at getting results from it.
But it was like pulling teeth to get Google to eventually, kicking and screaming, produce a link to the original message on Truth Social.
If that kind of active reluctance isn't censorship on Google's part, then what might it be described as instead?
And if they're seeking to keep me away from the root of this very minor issue, then what else might they also be working to keep me from?
There certainly is a huge army of people ready to spout this sort of nonsense in response to anyone talking about doing anything.
Hard to know what percentage of these folks are trying to assuage their own guilt and what percentage are state actors. Russia and Israel are very chronically online, and it behooves us internet citizens to keep that in mind.
If you are concerned about heinous war crimes and the slaughter of civilians to the point that you don't want to use private services from countries that conduct such acts, you should avoid both already.
so it is like humans vs robots started? robots ask humans questions to verify they are not robots. humans mark content as robot-generated to filter it out.
My first instinct is that users abuse it like they do any other report/downvote mechanism. They see something they just don't plain like, they report it as AI slop.
I've been using DuckDuckGo for the last... decade or so. And it still seems to return fairly relevant documentation towards the top.
To be fair, that's most of what I use search for these days is "<<Programming Language | Tool | Library | or whatever>> <<keyword | function | package>>" then navigate to the documentation, double check the versions align with what I'm writing software in, read... move on.
Sometimes I also search for "movie showtimes nyc" or for a specific venue or something.
So maybe my use cases are too specific to screw up, who knows. If not, maybe DDG is worth a try.
There is also the fact that automatically generated content predates ChatGPT by a lot. By around 2020 most Google searches already returned lots of SEO-optimized pages made from scrapped content or keyword soups made by rudimentary language models or markov chains.
Well there's also the fact that GPT-3 API was released in June 2020 and its writing capabilities were essentially on par with ChatGPT initial release. It was just a bit harder to use, because it wasn't yet trained to follow instructions, it only worked as a very good "autocomplete" model, so prompting was a bit "different" and you couldn't do stuff like "rewrite this existing article in your own words" at all, but if you just wanted to write some bullshit SEO spam from scratch it was already as good as ChatGPT would be 2 years later.
Counterpoint: The experience of quickly finding succinct accurate responses to queries has never been better.
Years ago, I would consider a search "failed" if the page with related information wasn't somewhere in the top 10. Now a search is "failed" if the AI answer doesn't give me exactly what I'm looking for directly.
> if I search on Google search for specific terms, I am not interested in crap such as "others also searched for xyz" - that is just ruining the UI with irrelevant information
You assume the aim here is for you to find relevant information, not increase user retention time. (I just love the corporate speak for making people's lives worse in various ways.)
That's a separate problem. The search algorithm applied on top of the underlying content is a separate problem from the quality or origin of the underlying content, in aggregate.
Sure, but I think that the underlying assumption is that, after the public release of ChatGPT, the amount of autogenerated content on the web became significantly bigger. Plus, the auto-generated content was easier to spot before.
Honestly the biggest failing is just SEO spam sites got too good at defeating the algorithm. The amount of bloody listicles or quora nonsense or backlink farming websties that come up in search is crazy.
Certainly seems that way if you observed the waves of usability Google search underwent in the first 15 years. There was several distinct cycles where the results were great, then garbage, then great again. They would be flooded with SEO spam, then they would tweak and penalize the SEO spam heavily, then SEO would catch up.
The funny thing is that it seems like when they gave up it wasn't because some new advancement in the arms race. It was well before LLMs hit the scene. The SEO spam was still incredibly obvious to a human reader. Really seems like some data-driven approach demonstrated that surrendering on this front led to increased ad revenue.
For most commercial related terms, I suspect if you got rid of all “spanmy” results you would be left with almost nothing. No independent blogger is gonna write about the best credit card with travel points.
Sites like Credit Karma / NerdWallet exist. While I think they are rife with affiliate link nonsense and paid promotion masquerading as advice, I'm also pretty sure they have paid researchers and writers generating genuine content. Not sure that quite falls into the bucket of SEO blogspam.
I had a coworker who kept up a blog about random purchases she’d made, where she would earn some money via affiliate links. I thought it was horrendously boring and weird, and the money made was basically pocket change, but she seemed to enjoy it. You might be surprised, people write about all sorts of things.
This is bullshit the search engines want you to believe. It's trivial to detect sites that "defeat" the algorithm; you simply detect their incentives (ads/affiliate links) instead.
Problem is that no mainstream search engine will do it because they happen to also be in the ad business and wouldn't want to reduce their own revenue stream.
Yes, this is true. It was revealed in Google emails released during antitrust hearings. Google absolutely made a deliberate decision to enshittify their search results for short term gains.
Though maybe it's a long term gain. I know many normal (i.e. non-IT) people who've noticed the poor search results, yet they continue to use Google search.
Significant changes were made to Google and YouTube in 2016 and 2017 in response to the US election. The changes provided more editorial and reputation based filtering, over best content matching.
somebody said once we are mining "low-background tokens" like we are mining low-background (radiation) steel post WW2 and i couldnt shake the concept out of my head
(wrote up in https://www.latent.space/i/139368545/the-concept-of-low-back... - but ironically repeating something somebody else said online is kinda what i'm willingly participating in, and it's unclear why human-origin tokens should be that much higher signal than ai-origin ones)
"...began to fall in 1963, when the Partial Nuclear Test Ban Treaty was enacted, and by 2008 it had decreased to only 0.005 mSv/yr above natural levels. This has made special low-background steel no longer necessary for most radiation-sensitive uses, as new steel now has a low enough radioactive signature."
Interesting. I guess that analogously, we might find that X years after some future AI content production ban, we could similarly start ignoring the low background token issue?
"Winter" in AI (or cryptocurrency, or any at all) ecosystems denote a period of low activity, and a focus on fundamentals instead of driven by hype.
What we're seeing now is something more like the peak of summer. If it ends up being a bubble, and it burtst, some months after that will be "AI Winter" as investors won't want to continue chucking money at problems anymore, and it'll go back to "in the background research" again, as it was before.
Multiple people have coined the idea repeatedly, way before you. The oldest comment on HN I could find was in December 2022 by user spawarotti: https://news.ycombinator.com/item?id=33856172
> we called it "standing on the shoulders of giants"
We do not see nearly so far though.
Because these days we are standing on the shoulders of giants that have been put into a blender and ground down into a slippery pink paste and levelled out to a statistically typical 7.3mm high layer of goo.
This sounds like an Alan Kay quote. He meant that in regards to
useful inventions. AI-generated spam just decreases the quality.
We'd need a real alternative to this garbage from Google but all
the other search engines are also bad. And their UI is also
horrible - not as bad as Google, but also bad. Qwant just tries
to copy/paste Google for instance (though interestingly enough,
sometimes it has better results than Google - but also fewer in
general, even ignornig false positive results).
Deep Research reports I think are above average internet quality, they collect hundreds of sources, synthesize and contrast them & provide backlinks. Almost like a generative wikipedia.
I think all we can expect from internet information is a good description of the distribution of materials out there, not truth. This is totally within the capabilities of LLMs. For additional confidence run 3 reports on different models.
We have two optimization mechanisms though which reduce noise with respect to their optimization functions: evolution and science. They are implicitly part of "standing on the shoulders of giants", you pick the giant to stand on (or it is picked for you).
Whether or not the optimization functions align with human survival, and thus our whole existence is not a slop, we're about to find out.
Listen, lad. I built this kingdom up from nothing. When I started here, all there was was swamp. Other kings said I was daft to build a castle on a swamp, but I built it all the same, just to show 'em. It sank into the swamp. So, I built a second one. That sank into the swamp. So, I built a third one. That burned down, fell over, then sank into the swamp, but the fourth one... stayed up! And that's what you're gonna get, lad: the strongest castle in these islands.
While this is religious:
[24] “Everyone then who hears these words of mine and does them will be like a wise man who built his house on the rock. [25] And the rain fell, and the floods came, and the winds blew and beat on that house, but it did not fall, because it had been founded on the rock. [26] And everyone who hears these words of mine and does not do them will be like a foolish man who built his house on the sand. [27] And the rain fell, and the floods came, and the winds blew and beat against that house, and it fell, and great was the fall of it.”
Humans build not on each other's slop, but on each other's success.
Capitalism, freedom of expression, the marketplace of ideas, democracy: at their best these things are ways to bend the wisdom of the crowds (such as it is) to the benefit of all; and their failures are when crowds are not wise.
The "slop" of capitalism is polluted skies, soil and water, are wage slaves and fast fashion that barely lasts one use, and are the reason why workplace health and safety rules are written in blood. The "slop" of freedom of expression includes dishonest marketing, libel, slander, and propaganda. The "slop" of democracy is populists promising everything to everyone with no way to deliver it all. The "slop" of the marketplace of ideas is every idiot demanding their own un-informed rambling be given the same weight as the considered opinions of experts.
None of these things contributed our social, technological, or economic advancement, they are simply things which happened at the same time.
AI has stuff to contribute, but using it to make an endless feed of mediocrity is not it. The flood of low-effort GenAI stuff filling feeds and drowning signal with noise, as others have said: just give us your prompt.
Projects like this remind me of a plot point in the Cyberpunk 2077 game universe. The "first internet" got too infected with dangerous AIs, so much so that a massive firewall needed to be built, and a "new" internet was built that specifically kept out the harmful AIs.
(Or something like that: it's been awhile since I played the game, and I don't remember the specific details of the story.)
It makes me wonder if a new human-only internet will need to be made at some point. It's mostly sci-fi speculation at this point, and you'd really need to hash out the details, but I am thinking of something like a meatspace-first network that continually verifies your humanity in order for you to retain access. That doesn't solve the copy-paste problem, or a thousand other ones, but I'm just thinking out loud here.
The problem really is that it is impossible to verify that the content someone uploads came from their mind and not a computer program. And at some point probably all content is at least influenced by AI. The real issue is also not that I used chatgpt to look up a synonym or asked a question before writing an article, the problem is when I copy paste the content and claim I wrote it.
> the problem is when I copy paste the content and claim I wrote it
Why is this the problem and not the reverse - using AI without adding anything original into the soup? I could paraphrase an AI response in my own words and it will be no better. But even if I used AI, if it writes my ideas, then it would not be AI slop.
Ignoring the privacy and security issues for a moment, how would having a digital ID prove that the blog post I put on my site came only out of my own mind and I didn't use an LLM for it?
There doesn't need to be any difference in treatment between AI slop and human slop. The point isn't to keep AI out - it's to keep spam and slop out. It doesn't matter whether it's produced by a being made of carbon or silicon.
If someone can consistently produce high-quality content with AI assistance, so be it. Let them. Most don't, though.
I share an opinion with Nick Bostrom, once a civilization disrupting idea (like LLMs) is pulled out of the bag, there is no putting it back. People in isolation will recreate it simply because it's now possible. All we can do is adapt.
That being said, the idea of a new freer internet is reality.. Mastodon is a great example. I think private havens like discord/matrix/telegram are an important step on the way.
In person web of trust in order to join any private community. It'll suck and be hard in the beginning, but once you reach a threshold, it'll be OK. Ban entire trees of users when you discover bots/puppets, to set an example.
So we expect either 1. people using AI and copy pasting into the human-only network, or 2. other people claiming your text sounds like AI and ostracizing you for no good reason. It won't be a happy place - I know from anti-generative AI forums.
Double hyphen is replaced in some software with an en-dash (and in those, a triple hyphen is often replaced with an em-dash), and in some with an em-dash; its usually used (other than as input to one of those pieces of software) in places where an em-dash would be appropriate, but in contexts where both an em-dash set closed and an en-dash set open might be used, it is often set open.
So, it’s not unambiguously s substitute for either is essentially its own punctuation mark used in ASCII-only environments with some influence from both the use of em-dashed and that of en-dashes in more formal environments.
In German you use en-dashes with spaces, whereas in English it’s em-dashes without spaces. Some people dislike em-dashes in English though and use en-dashes with spaces as well.
In British English en-dashes with spaces is more common than em-dashes without spaces, I think, but I don't have any data for that, just a general impression.
In English, typically em-dashes are set without spaces or with thin spaces when used to separate appositives/parentheticals (though that style isn't universal even in professional print, there are places that aet them open, and en-dashes set open can also be used in this role); when representating an interruption, they generally have no space before but frequently have space following. And other uses have other patterns.
Em dashes used as parenthetical dividers, and en dashes when used as word joiners, are usually set continuous with the text. However, such a dash can optionally be surrounded with a hair space, U+200A, or thin space, U+2009 or HTML named entities   and   These spaces are much thinner than a normal space (except in a monospaced (non-proportional) font), with the hair space in particular being the thinnest of horizontal whitespace characters.
1. (letterpress typography) A piece of metal type used to create the narrowest space.
2. (typography, US) The narrowest space appearing between letters and punctuation.
Now I'd like to see how the metal type looks like, but ehm... it's difficult googling it.
Also a whole collection of space types and what they're called in other languages.
I once spent a day debugging some data that came from an English doc written by someone in Japan that had been pasted into a system and caused problems. Turned out to be an en-dash issue that was basically invisible to the eye. No love for en-dash!
Compiler error while working on some ObjC. Nothing obviously wrong. Copy-pasted the line, same thing on the copy. Typed it out again, no issue with the re-typed version. Put the error version and the ok version next to each other, apparently identical.
I ended up discovering I'd accidentally lent on the option key while pressing the "-"; Monospace font, Xcode, m-dash and minus looked identical.
Apparently, it's not only em-dash that's distinctive. I've went through comments of the leader, and spot he also uses the backtick "’" instead of the apostrophe.
besides for training future models, is this really such a big deal? most of the AI-gened text content is just replacing content-farm SEO-spam anyway. the same stuff that any half-awares person wouldn't have read in the past is now slightly better written, using more em dashes and instances of the word "delve". if you're consistently being caught out by this stuff then likely you need to improve your search hygiene, nothing so drastic as this
the only place I've ever had any issue with AI content is r/chess, where people love to ask ChatGPT a question and then post the answer as if they wrote it, half the time seemingly innocently, which, call me racist, but I suspect is mostly due to the influence of the large and young Indian contingent. otherwise I really don't understand where the issue lies. follow the exact same rules you do for avoiding SEO spam and you will be fine
It's not the future. Tell him not to do that. If it happens again, bring it to the attention of his manager. Because that's not what he's being paid for. If he continues to do it, that's grounds for firing.
What you're describing is not the future. It's a fireable offense.
> the only place I've ever had any issue with AI content is r/chess, where people love to ask ChatGPT a question and then post the answer as if they wrote it, half the time seemingly innocently
Some of the science, energy, and technology subreddits receive a lot of ChatGPT repost comment. There are a lot of people who think they’ve made a scientific or philosophical breakthrough with ChatGPT and need to share it with the world.
Even the /r/localllama subreddit gets constant AI spam from people who think they’ve vibecoded some new AI breakthrough. There have been some recent incidents where someone posted something convincing and then others wasted a lot of time until realizing the code didn’t accomplish what the post claimed it did.
Even on HN some of the “Show HN” posts are AI garbage from people trying to build portfolios. I wasted too much time trying to understand one of them until I realized they had (unknowingly?) duplicated some commits from upstream project and then let the LLM vibe code a README that sounded like an amazing breakthrough. It was actually good work, but it wasn’t theirs. It was just some vibecoding tool eventually arriving at the same code as upstream and then putting the classic LLM written, emoji-filled bullet points in the README
In the past, I'd find one wrong answer and I could easily spot the copies. Now there's a dozen different sites with the same wrong answer, just with better formatting and nicer text.
The trick is to only search for topics where there are no answers, or only one answer leading to that blog post you wrote 10 years ago and forgot about.
> besides for training future models, is this really such a big deal? most of the AI-gened text content is just replacing content-farm SEO-spam anyway.
Yes, it is because of the other side of the coin. If you are writing human-generated, curated content, previously you would just do it in your small patch of Internet, and probably SEs (Google...) will pick it up anyway because it was good quality content. You just didn't care about SEO-driven shit anyway.
Now you nicely hand-written content is going to be fed into LLM training and it's going to be used - whatever you want it or not - in the next generation of AI slop content.
It's not slop if it is inspired from good content. Basically you need to add your original spices into the soup to make it not slop, or have the LLM do deep research kind of work to contrast among hundreds of sources.
Slop did not originate from AI itself, but from the feed ranking Algorithm which sets the criteria for visibility. They "prompt" humans to write slop.
AI slop is just an extension of this process, and it started long before LLMs. Platforms optimizing for their own interest at the expense of both users and creators is the source of slop.
SEO-spam was often at least somewhat factual and not complete generated garbage. Recipe sites, for example, usually have a button that lets you skip the SEO stuff and get to the actual recipe.
Also, the AI slop is covering almost every sentence or phrase you can think of to search. Before, if I used more niche search phrases and exact searches, I was pretty much guaranteed to get specific results. Now, I have to wade through pages and pages of nonsense.
Yes it is a big deal. I cant find new artists without having a fear of their art being AI generated, same for books and music. I also cant post my stuff to the internet anymore because I know its going to be fed into LLM training data. The internet is dead to me mostly and thankfully I lost almost all interest of being on my computer as much as I used to be.
Yes indeed, it is a problem. Now the old good sites have turned into AI-slop sites because they can't fight the spammers by writing slowly with humans.
* ChatGPT put it in my memory, so it persisted between conversations
* When asked for a citation, ChatGPT found 2 AI created articles to back itself up
It took a while, but I eventually found human written documentation from the organization that created the technical thingy I was investigating.
This happens A LOT for topics on the edge of knowledge easily found on the Web. Where you have to do true research, evaluate sources, and make good decisions on what you trust.
AI reminds me of combing through stackoverflow answers. The first one might work... Or it might not. Try again, find a different SO problem and answer. Maybe third times the charm...
Except it's all via the chat bot and it isn't as easy to get it to move off of a broken solution.
For images, https://same.energy is a nice option that, being abandoned but still functioning since a few years, seems to naturally not have crawled any AI images. And it’s all around a great product.
If you took Google of 2006, and used that iteration of the pagerank algorithm, you’d probably not get most of the SEO spam that’s so prevalent in Google results today.
That's specifically for AI generated content, but there are other indicators like how many affiliate links are on the page and how many other users have downvoted the site in their results. The other aspect is network effect, in that everyone tunes their sites to rank highly on Google. That's presumably less effective on other indices?
Most of college courses and school books haven't changed in decades. Some reputed college keep courses for Pascal and Fortran instead of Python or Java, just because, it might affect their reputation of being classical or pure or to match their campus buildings style.
give me quite different results. In the 2nd query, most results have a date listed next to them in the results page, and that date is always prior to 2022. So the date filtering is "working". However, most of the dates are actually Google making a mistake and misinterpreting some unimportant date it found on the page as the date the page was created. At least one result is a Youtube video posted before 2022, that edited its title after Chatgpt was released to say Chatgpt.
I noticed AI-generated slop taking over google search results well before ChatGPT. So I don't agree with the premise on this site that you can be "you can be sure that it was written or produced by the human hand."
I didn’t know “eccentric engineering” was even a term before reading this. It’s fascinating how much creativity went into solving problems before large models existed. There’s something refreshing about seeing humans brute force the weird edges of a system instead of outsourcing everything to an LLM.
It also makes me wonder how future kids will see this era. Maybe it will look the same way early mechanical computers look to us. A short period where people had to be unusually inquisitive just to make things work.
It doesn't really work. I tried my website and it shows up, while definitely being built after 2023. There is a mistake in the metadata of the page that shows it as from 2011.
Just the other evening, as my family argued about whether some fact was or was not fake, I detached from the conversation and began fantasizing about whether it was still possible to buy a paper encyclopedia.
In the end the analogy doesn't really work, because 'eternal September' referred to what used to be a regular, temporary thing (an influx of noobs disrupting the online culture, before eventually leaving or assimilating) becoming the new normal. 'Eternal {month associated with ChatGPT}' doesn't fit because LLM-generated content was never a periodic phenomenon.
to be honest, GPT-3, which was pretty solid and extremely capable of producing webslop, had been out for a good while before ChatGPT, and GPT-2 even had been being used for blogslop years before. maybe ChatGPT was the beginning of when the public became aware of it, but it was going on well beforehand. and, as the sibling commenter points out, the analogy doesn't quite fit structurally either
They are the same. I was looking for something and tried AI. It gave me a list of stuff. When I asked for its sources, it linked me to some SEO/Amazon affiliate slop.
All AI is doing is making it harder to know what is good information and what is slop, because it obscures the source, or people ignore the source links.
I've started just going to more things in person, asking friends for recommendations, and reading more books (should've been doing all of these anyway). There are some niche communities online I still like, and the fediverse is really neat, but I'm not sure we can stem the Great Pacific Garbage Patch-levels of slop, at this point. It's really sad. The web, as we know and love it, is well and truly dead.
I hope there's an uncensored version of the Internet Archive somewhere, I wish I could look at my website ca. 2001, but I think it got removed because of some fraudulent DMCA claim somewhere in the early 2010s.
Is that still the case? And even if so how is it going to avoid keeping it like that in the future? Are they going to stop scraping new content, or are they going to filter it with a tool which recognizes their own content?
it's a known problem in ML, I think grok solved it partially and chatGPT uses another model on top to search web like suggested below. Hence MLOps field appeared, to solve models management
I find it a bit annoying to navigate between hallucinations and outdated content. Too much invalid information to filter out.
What kind of heuristics does it use to determine age? a lot of content on Google actually backdates for some reason... presumably some sort of SEO scam?
The slop is getting worse, as there is so much llm generated shit online, now new models are getting trained on the slop. Slop training slop, and slop. We have gone full circle just in a matter of a few years.
I was replaying Cyberpunk 2077 and trying to think of all the ways one might have dialed up the dystopia to 11 (beyond what the game does). And pervasive AI slop was never on my radar. Kinda reminds me of the foreword in Neuromancer bringing attention to the fact the book was written before cellphones became popular. It's already fucking with my mind. I recently watched Frankenstein 2025 and 100% thought gen ai had a role in the CGI only to find out the director hates it so much he rather die than use it. I've been noticing little things in old movies and anime where I thought to myself (if I didn't know this was made before gen ai, I would have thought this was generated for sure). One example (https://www.youtube.com/watch?v=pGSNhVQFbOc&t=412) cityscape background in this a outro scene with buildings built on top of buildings gave me ai vibes (really the only thing in this whole anime), yet this came out ~1990. So I can already recognize a paranoia / bias in myself and really can't reliably tell what's real.. Probably also other people have this and why some non-zero number of people always thinks every blog post that comes out was written by gen ai.
I had the same experience, watching a nature documentary on a streaming service recently. It was... not so good, at least at the beginning. I was wondering if this was a pilot for AI generated content on this streaming service.
Actually, it came out in 2015 and was just low budget.
Not affiliated, but I've been using kagi's date range filter to similar effect. The difference in results for car maintenance subjects is astounding (and slightly infuriating).
For that purpose I do not update my book on LeanPub about Ruby. I just know one day people gonna read it more, because human-written content would be gold.
"This browser extension uses the Google search API to only return content published before Nov 30th, 2022 so you can be sure that it was written or produced by the human hand."
In hindsight, that would've been a real utility use case for NFTs. A decentralized cryptographic prove that some content existed in a particular form at a particular moment.
At least when reading a human-made material you can spot author's uncertainty in some topics. Usually, when someone doesn't have knowledge of something, he doesn't try to describe that. AI, however, will try to convince you that pigs can fly.
Interesting concept. As a side benefit this would allow you to make steady progress fighting SEO slop as well, since there can be no arms race if you are ignoring new content.
You could even add options for later cutoffs… for example, you could use today’s AIs to detect yesterday’s AI slop.
True, but there's probably many ways to do this and unless AI content starts falsifying tons of its metadata (which I'm sure would have other consequences), there's definitely a way.
Plus other sites that link to the content could also give away it's date of creation, which is out of the control of the AI content.
I have heard of a forum (I believe it was Physics Forums) which was very popular in the older days of the internet where some of the older posts were actually edited so that they were completely rewritten with new content. I forget what the reasoning behind it was, but it did feel shady and unethical. If I remember correctly, the impetus behind it was that the website probably went under new ownership and the new owners felt that it was okay to take over the accounts of people who hadn't logged on in several years and to completely rewrite the content of their posts.
If it's just using Google search "before <x date>" filtering I don't think there's a way to game it... but I guess that depends on whether Google uses the date that it indexed a page versus the date that a page itself declares.
None of these documents were actually published on the web by then, incl., a Watergate PDF bearing date of Nov 21, 1974 - almost 20 years before PDF format got released. Of course, WWW itself started in 1991.
Google Search's date filter is useful for finding documents about historical topics, but unreliable for proving when information actually became publicly available online.
Are you sure it works the same way for documents that Google indexed at the time of publication? (Because obviously for things that existed before Google, they had to accept the publication date at face value).
Yes, it works the same way even for content Google indexed at publication time. For example, here are chatgpt.com links that Google displays as being from 2010-2020, a period when Google existed but ChatGPT did not:
So it looks like Google uses inferred dates over its own indexing timestamps, even for recently crawled pages from domains that didn't exist during the claimed date range.
"Gamed quite easily" seems like a stretch, given that the target is definitionally not moving. The search engine is fundamentally searching an immutable dataset that "just" needs to be cleaned.
How? They have an index from a previous date and nothing new will be allowed since that date? A whole copy of the internet? I don't think so.... I'm guessing, like others, it's based on the date the user/website/blog lists in the post. Which they can change at any time.
Is it really here to stay? If the wheels fells off the investment train and ChatGPT etc. disappeared tomorrow, how many people would be running inference locally? I suspect most people either wouldn't meet the hardware requirements or would be too frustrated with the slow token generation to bother. My mom certainly wouldn't be talking to it anymore.
Remember that a year or two ago, people were saying something similar about NFTs —that they were the future of sharing content online and we should all get used to it. Now, they still might exist, it's true, but they're much less pervasive and annoying than they once were.
Maybe you don't love your mom enough to do this, but if ChatGPT disappeared tomorrow and it was something she really used and loved, I wouldn't think twice before buying her a rig powerful enough to run a quantized downlodable model on, though I'm not current on which model or software would be the best for her purposes. I get that your relationship with your mother, or your financial situation might be different though.
I don't agree it is 'almost worse' than the slop but it sure can be annoying. On one hand it seems even somewhat positive that some people developed a more critical attitude and question things they see, on the other hand they're not critical enough to realize their own criticism might be invalid. Plus I feel bad for all the resources (both human and machine) wasted on this. Like perfectly normal things being shown, but people not knowing anything about the subject chiming in to claim that it must be AI because they see something they do not fully understand.
Point still stands. It’s not going anywhere. And the literal hate and pure vitriol I’ve seen towards people on social media, even when they say “oh yeah; this is AI”, is unbelievable.
So many online groups have just become toxic shitholes because someone once or twice a week posts something AI generated
The entire US GDP for the last few quarters is being propped up by GPU vendors and one singular chatbot company, all betting that they can make a trillion dollars on $20-per-month "it's not just X, it's Y" Markov chain generators. We have six to 12 more months of this before the first investor says "wait a minute, we're not making enough money", and the house of cards comes tumbling down.
Also, maybe consider why people are upset about being consistently and sneakily lied to about whether or not an actual human wrote something. What's more likely: that everyone who's angry is wrong, or that you're misunderstanding why they're upset?
I feel like this is the kind of dodgy take that'll be dispelled by half an hour's concerted use of the thing you're talking about
short of massive technological regression, there's literally never going to be a situation where the use of what amounts to a second brain with access to all the world's public information is not going to be incredibly marketable
I dare you to try building a project with Cursor or a better cousin and then come back and repeat this comment
>What's more likely: that everyone who's angry is wrong, or that you're misunderstanding why they're upset?
your patronising tone aside, GP didn't say everyone was wrong, did he? if he didn't, which he didn't, then it's a completely useless and fallacious rhetorical. what he actually said was that it's very common. and, factually, it is. I can't count the number of these type of instagram comments I've seen on obviously real videos. most people have next to no understanding of AI and its limitations and typical features, and "surprising visual occurrence in video" or "article with correct grammar and punctuation" are enough for them to think they've figured something out
> I dare you to try building a project with Cursor or a better cousin and then come back and repeat this comment
I always try every new technology, to understand how it works, and expand my perspective. I've written a few simple websites with Cursor (one mistake and it wiped everything, and I could never get it to produce any acceptable result again), tried writing the script for a YouTube video with ChatGPT and Claude (full of hallucinations, which – after a few rewrites – led to us writing a video about hallucinations), generated subtitles with Whisper (with every single sentence having at least some mistake) and finally used Suno and ChatGPT to generate some songs and images (both of which were massively improved once I just made them myself).
Whether Android apps or websites, scripts, songs, or memes, so far AI is significantly worse at internet research and creation than a human. And cleaning up the work AI did always ended up being taking longer just doing it myself from scratch. AI certainly makes you feel more productive, and it seems like you're getting things done faster, even though it's not.
Fascinatingly, as we found out from this HN post Markov chains don't work when scaled up, for technical reasons, so that whole transformers thing is actually necessary for this current generation of AI.
What isn't going anywhere? You're kidding yourself if you think every single place AI is used will withstand the test of time. You're also kidding yourself if you think consumer sentiment will play no part in determining which uses of AI will eventually die off.
I don't think anyone seriously believes the technology will categorically stop being used anytime soon. But then again we still keep using tech thats 50+ years old as it is.
> This is a search tool that will only return content created before ChatGPT's first public release on November 30, 2022.
The problem is that Google's search engine - but, oddly enough, ALL search engines - got worse before that already. I noticed that search engines got worse several years before 2022. So, AI further decreased the quality, but the quality had a downwards trend already, as it was. There are some attempts to analyse this on youtube (also owned by Google - Google ruins our digital world); some explanations made sense to me, but even then I am not 100% certain why Google decided to ruin google search.
One key observation I made was that the youtube search, was copied onto Google's regular search, which makes no sense for google search. If I casually search for a video on youtube, I may be semi-interested in unrelated videos. But if I search on Google search for specific terms, I am not interested in crap such as "others also searched for xyz" - that is just ruining the UI with irrelevant information. This is not the only example, Google made the search results worse here and tries to confuse the user in clicking on things. Plus placement of ads. The quality really worsened.
Are you aware of Kagi (kagi.com)?
With them, at least the AI stuff can be turned off.
Membership is presently about 61k, and seems to be growing about 2k per month: https://kagi.com/stats
The AI stuff in google search can be turned off.
My default browser search tool is set to google with ?udm=14 automatically appended.I’ve had much better results with Kagi than with Google in the past few months. I’d trialed them a couple times in the past and been disappointed, but that’s no longer the case.
Be aware of:
https://www.reddit.com/r/SearchKagi/comments/1gvlqhm/disappo...
I directly use Yandex sometimes, because there are huge blind spots for all the US-based engines I'm aware of, and it fills some of them in.
If someone can point me to a better index for that purpose, I'd love to avoid Yandex. Please inform me.
There are few other powerful countries, with countless Web services, who freely wages war(s) on other countries and support wars in many different ways. Is there a way to avoid their products?
As a European, I'm also increasingly in favour of avoiding American companies. Especially the big corrupting near-monopolists.
It's worth pointing out the flaws of all bad actors. The more info we have, the more effectively we can act.
Whataboutism doesn't get us anywhere — saying "but what about X" (insert anything for X here) usually results in doing nothing.
Some of us would rather take a stand, imperfect as it is, than just sit and do nothing. Especially in the very clear case of someone (Kagi) doing business with a country that invaded a neighboring country for no reason, and keeps killing people there.
Why this particular stand? Is doing nothing any better than taking what are essentially random stands? Obviously if you are Ukrainian this will be an important stand to you, but otherwise doing things based on a mix of what the media you like focuses on or whatever is not really very different from doing nothing.
I think "no wars of conquest" is a bright line that was crossed by Russia, that hasn't been crossed by other nations in a long time. And I think it's important for the whole world to take a stand on that, not just the nation that was invaded. It's not a "random stand."
>"no wars of conquest"
how about "no wars of genocide"? you know, like the one the collective West had enthusiastically supported for a while now?
I find it much easier to take a strong stand on Russia/Ukraine than on Israel/Palestine. The history of Israel/Palestine is much more of a gray area. Palestine has used plenty of aggressive actions and rhetoric that make Israel's actions more understandable (if not justified).
Example of actions: Gaza invaded Israel and killed, raped, and kidnapped civilians on October 7. Ukraine had no such triggering event that caused Russia to invade.
Example of rhetoric: Gaza's political leaders have said they want to destroy Israel. I don't think anyone in power in Ukraine has said they want to destroy the Russian state.
"enthusiastic support"
https://yougov.co.uk/international/articles/52279-net-favour...
https://www.pewresearch.org/politics/2025/10/03/how-american...
etc etc....
I'm not sure what collective West you're referring to; but apparently it excludes every major Western European nation, America, and Canada.
Plenty of people boycott Israeli goods and there's an increasing trend of moving away from reliance on American services also.
I am amused by my (unpopular and downvoted by now) comment by the scourge of "whataboutism" sparked a discussion, where comments begin with "how about" :-)
That is exactly my point! Saying "but what about" is akin to saying "you shouldn't do anything, because there is another unrelated $thing happening elsewhere". I refuse to follow this line of thinking.
did you just "but what about X" to the previous comment which is the whole point of this thread?
Doing something is literally the opposite of doing nothing. This is complete gibberish.
> Why this particular stand?
First, any stand is better than whataboutism and just sitting there doing nothing.
Second, this stand results from my thoughts. It is my stand. There are many like it, but this one is mine.
Third, in the history of the modern world there were very few black&white situations where there was one side which was clearly the aggressor. This is one of them.
> First, any stand is better than whataboutism and just sitting there doing nothing.
I definitely disagree with this. There are many cases where you might take the wrong stand, especially where you do not have detailed knowledge of the issue you re taking a stand over.
“whataboutism” is the reddit word for calling out hypocrisy
Yandex has the best image search, and others are years behind it. Further more Nebius has sold all group’s businesses in Russia and certain international market. They are completely divested from Russia for a 1.5 years already: https://nebius.com/newsroom/ynv-announces-successful-complet...
The post you linked was posted when the divestment was already going underway, so it is at least dishonest if not malicious.
Yandex is the government approved search engine in Russia, which is impossible without the state exerting control over it. I wouldn't pay much attention to divestment, it's not how any of that works.
For instance here you can learn that Yandex NV is fully controlled by a group of Russian investors: https://www.rbc.ru/business/06/03/2024/65e7a0f29a7947609ea39...
Some clarification. Since 2024 Yandex NV split into Nebius (NL-registred NASDAQ-listed company, no longer a search engine) and russian-based Yandex. The latter is fully controlled by russian investors.
The government's where the offices of a software company are physically located exert control over them. To follow this logic to its end and apply it even handedly results in nation based NIH syndrome surely?
You are talking about an entity whose ownership is 99.8% Russian nationals and state companies; whose employees for the most part are Russian nationals, whose main market is Russia and with very little tangible assets that can be arrested in the Netherlands. The only reason for this "divestment" is sanctions evasion.
you clearly don't know anything about nebius
They have a lot of hardware in e.g. Finland. I don't think they provide GPU access to the russian companies, feel free to correct me
We were talking search engines here, but interesting indeed! What's the name of Neibus CEO?
I wouldn’t trust a divorce where one party still provides for the other.
You don't "trust" a divorce is alimony was part of the settlement?
Yep, when the party paying can decide not to pay and there are no teeth to extract payment, that gives immense power to the payer.
At least in my area, there are legal avenues if alimony goes unpaid. Assets can be seized to pay off late payments and wages can be garnished.
Its a different story if the payer truly can't afford to pay the alimony, but at that point they wouldn't have the immense power you are concerned with.
https://news.ycombinator.com/item?id=42349797 (11 months ago)
https://som.yale.edu/story/2022/over-1000-companies-have-cur...
You pays your money, you takes your choice.
You are mistaken to think that zealots can be reasoned with. They have been conditioned to react upon anything “Russia” like a Pavlovian cue, a command of the trained animal. They are a herd that moves as a herd, based on cues of lead animals. No amount of proof or evidence will ever dissuade them from a position that the herd is moving in. They cannot reason on their own and lack the courage to separate, let alone say something that the herd disapproves of, lest they be expelled from the herd and ganged up on.
I find this amusing, because it seems like Kagi's target audience dislikes this (politically polarized), and I as someone who is not Kagi's target audience likes this (politically neutral).
Politics is not just a 1 dimensional line.
Yeah, it's two dimensional. One axis goes from good to evil. The other axis, chaotic to lawful.
There's a secret third dimension you can ascend to through a hole in the neutral middle where the forces of the other two axes cancel out. 'The Elites' doesn't want you to know this.
/hj?
Wait, what? Their choice is specifically a politically neutral one, wouldn't that mean their target audience is a politically neutral one? Why is your impression that Kagi's target audience is politically polarized users? Been a paying user of Kagi for years, never got that impression.
FWIW, I don't think Kagi should remove or avoid indexing content from countries that invade others, because a lot of the times websites in those countries have useful information on them. If Kagi were to enact such a block, it would mean it would no longer surface results from HN, reddit and a bunch of other communities, effectively making the search engine a lot less useful.
Why is supporting Yandex, who are involved in Russian politics and linked to the ruling regime, a neutral decision? That is very much a political decision, in the same way that working with US tech companies is a political decision. You need to decide what you're willing to tolerate and where your ethical lines are drawn; the alternative isn't neutrality, it's nihilism.
Solution: Kagi as it is, but with a ‘remove Yandex’ toggle. Even if it was a paid upgrade, I’d take it.
Damn. I didn't know that.
Now we need a 2nd Kagi, so we can switch to that one instead. :(
Imo, Kagi is still the better option, because it isn't supporting the global surveillance mechanism we call advertising. All these people, missing the forest for the single yandex tree.
Yeah I kept thinking "man I should try kagi" and then that :(
Try it anyway.
Naw, the well is poisoned and I question the company's decision making at this point.
He probably doesn’t want to support genocide.
Hope he doesn't pay his taxes then considering where US aid ends up
Why's that something to be aware of? Yandex is actually a good search engine, so I'm told, as long as you don't search for things related to Russian politics. Kagi presumably knows this and won't use their results related to Russian politics.
Feels more like a scare campaign to me - someone doesn't want you to use Kagi, and points to Yandex as a reason for that.
So if America invades Venezuela should we all stop using google? Should we have stopped using google when the U.S. invaded Iraq and killed 150,000 people[1]?
Should we stop using products imported from China for the cultural genocide they've perpetrated against the Uyghurs?[2]
Is Yandex Russia?
[1] https://en.wikipedia.org/wiki/Casualties_of_the_Iraq_War
[2] https://en.wikipedia.org/wiki/Persecution_of_Uyghurs_in_Chin...
You can take whatever stand you want. When there’s a country that killed, raped and tried to exterminate most of Eastern Europe we can choose to cut any and all ties with it and consider them for all intents and purposes ..terrorists.
I sort-of see where you're coming from, but it also ignores a double standard to me. Don't buy search from a company that uses an api from another company that is (or was? unclear) based in a country that invaded another country and completely upended the world order. For some people that's a line that they don't want to cross and I get it.
However if that's the case how can they continue buying Chinese products when China has done the same thing, but worse, and for longer, to their own population? Because it's less convenient to stop? _That_ to me lands squarely in the "take whatever stand you want" category with the addendum of, "and don't worry if it doesn't make sense."
Is it because it's within their own borders and therefore isn't our problem?
And the fact that there are other countries that should also be considered terrorists, doesn't mean we shouldn't boycott this one. It means we should boycott them all. But boycotting a few is still better than nothing.
Honest answers are yes, yes, and yes. It may be unavoidable for the average person to avoid imported goods from China, but we should remain aware of our place in the world and try where we can. If the US does invade Venezuela, I sincerely hope that individuals and business owners try to cut as many ties with complicit US tech companies as possible. Honestly, with this clusterfuck of war crimes going on over "drug boats," I hope they're already starting.
I don't agree with this logic. It implies that people who use Google, Bing and a million other products made by US-based companies are supportive of the huge amount of attrocities commited or aided by the United States. Or other countries. It feels very odd to single out Russia's invasion of Ukraine but to minimize the Israeli genocide of palestinians in Gaza, the multiple unjust wars waged by the United States all over the world etc.
Google doesn’t censor those atrocities for the US government. That’s the key difference.
It's often fairly easy to find US government-centric news and criticism with Google.
But as one counterexample: The end of the US penny was formed and announced not with public legislative discourse, nor even with an executive order, but with a brief social media post by the president.
And I don't mean that it's atrocious or anything, but I wanted to see that social media post myself. Not a report about it, or someone's interpretation of it, but -- you know -- the actual utterance from the horse's mouth.
Which should be a simple matter. After all, it's the WWW.
And I've been Googling for as long as there has been a Google to Google with. I'd like to think that I am proficient at getting results from it.
But it was like pulling teeth to get Google to eventually, kicking and screaming, produce a link to the original message on Truth Social.
If that kind of active reluctance isn't censorship on Google's part, then what might it be described as instead?
And if they're seeking to keep me away from the root of this very minor issue, then what else might they also be working to keep me from?
It doesn’t imply any of that at all.
There certainly is a huge army of people ready to spout this sort of nonsense in response to anyone talking about doing anything.
Hard to know what percentage of these folks are trying to assuage their own guilt and what percentage are state actors. Russia and Israel are very chronically online, and it behooves us internet citizens to keep that in mind.
Thank you. Didn't know that and was, until now, considering paying for a Kagi subscription.
Kagi is based in the United States, as is YC.
If you are concerned about heinous war crimes and the slaughter of civilians to the point that you don't want to use private services from countries that conduct such acts, you should avoid both already.
> "We do not discriminate based on current geopolitical issues."
That's one way of phrasing it.
Meh. Most people, including myself, couldn't care less, and Yandex image search is very capable.
based Vlad tbh
Haven't looked back since I signed up.
How does Kagi know what is AI stuff? I don't see how they can 'just turn it off'
By "turn it off" I mostly mean that Kagi have their own AI driven tools available, but a toggle in your user settings disables it completely.
ie it's not forced down your throat, nor mysteriously/accidentally/etc turned back on occasionally
It's driven by community ratings.
https://news.ycombinator.com/item?id=45919067
so it is like humans vs robots started? robots ask humans questions to verify they are not robots. humans mark content as robot-generated to filter it out.
My first instinct is that users abuse it like they do any other report/downvote mechanism. They see something they just don't plain like, they report it as AI slop.
I've been using DuckDuckGo for the last... decade or so. And it still seems to return fairly relevant documentation towards the top.
To be fair, that's most of what I use search for these days is "<<Programming Language | Tool | Library | or whatever>> <<keyword | function | package>>" then navigate to the documentation, double check the versions align with what I'm writing software in, read... move on.
Sometimes I also search for "movie showtimes nyc" or for a specific venue or something.
So maybe my use cases are too specific to screw up, who knows. If not, maybe DDG is worth a try.
There is also the fact that automatically generated content predates ChatGPT by a lot. By around 2020 most Google searches already returned lots of SEO-optimized pages made from scrapped content or keyword soups made by rudimentary language models or markov chains.
Well there's also the fact that GPT-3 API was released in June 2020 and its writing capabilities were essentially on par with ChatGPT initial release. It was just a bit harder to use, because it wasn't yet trained to follow instructions, it only worked as a very good "autocomplete" model, so prompting was a bit "different" and you couldn't do stuff like "rewrite this existing article in your own words" at all, but if you just wanted to write some bullshit SEO spam from scratch it was already as good as ChatGPT would be 2 years later.
Also the full release of GPT-2 in late 2019. While GPT-2 wasn't really "good" at writing, it was more than good enough to make SEO spam
I didn't remember that, but it would explain the spam exponential grow back then.
And 10 years ago, Reddit was already experimenting with auto-generated subreddits: https://www.reddit.com/r/SubredditSimulator.
It was popular way before 2020 but Google managed to keep up with SEO tricks for good decade+ before. Guess it got to breaking point.
> Google made the search results worse here
Did you mean:
worse results near me
are worse results worth it
worse results net worth
best worse results
worse results reddit
search: Emacs
Tbh, this sounds like a Google Easter egg.
Because it is
Counterpoint: The experience of quickly finding succinct accurate responses to queries has never been better.
Years ago, I would consider a search "failed" if the page with related information wasn't somewhere in the top 10. Now a search is "failed" if the AI answer doesn't give me exactly what I'm looking for directly.
> if I search on Google search for specific terms, I am not interested in crap such as "others also searched for xyz" - that is just ruining the UI with irrelevant information
You assume the aim here is for you to find relevant information, not increase user retention time. (I just love the corporate speak for making people's lives worse in various ways.)
You finding relevant information used to be the aim. Enshittification started when they let go of that aim.
> The problem
That's a separate problem. The search algorithm applied on top of the underlying content is a separate problem from the quality or origin of the underlying content, in aggregate.
I think this is about trustworthy content, not about a good search engine per se
But it's not necessarily trustworthy content, we had autogenerated listicles and keyword list sites before ChatGPT.
Sure, but I think that the underlying assumption is that, after the public release of ChatGPT, the amount of autogenerated content on the web became significantly bigger. Plus, the auto-generated content was easier to spot before.
ML and AI killed it between 2011-2016 somewhere. https://en.wikipedia.org/wiki/Dead_Internet_theory
Honestly the biggest failing is just SEO spam sites got too good at defeating the algorithm. The amount of bloody listicles or quora nonsense or backlink farming websties that come up in search is crazy.
I feel like google gave up the fight at some point. I think HN had some good articles that indicated that.
Certainly seems that way if you observed the waves of usability Google search underwent in the first 15 years. There was several distinct cycles where the results were great, then garbage, then great again. They would be flooded with SEO spam, then they would tweak and penalize the SEO spam heavily, then SEO would catch up.
The funny thing is that it seems like when they gave up it wasn't because some new advancement in the arms race. It was well before LLMs hit the scene. The SEO spam was still incredibly obvious to a human reader. Really seems like some data-driven approach demonstrated that surrendering on this front led to increased ad revenue.
For most commercial related terms, I suspect if you got rid of all “spanmy” results you would be left with almost nothing. No independent blogger is gonna write about the best credit card with travel points.
Sites like Credit Karma / NerdWallet exist. While I think they are rife with affiliate link nonsense and paid promotion masquerading as advice, I'm also pretty sure they have paid researchers and writers generating genuine content. Not sure that quite falls into the bucket of SEO blogspam.
I had a coworker who kept up a blog about random purchases she’d made, where she would earn some money via affiliate links. I thought it was horrendously boring and weird, and the money made was basically pocket change, but she seemed to enjoy it. You might be surprised, people write about all sorts of things.
I agree with your point, but you picked a poor example. Have you met any credit reward min-maxers?
This is bullshit the search engines want you to believe. It's trivial to detect sites that "defeat" the algorithm; you simply detect their incentives (ads/affiliate links) instead.
Problem is that no mainstream search engine will do it because they happen to also be in the ad business and wouldn't want to reduce their own revenue stream.
Afaik they did not lost the fight. They stopped trying, because it was good for short term earnings
Yes, this is true. It was revealed in Google emails released during antitrust hearings. Google absolutely made a deliberate decision to enshittify their search results for short term gains.
Though maybe it's a long term gain. I know many normal (i.e. non-IT) people who've noticed the poor search results, yet they continue to use Google search.
> I am not 100% certain why Google decided to ruin google search.
Ask Prabhakar Raghavan. Bet he knows.
Significant changes were made to Google and YouTube in 2016 and 2017 in response to the US election. The changes provided more editorial and reputation based filtering, over best content matching.
Goodhart's law applies to links, too. Google monetized them and destroyed their value as a signal.
The problem is that before Nov 30, 2022 we also had plenty of human-generated slop bearing down on the web. SEO content specifically.
the main theory is that with bad results you have to search more and get more engaged in ads so more revenue for google. Its enshitification
somebody said once we are mining "low-background tokens" like we are mining low-background (radiation) steel post WW2 and i couldnt shake the concept out of my head
(wrote up in https://www.latent.space/i/139368545/the-concept-of-low-back... - but ironically repeating something somebody else said online is kinda what i'm willingly participating in, and it's unclear why human-origin tokens should be that much higher signal than ai-origin ones)
Low background steel is no longer necessary.
"...began to fall in 1963, when the Partial Nuclear Test Ban Treaty was enacted, and by 2008 it had decreased to only 0.005 mSv/yr above natural levels. This has made special low-background steel no longer necessary for most radiation-sensitive uses, as new steel now has a low enough radioactive signature."
https://en.wikipedia.org/wiki/Low-background_steel
Interesting. I guess that analogously, we might find that X years after some future AI content production ban, we could similarly start ignoring the low background token issue?
We used a rather low number of atmospheric bombs, while we are carpet bombing the internet every day with AI marketing copy.
The eternal September has finally ended. We've now entered the AI winter. It promises to be long, dark, and full of annoyances.
"Winter" in AI (or cryptocurrency, or any at all) ecosystems denote a period of low activity, and a focus on fundamentals instead of driven by hype.
What we're seeing now is something more like the peak of summer. If it ends up being a bubble, and it burtst, some months after that will be "AI Winter" as investors won't want to continue chucking money at problems anymore, and it'll go back to "in the background research" again, as it was before.
It was a continuation of the nuclear analogy, a nuclear winter following a large scale nuclear exchange.
Also that winter comes after September (fall)
We're bombing the internet into extinction. But we were way before AI. It got real bad during the SEO/monetization phase. AI was just the final nail.
What’s the half-life of a viral meme?
[dead]
Can't wait, in fifty years we will have our data clean again.
Since synthetic data for training is pretty ubiquitous seems like a novelty
that was me swyx
Multiple people have coined the idea repeatedly, way before you. The oldest comment on HN I could find was in December 2022 by user spawarotti: https://news.ycombinator.com/item?id=33856172
Here is an even older comment chain about it from 2020: https://news.ycombinator.com/item?id=23895706
Apparently, comparing low-background steel to pre-LLM text is a rather obvious analogy.
As well as that people often do think alike.
If you have a thought, it's likely it's not new.
Oh wow, great find! That’s really early days.
i didnt claim to invent it.
i claimed swyx heard it through me - which he did
you did!!
every human generation built upon the slop of the previous one
but we appreciated that, we called it "standing on the shoulders of giants"
> we called it "standing on the shoulders of giants"
We do not see nearly so far though.
Because these days we are standing on the shoulders of giants that have been put into a blender and ground down into a slippery pink paste and levelled out to a statistically typical 7.3mm high layer of goo.
The secret is you then have to heat up that goo. When the temperature gets high enough things get interesting again.
Just simulate some evolution here and there.
You get Flubber?
This sounds like an Alan Kay quote. He meant that in regards to useful inventions. AI-generated spam just decreases the quality. We'd need a real alternative to this garbage from Google but all the other search engines are also bad. And their UI is also horrible - not as bad as Google, but also bad. Qwant just tries to copy/paste Google for instance (though interestingly enough, sometimes it has better results than Google - but also fewer in general, even ignornig false positive results).
Deep Research reports I think are above average internet quality, they collect hundreds of sources, synthesize and contrast them & provide backlinks. Almost like a generative wikipedia.
I think all we can expect from internet information is a good description of the distribution of materials out there, not truth. This is totally within the capabilities of LLMs. For additional confidence run 3 reports on different models.
We have two optimization mechanisms though which reduce noise with respect to their optimization functions: evolution and science. They are implicitly part of "standing on the shoulders of giants", you pick the giant to stand on (or it is picked for you).
Whether or not the optimization functions align with human survival, and thus our whole existence is not a slop, we're about to find out.
That's because the things we built on weren't slop
You may have one point.
The industrial age was built on dinosaur slop, and they were giant.
Nothing conveys better the idea of a solid foundation to build upon than the word ‘slop’.
You can't build on slop because slop is a slippery slope
Maybe we'll have to slurp the slop so we don't slip on the slope.
[dead]
How to make fire or kill a woolly mammoth was not slop come on.
There's a reason this is comedy:
While this is religious: Humans build not on each other's slop, but on each other's success.Capitalism, freedom of expression, the marketplace of ideas, democracy: at their best these things are ways to bend the wisdom of the crowds (such as it is) to the benefit of all; and their failures are when crowds are not wise.
The "slop" of capitalism is polluted skies, soil and water, are wage slaves and fast fashion that barely lasts one use, and are the reason why workplace health and safety rules are written in blood. The "slop" of freedom of expression includes dishonest marketing, libel, slander, and propaganda. The "slop" of democracy is populists promising everything to everyone with no way to deliver it all. The "slop" of the marketplace of ideas is every idiot demanding their own un-informed rambling be given the same weight as the considered opinions of experts.
None of these things contributed our social, technological, or economic advancement, they are simply things which happened at the same time.
AI has stuff to contribute, but using it to make an endless feed of mediocrity is not it. The flood of low-effort GenAI stuff filling feeds and drowning signal with noise, as others have said: just give us your prompt.
the only structure you can build with slop is a burial mound
What is unhardened concrete but slop?
Because the pyramids, the theory of general relativity and the Linux kernel are all totally comparable to ChatGPT output. /s
Why is anybody still surprised that the AI bubble made it that big?
for every theory of relativity the is the religious non-sense and superstitions of the medieval ages or today
> for every theory of relativity the is the religious non-sense and superstitions of the medieval ages or today
If Einstein came up with relativity by standing on "the religious non-sense and superstitions of the medieval ages," you'd have a point.
If we have billions of AIs one might pick the correct learning materials. Same way human Einstein did.
I know we're just pointlessly abusing the analogy here, but... mediaeval cathedrals are a greater work of artifice than pyramids.
Projects like this remind me of a plot point in the Cyberpunk 2077 game universe. The "first internet" got too infected with dangerous AIs, so much so that a massive firewall needed to be built, and a "new" internet was built that specifically kept out the harmful AIs.
(Or something like that: it's been awhile since I played the game, and I don't remember the specific details of the story.)
It makes me wonder if a new human-only internet will need to be made at some point. It's mostly sci-fi speculation at this point, and you'd really need to hash out the details, but I am thinking of something like a meatspace-first network that continually verifies your humanity in order for you to retain access. That doesn't solve the copy-paste problem, or a thousand other ones, but I'm just thinking out loud here.
The problem really is that it is impossible to verify that the content someone uploads came from their mind and not a computer program. And at some point probably all content is at least influenced by AI. The real issue is also not that I used chatgpt to look up a synonym or asked a question before writing an article, the problem is when I copy paste the content and claim I wrote it.
> the problem is when I copy paste the content and claim I wrote it
Why is this the problem and not the reverse - using AI without adding anything original into the soup? I could paraphrase an AI response in my own words and it will be no better. But even if I used AI, if it writes my ideas, then it would not be AI slop.
> The problem really is that it is impossible to verify that the content someone uploads came from their mind and not a computer program.
Er...digital id.
Ignoring the privacy and security issues for a moment, how would having a digital ID prove that the blog post I put on my site came only out of my own mind and I didn't use an LLM for it?
There doesn't need to be any difference in treatment between AI slop and human slop. The point isn't to keep AI out - it's to keep spam and slop out. It doesn't matter whether it's produced by a being made of carbon or silicon.
If someone can consistently produce high-quality content with AI assistance, so be it. Let them. Most don't, though.
I share an opinion with Nick Bostrom, once a civilization disrupting idea (like LLMs) is pulled out of the bag, there is no putting it back. People in isolation will recreate it simply because it's now possible. All we can do is adapt.
That being said, the idea of a new freer internet is reality.. Mastodon is a great example. I think private havens like discord/matrix/telegram are an important step on the way.
how does one keep ai out of private havens? thorough verification? is that the future? private havens on platforms?
In person web of trust in order to join any private community. It'll suck and be hard in the beginning, but once you reach a threshold, it'll be OK. Ban entire trees of users when you discover bots/puppets, to set an example.
So we expect either 1. people using AI and copy pasting into the human-only network, or 2. other people claiming your text sounds like AI and ostracizing you for no good reason. It won't be a happy place - I know from anti-generative AI forums.
Yep and then you deperson them
There were also similar plot points mentioned in Peter Watts' Starfish trilogy, and Neal Stephenson's Anathem.
> a new human-only internet
Only if those humans don't take their leads from AI. If they read AI and write, not much benefit.
Arguably this is already happening with much human-to-human interactions moving to private groups on Signal, WhatsApp, Telegram, etc.
Somewhat related, the leaderboard of em-dash users on HN before ChatGPT:
https://www.gally.net/miscellaneous/hn-em-dash-user-leaderbo...
They should include users who used a double hyphen, too -- not everyone has easy access to em dashes.
That would false positive me. I have used double dashes to delimit quote attribution for decades.
Like this:
"You can't believe everything you read on the internet." -- Abraham Lincoln, personal correspondence, 1863
That's literally a standard use of em-dash being approximated by a double hyphen, though.
Double-hyphen is an en-dash. Triple-hyphen is an em-dash.
Double hyphen is replaced in some software with an en-dash (and in those, a triple hyphen is often replaced with an em-dash), and in some with an em-dash; its usually used (other than as input to one of those pieces of software) in places where an em-dash would be appropriate, but in contexts where both an em-dash set closed and an en-dash set open might be used, it is often set open.
So, it’s not unambiguously s substitute for either is essentially its own punctuation mark used in ASCII-only environments with some influence from both the use of em-dashed and that of en-dashes in more formal environments.
Does AI use double hyphens? I thought the point was to find who wasn't AI that used proper em dashes.
Anytime I do this — and I did it long before AI did — they are always em dashes, because iOS/macOS translates double dashes to em dashes.
I think there may be a way to disable this, but I don’t care enough to bother.
If people want to think my posts are AI generated, oh well.
> Anytime I do this — and I did it long before AI did — they are always em dashes
It depends if you put the space before and after the dashes--that, to be clear, are meant to be there--or if you don't.
I cannot remember ever reading a book where there was a space around the dashes.
That depends on the language — whereas German puts spaces around —, English afaik usually doesn’t.
Similarly, French puts spaces before and after ? ! while English and German only put spaces afterwards.
[EDIT: I originally wrote that French treats . , ! ? specially. In reality, french only treats ? and ! specially.]
In German you use en-dashes with spaces, whereas in English it’s em-dashes without spaces. Some people dislike em-dashes in English though and use en-dashes with spaces as well.
In British English en-dashes with spaces is more common than em-dashes without spaces, I think, but I don't have any data for that, just a general impression.
In English, typically em-dashes are set without spaces or with thin spaces when used to separate appositives/parentheticals (though that style isn't universal even in professional print, there are places that aet them open, and en-dashes set open can also be used in this role); when representating an interruption, they generally have no space before but frequently have space following. And other uses have other patterns.
> whereas in English it’s em-dashes without spaces
Didn't know! Woot, I win!
Why does AI have a preference for doing it differently?
French doesn't put one before the period.
french does "," and "." like the british and germans the rest is space befor space after
Technically, there are supposed to be hair spaces around the dashes, not regular spaces. They're small enough to be sometimes confused for kerning.
Em dashes used as parenthetical dividers, and en dashes when used as word joiners, are usually set continuous with the text. However, such a dash can optionally be surrounded with a hair space, U+200A, or thin space, U+2009 or HTML named entities   and   These spaces are much thinner than a normal space (except in a monospaced (non-proportional) font), with the hair space in particular being the thinnest of horizontal whitespace characters.
https://en.wikipedia.org/wiki/Whitespace_character#Hair_spac...
Typographers usually add space to the left side of the following marks:
And they usually add space to the right of these: https://www.smashingmagazine.com/2020/05/micro-typography-sp...1. (letterpress typography) A piece of metal type used to create the narrowest space. 2. (typography, US) The narrowest space appearing between letters and punctuation.
https://en.wiktionary.org/wiki/hair_space
Now I'd like to see how the metal type looks like, but ehm... it's difficult googling it. Also a whole collection of space types and what they're called in other languages.
What, no love for our friend the en-dash?
- vs – vs —
I once spent a day debugging some data that came from an English doc written by someone in Japan that had been pasted into a system and caused problems. Turned out to be an en-dash issue that was basically invisible to the eye. No love for en-dash!
Similar.
Compiler error while working on some ObjC. Nothing obviously wrong. Copy-pasted the line, same thing on the copy. Typed it out again, no issue with the re-typed version. Put the error version and the ok version next to each other, apparently identical.
I ended up discovering I'd accidentally lent on the option key while pressing the "-"; Monospace font, Xcode, m-dash and minus looked identical.
This issue also exists with (so called) "smart" quotes.
Which, the iOS keyboard “helpfully” uses for you.
Pretty much the first thing I turn off on a new laptop (it's in the keyboard settings on iOS too.)
Especially when you're sending some quick scratch code in a slack message.
There is also the difference in using space around em-dashes.
Oof, I feel like you'll accidentally capture a lot of getopt_long() fans. ;)
Excluding those with asymmetrical whitespace around might be enough
Apparently, it's not only em-dash that's distinctive. I've went through comments of the leader, and spot he also uses the backtick "’" instead of the apostrophe.
Just to be clear this is done automatically by macOS or iOS browsers when configured properly.
I (~100 in the leaderboard, regardless of how you sort) also frequently use ’ (unicode apostrophe) instead of ' :D
Amazing! But no love for en dashes?
You don’t need an extension to do this. Simply add a “before:” search filter to your search query, eg - https://www.google.com/search?q=Happiness+before%3A2022
For a while I've been saying it's a pity we hadn't been regularly trusted-timestamping everything before that point as a matter of course.
besides for training future models, is this really such a big deal? most of the AI-gened text content is just replacing content-farm SEO-spam anyway. the same stuff that any half-awares person wouldn't have read in the past is now slightly better written, using more em dashes and instances of the word "delve". if you're consistently being caught out by this stuff then likely you need to improve your search hygiene, nothing so drastic as this
the only place I've ever had any issue with AI content is r/chess, where people love to ask ChatGPT a question and then post the answer as if they wrote it, half the time seemingly innocently, which, call me racist, but I suspect is mostly due to the influence of the large and young Indian contingent. otherwise I really don't understand where the issue lies. follow the exact same rules you do for avoiding SEO spam and you will be fine
A colleague sent me a confident ChatGPT formatted bug report.
It misidentified what the actual bug was.
But the tone was so confident, and he replied to my later messages using chat gpt itself, which insisted I was wrong.
I don't like this future.
It's not the future. Tell him not to do that. If it happens again, bring it to the attention of his manager. Because that's not what he's being paid for. If he continues to do it, that's grounds for firing.
What you're describing is not the future. It's a fireable offense.
I have dozens of these over the years - many of the people responsible have "Head of ..." or "Chief ..." job titles now.
Did you call his ass out for being lazy and wasting your time?
> the only place I've ever had any issue with AI content is r/chess, where people love to ask ChatGPT a question and then post the answer as if they wrote it, half the time seemingly innocently
Some of the science, energy, and technology subreddits receive a lot of ChatGPT repost comment. There are a lot of people who think they’ve made a scientific or philosophical breakthrough with ChatGPT and need to share it with the world.
Even the /r/localllama subreddit gets constant AI spam from people who think they’ve vibecoded some new AI breakthrough. There have been some recent incidents where someone posted something convincing and then others wasted a lot of time until realizing the code didn’t accomplish what the post claimed it did.
Even on HN some of the “Show HN” posts are AI garbage from people trying to build portfolios. I wasted too much time trying to understand one of them until I realized they had (unknowingly?) duplicated some commits from upstream project and then let the LLM vibe code a README that sounded like an amazing breakthrough. It was actually good work, but it wasn’t theirs. It was just some vibecoding tool eventually arriving at the same code as upstream and then putting the classic LLM written, emoji-filled bullet points in the README
In the past, I'd find one wrong answer and I could easily spot the copies. Now there's a dozen different sites with the same wrong answer, just with better formatting and nicer text.
The trick is to only search for topics where there are no answers, or only one answer leading to that blog post you wrote 10 years ago and forgot about.
> besides for training future models, is this really such a big deal? most of the AI-gened text content is just replacing content-farm SEO-spam anyway.
Yes, it is because of the other side of the coin. If you are writing human-generated, curated content, previously you would just do it in your small patch of Internet, and probably SEs (Google...) will pick it up anyway because it was good quality content. You just didn't care about SEO-driven shit anyway. Now you nicely hand-written content is going to be fed into LLM training and it's going to be used - whatever you want it or not - in the next generation of AI slop content.
It's not slop if it is inspired from good content. Basically you need to add your original spices into the soup to make it not slop, or have the LLM do deep research kind of work to contrast among hundreds of sources.
Slop did not originate from AI itself, but from the feed ranking Algorithm which sets the criteria for visibility. They "prompt" humans to write slop.
AI slop is just an extension of this process, and it started long before LLMs. Platforms optimizing for their own interest at the expense of both users and creators is the source of slop.
SEO-spam was often at least somewhat factual and not complete generated garbage. Recipe sites, for example, usually have a button that lets you skip the SEO stuff and get to the actual recipe.
Also, the AI slop is covering almost every sentence or phrase you can think of to search. Before, if I used more niche search phrases and exact searches, I was pretty much guaranteed to get specific results. Now, I have to wade through pages and pages of nonsense.
Yes it is a big deal. I cant find new artists without having a fear of their art being AI generated, same for books and music. I also cant post my stuff to the internet anymore because I know its going to be fed into LLM training data. The internet is dead to me mostly and thankfully I lost almost all interest of being on my computer as much as I used to be.
Yes indeed, it is a problem. Now the old good sites have turned into AI-slop sites because they can't fight the spammers by writing slowly with humans.
I really thought this was going to be the Dewey Decimal system. Exclude sources from this century. It’s the only way to be sure.
The low-background steel of the internet
https://en.wikipedia.org/wiki/Low-background_steel
As mentioned half a year ago at https://news.ycombinator.com/item?id=44239481
As mentioned 7 months ago https://news.ycombinator.com/item?id=43811732
The other day I was researching with ChatGPT.
* ChatGPT hallucinated an answer
* ChatGPT put it in my memory, so it persisted between conversations
* When asked for a citation, ChatGPT found 2 AI created articles to back itself up
It took a while, but I eventually found human written documentation from the organization that created the technical thingy I was investigating.
This happens A LOT for topics on the edge of knowledge easily found on the Web. Where you have to do true research, evaluate sources, and make good decisions on what you trust.
AI reminds me of combing through stackoverflow answers. The first one might work... Or it might not. Try again, find a different SO problem and answer. Maybe third times the charm...
Except it's all via the chat bot and it isn't as easy to get it to move off of a broken solution.
Simple solution - run the same query on 3 different LLMs with different search integrations, if they concur chances of hallucination are low.
Or you could just… not use LLMs
For images, https://same.energy is a nice option that, being abandoned but still functioning since a few years, seems to naturally not have crawled any AI images. And it’s all around a great product.
How about a search engine that only returns what you searched for, and not a million other unrelated things that it hopes you might like to buy?
This goes for you, too, website search.
google results were already 90% SEO crap long before ChatGPT
just use Kagi and block all SEO sites...
How do we (or Kagi) know which ones are "SEO sites"? Is there some filter list or other method to determine that?
If you took Google of 2006, and used that iteration of the pagerank algorithm, you’d probably not get most of the SEO spam that’s so prevalent in Google results today.
It seems like a mixture of heuristics, explicit filtering and user reports.
https://help.kagi.com/kagi/features/slopstop.html
That's specifically for AI generated content, but there are other indicators like how many affiliate links are on the page and how many other users have downvoted the site in their results. The other aspect is network effect, in that everyone tunes their sites to rank highly on Google. That's presumably less effective on other indices?
Most of college courses and school books haven't changed in decades. Some reputed college keep courses for Pascal and Fortran instead of Python or Java, just because, it might affect their reputation of being classical or pure or to match their campus buildings style.
Or because the core knowledge stay the same no matter how it is expressed.
Why use this when you can use the before: syntax on most search engines?
doesn't actually do anything anymore in Google or bing.
Searching Google for
chatgpt
vs
chatgpt before:2022-01-01
give me quite different results. In the 2nd query, most results have a date listed next to them in the results page, and that date is always prior to 2022. So the date filtering is "working". However, most of the dates are actually Google making a mistake and misinterpreting some unimportant date it found on the page as the date the page was created. At least one result is a Youtube video posted before 2022, that edited its title after Chatgpt was released to say Chatgpt.
Disclosure: I work at Google, but not on search.
I noticed AI-generated slop taking over google search results well before ChatGPT. So I don't agree with the premise on this site that you can be "you can be sure that it was written or produced by the human hand."
FWIW Mojeek (an organic search engine in the classic sense) can do this with the before: operator.
https://www.mojeek.com/search?q=britney+spears+before%3A2010...
I didn’t know “eccentric engineering” was even a term before reading this. It’s fascinating how much creativity went into solving problems before large models existed. There’s something refreshing about seeing humans brute force the weird edges of a system instead of outsourcing everything to an LLM.
It also makes me wonder how future kids will see this era. Maybe it will look the same way early mechanical computers look to us. A short period where people had to be unusually inquisitive just to make things work.
Maybe like how I view my dad and the punchcard era: cool and endearing that he went through that, but thankful that I don’t have to.
It doesn't really work. I tried my website and it shows up, while definitely being built after 2023. There is a mistake in the metadata of the page that shows it as from 2011.
https://audiala.com/changelog
Just the other evening, as my family argued about whether some fact was or was not fake, I detached from the conversation and began fantasizing about whether it was still possible to buy a paper encyclopedia.
You should call it Predecember, referring to the eternal December.
September?
ChatGPT was released exactly 3 years ago (on the 30th of November) so December it is in this context.
surely that would be eternal November then
everything is dead after november passes
No, being released on Nov 30th means November was still before the slop era.
In the end the analogy doesn't really work, because 'eternal September' referred to what used to be a regular, temporary thing (an influx of noobs disrupting the online culture, before eventually leaving or assimilating) becoming the new normal. 'Eternal {month associated with ChatGPT}' doesn't fit because LLM-generated content was never a periodic phenomenon.
AI R&D certainly was periodic. Good thing we put a stop to that!
to be honest, GPT-3, which was pretty solid and extremely capable of producing webslop, had been out for a good while before ChatGPT, and GPT-2 even had been being used for blogslop years before. maybe ChatGPT was the beginning of when the public became aware of it, but it was going on well beforehand. and, as the sibling commenter points out, the analogy doesn't quite fit structurally either
Yes, and this site is for everything before the slop era, hence eternal November.
aka 0 December
Does this filter out traditional SEO blogfarms?
Yeah, might prefer AI-slop to marketing-slop.
They are the same. I was looking for something and tried AI. It gave me a list of stuff. When I asked for its sources, it linked me to some SEO/Amazon affiliate slop.
All AI is doing is making it harder to know what is good information and what is slop, because it obscures the source, or people ignore the source links.
I've started just going to more things in person, asking friends for recommendations, and reading more books (should've been doing all of these anyway). There are some niche communities online I still like, and the fediverse is really neat, but I'm not sure we can stem the Great Pacific Garbage Patch-levels of slop, at this point. It's really sad. The web, as we know and love it, is well and truly dead.
> This is a search tool that will only return content created before ChatGPT's first public release on November 30, 2022.
How does it do that? At least Google seems to take website creation date metadata at face value.
I hope there's an uncensored version of the Internet Archive somewhere, I wish I could look at my website ca. 2001, but I think it got removed because of some fraudulent DMCA claim somewhere in the early 2010s.
ChatGPT also returns content only created before ChatGPT release, which is why I still have to google damn it!
Is that still the case? And even if so how is it going to avoid keeping it like that in the future? Are they going to stop scraping new content, or are they going to filter it with a tool which recognizes their own content?
it's a known problem in ML, I think grok solved it partially and chatGPT uses another model on top to search web like suggested below. Hence MLOps field appeared, to solve models management
I find it a bit annoying to navigate between hallucinations and outdated content. Too much invalid information to filter out.
Click the globe icon below the input box to enable web searching by ChatGPT.
so it's a filter by date and you chose the chatgpt's public release?
The real gold is content created before the internet!
What kind of heuristics does it use to determine age? a lot of content on Google actually backdates for some reason... presumably some sort of SEO scam?
The slop is getting worse, as there is so much llm generated shit online, now new models are getting trained on the slop. Slop training slop, and slop. We have gone full circle just in a matter of a few years.
I was replaying Cyberpunk 2077 and trying to think of all the ways one might have dialed up the dystopia to 11 (beyond what the game does). And pervasive AI slop was never on my radar. Kinda reminds me of the foreword in Neuromancer bringing attention to the fact the book was written before cellphones became popular. It's already fucking with my mind. I recently watched Frankenstein 2025 and 100% thought gen ai had a role in the CGI only to find out the director hates it so much he rather die than use it. I've been noticing little things in old movies and anime where I thought to myself (if I didn't know this was made before gen ai, I would have thought this was generated for sure). One example (https://www.youtube.com/watch?v=pGSNhVQFbOc&t=412) cityscape background in this a outro scene with buildings built on top of buildings gave me ai vibes (really the only thing in this whole anime), yet this came out ~1990. So I can already recognize a paranoia / bias in myself and really can't reliably tell what's real.. Probably also other people have this and why some non-zero number of people always thinks every blog post that comes out was written by gen ai.
I had the same experience, watching a nature documentary on a streaming service recently. It was... not so good, at least at the beginning. I was wondering if this was a pilot for AI generated content on this streaming service.
Actually, it came out in 2015 and was just low budget.
Not affiliated, but I've been using kagi's date range filter to similar effect. The difference in results for car maintenance subjects is astounding (and slightly infuriating).
For that purpose I do not update my book on LeanPub about Ruby. I just know one day people gonna read it more, because human-written content would be gold.
"This browser extension uses the Google search API to only return content published before Nov 30th, 2022 so you can be sure that it was written or produced by the human hand."
In hindsight, that would've been a real utility use case for NFTs. A decentralized cryptographic prove that some content existed in a particular form at a particular moment.
This is such a great idea
Of course my first thought was: Let's use this as a tool for AI searches (when I don't need recent news).
Something generated by humans does not mean high quality.
Yes, but AI-generated is always low quality so it makes sense to filter it out.
Grokipedia would like a word
I wouldn't say always... Especially because you probably only noticed the bad slop. Usually it is crap though.
At least when reading a human-made material you can spot author's uncertainty in some topics. Usually, when someone doesn't have knowledge of something, he doesn't try to describe that. AI, however, will try to convince you that pigs can fly.
[dead]
Interesting concept. As a side benefit this would allow you to make steady progress fighting SEO slop as well, since there can be no arms race if you are ignoring new content.
You could even add options for later cutoffs… for example, you could use today’s AIs to detect yesterday’s AI slop.
technically you can ask chatgpt to return the same result by asking it to filter by year
I mean I get it, but it seems a bit silly. What's next - an image search engine that only returns images created before photoshop?
I'm grateful that I published a large body of content pre-ChatGPT so that I have proof that I'm not completely inarticulate without AI.
We now need an extension to hide 3 years of the internet because it was written by robots. This timeline is undefeated.
I don't know how this works under the hood but it seems like no matter how it works, it could be gamed quite easily.
True, but there's probably many ways to do this and unless AI content starts falsifying tons of its metadata (which I'm sure would have other consequences), there's definitely a way.
Plus other sites that link to the content could also give away it's date of creation, which is out of the control of the AI content.
I have heard of a forum (I believe it was Physics Forums) which was very popular in the older days of the internet where some of the older posts were actually edited so that they were completely rewritten with new content. I forget what the reasoning behind it was, but it did feel shady and unethical. If I remember correctly, the impetus behind it was that the website probably went under new ownership and the new owners felt that it was okay to take over the accounts of people who hadn't logged on in several years and to completely rewrite the content of their posts.
I believe I learned about it through HN, and it was this blog post: https://hallofdreams.org/posts/physicsforums/
It kind of reminds me of why some people really covet older accounts when they are trying to do a social engineering attack.
> website probably went under new ownership
According to the article, it was the founder himself who was doing this.
If it's just using Google search "before <x date>" filtering I don't think there's a way to game it... but I guess that depends on whether Google uses the date that it indexed a page versus the date that a page itself declares.
Date displayed in Google Search results is often the self-described date from the document itself. Take a look at this "FOIA + before Jan 1, 1990" search: https://www.google.com/search?q=foia&tbs=cdr:1,cd_max:1/1/19...
None of these documents were actually published on the web by then, incl., a Watergate PDF bearing date of Nov 21, 1974 - almost 20 years before PDF format got released. Of course, WWW itself started in 1991.
Google Search's date filter is useful for finding documents about historical topics, but unreliable for proving when information actually became publicly available online.
Are you sure it works the same way for documents that Google indexed at the time of publication? (Because obviously for things that existed before Google, they had to accept the publication date at face value).
Yes, it works the same way even for content Google indexed at publication time. For example, here are chatgpt.com links that Google displays as being from 2010-2020, a period when Google existed but ChatGPT did not:
https://www.google.com/search?q=site%3Achatgpt.com&tbs=cdr%3...
So it looks like Google uses inferred dates over its own indexing timestamps, even for recently crawled pages from domains that didn't exist during the claimed date range.
Interesting, thanks.
I wonder why they do that when they could use time of first indexing instead.
"Gamed quite easily" seems like a stretch, given that the target is definitionally not moving. The search engine is fundamentally searching an immutable dataset that "just" needs to be cleaned.
How? They have an index from a previous date and nothing new will be allowed since that date? A whole copy of the internet? I don't think so.... I'm guessing, like others, it's based on the date the user/website/blog lists in the post. Which they can change at any time.
Yes they do. It's called common crawl, and is available from your chosen hyperscaler vendor.
Can't we just append "before:2021-01-01" to Google?
I use this to find old news articles for instance.
This tool has no future. We have that in common with it, I fear.
What we really need to do is build an AI tool to filter out the AI automatically. Anybody want to help me found this company?
[dead]
[dead]
[dead]
[dead]
[dead]
[flagged]
Is it really here to stay? If the wheels fells off the investment train and ChatGPT etc. disappeared tomorrow, how many people would be running inference locally? I suspect most people either wouldn't meet the hardware requirements or would be too frustrated with the slow token generation to bother. My mom certainly wouldn't be talking to it anymore.
Remember that a year or two ago, people were saying something similar about NFTs —that they were the future of sharing content online and we should all get used to it. Now, they still might exist, it's true, but they're much less pervasive and annoying than they once were.
>that they were the future of sharing content online
nobody was saying that
People right here on HN were adamant my next house would be purchased using an NFT. And similar absurd claims about blockchain before that.
And it's at least interesting that it's a lot of the same people pitching AI now who were all so excited about blockchain and NFTs and the metaverse.
Maybe you don't love your mom enough to do this, but if ChatGPT disappeared tomorrow and it was something she really used and loved, I wouldn't think twice before buying her a rig powerful enough to run a quantized downlodable model on, though I'm not current on which model or software would be the best for her purposes. I get that your relationship with your mother, or your financial situation might be different though.
> Maybe you don't love your mom enough to do this
I actually love my mom enough not to do this.
Maybe you should talk more to your mother so she does not need a imaginary friend.
Please tell me this is satire.
Is just your average AI user. Too much "your are right" makes them detached from reality.
No man, this must be satire.
> I get that your relationship with your mother, or your financial situation might be different though.
Fucking hell
I don't agree it is 'almost worse' than the slop but it sure can be annoying. On one hand it seems even somewhat positive that some people developed a more critical attitude and question things they see, on the other hand they're not critical enough to realize their own criticism might be invalid. Plus I feel bad for all the resources (both human and machine) wasted on this. Like perfectly normal things being shown, but people not knowing anything about the subject chiming in to claim that it must be AI because they see something they do not fully understand.
"You know what's almost worse than something bad? People complaining about something bad."
Shrug. Sure.
Point still stands. It’s not going anywhere. And the literal hate and pure vitriol I’ve seen towards people on social media, even when they say “oh yeah; this is AI”, is unbelievable.
So many online groups have just become toxic shitholes because someone once or twice a week posts something AI generated
The entire US GDP for the last few quarters is being propped up by GPU vendors and one singular chatbot company, all betting that they can make a trillion dollars on $20-per-month "it's not just X, it's Y" Markov chain generators. We have six to 12 more months of this before the first investor says "wait a minute, we're not making enough money", and the house of cards comes tumbling down.
Also, maybe consider why people are upset about being consistently and sneakily lied to about whether or not an actual human wrote something. What's more likely: that everyone who's angry is wrong, or that you're misunderstanding why they're upset?
I feel like this is the kind of dodgy take that'll be dispelled by half an hour's concerted use of the thing you're talking about
short of massive technological regression, there's literally never going to be a situation where the use of what amounts to a second brain with access to all the world's public information is not going to be incredibly marketable
I dare you to try building a project with Cursor or a better cousin and then come back and repeat this comment
>What's more likely: that everyone who's angry is wrong, or that you're misunderstanding why they're upset?
your patronising tone aside, GP didn't say everyone was wrong, did he? if he didn't, which he didn't, then it's a completely useless and fallacious rhetorical. what he actually said was that it's very common. and, factually, it is. I can't count the number of these type of instagram comments I've seen on obviously real videos. most people have next to no understanding of AI and its limitations and typical features, and "surprising visual occurrence in video" or "article with correct grammar and punctuation" are enough for them to think they've figured something out
> I dare you to try building a project with Cursor or a better cousin and then come back and repeat this comment
I always try every new technology, to understand how it works, and expand my perspective. I've written a few simple websites with Cursor (one mistake and it wiped everything, and I could never get it to produce any acceptable result again), tried writing the script for a YouTube video with ChatGPT and Claude (full of hallucinations, which – after a few rewrites – led to us writing a video about hallucinations), generated subtitles with Whisper (with every single sentence having at least some mistake) and finally used Suno and ChatGPT to generate some songs and images (both of which were massively improved once I just made them myself).
Whether Android apps or websites, scripts, songs, or memes, so far AI is significantly worse at internet research and creation than a human. And cleaning up the work AI did always ended up being taking longer just doing it myself from scratch. AI certainly makes you feel more productive, and it seems like you're getting things done faster, even though it's not.
Fascinatingly, as we found out from this HN post Markov chains don't work when scaled up, for technical reasons, so that whole transformers thing is actually necessary for this current generation of AI.
https://news.ycombinator.com/item?id=45958004
This kind of pressure is good actually, because it helps fighting against “lazy AI use” while letting people use AI in addition to their own brain.
And that's a hood thing because I much as I like LLMs as a technology, I really don't want people blindly copy-pasting stuff from it without thinking.
What isn't going anywhere? You're kidding yourself if you think every single place AI is used will withstand the test of time. You're also kidding yourself if you think consumer sentiment will play no part in determining which uses of AI will eventually die off.
I don't think anyone seriously believes the technology will categorically stop being used anytime soon. But then again we still keep using tech thats 50+ years old as it is.
[flagged]
Besides this being spam, the linked leaderboard is pre-chatgpt, it doesnt care about comments made now
[flagged]