Large language models like ChatGPT learn by ingesting vast amounts of text, including published articles and blog posts. Some creators want to opt out of this "scraping," but doing so may limit the reach of their ideas. Though attribution concerns are valid, engaging thoughtfully with AI systems could help creators build community, develop distinctive voices, and get paid for originating influential concepts. The rise of AI content may ultimately expand opportunities for human originality and income.
ChatGPT and other large language models “learn” by digesting incredible amounts of text. That includes articles, blog posts, and other human communications. In some cases, users can prompt ChatGPT to visit a specific web address and summarize or synthesize its contents to answer a question or generate a new text.
Many people are understandably wary or angry about this. And OpenAI, the creator of ChatGPT, recently announced a way for publishers to opt out of this type of content “scraping” or “crawling.”
But should they? If you’re producing content, would you like popular AI tools to understand it and learn from it, or would you rather keep these ideas for yourself — and for the humans and other crawlers who already read them?
These questions transcend the specific issue at hand. They touch on the broader relationship between human creativity and AI and of how ideas are produced, valued, and shared.
As Ben Thompson points out, before we even get to the heart of the matter, it’s important to point out the business rationale for OpenAI’s move. A cynical reading of the new policy is that OpenAI, the market leader, launched a hit product, ChatGPT, after scraping millions of web pages. Now that OpenAI is ahead, it introduces new standards that make it harder for everyone to follow the same playbook and catch up. To use an imperfect analogy, it would be akin to an artist that made millions from freely sampling other people’s work calling for an end to free sampling (while he continues to enjoy royalties from his sample-based songs).
Aside from Ben Thompson’s insight on OpenAI’s motives, it’s interesting to see his own business maneuvers in responses to AI scrapers. As one of the most prolific and highest-earning online writers, Ben’s approach may seem surprising:
“I don’t plan on blocking the GPTBot crawler or any other crawlers.... The high-minded reason is because my primary goal when I started [my site] Stratechery was to share my ideas; my only economic goal was to match my tech industry salary. Contributing to AI feels in line with that goal.
On a more personal level, I used to get pretty upset when other writers would blatantly steal my ideas or insights; I realized, though, that this reaction was counter to the aforementioned goal, and also, one of my insights was that ideas, once published, are economically worthless — I made them non-exclusive by choice! The truth is that Stratechery’s economic value is really not about what I have already published but rather my ability to keep publishing ideas that are interesting…”
His argument is compelling. Anyone writing online should assume their ideas will be immediately replicated and shared, often without attribution. Thus, one’s value proposition should focus not on extracting value from old ideas but on the promise of a constant flow of new ideas. As Ben put it elsewhere, online subscribers pay for “the regular delivery of well-defined value.”
Makes sense, no?
Well, it depends for whom.
To a cynical ear, Ben’s strategy rhymes with OpenAIs: Treating old content as worthless is great if you’re already ahead; an established writer with a paying audience and a track record of well-known ideas (perhaps) has little to fear from AI, especially if he can keep coming up with new ideas or riffs on his old ones. But what about everyone else? But if you start writing and AI scrapes your ideas before anyone even heard of you, and you operate in a world with even more free (AI-generated) content, it might be much harder to stand our and build an audience from scratch.
To be clear, I don’t think Ben was being cynical or manipulative in sharing his strategy. And he did acknowledge that he already reached his financial goals and was thus not fussed either way. Still, Ben’s position and interests stand in contrast to those of other writers — of most writers — who have yet to become as well-known and successful. The best among them might have their ideas “stolen” by AI without ever giving credit or converting impressed readers into paid subscribers.
I don’t think this will be AI’s actual impact, but it’s possible. We’ll explore a couple of other alternatives in a moment. But for now, the point is that when it comes to dealing with AI scrapers, what’s viable for Ben Thompson might not be viable for other creators. In fact, it might not even be viable for Ben himself.
The German pastor Martin Niemöller wrote a famous poem about Germany’s gradual descent into tyranny:
First they came for the Communists
And I did not speak out
Because I was not a Communist
Then they came for the Socialists
And I did not speak out
Because I was not a Socialist
Then they came for the trade unionists
And I did not speak out
Because I was not a trade unionist
Then they came for the Jews
And I did not speak out
Because I was not a Jew
Then they came for me
And there was no one left
To speak out for me
Everyone involved thought the fire wouldn’t reach them — until it did. Online publishing might be experiencing a descent of its own. Established writers with a paying audience and a track record of well-known ideas don’t have to worry about AI yet. But ultimately, it might become good enough to outshine (or flood) even the best of them or to cement new business models that will make old ones untenable.
Niemöller’s poem captures a feeling, but I do not want to imply there’s some sort of analogy between the rise of Nazism and the rise of AI scrapers. And I believe that good writers will always be able to make a living online — but this might be due to my own blindspots and the comfort of my position. Still, I see two ways in which scrapers will make it possible for more creators to make a living online.
Find the others
My friend Paul Millerd wrote a book, The Pathless Path, about an alternative way to live and work. He writes a newsletter on the same topic and runs an online community called Find the Others. What does “online community” mean? In Paul’s case, paying members enjoy access to exclusive content; they get to share their own thoughts with each other, participate in meetups, encourage and support each other’s journeys, and more. It all revolves around — or at least begins from — everyone’s interest in forging a path different from the traditional college-corporate-job-constant-thinking-about-money trajectory.
Paul is not the only one doing it. Many other creators pair content and community in a way that meets Ben Thompson’s benchmark of “the regular delivery of well-defined value.” In some cases, the “community” offering is less explicit but not less valuable. Popular blogs/newsletters like The Diff and Marginal Revolution have a lively comment section with readers that comment and debate each post in a way that exceeds the narrow definition of “commenting.” The readers of these publications are regulars, they are members, and they interact according to implicit and explicit rules and using terminology specific to their community.
All this is to say that niche content and community are having a moment. And it’s no coincidence that this moment coincides with the rise of generative AI. The better machines become at producing human-sounding content, the more human creators will push into offerings that only humans can provide. Bringing together and “managing” a community of like-minded people is something that humans are still much better at.
Humans are also better at having a specific point of view. Large language models like ChatGPT produces average, mild content by definition. Beyond their preset “filters” that aim not to offend anyone, their ideas are a mishmash of more content than humans can ever digest. Even if they scrape the whole internet… actually, because they scrape the whole internet, their answer to every question is the average of everything anyone ever said about a topic. Of course, AI can be more specific with some instruction or tweaks, but being pareve is its essence. In Yiddish, pareve (pronounced pa-rev) is food that is neither meat nor milk and goes well with anything; it’s food without a point of view.
AI will push more humans to become less pareve. This, in turn, will create an opportunity for more humans to make a living from online content.
The Long and Fat Tail
I mentioned Ben Thompson’s observation that web users would pay for “the regular delivery of well-defined value.” In the same article, Thompson provides a specific definition of “well-defined value” on the internet.
When asking people to pay, quality matters far more than quantity, and the ratio matters: a publication with 1 valuable article a day about a well-defined topic will more easily earn subscriptions than one with 3 valuable articles and 20 worthless ones covering a variety of subjects.
Based on the above, we can say that a paid newsletter is almost the exact opposite of a newspaper. It’s not “all the news that’s fit to print”; it’s “here’s only what you should know.” What Thompson calls “quality” is not an objective standard but a subjective one: Quality is whatever specific people find compelling. And, on the internet, it’s enough to produce content that appeals to 1,000 or even 100 people in order to make a living.
The online definition of “quality” goes even further. The word “content” is sometimes interchangeable with “news” or “information.” But online, the information quotient of a piece of content is lower and probably diminishing over time. What I mean is that what determines a paid creator's success is not the information they deliver but how it is delivered and even who delivers it. The creator’s literal and figurative voice, their background and journey, their aspirations, and even their weaknesses and mistakes — are part of the value proposition; they are part of the reason people want to hear from them rather than anyone else.
I see this play out in the online courses I teach. I have incredible participants from the world’s leading companies who sign up to learn from me about emerging technologies and business models. I am currently teaching a Practical AI course (the next cohort begins next week! Learn more here). Students enjoy my courses not because I’m the world’s greatest expert on AI; They enjoy my classes because I am not the world’s greatest expert. I am constantly learning new things as an outsider, and I strive to help other outsiders learn these things and figure out how to make use of them in their own work. I don’t want to “dump” content on people; I want them to get it, to feel comfortable with it, to be able to talk about it with their friends, and to know enough to keep learning and experimenting on their own.
But enough about me. The point is that online, in a world of abundant content, the definition of quality is increasingly subjective and specific. This means there’s room for many more people to write (and teach) different topics. So even if AI becomes really good at writing objectively correct but pareve content about various topics, there will be room for more humans to take this content and deliver them in a manner that appeals to other humans. In fact, there will be more room because (1) more people will have access to correct information to build their own content around and (2) as the world gets flooded with free, machine-made, pareve content, more people will hunger for actual “meat and milk.” And some of this meat and milk will include offline interactions — yet another thing humans are better at than machines.
And if that’s not enough optimism for one day, there’s at least one more way AI can help more creators make a living.
Whose line is it anyway?
Humans are worried that AI might steal their ideas. Search engines point people to websites that provide answers to their questions. As such, they help writers and publishers gain readers. But AI can answer questions directly. As such, it might use someone’s idea to answer a question without even sending any readers their way.
But it can also do the opposite. As AI scrapes everything humans ever wrote (or said in a video or podcast), it becomes pretty good at tracing ideas back to their source. For example, in 1964, Marshall McLuhan observed that once horses were no longer needed for work, they took on new jobs in entertainment and leisure activity. Saying so, he hinted at what might become of machines or even humans once many of them are no longer needed to produce basic necessities. I love this idea, but let’s say I didn’t know who actually said it — I just read it in some newsletter in 2023.
How can I figure out its origin?
I’ll ask ChatGPT. I gave it the following prompt: “Someone wrote something about horses becoming entertainers once they’re no longer needed for work. Who wrote about it first?” To this, ChatGPT replied:
The idea that horses would primarily become creatures of entertainment once they were no longer needed for work is commonly attributed to the historian and philosopher Lewis Mumford. In his 1934 work "Technics and Civilization", Mumford discussed the transition of the horse from a primary means of transportation and work to a more recreational and entertainment role as machinery and automobiles took over. He noted that horses, once a central part of urban life and the economy, were becoming more associated with leisure activities like horse racing and recreational riding.
So it wasn’t McLuhan after all. Or maybe it was. Looking into the book ChatGPT mentioned, I saw that Mumford indeed discussed the evolution of horses and other “technologies” and how they transition from toys to industrial tools and back to toys. But it did not explicitly frame it the way McLuhan did. Still, it was a pretty good effort on ChatGPT’s behalf.
In any case, my point is that AI is becoming pretty good at tracing the source of different ideas. It can make it easier to give credit to original thinkers and to discredit those who simply recycle other people’s ideas. I can envision a world in which creators get paid for fractions of their content that are cited and repeated elsewhere. In fact, there have been some experiments to this effect in the world of web3 over the past few years — but it’s no longer fashionable to write about these.
So, should you let AI scrape your site? If you do, it might repackage your insights without giving you credit. But if you don’t, AI might never find out that you were the source of all these great ideas to begin with. I’m going to let it scrape. What will you do?
Let me know in the comments below.
Thank you for reading. To harness AI for your own work, check out the upcoming Practical AI course. We’ll clarify all the jargon and hype and get hands-on experience with tools and workflows for marketing, financial analysis, and general business automation. Learn more here.
I research technology's impact on how humans live, work, and invest.
💡Book a keynote presentation for your next offsite, event, or board meeting.
🚀 Sign up for the upcoming cohort of my Practical AI Course to master the latest tools for financial analysis, marketing, and business automation.
🔑 Become a Premium Subscriber to unlock subscriber-only content, online meetups, and more.
❤️ Share this email with a friend or colleague