Gary Illyes from Google described how search engine crawlers have changed over the years. This came up in the latest Search Off the Record podcast with Martin Splitt and Gary Illyes from Google.
He also said that while Googlebot does not support HTTP3 yet, they will eventually because it is more efficient.
It has changed in a few ways including:
(1) Pre and post HTTP headers was a change
(2) The robots.txt protocol (although that is super super old)
(3) Dealing with spammers and scammers
(4) How AI is consuming more stuff now (kinda).
This came up at the 23:23 mark into the podcast, here is the embed:
Martin Splitt asked Gary: “Do you see a change in the way that crawlers work or behave over the years?”
Gary replied:
Behave, yes. How they crawl, there’s probably not that much to change. Well, I guess back in the days we had, what, HTTP/1.1, or probably they were not crawling on /0.9 because no headers and stuff, like that’s probably hard. But, anyway, nowadays you have h2/h3. I mean, we don’t support h3 at the moment, but eventually, why wouldn’t we? And that enables crawling much more efficiently because you can stream stuff–stream, meaning that you open one
connection and then you just do multiple things on that one connection instead of opening a bunch of connections. So like the way the HTTP clients work under the hood, that changes, but technically crawling doesn’t actually change.
He then added:
And then how different companies set policies for their crawlers, that of course differs greatly. If you are involved in discussions at the IETF, for example, the Internet Engineering Task Force, about crawler behavior, then you can see that some publishers are complaining that crawler X or crawler B or crawler Y was doing something that they would
have considered not nice. The policies might differ between crawler operators, but in general, I think the well-behaved crawlers, they would all try to honor robots.txt, or Robots Exclusion Protocol, in general, and pay some attention to the signals that sites give about their own load or their servers load and back out when they can. And then you also have, what are they called, the adversarial crawlers like malware scanners and privacy scanners and whatnot. And
then you would probably need a different kind of policy for them because they are doing something that they want to hide. Not for a malicious reason, but because malware distributors would probably try to hide their malware if they knew that a malware scanner is coming in, let’s say. I was trying to come up with another example, but I can’t. Anyway. Yeah. What else do you have?
He added later:
Yeah. I mean, that’s one thing that we’ve been doing last year, right? Like, we were trying to reduce our footprint on the internet. Of course, it’s not helping that then new products are launching or new AI products that do fetching for various reasons. And then basically you saved seven bytes from each request that you make. And then this new product will add back eight. The internet can handle the the load from from crawlers. I firmly believe that–this will be controversial and I will get yelled at on the internet for this–but it’s not crawling that is eating up the resources; it’s indexing and potentially serving or what you are doing with the data when you are processing that data that you fetch, that’s what’s expensive and resource-intensive. Yeah, I will stop there before I get in more trouble.
I mean, not much has changed but listening this wasn’t too bad (looking at you Gary).
Forum discussion at LinkedIn.
Image credit to Lizzi
We all want to be satisfied, even though we know some people who will never be that way, and others who see satisfaction as a foreign emotion that they can’t hope to ever feel.
Newspaper Ads Canyon Crest CA
Click To See Full Page Ads
Click To See Half Page Ads
Click To See Quarter Page Ads
Click To See Business Card Size Ads
If you have questions before you order, give me a call @ 951-235-3518 or email @ canyoncrestnewspaper@gmail.com
Like us on Facebook Here
OpenAI ChatGPT Agent Marks A Turning Point For Businesses And SEO
OpenAI’s ChatGPT agent marks a change in how users interact with web pages and complete...
Daily Search Forum Recap: July 17, 2025
Barry Schwartz is the CEO of RustyBrick and a technologist, a New York Web service firm...











0 Comments