Robots.txt File: What It Is, How to Create and Update It [SEO GUIDE]
Posted: Tue Dec 03, 2024 3:48 am
Robots.txt is a simple text file (to be uploaded to the root directory of your website) where all the sections of the site that should not be crawled by search engine crawlers are indicated. The file is mainly used to avoid an overload of requests from crawlers towards the site.
The task of writing the robots.txt file falls to the SEO Specialist who must be able to:
allow bots to access the pages of the site that are considered most important;
limit the scanning of pages deemed to be of low value;
configure the file correctly avoiding wasting “ crawl budget ”.
All the main elements of the robots.txt file
A robots.txt file has syntax rules and a well-defined structure to respect: this is to allow bots to correctly understand all the instructions reported. Here are all the main elements that can make up a robots.txt file.
User Agent
The User-Agent element is used to specify the name of the crawler to which the immediately following rules will be applied. An example of a crawler is Google's Googlebot or Bingbot. The wildcard character * indicates that the directives reported in the file are valid for all User Agents.
#Google
User-agent: Googlebot
#Bing
User-agent: Bingbot
#All User-agents
User-agent: *
Disallowed
The Disallow directive is used to indicate pages, directories or files that you want to exclude from being scanned by search engine crawlers. However, the rule only indirectly impacts the indexing of the same: this is because, for example, it is enough for a page to be linked to be crawlable by bots and appear in search results.
#Block access to the entire site
Disallow: /
#Block access to a page
Disallow: /readme.txt
#Block access to a directory
Disallow: /landing/
The Disallow directive cannot be used for:
delete a resource from the index;
block access to files essential for page rendering (images, CSS and JS files);
put the site into maintenance;
prevent access to private resources.
Allow
The Allow directive explicitly specifies permission Phone Number Database to crawl a certain URL, directory, or file. This behavior is applied by default for all resources on a website, which is why this rule is mostly used to override a specific Disallow directive.
#Access to media not allowed except for file italia.pdf
User Agent: *
Disallow: /media/
Allow: /media/italia.pdf

Sitemap
The Sitemap rule is used to tell the search engine the URL where the XML Sitemap of the website can be retrieved. The URL must be given absolutely
The task of writing the robots.txt file falls to the SEO Specialist who must be able to:
allow bots to access the pages of the site that are considered most important;
limit the scanning of pages deemed to be of low value;
configure the file correctly avoiding wasting “ crawl budget ”.
All the main elements of the robots.txt file
A robots.txt file has syntax rules and a well-defined structure to respect: this is to allow bots to correctly understand all the instructions reported. Here are all the main elements that can make up a robots.txt file.
User Agent
The User-Agent element is used to specify the name of the crawler to which the immediately following rules will be applied. An example of a crawler is Google's Googlebot or Bingbot. The wildcard character * indicates that the directives reported in the file are valid for all User Agents.
User-agent: Googlebot
#Bing
User-agent: Bingbot
#All User-agents
User-agent: *
Disallowed
The Disallow directive is used to indicate pages, directories or files that you want to exclude from being scanned by search engine crawlers. However, the rule only indirectly impacts the indexing of the same: this is because, for example, it is enough for a page to be linked to be crawlable by bots and appear in search results.
#Block access to the entire site
Disallow: /
#Block access to a page
Disallow: /readme.txt
#Block access to a directory
Disallow: /landing/
The Disallow directive cannot be used for:
delete a resource from the index;
block access to files essential for page rendering (images, CSS and JS files);
put the site into maintenance;
prevent access to private resources.
Allow
The Allow directive explicitly specifies permission Phone Number Database to crawl a certain URL, directory, or file. This behavior is applied by default for all resources on a website, which is why this rule is mostly used to override a specific Disallow directive.
#Access to media not allowed except for file italia.pdf
User Agent: *
Disallow: /media/
Allow: /media/italia.pdf

Sitemap
The Sitemap rule is used to tell the search engine the URL where the XML Sitemap of the website can be retrieved. The URL must be given absolutely