You have the ability to manage which files web crawlers are permitted to access on your website using a robots.txt file.
The robots.txt file is typically located at the root of your website. For example, if your site is www.imranonline.net, the robots.txt file can be found at https://www.imranonline.net/robots.txt. This file is in plain text format and adheres to the Robots Exclusion Standard. It consists of one or more rules, and each rule dictates whether a specific web crawler is granted or denied access to particular file paths on the domain or subdomain where the robots.txt file is situated. By default, unless you specify otherwise in your robots.txt file, all files are considered as implicitly allowed for crawling.
Here is a simple robots.txt file with two rules:
User-agent: Googlebot
Disallow: /nogooglebot/
User-agent: *
Allow: /
Sitemap: https://www.example.com/sitemap.xml
Here’s what that robots.txt file means:
- The user agent named Googlebot is not allowed to crawl any URL that starts with
https://example.com/nogooglebot/
. - All other user agents are allowed to crawl the entire site. This could have been omitted and the result would be the same; the default behavior is that user agents are allowed to crawl the entire site.
- The site’s sitemap file is located at
https://www.example.com/sitemap.xml
.
Resource: https://developers.google.com/search/docs/crawling-indexing/robots/create-robots-txt
Simple Robots.txt File
User-agent:*
Disallow: /index.php