Robots.txt

A robots.txt file is a text file placed in the root directory of a website that gives instructions to search engine crawlers about which pages or sections should be crawled or excluded. It is part of the Robots Exclusion Protocol and is one of the first files crawlers check before scanning a site.
This file helps website owners manage how search engines interact with their content. For example, a company may use robots.txt to block crawlers from indexing admin pages, staging environments, or duplicate content, while still allowing public-facing content to be discovered and ranked.
Advanced
Robots.txt works by defining rules for user-agents (search engine bots). Commands such as Disallow, Allow, Crawl-delay, and Sitemap provide instructions on what crawlers can or cannot access. The file is case-sensitive and must be placed at rubixstudios.com.au/robots.txt to be recognized.
Advanced usage involves creating separate rules for different bots, such as Googlebot, Bingbot, or ad crawlers. While robots.txt can restrict crawling, it cannot guarantee content won’t appear in search results, blocked URLs may still show if linked externally. For complete control, it is often combined with meta robots tags or noindex directives. Monitoring crawl logs and Search Console reports ensures rules are functioning as intended.
Relevance
Applications
Metrics
Issues
Example
An online store mistakenly blocks /products/ in its robots.txt file, preventing all product pages from being crawled and indexed. After correcting the directive to allow Googlebot access, the pages begin appearing in search results again, restoring lost traffic.