{"id":61165,"date":"2021-07-24T17:05:24","date_gmt":"2021-07-25T00:05:24","guid":{"rendered":"https:\/\/sacramentowebdesigngroup.com\/?p=61165"},"modified":"2026-02-25T22:09:33","modified_gmt":"2026-02-26T06:09:33","slug":"understanding-and-configuring-the-wordpress-robots-txt-file","status":"publish","type":"post","link":"https:\/\/dev.sacramentowebdesigngroup.com\/sacweb\/understanding-and-configuring-the-wordpress-robots-txt-file\/","title":{"rendered":"Understanding and Configuring the WordPress robots.txt File"},"content":{"rendered":"<div class=\"content-block-wysi\" data-content-block-type=\"Wysi\">\n<p>One of my previous tutorials covered the basics of <a href=\"https:\/\/webdesign.tutsplus.com\/tutorials\/understanding-and-configuring-the-htaccess-file-in-wordpress--cms-37360\" target=\"_blank\" rel=\"noopener\">understanding and configuring the .htaccess file in WordPress<\/a>. The <strong>robots.txt<\/strong> file is a special file just like the <strong>.htaccess<\/strong> file. However, it serves a very different purpose. As you might have guessed from the name, the <strong>robot.txt<\/strong> file is meant for bots. For example, bots from search engines like Google and Bing.<\/p>\n<p>This tutorial will help you understand the basics of the <strong>robots.txt<\/strong> file and how to configure it for WordPress. Let&#8217;s get started.<\/p>\n<h2>Purpose of the robots.txt File<\/h2>\n<p>As I mentioned earlier, the <strong>robots.txt<\/strong> file is meant for scraping bots. These are mainly search engines but can include other bots as well.<\/p>\n<p>You might already know that search engines find all the pages and content on your website by crawling it\u2014moving from one page to another through links either on the page itself or in the sitemap. This allows them to collect data from your website.<\/p>\n<p>However, there could be some pages on a website that you don&#8217;t want the bots to crawl. The <strong>robots.txt<\/strong> file gives you the option to specify which page they are allowed to visit and which pages they shouldn&#8217;t crawl.<\/p>\n<p>Please note that the instructions you provide in the <strong>robots.txt<\/strong> file are not binding. This means that, although reputable bots like the Google search crawler will respect the limitations in <strong>robots.txt<\/strong>, some bots will probably ignore whatever you put in there and crawl your website anyway. Others might even use it to find links that you specifically don&#8217;t want to be crawled and then crawl them.<\/p>\n<p>Basically, it is not advisable to rely on this file to prevent malicious bots from scraping your website. It is more like a guide that good bots follow.<\/p>\n<h2>Where Should I Put My robots.txt File?<\/h2>\n<p>The <strong>robots.txt<\/strong> file is supposed to be in the root directory of your website. This is different than <strong>.htaccess<\/strong> files which can be placed in different directories. The <strong>robots.txt<\/strong> file only works if it is in the root directory and is exactly named <strong>robots.txt<\/strong>.<\/p>\n<p>You can create this file manually and place it inside your web root directory if it doesn&#8217;t already exist.<\/p>\n<h2>Understanding the Contents of the <strong>robots.txt<\/strong> File<\/h2>\n<p>The <strong>robots.txt<\/strong> file will tell different bots what they should and should not crawl on your website. It uses a bunch of commands to do that. Three such commands that you will use very often are <code>User-Agent<\/code>, <code>Allow<\/code> and <code>Disallow<\/code>.<\/p>\n<p>The <code>User-Agent<\/code> the command will identify the bots to which you want to apply the current set of <code>Allow<\/code> and <code>Disallow<\/code> commands. You can set it to <code>*<\/code> to target all bots. You can also narrow down the list of bots by specifying values like <code>Googlebot<\/code> and <code>Bingbot<\/code>. These are some of the most common crawler bots for the Google and Bing search engines respectively. There are many others out there from different companies which you might want to target specifically.<\/p>\n<p>The <code>Allow<\/code> the command gives you the option to specify the webpage or directory on your website which the bots are free to access. Keep in mind that any values that you specifically need to be with respect to the root directory.<\/p>\n<p>The <code>Disallow<\/code> command on the other hand tells the bots that they shouldn&#8217;t crawl the listed directory or webpage.<\/p>\n<p>You are only allowed to provide one directory or webpage for each <code>Allow<\/code> or <code>Disallow<\/code> command. However, you can use multiple <code>Allow<\/code> and <code>Disallow<\/code> commands within the same set. Here is an example:<\/p>\n<pre class=\"brush: plain noskimlinks noskimwords\">User-Agent: *\nDisallow: \/uploads\/\nDisallow: \/includes\/\nAllow: \/uploads\/images\/\nDisallow: \/login.php<\/pre>\n<p>In the above example, we told the bots that they shouldn&#8217;t crawl the contents of the <strong>uploads<\/strong> directory. However, we use the <code>Allow<\/code> command to tell them to still crawl the <strong>images<\/strong> sub-directory found inside <strong>uploads<\/strong>.<\/p>\n<p>Any bot will assume that it is allowed to crawl all pages that you have not explicitly disallowed. This means that there is no need for you to allow the crawling of directories one at a time.<\/p>\n<p>You should also keep in mind that the values you provide are case-sensitive. The bots will treat <code>uploads<\/code> and <code>UPLOADS<\/code> as referring to different directories.<\/p>\n<p>The <strong>robots.txt<\/strong> file can also contain links to one or more sitemaps on your website. This makes it easier for bots to find all the posts and web pages on your website that you want them to crawl.<\/p>\n<h2>Configuring the robots.txt File in WordPress<\/h2>\n<p>It is important to be careful when you are creating a <strong>robots.txt<\/strong> file to go along with your WordPress website. This is because small mistakes or oversights can prevent the crawling of content on your website by search engines. All the work that you put into SEO will be in vain if the search engines can&#8217;t even crawl it.<\/p>\n<p>A good rule of thumb is to disallow as little as possible. One approach is to just put the following in your <strong>robots.txt<\/strong> file. This basically tells all the bots that they are free to crawl all content on the website.<\/p>\n<pre class=\"brush: plain noskimlinks noskimwords\">User-agent: *<\/pre>\n<p>Another option is to use the following version which tells them to avoid crawling the <strong>wp-admin<\/strong> directory but still crawls all the other content on the website. We also provide a link to the sitemap of the website in this example but that is entirely optional.<\/p>\n<pre class=\"brush: plain noskimlinks noskimwords\">User-agent: *\nDisallow: \/wp-admin\/\nAllow: \/wp-admin\/admin-ajax.php\n \nSitemap: https:\/\/your-website.com\/sitemap.xml<\/pre>\n<p>It is important to not be too aggressive with the <code>Disallow<\/code> command and block access to CSS or JavaScript files that might affect the appearance of the content on the front end. Nowadays, search engines also look at many other aspects of a webpage like its appearance or the user-friendliness of the layout before they determine how the content should be ranked. Blocking them from accessing CSS or JavaScript files will result in issues sooner or later.<\/p>\n<h2>When You Shouldn&#8217;t Be Using robots.txt<\/h2>\n<p>As I have mentioned before, the <strong>robots.txt<\/strong> file is not used to enforce any rules. The rules you specify in the file are only to be used for providing guidance to good and obedient bots. This basically means that you should not be using this file in order to restrict access to some content on your website. There are two common situations that you might face if you used a <strong>robots.txt<\/strong> file for this purpose.<\/p>\n<p>Even though malicious bots won&#8217;t follow the guidelines provided in <strong>robots.txt<\/strong>, they could still use it in order to figure out exactly what you don&#8217;t want them to crawl. This could possibly inflict more damage if you were using this file as a security measure.<\/p>\n<p>This file isn&#8217;t helpful in preventing your web pages from appearing in search results either. The webpage you are trying to hide will still show up in search results but its description would simply say <strong>No information is available for this page<\/strong>. This can happen when you block Google from reading a certain page with the <strong>robots.txt<\/strong> file, but that page is still being linked to from somewhere else.<\/p>\n<p>If you want to block a page from appearing in search results, <a href=\"https:\/\/developers.google.com\/search\/docs\/advanced\/crawling\/block-indexing\" target=\"_self\" rel=\"noopener\">Google recommends using the <code>noindex<\/code> option in the HTTP response header or adding a <code>noindex<\/code> meta tag to the HTML file<\/a>.<\/p>\n<p>There&#8217;s an easy way to do this if you are using WordPress. Just go to <strong>Settings &gt; Reading<\/strong> in the WordPress admin dashboard and then uncheck the Search engine visibility option.<\/p>\n<p>Removing a webpage from search results requires you to take some other actions like removing the page itself from the website, password protecting it, or using the <code>noindex<\/code> option for bots.<\/p>\n<p>Similar to the <strong>robots.txt<\/strong> file, only well-behaved and trustworthy bots will respect the <code>noindex<\/code> option, so if you want to secure sensitive information on your site, you&#8217;ll need to do it another way. For example, you could password-protect that page, or remove it from your website entirely.<\/p>\n<h2>Final Thoughts<\/h2>\n<p>Our aim with this post was to introduce you to the basics of the <strong>robots.txt<\/strong> file so that you can get an idea of what this file does. After that, we discussed the optimum configuration of <strong>robots.txt<\/strong> with respect to WordPress. We also saw how to set the <code>noindex<\/code> option for a page using the WordPress admin.<\/p>\n<p>In the end, I would like to repeat just one more time that you should not be using <strong>robots.txt<\/strong> to block access to sensitive content on the website. This will usually have the opposite effect with malicious bots!<img decoding=\"async\" style=\"color: initial;\" src=\"http:\/\/pi.feedsportal.com\/r\/186529796139\/u\/407\/f\/668806\/c\/35227\/s\/37456\/a2t.img\" width=\"1\" height=\"1\" border=\"0\" \/><\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>One of my previous tutorials covered the basics of understanding and configuring the .htaccess file in WordPress. The robots.txt file is a special file just like the .htaccess file. However, it serves a very different purpose. As you might have guessed from the name, the robot.txt file is meant for bots. For example, bots from [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":61167,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_breakdance_hide_in_design_set":false,"_breakdance_tags":"","footnotes":""},"categories":[16,32,18,19,21],"tags":[34,33,17,20],"class_list":["post-61165","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-content-curation","category-google-for-business","category-reputation-management","category-search-engine-optimization","category-wordpress-design","tag-css","tag-google","tag-html","tag-wordpress"],"_links":{"self":[{"href":"https:\/\/dev.sacramentowebdesigngroup.com\/sacweb\/wp-json\/wp\/v2\/posts\/61165","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dev.sacramentowebdesigngroup.com\/sacweb\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dev.sacramentowebdesigngroup.com\/sacweb\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dev.sacramentowebdesigngroup.com\/sacweb\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dev.sacramentowebdesigngroup.com\/sacweb\/wp-json\/wp\/v2\/comments?post=61165"}],"version-history":[{"count":1,"href":"https:\/\/dev.sacramentowebdesigngroup.com\/sacweb\/wp-json\/wp\/v2\/posts\/61165\/revisions"}],"predecessor-version":[{"id":72225,"href":"https:\/\/dev.sacramentowebdesigngroup.com\/sacweb\/wp-json\/wp\/v2\/posts\/61165\/revisions\/72225"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/dev.sacramentowebdesigngroup.com\/sacweb\/wp-json\/"}],"wp:attachment":[{"href":"https:\/\/dev.sacramentowebdesigngroup.com\/sacweb\/wp-json\/wp\/v2\/media?parent=61165"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dev.sacramentowebdesigngroup.com\/sacweb\/wp-json\/wp\/v2\/categories?post=61165"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dev.sacramentowebdesigngroup.com\/sacweb\/wp-json\/wp\/v2\/tags?post=61165"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}