Per Environment robots.txt
In this article we'll learn how to set a custom robots.txt file per environment, which will prevent google from indexing your staging site but still allow the live site to be indexed.
Recently Andrew Weaver wrote about Preventing Google from Indexing Staging Sites. Go read it and then come back. Seriously.
The key take away from that article is being able to stop Google from indexing your staging site while allowing indexing of the live site. Andrew goes into details about how to do that using his custom multi-environment set up. I however prefer to use the setup from the Craft documentation.
All you need to do is add one line to each environment of your general.php file. In the nested environmentVariables array I've added environment. Add to each environment and then change it to 'local', 'staging', or 'live' as needed.
//local only
'demo.dev' => array(
'devMode' => true,
'siteUrl' => array(
'en' => 'http://demo.dev'
),
'environmentVariables' => array(
'environment' => 'local',
)
),
Then in the SEOmatic template meta page update your robots.txt to use the following twig code and you'll be set
# robots.txt for {{ siteUrl }}
Sitemap: {{ siteUrl }}sitemap.xml
{% switch craft.config.environmentVariables.environment %}
{% case "live" %}
# Live - don't allow web crawlers to index Craft
User-agent: *
Disallow: /craft/
{% case "staging" %}
# Staging - disallow all
User-agent: *
Disallow: /
{% default %}
# Default - don't allow web crawlers to index Craft
User-agent: *
Disallow: /craft/
{% endswitch %}
As per Andrew's article you can check that it's working correctly by visiting /robots.txt in each environment.
Important -- don't forget to remove any robots.txt file in your public directory as that will override the seomatic settings.