robots - HK Notes

# robots.txt A [robots.txt file](https://developers.google.com/search/docs/advanced/robots/intro) tells search engine crawlers which pages or files the crawler can or can't request from your site. The `robots.txt` file is a web standard file that most [good bots](https://www.cloudflare.com/en-ca/learning/bots/how-to-manage-good-bots/#:~:text=By%20maintaining%20a%20list%20of,blocklist%20of%20known%20bad%20bots) consume before requesting anything from a specific domain. Web Sraper frameworks like scrapy respect `robots.txt` by default, but you may still change the setting to ignore `robots.txt`.