robots.txt
A robots.txt file tells search engine crawlers which pages or files the crawler can or can't request from your site. The robots.txt
file is a web standard file that most good bots consume before requesting anything from a specific domain.
Web Sraper frameworks like scrapy respect robots.txt
by default, but you may still change the setting to ignore robots.txt
.