我們可以寫 robots.txt 來防止機器人或是 crawler 亂爬我們的網站
wget 是很遵守 robots.txt 的,不過還是有方法可以偽裝我們不是機器人
wget -e robots=off [url]
-e 是可以附加在 wgetrc 中沒寫的功能
--execute command
Execute command as if it were a part of .wgetrc. A command thus invoked will be executed after the commands in .wgetrc, thus taking precedence over them. If you need to specify more than one wgetrc command, use multiple instances of -e.
Reference
http://www.gnu.org/software/wget/manual/html_node/Robot-Exclusion.html
No comments:
Post a Comment