Thursday, June 27, 2013
git archive to tarballs
今天看到的小工具,git 可以直接把整個 repo 打包帶走
git archive --format=tar --prefix=proj-1.2.3/ HEAD > proj-1.2.3.tar
Ref/
https://www.kernel.org/pub/software/scm/git/docs/git-archive.html
Wednesday, June 26, 2013
let wget ignore robots.txt
我們可以寫 robots.txt 來防止機器人或是 crawler 亂爬我們的網站
wget 是很遵守 robots.txt 的,不過還是有方法可以偽裝我們不是機器人
wget -e robots=off [url]
-e 是可以附加在 wgetrc 中沒寫的功能
--execute command
Execute command as if it were a part of .wgetrc. A command thus invoked will be executed after the commands in .wgetrc, thus taking precedence over them. If you need to specify more than one wgetrc command, use multiple instances of -e.
Reference
http://www.gnu.org/software/wget/manual/html_node/Robot-Exclusion.html
wget 是很遵守 robots.txt 的,不過還是有方法可以偽裝我們不是機器人
wget -e robots=off [url]
-e 是可以附加在 wgetrc 中沒寫的功能
--execute command
Execute command as if it were a part of .wgetrc. A command thus invoked will be executed after the commands in .wgetrc, thus taking precedence over them. If you need to specify more than one wgetrc command, use multiple instances of -e.
Reference
http://www.gnu.org/software/wget/manual/html_node/Robot-Exclusion.html
Thursday, June 6, 2013
Subscribe to:
Posts (Atom)