Thursday, June 27, 2013

git archive to tarballs


今天看到的小工具,git 可以直接把整個 repo 打包帶走

git archive --format=tar --prefix=proj-1.2.3/ HEAD > proj-1.2.3.tar

Ref/
https://www.kernel.org/pub/software/scm/git/docs/git-archive.html

Wednesday, June 26, 2013

let wget ignore robots.txt

我們可以寫 robots.txt 來防止機器人或是 crawler 亂爬我們的網站

wget 是很遵守 robots.txt 的,不過還是有方法可以偽裝我們不是機器人

wget -e robots=off [url]

-e 是可以附加在 wgetrc 中沒寫的功能


--execute command
           Execute command as if it were a part of .wgetrc.  A command thus invoked will be executed after the commands in .wgetrc, thus taking precedence over them.  If you need to specify more than one wgetrc command, use multiple instances of -e.


Reference

http://www.gnu.org/software/wget/manual/html_node/Robot-Exclusion.html

Thursday, June 6, 2013