zzzhc's Blog

stay curious

Ack - Better Grep

Ack是一个给程序员用的grep, 使用perl regular expressions, 而不是POSIX/GNU subset.

why ack?

  • 使用perl regular expressions, 忘掉grep那套不人性的pattern吧
  • 速度快, 默认只搜索程序代码文件
  • 自动忽略.svn, .git, CVS这类目录, 默认递归搜索子目录
1
2
3
4
5
6
# with grep
$ grep pattern $(find . -type f | grep -v '\.svn')
or
$ grep -R --exclude-dir .svn pattern
# with ack
$ ack pattern
  • 支持搜索指定文件类型
1
2
# 只搜索ruby code
$ ack --ruby pattern

install

1
2
3
4
# mac
$ brew install ack
# ubuntu
$ sudo apt-get install ack-grep

integrate into vim

  • install ack.vim
  • set grepprg=ack in vimrc

custom ack

Ack默认包含很多filetype => extensions的设置,但新出现的语言不一定支持。好在ack提供--type-add TYPE=.EXTENSION[,.EXT2[,...]], --type-set TYPE=.EXTENSION[,.EXT2[,...]]来扩展。经常用的可以加到~/.ackrc里, 我的.ackrc:

1
2
3
4
5
6
7
$ cat ~/.ackrc
--type-add
ruby=.haml,.ru
--type-add
css=.scss,.sass,.less
--type-add
js=.coffee

有道字典 Chrome Extension

有道字典的chrome extension会把鼠标下的词log到console里,debug的时候让人烦, 看了下code, 直接用的console.log, 发布的时候也没注释掉. 跑到~/Library/Application Support/Google/Chrome/Default/Extensions/nbndkplefmmhmcmfjanjaakhhkiegogd/1.0_0下把content.js,background.html里的console.log都注释掉,安静了。

大概看了下extension code, 发现两个有意思的地方:

  • 打包的时候连.svn目录都没放过
1
2
3
4
5
6
7
8
9
10
11
$ svn info
Path: .
URL: https://dev.corp.youdao.com/svn/outfox/products/desktop/incubator/mac/GetWordExtension/Chrome/extension
Repository Root: https://dev.corp.youdao.com/svn/outfox
Repository UUID: 36a6777f-fe3c-0410-890b-904d6044f29d
Revision: 285097
Node Kind: directory
Schedule: normal
Last Changed Author: huangdx
Last Changed Rev: 277738
Last Changed Date: 2011-09-13 14:00:17 +0800 (二, 13  9 2011)
  • 取词的时候调用的是本机有道字典app提供的http接口
1
2
3
4
5
6
7
//in background.html
     function SendResult(word, pos, type) {
         var s = new XMLHttpRequest;
         s.open("GET", "http://localhost:32445/getword?word=" + word + "&pos=" + pos + "&type=" + type, true);
         //console.log('sending...')
         s.send()
     };

用curl试了下,可以发送请求,但响应为空

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ curl -vv 'http://localhost:32445/getword?word=for%20suppliers&pos=8&type=0'
* About to connect() to localhost port 32445 (#0)
*   Trying ::1... Connection refused
*   Trying 127.0.0.1... connected
* Connected to localhost (127.0.0.1) port 32445 (#0)
> GET /getword?word=for%20suppliers&pos=8&type=0 HTTP/1.1
> User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4 OpenSSL/0.9.8r zlib/1.2.5
> Host: localhost:32445
> Accept: */*
>
* Empty reply from server
* Connection #0 to host localhost left intact
curl: (52) Empty reply from server
* Closing connection #0

word, pos, type这三个参数只要少一个有道字典就会crash

这次http request只是一个trigger, 有道字典会向dict.youdao.com发一个request:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
$ curl -vv 'http://dict.youdao.com/fsearch?q=for%20suppliers&pos=8&keyfrom=mac.scrtrans.0&id=E33EC7736AFDCABB184051CD3757CA73&vendor=cidian.youdao.com&client=macdict'
* About to connect() to dict.youdao.com port 80 (#0)
*   Trying 61.135.218.32... connected
* Connected to dict.youdao.com (61.135.218.32) port 80 (#0)
> GET /fsearch?q=for%20suppliers&pos=8&keyfrom=mac.scrtrans.0&id=E33EC7736AFDCABB184051CD3757CA73&vendor=cidian.youdao.com&client=macdict HTTP/1.1
> User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4 OpenSSL/0.9.8r zlib/1.2.5
> Host: dict.youdao.com
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: nginx
< Date: Tue, 13 Dec 2011 07:10:55 GMT
< Content-Type: text/xml; charset=utf-8
< Connection: keep-alive
< Cache-Control: private
< Content-Language: en-US
< Set-Cookie: OUTFOX_SEARCH_USER_ID=-447518408@123.117.56.163; domain=.youdao.com; expires=Thu, 05-Dec-2041 07:10:55 GMT
< Set-Cookie: JSESSIONID=abcdjNBcCWPPK4igOq2qt; domain=youdao.com; path=/
< Vary: Accept-Encoding
< Content-Length: 1477
<
<?xml version="1.0" encoding="UTF-8"?>

<yodaodict>
  <return-phrase><![CDATA[suppliers]]></return-phrase>

                            <phonetic-symbol>sə'plaiəz</phonetic-symbol>
                  <dictcn-speach>suppliers</dictcn-speach>
                          <custom-translation>
        <type>ec</type>
                  <translation><content><![CDATA[n. 供应商(supplier的复数)]]></content></translation>
                </custom-translation>

              <yodao-web-dict>
                    <web-translation>
                <key><![CDATA[Suppliers]]></key>
                          <trans><value><![CDATA[供应商]]></value></trans>
                  <trans><value><![CDATA[供货商]]></value></trans>
                  <trans><value><![CDATA[数据库]]></value></trans>
                </web-translation>
              <web-translation>
                <key><![CDATA[Overview Suppliers]]></key>
                          <trans><value><![CDATA[概览]]></value></trans>
                  <trans><value><![CDATA[供应商]]></value></trans>
                </web-translation>
              <web-translation>
                <key><![CDATA[select suppliers]]></key>
                          <trans><value><![CDATA[挑选供应商]]></value></trans>
                </web-translation>
            </yodao-web-dict>

                               <recommend><![CDATA[supplier]]></recommend>
                    <sexp>0</sexp>
</yodaodict>
* Connection #0 to host dict.youdao.com left intact
* Closing connection #0

拿到结果后由有道字典显示一个提示窗口.

这种实现方式倒也算精巧, 其他app要用这个http接口也比较方便,只是32445端口连个http 200都不返回粗暴了点.

Speed Up Rails Asset Pipeline

Asset pipeline是Rails 3.1里引入的一个特性, 用来帮助拼接,压缩js/css,同时对缓存asset提供了更好的支持.

Asset pipe本身只是对Sprockets做了一层封装, 通过Rails.application.assets可以得到Sprockets自己的rack app.

如果application.js里用到了一堆的js, 在development模式下会发现页面很慢, 即使用了active_reload来加速.在chrome console里看下network tab,每次js请求都在100ms以上。慢的原因在于每次请求都要过一次Rails/application里定义的middleware, 跳过这些middleware之后就快多了. 改下config.ru, 世界又和谐了:

1
2
3
4
5
6
7
8
9
10
11
# This file is used by Rack-based servers to start the application.
require ::File.expand_path('../config/environment',  __FILE__)

map '/assets' do
  assets = Rails.application.assets
  run assets
end

map '/' do
  run Rails.application
end

高效能文本编辑的7个习惯

Seven habits of effective text editing

编辑单个文件

1. 快速移动

  • 查找光标下单词在当前文件的其它位置
    1. * 向后查找
    2. # 向前查找
  • 搜索文本, /pattern
  • 使用%跳转到对应块的结尾/开头,安装matchit插件效果更好
  • 使用gd跳转到变量定义

掌握更多高效编辑命令的三个基本步骤: * 留意编辑过程中的重复动作和花时间较多的地方 * 找出一个可以更快完成这些操作的命令 * 反复练习,直到形成习惯,不需要思考靠直觉就能敲出命令

2. 不要重复输入两次

  • 善用.命令重做前一个修改
  • 使用自动补全, 安装supertab后可用tab键补全
  • 录制宏, qa开始录制,q结束宏, @a重放宏

3. 自动修复拼写错误

  • abbr
  • syntax hightlight

编辑多个文件

4. 很少只在单独一个文件上工作

  • ctags, Ctrl+] 跳转到定义处
  • :grep, :cn

5. 与其它程序协同工作

1
:!command

6. Text is structured

自动化编译,修改过程

1
2
:make
:set errorformat=xxx

磨快你的矩

7. 养成习惯

这一点最重要,花大把时间找到合适的命令但很快就忘掉是很不划算的。

Crontab Tips

crontab command

1
2
3
4
5
6
7
8
# 设置编辑器
$ export EDITOR=vim
# 编辑
$ crontab -e [-u user]
# 列出crontab内容
$ crontab -l [-u user]
# 删除crontab文件
$ crontab -r [-u user]

crontab syntax

1
2
3
4
5
6
7
8
9
* * * * *  command
| | | | |
| | | | |- day of week(0-6) 0 means sunday
| | | |--- month(1-12)
| | |----- day of month(1-31)
| |------- hour
|--------- minute(0-59)

* 表示的项可以用,分隔指定多个值(如5,15,25),也可以指定周期,如在minute项上写*/5表示每5分钟执行一次

crontab environment

cron脚本执行的环境跟正常用户执行的有区别,~/.bashrc不会被执行, 这一点经常会引起问题, 可以在crontab里设置环境变量来减少影响.

1
2
3
4
5
6
# PATH, SHELL, MAILTO比较常用, 如
PATH=/usr/bin:/usr/sbin:/bin:/sbin
SHELL=/bin/bash
MAILTO=xxx@xxx.com

1 * * * * find /var/data/upload/ -mtime 30 -exec rm -- {} +

默认情况crontab按OS时区调度, 而不是用户时区,可以通过指定TZ环境变量设实际要用的时区

1
2
TZ=UTC
1 * * * * find /var/data/upload/ -mtime 30 -exec rm -- {} +

crontab notification

如果command有输出(stdout/stderr),crond会发送通知邮件, 具体发送给谁可以通过MAILTO定义,默认发送给当前用户。 一般情况stdout可以忽略,所以经常会看到crontab里有 command >/dev/null; stderr一定不要忽略,否则cron job有错误无法正常执行都不知道。

Bash Tips - 0

  • 为变量设置缺省值
1
: ${BIND_PORT:=9999}
  • 取得前一个在后台运行的进程pid
1
2
$ sleep 10 &
$ echo $!
  • 当前脚本的绝对路径
1
SCRIPT_PATH=`readlink -f "$0"`

Rvm Global Gems

用rvm安装多了ruby版本后,一些经常用的gem需要在多个版本下都重装一次,很费事。不过还好rvm早有解决方案,可以通过编辑~/.rvm/gemsets/global.gems来添加全局的gemsets, 比如必不可少的bundler

1
bundler

也可以指定版本

1
bundler -v~>1.0.21

Lucene 3下最快的中文分词器

包包分词器 - 一个基于字典的快速中文分词器

source code

features

  • 简单 1000LOC
  • 高效 7M+ chars/second
  • 支持中文,英语,数字
  • 自动识别未登录词
  • 支持OffsetAttribute
  • 支持TypeAttribute
  • 支持PositionIncrementAttribute

usage

1
2
3
4
5
6
7
8
Dict dict = new Dict();
dict.addAllSpecialTypes();
BufferedReader dictReader = new BufferedReader(new InputStreamReader(
    new FileInputStream("dict.txt"), "UTF-8"));
dict.load(dictReader);
dictReader.close();
dict.optimize();
DictAnalyzer dictAnalyzer = new DictAnalyzer(dict);

benchmark

ant benchmark

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
supported features:
                  CharTerm  Offset  PositionIncrement  Term  Type
      IKAnalyzer         Y       Y                  N     Y     N
   MMSegAnalyzer         Y       Y                  N     Y     Y
 PaodingAnalyzer         Y       Y                  N     Y     Y
StandardAnalyzer         Y       Y                  Y     Y     Y
  BaoBaoAnalyzer         Y       Y                  Y     Y     Y

test 1, sample length=26265
            name          chars           time         tokens speed(chars/second)
 PaodingAnalyzer          26265          0.610          12542            43036.87
   MMSegAnalyzer          26265          0.314          14007            83566.52
      IKAnalyzer          26265          0.262          16016           100177.91
StandardAnalyzer          26265          0.141          22366           185727.87
  BaoBaoAnalyzer          26265          0.038          18185           695682.16

test 2, sample length=262650
            name          chars           time         tokens speed(chars/second)
 PaodingAnalyzer         262650          0.187         125420          1402139.61
      IKAnalyzer         262650          0.163         160160          1613693.16
   MMSegAnalyzer         262650          0.158         140070          1664009.53
  BaoBaoAnalyzer         262650          0.041         181850          6362134.44
StandardAnalyzer         262650          0.020         223660         12905789.80

test 3, sample length=2626500
            name          chars           time         tokens speed(chars/second)
      IKAnalyzer        2626500          2.251        1601600          1166564.72
 PaodingAnalyzer        2626500          1.462        1254200          1796381.55
   MMSegAnalyzer        2626500          1.043        1400700          2519010.94
  BaoBaoAnalyzer        2626500          0.352        1818500          7458959.20
StandardAnalyzer        2626500          0.202        2236600         13015280.16

Mysqld 5.1.41的一个神奇bug

今天打算在本地装两个数据库,方便测试,用mysql_install_db –datadir=/opt/mysql初始化mysql data directory, 死活都不成功,/opt, /opt/mysql的权限都改成0777也是一样

OS:

1
2
3
4
5
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=10.04
DISTRIB_CODENAME=lucid
DISTRIB_DESCRIPTION="Ubuntu 10.04.1 LTS"
1
2
3
4
5
6
7
8
$ mysql_install_db --datadir=/opt/mysql/
Installing MySQL system tables...
110609 20:59:26 [Warning] Can't create test file /opt/mysql/zzzhc-laptop.lower-test
110609 20:59:26 [Warning] Can't create test file /opt/mysql/zzzhc-laptop.lower-test

Installation of system tables failed!  Examine the logs in
/opt/mysql/ for more information.
...

换成/tmp目录,mysql_install_db –datadir=/tmp,神奇地成功了。

bash -x mysql_install_db –datadir=/opt/mysql/发现问题在这:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
+ mysqld_install_cmd_line='/usr/sbin/mysqld  --language=/usr/share/mysql/english --bootstrap   --basedir=/usr --datadir=/opt/mysql/ --log-warnings=0 --loose-skip-innodb   --loose-skip-ndbcluster  --user=mysql --max_allowed_packet=8M   --default-storage-engine=myisam   --net_buffer_length=16K'
+ s_echo 'Installing MySQL system tables...'
+ test 0 -eq 0 -a 0 -eq 0
+ echo 'Installing MySQL system tables...'
Installing MySQL system tables...
+ /usr/sbin/mysqld --language=/usr/share/mysql/english --bootstrap --basedir=/usr --datadir=/opt/mysql/ --log-warnings=0 --loose-skip-innodb --loose-skip-ndbcluster --user=mysql --max_allowed_packet=8M --default-storage-engine=myisam --net_buffer_length=16K
+ echo 'use mysql;'
+ cat /usr/share/mysql/mysql_system_tables.sql /usr/share/mysql/mysql_system_tables_data.sql
110609 21:04:14 [Warning] Can't create test file /opt/mysql/zzzhc-laptop.lower-test
110609 21:04:14 [Warning] Can't create test file /opt/mysql/zzzhc-laptop.lower-test
+ eval cat
++ cat
+ echo

+ echo 'Installation of system tables failed!  Examine the logs in'
Installation of system tables failed!  Examine the logs in
+ echo '/opt/mysql/ for more information.'
/opt/mysql/ for more information.

strace /usr/sbin/mysqld –language=/usr/share/mysql/english –bootstrap –basedir=/usr –datadir=/opt/mysql/ –log-warnings=0 –loose-skip-innodb –loose-skip-ndbcluster –user=mysql –max_allowed_packet=8M –default-storage-engine=myisam –net_buffer_length=16K 2>&1 |tee strace.log

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
stat64("/usr/share/mysql/charsets/Index.xml", {st_mode=S_IFREG|0644, st_size=18261, ...}) = 0
brk(0x22064000)                         = 0x22064000
open("/usr/share/mysql/charsets/Index.xml", O_RDONLY|O_LARGEFILE) = 3
read(3, "..., 18261) = 18261
close(3)                                = 0 
unlink("/opt/mysql/zzzhc-laptop.LOWER-TEST") = -1 ENOENT (No such file or directory)
open("/opt/mysql/zzzhc-laptop.lower-test", O_RDWR|O_CREAT|O_LARGEFILE, 0666) = -1 EACCES (Permission denied)
time(NULL)                              = 1307624820
write(2, "110609 21:07:00 [Warning] Can't "..., 84110609 21:07:00 [Warning] Can't create test file /opt/mysql/zzzhc-laptop.lower-test
) = 84
unlink("/opt/mysql/zzzhc-laptop.LOWER-TEST") = -1 ENOENT (No such file or directory)
open("/opt/mysql/zzzhc-laptop.lower-test", O_RDWR|O_CREAT|O_LARGEFILE, 0666) = -1 EACCES (Permission denied)
time(NULL)                              = 1307624820
write(2, "110609 21:07:00 [Warning] Can't "..., 84110609 21:07:00 [Warning] Can't create test file /opt/mysql/zzzhc-laptop.lower-test
) = 84

open(“/opt/mysql/zzzhc-laptop.lower-test”, O_RDWR|O_CREAT|O_LARGEFILE, 0666) = -1 EACCES (Permission denied) 这个太没道理了。。。比较了/tmp和/opt/mysql两种情况下的strace结果,没发现什么可疑的地方,难道是/tmp目录有什么奇怪的特性??

0.77试了下,一切正常。 bug总是无处不在啊!

做事

做好一件事和用正确的方法做好一件事,看起来是一样的结果,实际上如果每次做的事都留下一两个坑,几年之后会再也无法移动。 但什么是正确的方法?这也没有定论,不同的阶段会不一样。