引言

为了提高自己的可读性和美观程度，我在网站找了一些相关的文章开始学习起来，读完这篇代码风格的文章之后，我觉得我要学习一些Pythonic的开源项目，读源码这个事情一直想要开始，却一直没有正儿八经开始，只有在遇到重大bug 时，才偶尔（不得不）读一部分。现在，为了使得自己的代码能力能够上一个台阶，我决定是时候开始大规模的读源码了。^_^

根据Reading Great Code的推荐，我准备开始阅读学习howdoi，好吧，其实是发现它只有一个文件，而且才区区270行，所以还是从软柿子开始捏吧。

首先，howdoi是这样一个工具，它让你在命令行界面下完成一些编程问题简单的搜索，比如官方举例的格式化日期以及tar的参数查询等。

安装

安装简单到令人发指，为了稍微增加点复杂度，我建议还是使用virturalenv virtualenv ~/.virtualenvs/howdoi
source ~/.virtualenvs/howdoi/bin/activate
pip install howdoi

使用

我简单尝试了一下，很不幸，这个工具“原生”的不支持中国大陆的用户 (国内用户可以尝试使用我修改的基于Bing.com的版本)，因为它内置的使用了Google去搜索，如下SEARCH_URL的设置：

if os.getenv('HOWDOI_DISABLE_SSL'):  # Set http instead of https
    SEARCH_URL = 'http://www.google.com/search?q=site:{0}%20{1}'
    VERIFY_SSL_CERTIFICATE = False
else:
    SEARCH_URL = 'https://www.google.com/search?q=site:{0}%20{1}'
    VERIFY_SSL_CERTIFICATE = True

不过没关系，相信绝大多数程序员都会科学上网，只是一般的科学上网都是浏览器，怎么让Terminal也变科学呢？感兴趣的用户可以戳这里获得科学知识。

一切就绪，那么让我们试试它的效果如何：

$ proxychains4 howdoi find file
grep -rnw '/path/to/somewhere/' -e "pattern"

哇，好厉害，怎么做到的呢？下面我们一一来分析一下。

分析

1. 无参数运行

在没有任何参数的情况下单独运行howdoi会输出help信息，

$ proxychains4 howdoi
usage: howdoi [-h] [-p POS] [-a] [-l] [-c] [-n NUM_ANSWERS] [-C] [-v]
              [QUERY [QUERY ...]]

instant coding answers via the command line

positional arguments:
  QUERY                 the question to answer

optional arguments:
  -h, --help            show this help message and exit
  -p POS, --pos POS     select answer in specified position (default: 1)
  -a, --all             display the full text of the answer
  -l, --link            display only the answer link
  -c, --color           enable colorized output
  -n NUM_ANSWERS, --num-answers NUM_ANSWERS
                        number of answers to return
  -C, --clear-cache     clear the cache
  -v, --version         displays the current version of howdoi

像大多数程序一样，没有参数运行howdoi会输出帮助信息，其过程是首先程序入口在最下面：

if __name__ == '__main__':
    command_line_runner()

然后在command_line_runner()中，首先通过get_parser获取参数相关的信息，其中的argparse是Python标准库中用来处理命令行参数的一个包，由于我们此时的输入没有查询参数（query），所以程序进入if not args['query']中并打印了帮助信息，如下所示：

if not args['query']:
    parser.print_help()
    return

2. 基本搜索

下面我们尝试一下基本的搜索，输入的参数即示例中的format date bash, 这一次，执行越过了刚才的参数为空的检查，进入了下面的缓存检查，由于我们没有禁止缓存，所以就进入了_enable_cache:

def _enable_cache():
    if not os.path.exists(CACHE_DIR):
        os.makedirs(CACHE_DIR)
    requests_cache.install_cache(CACHE_FILE)

在_enable_cache中，通过使用requests-cache为相同的请求做了缓存，可以加速相似的请求的返回。

最后程序进入howdoi，准备开始真正的工作。在howdoi中，首先把查询参数拼接了起来，然后进入_get_instructions中，在_get_instructions中，又进入到_get_links去获取答案的连接，其中_get_result发起网络请求，获取返回结果；然后用pyquery对得到的网页进行解析，从其中提取出Google搜索结果列表页的网页连接，然后将结果返回到_get_instructions中的links，如果Google的搜索结果中没有任何网页，那么_get_instructions将会返回False，此时就会在命令行打印Sorry, couldn't find any help with that topic，如216所示：

def howdoi(args):
    args['query'] = ' '.join(args['query']).replace('?', '')
    try:
        return _get_instructions(args) or 'Sorry, couldn\'t find any help with that topic\n'
    except (ConnectionError, SSLError):
        return 'Failed to establish network connection\n'

当然，在正常的情况下，Google还是会返回一些结果的，程序开始收集答案。在确定了用户通过参数传入的需要的答案的数量（args['num_answers']）以及解析起始答案的位置之后，程序进入_get_answer中去寻找答案，并将答案按照固定格式打印到Terminal中：

answers = []
append_header = args['num_answers'] > 1
initial_position = args['pos']
for answer_number in range(args['num_answers']):
    current_position = answer_number + initial_position
    args['pos'] = current_position
    answer = _get_answer(args, links)
    if not answer:
        continue
    if append_header:
        answer = ANSWER_HEADER.format(current_position, answer)
    answer += '\n'
    answers.append(answer)
return '\n'.join(answers)

在_get_answer中，通过_get_questions过滤了连接，只保留了那些stackoverflow.com中的问题，接着获取用户指定的位置的问题的连接，如果用户只是需要连接本身，就只返回连接，否则就去抓取连接对应的网络中的内容。

if args.get('link'):
    return link
page = _get_result(link + '?answertab=votes')
html = pq(page)

在解析问题页面时，首先当然是要处理排在第一位的回答，其中的instructions表示是否包含代码片段， args['tags']则表示该问题所标记的标签，紧接着，如果没有找到代码，并且用户没有输入all标记参数，则返回第一个答案的所有文本信息，如果用户标注了all，则依次处理第一个答案下面的所有文本，遇到普通文本直接打印，遇到代码，则使用_format_output格式化后打印。

first_answer = html('.answer').eq(0)
instructions = first_answer.find('pre') or first_answer.find('code')
args['tags'] = [t.text for t in html('.post-tag')]

if not instructions and not args['all']:
    text = first_answer.find('.post-text').eq(0).text()
elif args['all']:
    texts = []
    for html_tag in first_answer.items('.post-text > *'):
        current_text = html_tag.text()
        if current_text:
            if html_tag[0].tag in ['pre', 'code']:
                texts.append(_format_output(current_text, args))
            else:
                texts.append(current_text)
    texts.append('\n---\nAnswer from {0}'.format(link))
    text = '\n'.join(texts)
else:
    text = _format_output(instructions.eq(0).text(), args)
if text is None:
    text = NO_ANSWER_MSG
text = text.strip()
return text

就这样，一个标准的查询流程就完成了。

← Previous Archive Next →

blog comments powered by Disqus

Published

10 December 2016

阅读源代码之开篇`Howdoi`

引言

安装

使用

分析

1. 无参数运行

2. 基本搜索

Published

Category

Tags