星期二, 十一月 04, 2014

社会科学的代码和数据工作指南

偶尔在知乎上看到有人推荐了一本小册子:《Code and Data for the Social Sciences:A Practitioner’s Guide》。专门讲非计算机背景的分析研究人员如何归整自己的分析代码和研究数据。看下来还是总结得非常好,很有益于创建高效的工作规范和流程。将其中一些基本的规则摘要如下:

Automate
(A) Automate everything that can be automated.
(B) Write a single script that executes all code from beginning to end.

Version Control
(A) Store code and data under version control.
(B) Run the whole directory before checking it back in.

Directories
(A) Separate directories by function.
(B) Separate files into inputs and outputs.
(C) Make directories portable.

\input CSV
\code R, SQL
\output pic, ppt
\temp
readme.txt

Keys
(A) Store cleaned data in tables with unique, non-missing keys.
(B) Keep data normalized as far into your code pipeline as you can

Abstraction
(A) Abstract to eliminate redundancy.
(B) Abstract to improve clarity.
(C) Otherwise, don't abstract.

Documentation
(A) Don't write documentation you will not maintain.
(B) Code should be self-documenting.

Management
(A) Manage tasks with a task management system.
(B) E-mail is not a task management system.