DeMON lab GitHub Guidelines
General Organization
Motivations
The motivations of maintaining a lab-wise GitHub organization are multifold.
- The codes/scripts of every lab's paper is easy to find/maintain/reproduce;
- The project codes/scripts are version controlled;
- The future labmates could easily follow/reuse codes/scripts;
- ...
Project-wise (paper-wise) repo
You are recommended to create a project-wise repo for each of your projects. Usually one repo is for one paper, you could discuss with Jake about repo structure if your papers are heavily entangled with each other.
Recommended repo structure
In general, your repo should cover:
funcdirectory for all your functions/classes;scriptdirectory for all your scripts to call functions/classes infuncfor analysese;datadirectory for all your data (pre)processing, split, QC procedures;exampleortutorialdirectory for users to easily run your analyses using fake/public data.
In our lab, most data we used are restricted access. NEVER PUT YOUR DATA ON CLOUD STORAGE (Github, Dropbox, Google Drive,...). It will cause serious problems!
Private mode
You could easily control the visibility level (public/lab-wise/yourself) of your codes by setting access constrain of your project repo.
Reproducibility
In DeMON lab, we strongly support open-science. Every paper's code should be reviewed independently by another co-author (reproducibility person, RP) for reducing errors and checking reproducibility.
The procedure of reproducibility is conducting by Pull Request.
Before your paper is almost ready (for example, final revision round), you should begin to cleanup your code for being ready to code review.
In general, the reproducibility procedure should cover:
- Highly structured code/scripts with clear comments for easy reading;
- Accessible data for
RPto run your code; - Detailed comments/checks by
RP; - Validation of results:
- Jupyter Notebook
- Querto document
- Output folder/Reference documents
For RP, you can give different levels of comments/checks after discussion with Jake, depending on your experience and engagement of project.
- Level1: Successfully running code and pass all results validation checks;
- Level2: Providing coding suggestions;
- Level3: In-depth comments for algorithms/approaches used in this project.
Documentation
Although perfect code is self-explanatory, academy is never perfect. You are highly recommended to have detailed documentations for your project. A user is more likely to try your paper (leading to high impacts) if your project is easy to run/replicate.
You should at least have a top-level README.md in your repo with:
- Introduction/Background
- Link of your paper
- Usage
- Contact
For your functions/classes, highly recommended to have docstring to describe the input, output, expected behavior of your code.
Coding guidelines
You are highly recommended to think about following points during your coding/refectory:
- Modularity
- Avoid hard coding
- Follow some language-specific guidelines (e.g.,
PEP8for Python) - Clear and readable code (e.g., meaningful variable names)
- Short functions/scripts (e.g., No one could follow a 2000 lines function)
- Comments for important/complex code blocks
- Progress bar for long running
- No data leakage (e.g., printed out rows in Jupyter Notebook)
Utilities
Please stay tuned.
Bugs and questions
If you have any questions or find any bugs for this page, you could talk with Jake.