Module 2: Digitalisation in Research
Coding best practices
Jana Lasser
TU Graz & CSH Vienna
2021-12-16
The aims of coding best practices
Enable others to ...
... find your code.
... understand your code.
... run your code.
... one year from now.
The aims of coding best practices
Enable yourself to ...
... find your code.
... understand your code.
... run your code.
... one year from now.
The aims of coding best practices
Enable others to ...
... find your code → code repositories
... understand your code → comments & documentation
... run your code → dependencies & containers
... one year from now.
Enable others to understand your code
Commenting best practices
# mesa models already implement fixed seeds through their own random
# number generations. Sadly, we need to use the Weibull distribution
# here, which is not implemented in mesa's random number generation
# module. Therefore, we need to initialize the numpy random number
# generator with the given seed as well
if seed != None:
np.random.seed(seed)
# sets the (daily) transmission risk for a household contact without
# any precautions. Target infection ratios are taken from literature
# and the value of the base_transmission_risk is calibrated such that
# the simulation produces the correct infection ratios in a household
# setting with the given distributions for epidemiological parameters
# of agents
self.base_transmission_risk = base_transmission_risk
Write why you did something, not what you did.
Commenting best practices
def get_floor_distribution(N_floors, N_classes):
"""
Distribute the number of classes evenly over the number of available floors.
Parameters
----------
N_floors : int
Number of available floors.
N_classes : int
Number of classes in the school.
Returns
-------
floors : dictionary
Dictionary of the form {floor1:[class_1, class_2, ...], ...}
floors_inv : dictionary
Dictionary of the form {class1:floor1, ..., class_N:floor_N}
"""
floors = {i:[] for i in range(N_floors)} # starts with 0 (ground floor)
classes = list(range(1, N_classes + 1))
classes_per_floor = int(N_classes / N_floors)
Comment your functions and classes.
Commenting best practices
Comment while you code and/or while your code is running.
Commenting best practices
You copied some code from Stack Owerflow?
Best: understand it and comment in your own words.
Second best: leave a note to the post where you found the code.
Code style: making the code readable
A style guide is about consistency. Consistency with this style guide is important. Consistency within a project is more important. Consistency within one module or function is the most important.
PEP 8 - Style Guide for Python Code
When you start a new project:
- Have a look at the language's style guide.
- What conventions make sense for your project?
- Consider using a code formatter (example black).
Enable others to run your code
Challenges
Code dependencies
Dependency versions
System requirements
Code dependencies
import math # standard module
import numpy as np # third-party package
import mesa # third-party package
import scseirx # custom package
Almost all modern code depends on other code (libraries, packages).
List all code dependencies
appdirs==1.4.3
argon2-cffi==20.1.0
arrow==0.17.0
async-generator==1.10
attrs==20.3.0
backcall==0.2.0
binaryornot==0.4.4
...
For Python: write a requirements file:
pip freeze > requirements.txt
Install all requirements for a repository:
pip install -r requirements.txt
Encapsulate your coding projects in virtual environments.
Code versions
Packages are under development themselves. Releases are indicated by versions.
Code Versions
Aim: make your code usable by others: consider releasing a package (workflow for Python).
Aim: make your code citable by others: assign a DOI to a code version (workflow for Zenodo).
System requirements
Hardware requirements
Language versions
Driver versions
Compiler versions
...
Document these in the repository's README!
A note on languages
Python: very versatile, open source, easy to learn, can be efficient, machine learning, NLP, very large community.
R : a bit less versatile, open source, easy to learn, not so efficient, statistics, large community.
Julia: versatile, open source, OK to learn, very efficient, growing community, machine learning.
Matlab: can be efficient, good support ($$), computer simulations.
STATA, SPSS: supposedly easier to learn & use (GUIs), statistics.
C++: very efficient, computer simulations, computer graphics.
C, Fortran: extremely efficient, computer simulations, sensors.