Statistics with python, Part 1

João Castelo
4 min readFeb 12, 2021

Hello everyone! Sorry for the 2 month break from posts, been busy finishing the residency!!

Photo by Christopher Burns on Unsplash

Statistics is the main framework we have to test our hypothesis about data!

Most of the time we want to test if our new way to do something is really working.

When applying a new strategy in the clinic, we begin by testing once, twice, three times, and then we agree if it is worthy to test some more or not.

But how many times we have to test, which test should we perform and how should we display the analyzed data?

In my opinion, you should be using python to figure it out!

Easy python, no installing

Let’s use Google Colaboratory to write python code!

Log in with a google account and create a new notebook!

Change the notebook name to ESAPIAnalysis.ipynb

Let’s write a simple hello world:

viewer = "😎 Physicist" ## variable assingment
print(f"Hello world {viewer}")

Paste this code in the first cell and press Alt+Enter

Let’s reuse the variable viewer in another cell!

Note that over each cell there are two buttons:

If you choose text:

With the hashtags you can create sections and subsections!

And then you have hyperlinks in the contents tab:

In python it is usual to import libraries and use someone else’s code.

Use the keyword import:

Numpy is a library with useful methods for dealing with arrays.

We’re using the normal method from the numpy random class.

The keyword to create 60 pseudo-random numbers based on a normal distribution with mean 30 and standard deviation 5 is:

normal_dist = np.random.normal(30,5,60)

Matplotlib is a library with methods for plotting data.

Plot the distribution:

import matplotlib.pyplot as plt
figure = plt.hist(normal_dist)

Let`s create another normal distribution with another mean and variance:

normal_dist2 = np.random.normal(34,3,60)

Plot both:

fig1 = plt.hist(normal_dist)
fig2 = plt.hist(normal_dist2)

Now let`s use the student`s t test to check the hypothesis that both independent samples come from the same distribution:

from scipy.stats import ttest_ind
test = ttest_ind(normal_dist,normal_dist2)
print(test)
Result, the statistic is -5, and p-value = 1e-8

We can have paired samples too:

from scipy.stats import ttest_rel
test = ttest_rel(normal_dist,normal_dist2)
print(test)

Remember that t-test assumes normality and equal variance between samples, when those cant be assumed, one can use the wilcoxon paired test (for paired samples):

from scipy.stats import wilcoxontest
test = wilcoxon(normal_dist,normal_dist2)
print(test)

or the U-Mann-Whitney (for independent samples):

from scipy.stats import mannwhitneyu
test = mannwhitneyu(normal_dist,normal_dist2)
print(test)

Do you like this kind of post?

Next we’ll get some ESAPI data with a single file plugin, export as CSV and use python to analyze it based on Chaickh et al!

Chaikh, A., Giraud, JY., Perrin, E. et al. The choice of statistical methods for comparisons of dosimetric data in radiotherapy. Radiat Oncol 9, 205 (2014). https://doi.org/10.1186/1748-717X-9-205

Thanks for reading!

Thanks to Jonas for reviewing!

--

--

João Castelo

Radiation Therapy Medical Physicist and Programmer