# Statistics with python, Part 1

Hello everyone! Sorry for the 2 month break from posts, been busy finishing the residency!!

Statistics is the main framework we have to test our hypothesis about data!

Most of the time we want to test if our new way to do something is really working.

When applying a new strategy in the clinic, we begin by testing once, twice, three times, and then we agree if it is worthy to test some more or not.

But how many times we have to test, which test should we perform and how should we display the analyzed data?

In my opinion, you should be using **python** to figure it out!

## Easy python, no installing

Let’s use **Google Colaboratory **to write python code!

Log in with a google account and create a new notebook!

Change the notebook name to **ESAPIAnalysis.ipynb**

Let’s write a simple hello world:

**viewer = "😎 Physicist" ## variable assingment**

print(f"Hello world {viewer}")

Paste this code in the first cell and press **Alt+Enter**

Let’s reuse the variable **viewer **in another cell!

Note that over each cell there are two buttons:

If you choose text:

With the hashtags you can create sections and subsections!

And then you have **hyperlinks **in the contents tab:

In python it is usual to import libraries and use someone else’s code.

Use the keyword **import**:

**Numpy **is a library with useful methods for dealing with arrays.

We’re using the normal method from the numpy random class.

The keyword to create 60 pseudo-random numbers based on a normal distribution with mean 30 and standard deviation 5 is:

**normal_dist = np.random.normal(30,5,60)**

**Matplotlib **is a library with methods for plotting data.

Plot the distribution:

`import matplotlib.pyplot as plt`

**figure = plt.hist(normal_dist)**

Let`s create another normal distribution with another mean and variance:

**normal_dist2 = np.random.normal(34,3,60)**

Plot both:

**fig1 = plt.hist(normal_dist)**

fig2 = plt.hist(normal_dist2)

Now let`s use the student`s t test to check the hypothesis that both independent samples come from the same distribution:

**from scipy.stats import ttest_ind**

test = ttest_ind(normal_dist,normal_dist2)

print(test)

We can have paired samples too:

**from scipy.stats import ttest_rel**

test = ttest_rel(normal_dist,normal_dist2)

print(test)

Remember that t-test assumes normality and equal variance between samples, when those cant be assumed, one can use the wilcoxon paired test (for paired samples):

**from scipy.stats import wilcoxontest**

test = wilcoxon(normal_dist,normal_dist2)

print(test)

or the U-Mann-Whitney (for independent samples):

**from scipy.stats import mannwhitneyu**

test = mannwhitneyu(normal_dist,normal_dist2)

print(test)

Do you like this kind of post?

Next we’ll get some **ESAPI** data with a single file plugin, export as CSV and use python to analyze it based on Chaickh et al!

Chaikh, A., Giraud, JY., Perrin, E. *et al.* The choice of statistical methods for comparisons of dosimetric data in radiotherapy. *Radiat Oncol* **9, **205 (2014). https://doi.org/10.1186/1748-717X-9-205

Thanks for reading!

Thanks to Jonas for reviewing!