Basic Statistical Analysis Using Excel
June 28, 2025 | by Bloom Code Studio
Excel provides many statistical functions and statistical distributions that are used in data science applications.
Excel Analysis for Measures of Center and Dispersion
Excel allows you to type a formula into a cell. Using a formula, you can refer to a value from another cell or produce an output based on a value in a cell (or values from multiple cells). A formula always starts with an equal sign (=). Following is an example formula, calculating an average of three numbers in cells A1, A2, and A3.
= AVERAGE(A1, A2, A3)
Suppose you have numbers in cells A1, A2, and A3 and want to show the average of the three numbers in cell B1. Figure A22 shows how to use AVERAGE in this scenario.
Figure A22 Use of AVERAGE in MS Excel (Used with permission from Microsoft)
Once you are done typing the formula in B1, press enter. The formula in B1 will change to the result of the formula – 20 (see Figure A23).
Figure A23 The Result of Using AVERAGE in MS Excel (Used with permission from Microsoft)
Excel does NOT save the formula when the file is saved as CSV
You can use Excel formulas with no problem on a CSV file, but Excel does not save those when the file is saved in the CSV format. If you want to keep the formula you have used, make sure to save your file as XLSX (File > Save As… > Excel Workbook (XLSX)). XLSX stores more information beyond text, such as formulas and drawings.
EXAMPLE A.6
Problem
On ch1-SOCR-small.csv, calculate the average height of the MLB players in the dataset using AVERAGE.
Solution
On the worksheet where the MLB Players dataset is open, type this formula in any empty cell and press the enter key (Figure A24).
=AVERAGE(D2:D13)
This will fill the average across cells D2, D3, D4, …, through D13. The resulting average height is 73.3333.
Figure A24 Calculate the Average Height Using AVERAGE in MS Excel (Used with permission from Microsoft)
Other Useful Formulas for Summarizing Data
There are a lot of other ways to use simple Excel formulas to analyze data values. You can type each of the example formulas listed in Table A1 and see how it works.
| Formula | Description |
|---|---|
| =MEDIAN(D2:D13) | Compute median of D2, D3, …, and D13. Returns 72.5 as a result. |
| =STDEV(D2:D13) | Compute standard deviation of D2, D3, …, and D13. Returns 3.28 as a result. |
| =MIN(D2:D13) | Find the minimum value among D2, D3, …, and D13. Returns 70 as a result. |
| =MAX(D2:D13) | Find the maximum value among D2, D3, …, and D13. Returns 82 as a result. |
| =SUM(D2:D13) | Sum the values of D2, D3, …, through D13. Returns 880 as a result. |
| =COUNTIF(D2:D13, 72) | Count the number of occurrences of “72” within the range of D2, D3, …, through D13. Returns 3 as a result (i.e., there are three 72s in D2:D13). |
Table A1 Example MS Excel Formulas
EXAMPLE A.7
Problem
On ch1-SOCR-small.csv, write a formula that calculates the median weight of the players.
Solution
The formula is
=MEDIAN(E2:E13)
This will find the median across cell E2, E3, E4, …, through E13, which is 200. If you do not know what median is, don’t worry! You will learn more about it in Measures of Position.
Similar to the built-in function for the mean (=AVERAGE), Excel provides a function to calculate the sample standard deviation of a dataset =STDEV.S (for the sample standard deviation). To calculate these statistical results in Excel, enter the data values in a column. Let’s assume the data values are placed in cells A2 through A11. In any cell, type the Excel command =AVERAGE(A2:A11) and press enter. Excel will calculate the arithmetic mean in this cell. Then, in any other cell, type the Excel command =STDEV.S(A2:A11) and press enter. Excel will calculate the sample standard deviation in this cell. Figure A25 shows the mean and standard deviation for the 10 ages.
Figure A25 Mean and Standard Deviation in MS Excel (Used with permission from Microsoft)
Here is a more comprehensive listing of various statistical functions available within Excel. The reader is encouraged to go to the Formulas menu in Excel and select any of the submenus such as “Financial,” “Math & Trig,” Lookup & Reference,” etc. to see various Excel formulas. Under the option for “More Functions,” select STATISTICAL to see a menu of various statistical functions available within Excel. Table A2 shows a sample of some statistical functions available within Excel:
| Function | Purpose |
|---|---|
| =AVERAGE(A1:A10) | Find the mean of a set of numbers. |
| =MEDIAN(A1:A10) | Find the median of a set of numbers. |
| =STDEV.S(A1:A10) | Find the standard deviation for a set of numbers representing a sample. |
| =STDEV.P(A1:A10) | Find the standard deviation for a set of numbers representing a population. |
| =VAR.S(A1:A10) | Find the variance for a set of numbers representing a sample. |
| =VAR.P(A1:A10) | Find the variance for a set of numbers representing a population. |
| =MIN(A1:A10) | Find the minimum of a set of numbers. |
| =MAX(A1:A10) | Find the maximum of a set of numbers. |
| =MAX(A1:A10) – MIN(A1:A10) | Find the range of a set of numbers. |
| =CORREL(A1:A10, B1:B10) | Find the correlation coefficient “r” for (x, y) data. x-data is in cells A1 to A10; y-data is in cells B1 to B10. |
| =QUARTILE (A1:A10, 1) | Find the first quartile for a set of numbers. |
| =QUARTILE (A1:A10, 2) | Find the second quartile for a set of numbers (note that the second quartile is the same as the median). |
| =QUARTILE (A1:A10, 3) | Find the third quartile for a set of numbers. |
Table A2 Statistical Functions Available within MS Excel
Excel Analysis for Probability Distributions
As discussed in Measures of Position, data scientists are often interested in various probability distributions such as the normal distribution, binomial distribution, and Poisson distribution. Excel provides built-in functions to analyze many probability distributions.
Excel uses the command =NORM.DIST to find the area under the normal curve to the left of a specified value:
=NORM.DIST(XVALUE, MEAN, STANDARD_DEV, TRUE)
For example, suppose that at a financial consulting company, the mean employee salary is $60,000 with a standard deviation of $7,500. A normal curve can be drawn to represent this scenario, in which the mean of $60,000 would be plotted on the horizontal axis, corresponding to the peak of the curve. Then, to find the probability that an employee earns more than $75,000, you would calculate the area under the normal curve to the right of the data value $75,000.
For example, at the financial consulting company mentioned previously, the mean employee salary is $60,000 with a standard deviation of $7,500. To find the probability that a random employee’s salary is less than $55,000 using Excel, this is the command you would use:
=NORM.DIST(55000, 60000, 7500, TRUE)
Result: 0.25249
Thus, there is a probability of about 25% that a random employee has a salary less than $55,000.
Excel also provides built-in functions for binomial and Poisson distributions as shown in Table A3:
| Probability Distribution | Excel Command | Usage |
|---|---|---|
| Binomial | =BINOMIAL.DIST | =BINOMIAL.DIST(Number_s, Trials, Probability_S, Cumulative) |
| Poisson | =POISSON | =POSSON(X, Mean, Cumulative) |
Table A3 Built-In Functions for Binomial and Poisson Distributions in MS Excel
Excel Analysis for Correlation and Regression
In Inferential Statistics and Regression Analysis, the concepts of correlation and regression were discussed.
In correlation analysis, the data scientist is often interested in calculating the correlation coefficient, which is a numerical measure of the direction and strength of the correlation between two numeric quantities.
The Excel command to calculate the correlation coefficient uses the following format:
=CORREL(A1:A10, B1:B10)
In regression analysis, the data scientist is interested in calculating the equation of the best-fit linear model for two numeric quantities (assuming the correlation is significant).
Recall the equation of the best fit line is
yˆ=a+bx
where m is the slope of the line and b is the y-intercept of the line.
To calculate the slope and y-intercept of the linear model using Excel, start by entering the (x, y) data in two columns in Excel. Then the Excel commands =SLOPE and =INTERCEPT can be used to calculate the slope and intercept, respectively.
The dataset in Table A4 will be used as an example: the monthly amount spent on advertising and the monthly revenue for a Fortune 500 company for 12 months.
| Month | Amount Spent on Advertising | Revenue |
|---|---|---|
| Jan | 49 | 12210 |
| Feb | 145 | 17590 |
| Mar | 57 | 13215 |
| Apr | 153 | 19200 |
| May | 92 | 14600 |
| Jun | 83 | 14100 |
| Jul | 117 | 17100 |
| Aug | 142 | 18400 |
| Sep | 69 | 14100 |
| Oct | 106 | 15500 |
| Nov | 109 | 16300 |
| Dec | 121 | 17020 |
Table A4 Revenue versus Advertising for a Fortune 500 Company ($000s) Monthly Amount Spent on Advertising and the Monthly Revenue for a Fortune 500 Company
To calculate the slope of the regression model, use the Excel command
=SLOPE(y-data range, x-data range)
It’s important to note that this Excel command expects that the y-data range is entered first and the x-data range is entered second. Since revenue depends on amount spent on advertising, revenue is considered the y-variable and amount spent on advertising is considered the x-variable. Notice the y-data is contained in cells C2 through C13 and the x-data is contained in cells B2 through B13. Thus the Excel command for slope would be entered as
=SLOPE(C2:C13, B2:B13)
In the same way, the Excel command to calculate the y-intercept of the regression model is
=INTERCEPT(y-data range, x-data range)
For the dataset shown in the above table, the Excel command would be
=INTERCEPT(C2:C13, B2:B13)
The results are as follows (see Figure A26):
slope bintercept a==61.89,376.7
Figure A26 Revenue versus Advertising for a Fortune 500 Company ($000s) Showing Slope and y-Intercept Calculation in MS Excel (Used with permission from Microsoft)
Based on this, the regression equation can be written as
yˆyˆ==a+bx9,376.7+61.8x
where x represents the amount spent on advertising (in thousands of dollars) and y represents the amount of revenue (in thousands of dollars).
RELATED POSTS
View all