Basic Data Analysis Using Excel
June 28, 2025 | by Bloom Code Studio
A screenshot of the main Excel menu when it is open is provided in Figure A1. As you see, Excel consists of a series of submenus such as Home, Insert, Draw, Page Layout, Formulas, etc. Clicking on each Menu item opens a submenu with additional functions. For example, clicking on the Formulas menu item opens a submenu as shown in Figure A2.
Figure A1 MS Excel Menu (Used with permission from Microsoft)
Figure A2 Formulas Menu in MS Excel (Used with permission from Microsoft)
Excel presents data in the form of table with rows and columns. We call each rectangle, defined by the combination of row number and column name (e.g., A1, B3), a cell. We can use a mouse pointer to select a specific cell and then edit the data inside. Once a cell is selected, Namebox shows the identifier of the selected cell Figure A3.
Formula Bar is used to read/write a formula for a cell, which is identified by Namebox.
Figure A3 MS Excel Showing Rows and Columns (Used with permission from Microsoft)
Load Data Using Excel
Loading a dataset to Excel can be done simply by clicking File > Open File. That will open up a file explorer window, and you can simply select the CSV file you want to load. We are going to use a part of a public dataset, titled “SOCR- 1035 Records of Heights (in) and Weights (lbs) of Major League Baseball (MLB) Players” (ch1-SOCR-small.csv).
Once loaded, the dataset will show up on the Worksheet Window, as shown in Figure A4.
Figure A4 SOCR-small Dataset Opened with MS Excel (Used with permission from Microsoft)
Summarize Data Using Excel
We will continue to refer to the same dataset about the MLB players. For simplicity, only 12 items of SD Outfielders and Pitchers are included in Figure A5.
Figure A5 ch1-SOCR-small.csv Opened with MS Excel
EXAMPLE A.1
Problem
On ch1-SOCR-small.csv, write a formula that counts the number of Outfielders.
Solution
The formula is
=COUNTIF(C2:C13, “Outfielder”)”
This will count the rows whose position, located at column C, is “Outfielder.” Note that double quotation marks (i.e., “ ”) are used to indicate “Outfielder” is a string. The result is 7.
Search Data Using Excel
You can search data in a table in multiple ways with Excel. We will first explore ways without using Excel formulas, and then we’ll look at some useful formulas.
You can search for the cells containing a certain number or text by using “Find,” as in Figure A6. The Find pop-up window will have a text box in which you can type in any number or text, and then Excel will search and find all the cells that include number/text you typed in.
Figure A6 Find Feature in MS Excel (Used with permission from Microsoft)
If you are more interested in viewing all the rows that fulfill certain criteria, you can use “Filter,” as in Figure A7. Once Filter is enabled, you will see drop-down arrows on each column of the table. The arrows allow you to select specific values that you are searching for and then presents only the rows with such values.
Figure A7 Filter Feature in MS Excel (Used with permission from Microsoft)
For example, if you click “Outfielder” in the Position window, the resulting table will only have the rows whose Position is Outfielder, as shown in Figure A8.
Figure A8 Filter by Position on MS Excel (Used with permission from Microsoft)
Now let’s do some more sophisticated searches using some Excel formulas.
INDEX
Index returns a cell value at a specific location, given a range of cells. The location is identified as a relative row and column number within the range. For example, the following formula on ch1-SOCR-small.csv will return 72 since the cell located at the fifth row and second column within the range of C8:E13 will refer to the row 12 and column D—so the cell D12. The value at D12 is 72.
=INDEX(C8:E13,5,2)
EXAMPLE A.2
Problem
On ch1-SOCR-small.csv, what value does the following formula return?
=INDEX(A3:D9,7,3)
Solution
This formula finds the seventh row and third column within the range of A3:D9, which indicates C9. Thus, the resulting value is “Outfielder.”
MATCH
Given a range of single-column cells, MATCH returns the relative row number at which the cell value matches the value you specify. MATCH takes three arguments—the value to match, the range of search, and the match type. This example illustrates how to locate the value 75 within the cell D2, D3, …, through D13. The third argument, match type, is set to 0 so that it seeks for an exact match (i.e., exactly 75, not 74 or 75.5).
=MATCH(75, D2:D13, 0)
The formula above returns 5, a relative row number. The relative row number 5 indicates the fifth row within the range, and this corresponds to the absolute row number 6. Notice that D6 has 75.
What happens if there are multiple matching rows?
If there are multiple rows with a match, MATCH only returns the very first instance. For example, this formula will simply return 2 (i.e., the second row within D2:D13, which is D3) even though there are three occurrences of 72 at rows 3, 5, and 12.
=MATCH(72, D2:D13, 0)
EXAMPLE A.3
Problem
Using MATCH, find the absolute row number of a first player whose weight is 185 lb on ch1-SOCR-small.csv.
Solution
Weight is located in column E. The correct use of MATCH is provided next. It looks through E2, E3, …, E13 and finds the first exact match of 185. The first occurrence of 185 is in the second row in the range E2:E13, which is E3, so this formula will return 2.
=MATCH(185, E2:E13, 0)
To get the absolute row number, you need to offset the output above with a constant number. Notice that the search range starts from E2, so you should offset the output by +1. Therefore, the final answer is:
=MATCH(185, E2:E13, 0) + 1
VLOOKUP
Given the search range, VLOOKUP locates a specific search key within the first column in terms of the row number and then returns a value at the column of your choice along that row. It requires four arguments—the search key, the range of search, the relative position of the column of your interest, and the match type. For example, the formula below locates the first pitcher along column C from C2 to C13 first. Similarly to MATCH, VLOOKUP supports multiple match types and FALSE indicates the exact match (i.e., not the “partial” match) for VLOOKUP.
=VLOOKUP(“Pitcher”, C2:E13, 3, FALSE)
Then, the formula reads a specific cell value in the same row. The column number is indicated as the third argument in the formula, called the relative position. It refers to how many columns the search key is away to the right from the column that is specified in the first argument. The first exact match is found at row 2, and the formula reads values in the third column to the right from column C: E. Thus, the value 220 in cell E2 is the result.
EXAMPLE A.4
Problem
Using VLOOKUP, find the age of Mike Thompson on ch1-SOCR-small.csv.
Solution
First you need to search for the player Mike Thompson. His row can be located by his name, “Mike_Thompson.” Then, the relative position argument of VLOOKUP allows you to read his age. The age column, F, is 6 columns to the left of Name, so the relative position argument should be 6. The fourth argument remains FALSE as you are looking for an exact match.
=VLOOKUP(“Mike_Thompson”, A2:F13, 6, FALSE)
The formula above will return 26.31.
RELATED POSTS
View all