(Basic Statistics for Citizen Data Scientist)
Workbooks and worksheets
Excel works with files called workbooks. Each workbook contains one or more spreadsheets, called worksheets. Each worksheet consists of cells organized in a rectangular grid. The rows of a worksheet are labeled with a number and the columns are labeled with a letter or series of letters. The first row is labeled 1, the next 2 and so on. The first column is labeled A, the next B, etc. The column after Z is labeled AA, and then AB, AC, and so on. The column after AZ is BA, and then BB, BC, etc. The column after ZZ is AAA, and then AAB, AAC, etc.
Figure 1 – Sample Excel Worksheet
Each worksheet can contain up to 1,048,578 rows and 16,384 columns (i.e. column A through XFD).
Cell values, addresses and ranges
You can enter any of the following into a cell:
- A number (e.g. 45 or -23.006)
- Text (e.g. London)
- A truth value (TRUE or FALSE)
- A formula (e.g. =SUM(A4:B7)/4)
If you want Excel to treat a number as text then you need to precede the number with a single quote – e.g. ‘4.5 is considered to be the text 4.5.
Each cell also has an address, consisting of a row and a column label. E.g., in Figure 1 the cell with address A5 contains the text “London”, while the cell with address C6 contains the number 8.1.
You can also reference a range of cells. For our purposes, we only consider rectangular ranges, which consist of a rectangular collection of cells. Such ranges are specified as two cell addresses separated by a colon. E.g. the range C5:C10 consists of the 5 cells from cell C5 to C10, which for Figure 1 corresponds to the data elements for Brand B. The range A4:D10 consists of all the cells in the rectangle whose opposite corners are A4 and D10, which for Figure 2.1 corresponds to all the data in the table, including row and column headings, but excluding totals.
A cell reference consists of an address of a single cell (e.g. G17 or AB8) or of a cell range (e.g. A1:D6 or ZZ1:AAB14). Cell references can also be named. E.g. you can highlight the range B5:D5 in the worksheet in Figure 1, right-click and select Name a Range … to assign the name London to the range B5:D5.
The usual way of selecting a cell is to simply move the mouse pointer to that cell and left-click. To select a range of cells, click on a cell in one of the four corners of that range and then highlight the remaining cells in the range using the mouse. If the cells you want to select are not visible you can use the horizontal and vertical scroll bars in the usual manner to make the desired cell range visible.
If the contents of a cell are too big for the space allocated to that cell you may see the contents displayed as #######. To properly see the real contents you may need to increase the width of the column containing this cell. E.g. suppose the cell B5 is not wide enough for its contents. You can increase the width of column B by moving the mouse pointer to the vertical line at the border between the header for column B and column C (the column to the right of B). Once it is in the correct position the mouse pointer changes shape; you now hold down the left mouse button and move the mouse pointer to the right to increase the column width (or left to decrease the column width). You can also change the width of a column by right-clicking on the column heading and selecting the Column Width… option.
As mentioned above, besides numbers and text, cells can contain formulas. Formulas are built up from the following components preceded by an equal symbol (=):
- Cell references
- Worksheet functions
Operators include the following:
- Arithmetic operators: addition (+), subtraction (-), multiplication (*), division (/) and exponentiation (^)
- The concatenation operator: & (used to concatenate text)
- Logical comparison operators: less than (<), greater than (>), less than or equal (<=), greater than or equal (>=), equal (=) and not equal (<>)
The usual precedence rules apply, namely multiplication and division are applied before addition and subtraction, and exponentiation is applied before any of the other operators. Parentheses are used to change the order that operators are applied. For example:
4+5*2 = 14 (4+5)*2 = 18
3^2-1 = 8 3^(2-1) = 3
-(3^2)+1 = -8 (-3)^2+1 = 10
For some reason, Excel violates the usual precedence rules by giving a unary minus precedence over exponentiation. E.g. Excel calculates -3^2+1 as if it were (-3)^2+1 and so returns a value of 10 instead of -8. Similarly -1^2 is evaluated as 1 instead of -1. To get the correct answer you need to use the expression -(1^2).
Excel provides a variety of worksheet functions such as SUM, MIN, LOG, etc. We will describe these in more detail shortly.
Each cell has a value. For cells containing a number, text or a truth value, the value of the cell is simply the contents of the cell. For cells containing a formula, the value is the evaluation of the formula based on the values of any referenced cells. Some examples of the values of formulas are given in Figure 2.
Figure 2 – Examples of Excel Formulas
If we look at the example in Figure 1, we notice that the cell B11 contains the formula =SUM(B5:B10). The value of this formula, and therefore the value of cell B11, is 106.1. This is the value that is actually displayed in the cell. The formula is displayed in the Formula Bar, just above the grid, to the right of the symbol fx.
Note too that if we had assigned the name BrandA to the range B5:B10, then the formula =SUM(BrandA) would be equivalent to =SUM(B5:B10) and would have the same value.
In specifying a formula such as =A1+3 you can simply type the formula character by character. Alternatively, you can type the character = and click on the A1 cell and then continue by typing +3. Excel will automatically specify the correct formula. Similarly, for a formula that contains a cell range such as =SUM(A1:B4) you can type =SUM( and highlight the cell range A1:B4 and then type the right parenthesis to complete the formula.
If a cell has an illegal value you will see an error value displayed in the cell. E.g., if you enter =A1/0 into cell B1 then cell B1 will have the value #DIV/0 indicating that you are attempting to divide by zero. These error values all start with the symbol # and include #DIV/0, #N/A, #VALUE, #NAME?, etc.
One of the things that gives Excel such power is that when you change the value of any cell then the value of all formulas that reference that cell also change. E.g., if you change the value of cell B5 in the worksheet in Figure 1 from 23.5 to 13.5 then the value of cell B11 will automatically change from 106.1 to 96.1.
You can copy the contents of any cell into another cell (or, in fact, any cell range into another cell range, usually of the same size and shape). If the content of the first cell is a number, text or truth value, then the second cell will contain the same number, text or truth value. When the contents of the first cell is a formula then the second cell will also contain a formula, but the specific formula depends on the type of addressing that is used.
Relative and absolute addressing
Excel provides two types of addressing when defining a cell or cell range: absolute and relative addressing. The default is relative addressing. For example, as mentioned above, the contents of cell B11 in the worksheet in Figure 1 is =SUM(B5:B10). Here B5:B10 is a relative address. If this formula is copied into cell D11, then D11 will contain the formula =SUM(D5:D10), which accomplishes the same function as the formula in cell B11, namely to sum the values in the 5 cells above, and so the value of cell D11 will be 43.3. The usual way of copying the contents of one cell into another is to click on the first cell (B11 in the example above) and press Ctrl-C, and then click on the second cell (D11 in the example above) and press Ctrl-V.
The dollar symbol $ is used to specify absolute addressing. When copying a formula which contains a dollar sign to another cell location, any part of an address that contains a $ does not change. E.g., suppose cell B11 in the worksheet from Figure 1 contains =SUM($B$5:$B$10). Its value is still the same, namely 106.1, but this time if we copy the contains of cell B11 into D11, then D11 will also contain the formula =SUM($B$5:$B$10), and so the value of cell D11 will be 106.1 and not 43.3.
Note that we can use absolute addressing on any part of a cell address, namely, B11 (no absolute addressing), B$11 (absolute addressing for the row, but relative addressing for the column), $B11 (absolute addressing for the column, but relative addressing for the row) and $B$11 (absolute addressing for both row and column).
To change from relative addressing (the default) to absolute addressing, you simply put the $ in the appropriate place. Alternatively, you can highlight the cell address in the Formula Bar or in the cell and press the function key F4. E.g. if you highlight the cell reference A3 and press the F4 key, then A3 changes to $A$3. If you press F4 again it changes to A$3. Pressing F4 again changes the reference to $A3, and finally pressing F4 one more time returns the cell address to the original reference of A3.
You can also reference cells in another worksheet. To do this you need to precede the reference (absolute or relative) by the name of the worksheet followed by !. The default names for worksheets in Excel are Sheet1, Sheet2, etc., although these names can be changed as explained below. Thus the (relative) reference to range A3:B4 in Sheet1 is Sheet1!A3:B4. You don’t really need to worry about all of this since by clicking on a cell in another worksheet, Excel automatically copies the correct reference address into the formula.
Excel formula for Beginners – How to Sum last n columns in Excel
Statistics for Beginners – Introduction to Excel Spreadsheets
Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause. The information presented here could also be found in public knowledge domains.
Learn by Coding: v-Tutorials on Applied Machine Learning and Data Science for Beginners
Latest end-to-end Learn by Coding Projects (Jupyter Notebooks) in Python and R:
All Notebooks in One Bundle: Data Science Recipes and Examples in Python & R.
End-to-End Python Machine Learning Recipes & Examples.
End-to-End R Machine Learning Recipes & Examples.
Applied Statistics with R for Beginners and Business Professionals
Data Science and Machine Learning Projects in Python: Tabular Data Analytics
Data Science and Machine Learning Projects in R: Tabular Data Analytics
Python Machine Learning & Data Science Recipes: Learn by Coding
R Machine Learning & Data Science Recipes: Learn by Coding
Comparing Different Machine Learning Algorithms in Python for Classification (FREE)
There are 2000+ End-to-End Python & R Notebooks are available to build Professional Portfolio as a Data Scientist and/or Machine Learning Specialist. All Notebooks are only $29.95. We would like to request you to have a look at the website for FREE the end-to-end notebooks, and then decide whether you would like to purchase or not.