(SQL tutorials for Business Analyst)
In this end-to-end example, you will learn – SQL Tutorials for Business Analyst: SQL Introduction.
Structured Query Language (SQL)
Structured Query Language is a standard Database language which is used to create, maintain and retrieve the relational database. Following are some interesting facts about SQL.
-
- SQL is case insensitive. But it is a recommended practice to use keywords (like SELECT, UPDATE, CREATE, etc) in capital letters and use user defined things (liked table name, column name, etc) in small letters.
- We can write comments in SQL using “–” (double hyphen) at the beginning of any line.
- SQL is the programming language for relational databases (explained below) like MySQL, Oracle, Sybase, SQL Server, Postgre, etc. Other non-relational databases (also called NoSQL) databases like MongoDB, DynamoDB, etc do not use SQL
- Although there is an ISO standard for SQL, most of the implementations slightly vary in syntax. So we may encounter queries that work in SQL Server but do not work in MySQL.
What is Relational Database?
Relational database means the data is stored as well as retrieved in the form of relations (tables). Table 1 shows the relational database with only one relation called STUDENT which stores ROLL_NO, NAME, ADDRESS, PHONE and AGE of students.
STUDENT
ROLL_NO | NAME | ADDRESS | PHONE | AGE |
1 | RAM | DELHI | 9455123451 | 18 |
2 | RAMESH | GURGAON | 9652431543 | 18 |
3 | SUJIT | ROHTAK | 9156253131 | 20 |
4 | SURESH | DELHI | 9156768971 | 18 |
These are some important terminologies that are used in terms of relation.
Attribute: Attributes are the properties that define a relation. e.g.; ROLL_NO, NAME etc.
Tuple: Each row in the relation is known as tuple. The above relation contains 4 tuples, one of which is shown as:
1 | RAM | DELHI | 9455123451 | 18 |
Degree: The number of attributes in the relation is known as degree of the relation. The STUDENT relation defined above has degree 5.
Cardinality: The number of tuples in a relation is known as cardinality. The STUDENT relation defined above has cardinality 4.
Column: Column represents the set of values for a particular attribute. The column ROLL_NO is extracted from relation STUDENT.
ROLL_NO |
1 |
2 |
3 |
4 |
The queries to deal with relational database can be categories as:
Data Definition Language: It is used to define the structure of the database. e.g; CREATE TABLE, ADD COLUMN, DROP COLUMN and so on.
Data Manipulation Language: It is used to manipulate data in the relations. e.g.; INSERT, DELETE, UPDATE and so on.
Data Query Language: It is used to extract the data from the relations. e.g.; SELECT
So first we will consider the Data Query Language. A generic query to retrieve from a relational database is:
-
-
- SELECT [DISTINCT] Attribute_List FROM R1,R2….RM
- [WHERE condition]
- [GROUP BY (Attributes)[HAVING condition]]
- [ORDER BY(Attributes)[DESC]];
-
Part of the query represented by statement 1 is compulsory if you want to retrieve from a relational database. The statements written inside [] are optional. We will look at the possible query combination on relation shown in Table 1.
Case 1: If we want to retrieve attributes ROLL_NO and NAME of all students, the query will be:
SELECT ROLL_NO, NAME FROM STUDENT;
ROLL_NO | NAME |
1 | RAM |
2 | RAMESH |
3 | SUJIT |
4 | SURESH |
Case 2: If we want to retrieve ROLL_NO and NAME of the students whose ROLL_NO is greater than 2, the query will be:
SELECT ROLL_NO, NAME FROM STUDENT WHERE ROLL_NO>2;
ROLL_NO | NAME |
3 | SUJIT |
4 | SURESH |
CASE 3: If we want to retrieve all attributes of students, we can write * in place of writing all attributes as:
SELECT * FROM STUDENT WHERE ROLL_NO>2;
ROLL_NO | NAME | ADDRESS | PHONE | AGE |
3 | SUJIT | ROHTAK | 9156253131 | 20 |
4 | SURESH | DELHI | 9156768971 | 18 |
CASE 4: If we want to represent the relation in ascending order by AGE, we can use ORDER BY clause as:
SELECT * FROM STUDENT ORDER BY AGE;
ROLL_NO | NAME | ADDRESS | PHONE | AGE |
1 | RAM | DELHI | 9455123451 | 18 |
2 | RAMESH | GURGAON | 9652431543 | 18 |
4 | SURESH | DELHI | 9156768971 | 18 |
3 | SUJIT | ROHTAK | 9156253131 | 20 |
Note: ORDER BY AGE is equivalent to ORDER BY AGE ASC. If we want to retrieve the results in descending order of AGE, we can use ORDER BY AGE DESC.
CASE 5: If we want to retrieve distinct values of an attribute or group of attribute, DISTINCT is used as in:
SELECT DISTINCT ADDRESS FROM STUDENT;
ADDRESS |
DELHI |
GURGAON |
ROHTAK |
If DISTINCT is not used, DELHI will be repeated twice in result set. Before understanding GROUP BY and HAVING, we need to understand aggregations functions in SQL.
AGGRATION FUNCTIONS: Aggregation functions are used to perform mathematical operations on data values of a relation. Some of the common aggregation functions used in SQL are:
-
-
- COUNT: Count function is used to count the number of rows in a relation. e.g;
-
SELECT COUNT (PHONE) FROM STUDENT;
COUNT(PHONE) |
4 |
-
-
- SUM: SUM function is used to add the values of an attribute in a relation. e.g;
-
SELECT SUM (AGE) FROM STUDENT;
SUM(AGE) |
74 |
In the same way, MIN, MAX and AVG can be used. As we have seen above, all aggregation functions return only 1 row.
AVERAGE: It gives the average values of the tupples. It is also defined as sum divided by count values.
Syntax:AVG(attributename)
OR
Syntax:SUM(attributename)/COUNT(attributename)
The above mentioned syntax also retrieves the average value of tupples.
MAXIMUM:It extracts the maximum value among the set of tupples.
Syntax:MAX(attributename)
MINIMUM:It extracts the minimum value amongst the set of all the tupples.
Syntax:MIN(attributename)
GROUP BY: Group by is used to group the tuples of a relation based on an attribute or group of attribute. It is always combined with aggregation function which is computed on group. e.g.;
SELECT ADDRESS, SUM(AGE) FROM STUDENT GROUP BY (ADDRESS);
In this query, SUM(AGE) will be computed but not for entire table but for each address. i.e.; sum of AGE for address DELHI(18+18=36) and similarly for other address as well. The output is:
ADDRESS | SUM(AGE) |
DELHI | 36 |
GURGAON | 18 |
ROHTAK | 20 |
If we try to execute the query given below, it will result in error because although we have computed SUM(AGE) for each address, there are more than 1 ROLL_NO for each address we have grouped. So it can’t be displayed in result set. We need to use aggregate functions on columns after SELECT statement to make sense of the resulting set whenever we are using GROUP BY.
SELECT ROLL_NO, ADDRESS, SUM(AGE) FROM STUDENT GROUP BY (ADDRESS);
NOTE: An attribute which is not a part of GROUP BY clause can’t be used for selection. Any attribute which is part of GROUP BY CLAUSE can be used for selection but it is not mandatory. But we could use attributes which are not a part of the GROUP BY clause in an aggregate function.
Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause. The information presented here could also be found in public knowledge domains.
Learn by Coding: v-Tutorials on Applied Machine Learning and Data Science for Beginners
Latest end-to-end Learn by Coding Projects (Jupyter Notebooks) in Python and R:
All Notebooks in One Bundle: Data Science Recipes and Examples in Python & R.
End-to-End Python Machine Learning Recipes & Examples.
End-to-End R Machine Learning Recipes & Examples.
Applied Statistics with R for Beginners and Business Professionals
Data Science and Machine Learning Projects in Python: Tabular Data Analytics
Data Science and Machine Learning Projects in R: Tabular Data Analytics
Python Machine Learning & Data Science Recipes: Learn by Coding
R Machine Learning & Data Science Recipes: Learn by Coding
Comparing Different Machine Learning Algorithms in Python for Classification (FREE)
There are 2000+ End-to-End Python & R Notebooks are available to build Professional Portfolio as a Data Scientist and/or Machine Learning Specialist. All Notebooks are only $29.95. We would like to request you to have a look at the website for FREE the end-to-end notebooks, and then decide whether you would like to purchase or not.