SQL Tutorial
Table of Contents
1. Introduction to SQL
1.1 History of SQL
SQL (Structured Query Language) was developed in the 1970s by IBM researchers Donald D. Chamberlin and Raymond F. Boyce. It was initially called SEQUEL (Structured English Query Language) and was designed to manipulate and retrieve data stored in IBM's original quasi-relational database management system, System R.
Key milestones in SQL history:
- 1974: SEQUEL language developed at IBM
- 1979: Oracle introduces the first commercial SQL-based RDBMS
- 1986: SQL becomes an ANSI standard
- 1987: SQL becomes an ISO standard
- 1989: SQL-89 released
- 1992: SQL-92 released (major revision)
- 1999: SQL:1999 introduces recursive queries, triggers, and object-oriented features
- 2003-2016: Subsequent versions (SQL:2003, SQL:2008, SQL:2011, SQL:2016) add more features
1.2 Overview of Relational Databases
Relational databases are based on the relational model of data, proposed by E.F. Codd in 1970. Key concepts include:
- Tables: Data is organized into tables (relations)
- Rows: Each entry in a table is a row (tuple)
- Columns: Each attribute of an entry is a column (attribute)
- Keys: Unique identifiers for rows (primary keys, foreign keys)
- Relationships: Tables can be related to each other
Example of a simple relational database structure:
| Customers | Orders |
|---|---|
|
id (PK) name |
id (PK) customer_id (FK) order_date total_amount |
1.3 SQL Standards and Versions
SQL has evolved through several versions, each adding new features and capabilities:
- SQL-86: First published ANSI standard
- SQL-89: Minor revision
- SQL-92: Major revision (widely supported)
- SQL:1999: Added recursive queries, triggers, regular expressions, and object-oriented features
- SQL:2003: XML-related features, window functions, standardized sequences
- SQL:2008: ORDER BY in views and derived tables, INSTEAD OF triggers
- SQL:2011: Temporal databases, pipelined DML, enhancements to window functions
- SQL:2016: JSON, polymorphic table functions, row pattern matching
Despite these standards, many database systems implement their own extensions and variations.
1.4 Common SQL Dialects
While SQL is standardized, different database management systems have their own dialects with unique features and syntax variations:
- MySQL: Open-source, widely used for web applications
- PostgreSQL: Open-source, known for standards compliance and advanced features
- SQL Server (T-SQL): Microsoft's dialect, tightly integrated with .NET
- Oracle (PL/SQL): Powerful enterprise-level database system
- SQLite: Lightweight, serverless database engine
Example of dialect differences (getting the current date):
-- MySQL, PostgreSQL, SQLite
SELECT CURRENT_DATE;
-- SQL Server
SELECT GETDATE();
-- Oracle
SELECT SYSDATE FROM DUAL;
2. Basics of SQL
2.1 Understanding Databases, Tables, and Schemas
In SQL, data is organized hierarchically:
- Database: A collection of related data, typically for a specific application or purpose
- Schema: A logical container for tables, views, and other database objects (in some systems, schemas and databases are synonymous)
- Table: A structured set of data elements organized in columns and rows
- Column: A vertical entity in a table that contains all information associated with a specific field in a table
- Row: A horizontal entity in a table that represents a single, implicitly structured data item in a table
Example structure:
Database: OnlineStore
+-- Schema: Sales
| +-- Table: Customers
| | +-- Column: CustomerID
| | +-- Column: Name
| | +-- Column: Email
| +-- Table: Orders
| +-- Column: OrderID
| +-- Column: CustomerID
| +-- Column: OrderDate
+-- Schema: Inventory
+-- Table: Products
+-- Column: ProductID
+-- Column: Name
+-- Column: Price
2.2 Data Types in SQL
SQL supports various data types to store different kinds of information:
- Numeric Types:
- INT / INTEGER: Whole numbers
- DECIMAL(M,N) / NUMERIC(M,N): Exact decimal numbers
- FLOAT / REAL: Approximate decimal numbers
- String Types:
- CHAR(N): Fixed-length character string
- VARCHAR(N): Variable-length character string
- TEXT: Long text string
- Date and Time Types:
- DATE: Date values
- TIME: Time values
- DATETIME / TIMESTAMP: Date and time values
- Boolean Type:
- BOOLEAN: True/False values
- Binary Types:
- BLOB: Binary Large Object
Example usage:
CREATE TABLE Products (
ProductID INT PRIMARY KEY,
Name VARCHAR(100),
Price DECIMAL(10, 2),
Description TEXT,
CreatedAt TIMESTAMP,
IsAvailable BOOLEAN
);
2.3 Basic SQL Syntax
SQL syntax follows a few basic principles:
- SQL keywords are typically written in uppercase (not required, but a common convention)
- Statements end with a semicolon (;)
- Strings are enclosed in single quotes (')
- SQL is not case-sensitive for keywords, but is often case-sensitive for object names (depending on the database system)
Basic structure of an SQL query:
SELECT column1, column2
FROM table_name
WHERE condition
ORDER BY column1 [ASC|DESC];
Example query:
SELECT FirstName, LastName
FROM Customers
WHERE Country = 'USA'
ORDER BY LastName ASC;
2.4 Creating and Dropping Databases
In SQL, you can create new databases and remove existing ones using the CREATE DATABASE and DROP DATABASE statements.
Creating a Database
To create a new database, use the CREATE DATABASE statement followed by the desired database name:
CREATE DATABASE database_name;
Dropping a Database
To remove an existing database, use the DROP DATABASE statement followed by the database name:
DROP DATABASE database_name;
Be cautious when using DROP DATABASE as it permanently deletes the database and all its contents.
2.5 Creating and Dropping Tables
Tables are the fundamental structures for storing data in SQL databases. You can create and remove tables using CREATE TABLE and DROP TABLE statements.
Current database structure:
| Table Name | Columns |
|---|---|
| No tables yet | - |
Creating a Table
The CREATE TABLE statement defines the structure of a new table, including its columns and their data types:
CREATE TABLE table_name (
column1 datatype [constraints],
column2 datatype [constraints],
...
[table_constraints]
);
Dropping a Table
To remove an existing table and all its data, use the DROP TABLE statement:
DROP TABLE table_name;
2.6 Inserting Data
After creating a table, you can add data to it using the INSERT INTO statement.
Current 'employees' table:
| id | name | department | salary |
|---|---|---|---|
| No data yet | |||
Basic INSERT Syntax
INSERT INTO table_name (column1, column2, column3, ...)
VALUES (value1, value2, value3, ...);
Inserting Multiple Rows
INSERT INTO table_name (column1, column2, column3, ...)
VALUES
(value1, value2, value3, ...),
(value1, value2, value3, ...),
...;
2.7 Retrieving Data with SELECT
The SELECT statement is used to retrieve data from one or more tables in a database.
Current 'employees' table:
| id | name | department | salary |
|---|---|---|---|
| 1 | John Doe | IT | 75000.00 |
| 2 | Jane Smith | HR | 65000.00 |
| 3 | Bob Johnson | Sales | 80000.00 |
Basic SELECT Syntax
SELECT column1, column2, ...
FROM table_name
WHERE condition;
Common SELECT Clauses
- WHERE: Filters rows based on a condition
- ORDER BY: Sorts the result set
- LIMIT: Restricts the number of rows returned
- JOIN: Combines rows from two or more tables
2.8 Updating Data
The UPDATE statement is used to modify existing records in a table.
Current 'employees' table:
| id | name | department | salary |
|---|---|---|---|
| 1 | John Doe | IT | 75000.00 |
| 2 | Jane Smith | HR | 65000.00 |
| 3 | Bob Johnson | Sales | 80000.00 |
Basic UPDATE Syntax
UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;
2.9 Deleting Data
The DELETE statement is used to remove records from a table.
Current 'employees' table:
| id | name | department | salary |
|---|---|---|---|
| 1 | John Doe | IT | 75000.00 |
| 2 | Jane Smith | HR | 65000.00 |
| 3 | Bob Johnson | Sales | 80000.00 |
Basic DELETE Syntax
DELETE FROM table_name
WHERE condition;
3. Advanced SELECT Statements
3.1 Filtering Data with WHERE
The WHERE clause is used to filter records based on specified conditions.
Current 'employees' table:
| id | name | department | salary |
|---|---|---|---|
| 1 | John Doe | IT | 75000.00 |
| 2 | Jane Smith | HR | 65000.00 |
| 3 | Bob Johnson | Sales | 80000.00 |
Basic WHERE Syntax
SELECT column1, column2, ...
FROM table_name
WHERE condition;
Common WHERE Operators
- =, !=, <, >, <=, >=: Comparison operators
- BETWEEN: Selects values within a given range
- LIKE: Search for a specified pattern
- IN: Matches any value in a list of values
- IS NULL: Checks for NULL values
Example: Filtering data from the 'employees' table
SELECT name, salary FROM employees WHERE department = 'IT';
3.2 Sorting Data with ORDER BY
The ORDER BY clause is used to sort the result set in ascending or descending order.
Current 'employees' table:
| id | name | department | salary |
|---|---|---|---|
| 1 | John Doe | IT | 75000.00 |
| 2 | Jane Smith | HR | 65000.00 |
| 3 | Bob Johnson | Sales | 80000.00 |
Basic ORDER BY Syntax
SELECT column1, column2, ...
FROM table_name
ORDER BY column1 [ASC|DESC];
Example: Sorting data from the 'employees' table
SELECT name, salary FROM employees ORDER BY salary DESC;
3.3 Limiting Data with LIMIT/OFFSET
The LIMIT clause is used to limit the number of rows returned in the result set.
Current 'employees' table:
| id | name | department | salary |
|---|---|---|---|
| 1 | John Doe | IT | 75000.00 |
| 2 | Jane Smith | HR | 65000.00 |
| 3 | Bob Johnson | Sales | 80000.00 |
Basic LIMIT Syntax
SELECT column1, column2, ...
FROM table_name
LIMIT number;
OFFSET Syntax
SELECT column1, column2, ...
FROM table_name
LIMIT number OFFSET offset;
Example: Limiting data from the 'employees' table
SELECT name, salary FROM employees LIMIT 2;
3.4 Aggregation Functions (COUNT, SUM, AVG, MAX, MIN)
Aggregation functions are used to perform calculations on a set of values and return a single value.
Current 'employees' table:
| id | name | department | salary |
|---|---|---|---|
| 1 | John Doe | IT | 75000.00 |
| 2 | Jane Smith | HR | 65000.00 |
| 3 | Bob Johnson | Sales | 80000.00 |
Common Aggregation Functions
- COUNT: Returns the number of rows that matches a specified criteria
- SUM: Returns the sum of a numeric column
- AVG: Returns the average value of a numeric column
- MAX: Returns the maximum value of a numeric column
- MIN: Returns the minimum value of a numeric column
Example: Using aggregation functions on the 'employees' table
SELECT COUNT(*) AS total_employees, AVG(salary) AS average_salary FROM employees;
3.5 Grouping Data with GROUP BY
The GROUP BY clause is used to group rows that have the same values in specified columns into aggregated data.
Current 'employees' table:
| id | name | department | salary |
|---|---|---|---|
| 1 | John Doe | IT | 75000.00 |
| 2 | Jane Smith | HR | 65000.00 |
| 3 | Bob Johnson | Sales | 80000.00 |
Basic GROUP BY Syntax
SELECT column1, column2, ...
FROM table_name
GROUP BY column1, column2, ...;
Example: Grouping data from the 'employees' table
SELECT department, COUNT(*) AS total_employees, AVG(salary) AS average_salary FROM employees GROUP BY department;
3.6 Filtering Groups with HAVING
The HAVING clause is used to filter the result set based on a condition applied to the grouped data.
Current 'employees' table:
| id | name | department | salary |
|---|---|---|---|
| 1 | John Doe | IT | 75000.00 |
| 2 | Jane Smith | HR | 65000.00 |
| 3 | Bob Johnson | Sales | 80000.00 |
Basic HAVING Syntax
SELECT column1, column2, ...
FROM table_name
GROUP BY column1, column2, ...
HAVING condition;
Example: Filtering grouped data from the 'employees' table
SELECT department, COUNT(*) AS total_employees, AVG(salary) AS average_salary FROM employees GROUP BY department HAVING total_employees > 1;
3.7 Aliasing Columns and Tables
Aliasing is used to assign a temporary name to a table or column in a query.
Current 'employees' table:
| id | name | department | salary |
|---|---|---|---|
| 1 | John Doe | IT | 75000.00 |
| 2 | Jane Smith | HR | 65000.00 |
| 3 | Bob Johnson | Sales | 80000.00 |
Basic Aliasing Syntax
SELECT column1 AS alias1, column2 AS alias2, ...
FROM table_name AS alias_table;
Example: Aliasing columns and tables in a query
SELECT e.name AS employee_name, e.department AS department_name, e.salary AS employee_salary FROM employees AS e;
3.8 Using Subqueries
A subquery is a query nested inside another query. Subqueries are used to perform complex operations and return results that can be used in the outer query.
Current 'employees' table:
| id | name | department | salary |
|---|---|---|---|
| 1 | John Doe | IT | 75000.00 |
| 2 | Jane Smith | HR | 65000.00 |
| 3 | Bob Johnson | Sales | 80000.00 |
Basic Subquery Syntax
SELECT column1, column2, ...
FROM table_name
WHERE column1 = (SELECT column1 FROM table_name2 WHERE condition);
Example: Using a subquery to find employees with a salary greater than the average salary
SELECT name, salary FROM employees WHERE salary > (SELECT AVG(salary) FROM employees);
4. SQL Joins
4.1 Understanding Joins
SQL joins are used to combine rows from two or more tables based on a related column between them.
Current database structure:
Database: OnlineStore
+-- Schema: Sales
| +-- Table: Customers
| | +-- Column: CustomerID
| | +-- Column: Name
| | +-- Column: Email
| +-- Table: Orders
| +-- Column: OrderID
| +-- Column: CustomerID
| +-- Column: OrderDate
+-- Schema: Inventory
+-- Table: Products
+-- Column: ProductID
+-- Column: Name
+-- Column: Price
4.2 INNER JOIN
The INNER JOIN keyword selects records that have matching values in both tables.
Current database structure:
Database: OnlineStore
+-- Schema: Sales
| +-- Table: Customers
| | +-- Column: CustomerID
| | +-- Column: Name
| | +-- Column: Email
| +-- Table: Orders
| +-- Column: OrderID
| +-- Column: CustomerID
| +-- Column: OrderDate
+-- Schema: Inventory
+-- Table: Products
+-- Column: ProductID
+-- Column: Name
+-- Column: Price
Basic INNER JOIN Syntax
SELECT column1, column2, ...
FROM table1
INNER JOIN table2
ON table1.column = table2.column;
Example: INNER JOIN between Customers and Orders tables
SELECT Customers.Name, Orders.OrderDate
FROM Customers
INNER JOIN Orders
ON Customers.CustomerID = Orders.CustomerID;
4.3 LEFT JOIN
The LEFT JOIN keyword returns all records from the left table (table1), and the matched records from the right table (table2). The result is NULL from the right side, if there is no match.
Current database structure:
Database: OnlineStore
+-- Schema: Sales
| +-- Table: Customers
| | +-- Column: CustomerID
| | +-- Column: Name
| | +-- Column: Email
| +-- Table: Orders
| +-- Column: OrderID
| +-- Column: CustomerID
| +-- Column: OrderDate
+-- Schema: Inventory
+-- Table: Products
+-- Column: ProductID
+-- Column: Name
+-- Column: Price
Basic LEFT JOIN Syntax
SELECT column1, column2, ...
FROM table1
LEFT JOIN table2
ON table1.column = table2.column;
Example: LEFT JOIN between Customers and Orders tables
SELECT Customers.Name, Orders.OrderDate
FROM Customers
LEFT JOIN Orders
ON Customers.CustomerID = Orders.CustomerID;
4.4 RIGHT JOIN
The RIGHT JOIN keyword returns all records from the right table (table2), and the matched records from the left table (table1). The result is NULL from the left side, if there is no match.
Current database structure:
Database: OnlineStore
+-- Schema: Sales
| +-- Table: Customers
| | +-- Column: CustomerID
| | +-- Column: Name
| | +-- Column: Email
| +-- Table: Orders
| +-- Column: OrderID
| +-- Column: CustomerID
| +-- Column: OrderDate
+-- Schema: Inventory
+-- Table: Products
+-- Column: ProductID
+-- Column: Name
+-- Column: Price
Basic RIGHT JOIN Syntax
SELECT column1, column2, ...
FROM table1
RIGHT JOIN table2
ON table1.column = table2.column;
Example: RIGHT JOIN between Customers and Orders tables
-- Viewing the execution plan
EXPLAIN SELECT * FROM employees WHERE salary > 50000;
-- or in SQL Server
SET SHOWPLAN_XML ON;
GO
SELECT * FROM employees WHERE salary > 50000;
GO
SET SHOWPLAN_XML OFF;
6.7 Optimizing SQL Queries
Query optimization involves restructuring queries to improve performance.
-- Avoid using wildcards at the beginning of LIKE patterns
-- Bad:
SELECT * FROM employees WHERE name LIKE '%John%';
-- Better:
SELECT * FROM employees WHERE name LIKE 'John%';
-- Use EXISTS instead of IN for related subqueries
-- Bad:
SELECT * FROM orders WHERE customer_id IN (SELECT id FROM customers WHERE country = 'USA');
-- Better:
SELECT * FROM orders o WHERE EXISTS (SELECT 1 FROM customers c WHERE c.id = o.customer_id AND c.country = 'USA');
-- Avoid using functions in WHERE clauses
-- Bad:
SELECT * FROM orders WHERE YEAR(order_date) = 2023;
-- Better:
SELECT * FROM orders WHERE order_date >= '2023-01-01' AND order_date < '2024-01-01';
-- Use JOIN instead of correlated subqueries
-- Bad:
SELECT e.name, (SELECT d.name FROM departments d WHERE d.id = e.department_id) AS department
FROM employees e;
-- Better:
SELECT e.name, d.name AS department
FROM employees e
JOIN departments d ON e.department_id = d.id;
-- Use UNION ALL instead of UNION when duplicates are acceptable
-- Bad:
SELECT name FROM employees
UNION
SELECT name FROM contractors;
-- Better:
SELECT name FROM employees
UNION ALL
SELECT name FROM contractors;
-- Use table variables or CTEs instead of temporary tables when possible
-- Bad:
CREATE TEMPORARY TABLE temp_high_salary AS
SELECT * FROM employees WHERE salary > 50000;
-- Better:
WITH high_salary AS (
SELECT * FROM employees WHERE salary > 50000
)
SELECT * FROM high_salary;
-- Use covering indexes
CREATE INDEX idx_employee_salary_dept ON employees(salary) INCLUDE(department);
SELECT department FROM employees WHERE salary > 50000;
-- Avoid implicit conversions
```
Comments & Discussion
Facing issues? Have questions? Post them here! We're happy to help!