SQL — Structured Query Language
PRA: SQL queries and database concepts
Sprints: Sprint 2 (database integration)
SQL appears in almost every exam. Easy marks once you know the patterns.
SQL Basics & Terminology
Imagine you have thousands of customer records. You could store them in a giant Excel file — but what happens when 50 people need to read and update that file at the same time? Chaos. Data gets overwritten, duplicated, corrupted.
That is exactly why databases exist. A database is like a super-powered Excel that can handle millions of rows, lets hundreds of people access it simultaneously, and ensures data never gets corrupted. SQL (Structured Query Language) is the language you use to talk to this database — to ask questions, insert data, update records, or delete rows.
| Term | Meaning | Real-World Analogy |
|---|---|---|
| Database | Organized collection of structured data | The entire warehouse of filing cabinets |
| Table | Collection of rows and columns (like a spreadsheet) | One spreadsheet/filing cabinet for "employees" |
| Row (Record/Tuple) | A single entry in a table | One employee's complete file |
| Column (Field/Attribute) | A property of the data | One piece of info — name, salary, department |
| Schema | Structure/blueprint of the database | The blueprint that says "employees table has columns: id, name, salary" |
| Primary Key | Uniquely identifies each row | Like an Aadhaar number — no two people share one |
| Foreign Key | Links one table to another | Your department ID on your employee file — it points to the departments table |
| RDBMS | Relational Database Management System (Oracle, MySQL, PostgreSQL) | The warehouse manager software that runs everything |
SQL Command Categories
SQL commands are grouped by what they do. This is one of the most commonly tested concepts — expect at least one MCQ on this.
| Category | Commands | Purpose | Memory Trick |
|---|---|---|---|
| DDL (Data Definition) | CREATE, ALTER, DROP, TRUNCATE | Define/modify table structure | "Building the filing cabinet itself" |
| DML (Data Manipulation) | INSERT, UPDATE, DELETE, SELECT | Manipulate data in tables | "Working with the files inside the cabinet" |
| DCL (Data Control) | GRANT, REVOKE | Control access/permissions | "Giving/taking away keys to the cabinet" |
| TCL (Transaction Control) | COMMIT, ROLLBACK, SAVEPOINT | Manage transactions | "Save game / load last checkpoint" |
Which SQL command category does TRUNCATE belong to?
Data Types
Before you store data, the database needs to know what kind of data each column will hold. You would not store someone's name in a number field, right? Data types are like labels on filing cabinet drawers — "this drawer only holds numbers," "this one holds text up to 50 characters."
| Data Type | Description | Example |
|---|---|---|
NUMBER(p,s) | Numeric — p = precision (total digits), s = scale (decimal digits) | NUMBER(8,2) → 123456.78 |
VARCHAR2(n) | Variable-length string (Oracle) | VARCHAR2(50) |
VARCHAR(n) | Variable-length string (MySQL) | VARCHAR(100) |
CHAR(n) | Fixed-length string, padded with spaces | CHAR(10) |
DATE | Date value | '2026-03-22' |
INT / INTEGER | Whole number | 42 |
FLOAT | Floating-point number | 3.14 |
CLOB | Large text (Character Large Object) | Long articles |
BLOB | Binary data (Binary Large Object) | Images, files |
CHAR(2) — it is always exactly 2 characters, so fixed-length is perfect and slightly faster.Now imagine a column for employee names — "Raj" is 3 characters, "Darshanakrishnamurthy" is 21 characters. Use
VARCHAR(50) — it only stores what you give it. CHAR(50) would waste space by padding "Raj" with 47 spaces.
If you define a column as CHAR(10) and store the value 'Hi', how many bytes does it occupy?
DDL — CREATE, ALTER, DROP
DDL is about building and modifying the structure — the table itself, not the data inside it. Think of it as construction work: you are building the filing cabinet (CREATE), adding a new drawer to it (ALTER), or demolishing the entire cabinet (DROP).
CREATE TABLE
CREATE TABLE employees ( emp_id NUMBER(6) PRIMARY KEY, emp_name VARCHAR2(50) NOT NULL, salary NUMBER(10,2) DEFAULT 0, dept_id NUMBER(4), hire_date DATE, FOREIGN KEY (dept_id) REFERENCES departments(dept_id) );
ALTER TABLE
Your table is already live with data in it, but you realize you forgot to add an email column. You cannot DROP and recreate — you'd lose all data. ALTER lets you modify a live table.
-- Add a column ALTER TABLE employees ADD email VARCHAR2(100); -- Modify a column ALTER TABLE employees MODIFY emp_name VARCHAR2(100); -- Drop a column ALTER TABLE employees DROP COLUMN email; -- Rename a column (Oracle) ALTER TABLE employees RENAME COLUMN emp_name TO full_name; -- Add a constraint ALTER TABLE employees ADD CONSTRAINT chk_sal CHECK (salary > 0);
DROP & TRUNCATE
-- Drop table entirely (structure + data gone) DROP TABLE employees; -- Remove all rows but keep structure (cannot rollback) TRUNCATE TABLE employees;
TRUNCATE = empty every file from the cabinet, but the cabinet itself stays. Cannot be rolled back because it is DDL.
DELETE = remove specific files (or all files) from the cabinet. Can be rolled back because it is DML. Can use WHERE to pick which rows to remove.
Speed: TRUNCATE is much faster than DELETE for large tables because it does not log individual row deletions.
You accidentally ran TRUNCATE TABLE employees; on production. Can you ROLLBACK to undo it?
DML — INSERT, UPDATE, DELETE
If DDL builds the cabinet, DML is what you do with the files inside it every day — adding new files (INSERT), updating information on existing files (UPDATE), and removing files you no longer need (DELETE).
INSERT
-- Insert single row INSERT INTO employees (emp_id, emp_name, salary, dept_id) VALUES (101, 'Darshan', 50000, 10); -- Insert multiple rows (MySQL) INSERT INTO employees (emp_id, emp_name, salary) VALUES (102, 'Ankit', 45000), (103, 'Priya', 55000); -- Insert from another table INSERT INTO emp_backup SELECT * FROM employees WHERE dept_id = 10;
UPDATE
-- Update specific rows UPDATE employees SET salary = salary * 1.10 WHERE dept_id = 10; -- Update multiple columns UPDATE employees SET salary = 60000, dept_id = 20 WHERE emp_id = 101;
UPDATE employees SET salary = 0; — This updates EVERY row in the table. Always double-check your WHERE clause before running UPDATE or DELETE.
DELETE
-- Delete specific rows DELETE FROM employees WHERE emp_id = 103; -- Delete all rows (can be rolled back) DELETE FROM employees;
What happens if you run UPDATE employees SET salary = salary * 2; without a WHERE clause?
SELECT & Filtering
SELECT is the most important SQL command — it is how you ask questions to your database. Think of it like Google Search for your data. Want to know which employees earn more than 50,000? SELECT. Want a list of all departments? SELECT. You will use this in literally every exam and every real project.
SELECT emp_name, salary FROM employees WHERE dept_id = 10 reads as: "Get me the name and salary FROM the employees table WHERE the department is 10." SQL reads almost like English — that was the whole point when it was designed.
Basic SELECT
-- Select all columns (the * means "everything") SELECT * FROM employees; -- Select specific columns (always better — don't fetch what you don't need) SELECT emp_name, salary FROM employees; -- Alias — give a column a friendly display name SELECT emp_name AS "Employee Name", salary * 12 AS annual_salary FROM employees; -- DISTINCT — remove duplicates SELECT DISTINCT dept_id FROM employees;
WHERE Clause — Filtering Rows
Without WHERE, SELECT returns every row. WHERE is your filter — like telling Excel "show me only rows where salary > 50000." The database scans through all rows and only returns the ones that match your condition.
SELECT * FROM employees WHERE salary > 50000; SELECT * FROM employees WHERE dept_id = 10 AND salary >= 40000; SELECT * FROM employees WHERE dept_id = 10 OR dept_id = 20; SELECT * FROM employees WHERE NOT dept_id = 30;
ORDER BY
By default, SQL does not guarantee any order of results. If you want sorted results — highest salary first, alphabetical names, etc. — you must explicitly ask with ORDER BY.
-- Ascending (default) — lowest to highest SELECT * FROM employees ORDER BY salary ASC; -- Descending — highest to lowest SELECT * FROM employees ORDER BY salary DESC; -- Multiple columns: first sort by dept, then within each dept sort by salary descending SELECT * FROM employees ORDER BY dept_id ASC, salary DESC;
LIMIT / ROWNUM / FETCH
-- MySQL: LIMIT SELECT * FROM employees ORDER BY salary DESC LIMIT 5; -- Oracle: ROWNUM (old style) SELECT * FROM employees WHERE ROWNUM <= 5; -- Oracle 12c+: FETCH SELECT * FROM employees ORDER BY salary DESC FETCH FIRST 5 ROWS ONLY;
What does SELECT DISTINCT dept_id FROM employees; return?
Which query correctly finds employees whose name starts with 'D' and salary is above 40000?
Operators & Conditions
Operators are the building blocks of your WHERE clause. They let you express conditions like "between these dates," "in this list," or "matching this pattern." Master these and you can filter anything.
| Operator | Usage | Example |
|---|---|---|
= | Equal to | WHERE dept_id = 10 |
!= or <> | Not equal | WHERE dept_id != 10 |
>, <, >=, <= | Comparison | WHERE salary > 50000 |
BETWEEN | Range (inclusive on BOTH ends) | WHERE salary BETWEEN 30000 AND 60000 |
IN | Match any in list | WHERE dept_id IN (10, 20, 30) |
NOT IN | Not in list | WHERE dept_id NOT IN (40, 50) |
LIKE | Pattern matching | WHERE name LIKE 'D%' |
IS NULL | Check for NULL | WHERE email IS NULL |
IS NOT NULL | Check not NULL | WHERE email IS NOT NULL |
WHERE salary BETWEEN 30000 AND 60000 is the same as WHERE salary >= 30000 AND salary <= 60000. Both endpoints are included. This is a common exam trap — they will ask if BETWEEN includes the boundary values. Answer: YES.
LIKE Wildcards
LIKE is SQL's version of "search with wildcards." Two special characters: % means "any number of characters" (including zero), and _ means "exactly one character."
| Pattern | Meaning | Matches |
|---|---|---|
'D%' | Starts with D | Darshan, Deepak, Dhruv |
'%an' | Ends with "an" | Darshan, Rohan |
'%ar%' | Contains "ar" | Darshan, Karan |
'_a%' | Second letter is "a" | Darshan, Karan |
'___' | Exactly 3 characters | Raj, Ram |
= or !=. Think of it this way: is an unknown value equal to 10? We don't know! So the comparison returns neither true nor false — it returns UNKNOWN, and the row is excluded.WHERE salary = NULL → always returns nothing (WRONG).WHERE salary IS NULL → correct way to check for NULL.
SELECT * FROM employees WHERE salary BETWEEN 30000 AND 50000; — Does this include employees with salary exactly 30000 and exactly 50000?
BETWEEN 30000 AND 50000 is equivalent to >= 30000 AND <= 50000. Employees earning exactly 30000 or exactly 50000 will be included.SQL Functions
Functions in SQL are pre-built tools that transform data. Instead of doing calculations in your application code, SQL lets you do it right inside the query. There are two types:
- Single-row functions — work on each row individually (like applying a formula to each cell in Excel)
- Aggregate functions — work on a group of rows and return one result (like SUM or AVERAGE in Excel)
Single-Row Functions (work on each row)
String Functions:
| Function | Result |
|---|---|
UPPER('hello') | 'HELLO' |
LOWER('HELLO') | 'hello' |
INITCAP('hello world') | 'Hello World' |
LENGTH('Darshan') | 7 |
SUBSTR('Darshan', 1, 3) | 'Dar' |
CONCAT('Hello', ' World') | 'Hello World' |
REPLACE('Hello', 'l', 'r') | 'Herro' |
TRIM(' hi ') | 'hi' |
LPAD('42', 5, '0') | '00042' |
RPAD('Hi', 5, '*') | 'Hi***' |
INSTR('Darshan', 'sh') | 4 |
Number Functions:
| Function | Result |
|---|---|
ROUND(15.678, 2) | 15.68 |
ROUND(15.678, 0) | 16 |
TRUNC(15.678, 2) | 15.67 |
CEIL(15.1) | 16 |
FLOOR(15.9) | 15 |
MOD(10, 3) | 1 |
ABS(-42) | 42 |
POWER(2, 3) | 8 |
SQRT(144) | 12 |
ROUND(15.678, 2) = 15.68 (rounds UP the last digit)TRUNC(15.678, 2) = 15.67 (just CHOPS off — no rounding)Think of TRUNC as cutting with scissors — it does not care what comes after.
Date Functions (Oracle):
SELECT SYSDATE FROM dual; -- Current date SELECT MONTHS_BETWEEN('01-MAR-26', '01-JAN-26') FROM dual; -- 2 SELECT ADD_MONTHS(SYSDATE, 3) FROM dual; -- 3 months later SELECT NEXT_DAY(SYSDATE, 'FRIDAY') FROM dual; -- Next Friday SELECT LAST_DAY(SYSDATE) FROM dual; -- Last day of month SELECT EXTRACT(YEAR FROM SYSDATE) FROM dual; -- 2026
Conversion Functions:
TO_CHAR(SYSDATE, 'DD-MON-YYYY') -- '22-MAR-2026' TO_CHAR(salary, '$99,999.00') -- '$50,000.00' TO_DATE('22-03-2026', 'DD-MM-YYYY') -- date value TO_NUMBER('12345') -- 12345
NULL-handling Functions:
NULL is the trickiest part of SQL. Any operation with NULL returns NULL: 5 + NULL = NULL, 'hello' || NULL = NULL. These functions help you handle NULLs gracefully.
NVL(commission, 0) -- If NULL, return 0 NVL2(comm, 'Has', 'None') -- If NOT NULL → 'Has', else 'None' NULLIF(10, 10) -- Returns NULL (both equal) COALESCE(a, b, c, 0) -- First non-NULL value
NVL(a, b) takes exactly 2 args. COALESCE(a, b, c, ...) takes any number — returns first non-NULL. COALESCE is ANSI standard (works everywhere), NVL is Oracle-only.
Aggregate Functions (work on groups of rows)
Aggregate functions crunch multiple rows into a single result. Think of them like Excel's SUM, AVERAGE, COUNT functions at the bottom of a column.
| Function | Description | Example |
|---|---|---|
COUNT(*) | Count all rows | SELECT COUNT(*) FROM employees |
COUNT(col) | Count non-NULL values | SELECT COUNT(email) FROM employees |
SUM(col) | Total | SELECT SUM(salary) FROM employees |
AVG(col) | Average | SELECT AVG(salary) FROM employees |
MIN(col) | Minimum value | SELECT MIN(salary) FROM employees |
MAX(col) | Maximum value | SELECT MAX(salary) FROM employees |
COUNT(*) counts ALL rows including those with NULLs.COUNT(column_name) counts only NON-NULL values in that column.If you have 10 employees but 3 have NULL email,
COUNT(*) = 10 but COUNT(email) = 7.
Table has 10 rows. Column "commission" has NULL in 4 rows. What does SELECT COUNT(commission) FROM employees; return?
GROUP BY & HAVING
Why does GROUP BY exist? Imagine your manager asks: "What is the average salary in each department?" You cannot just run AVG(salary) — that gives you ONE number for the entire company. You need to first group employees by department, and THEN calculate the average within each group.
-- Count employees per department SELECT dept_id, COUNT(*) AS emp_count FROM employees GROUP BY dept_id; -- Average salary per department SELECT dept_id, AVG(salary) AS avg_sal FROM employees GROUP BY dept_id ORDER BY avg_sal DESC;
Wrong:
SELECT dept_id, emp_name, AVG(salary) FROM employees GROUP BY dept_id;Right:
SELECT dept_id, AVG(salary) FROM employees GROUP BY dept_id;
HAVING (filter groups)
Why can't we just use WHERE? Because WHERE runs before grouping happens. If you want to say "show me only departments that have more than 5 employees," you need a filter that runs after the groups are formed. That is what HAVING does.
-- Departments with more than 5 employees SELECT dept_id, COUNT(*) AS emp_count FROM employees GROUP BY dept_id HAVING COUNT(*) > 5; -- Departments where average salary exceeds 50000 SELECT dept_id, AVG(salary) AS avg_sal FROM employees WHERE salary > 10000 GROUP BY dept_id HAVING AVG(salary) > 50000;
HAVING filters groups AFTER grouping. Like removing baskets that have fewer than 5 apples.
You cannot use aggregate functions in WHERE.
WHERE COUNT(*) > 5 is an error. Use HAVING instead.
SQL Execution Order
SQL does NOT execute in the order you write it. Understanding this order is the key to understanding why WHERE cannot use aliases and HAVING can use aggregates.
| Order | Clause | Purpose | Analogy |
|---|---|---|---|
| 1 | FROM | Choose table | Open the filing cabinet |
| 2 | WHERE | Filter rows | Remove files that don't match |
| 3 | GROUP BY | Group rows | Sort remaining files into piles by department |
| 4 | HAVING | Filter groups | Remove piles that are too small |
| 5 | SELECT | Choose columns | Pick which info to read from each pile |
| 6 | ORDER BY | Sort results | Arrange the final output |
Which query correctly finds departments where the average salary exceeds 60000?
In the SQL execution order, which runs first — SELECT or WHERE?
JOINs
Why do JOINs exist? In a well-designed database, data is split across multiple tables to avoid repetition. Employees are in one table, departments in another. But when your manager asks "show me each employee with their department name," that data lives in TWO tables. A JOIN combines them.
| JOIN Type | Returns | When to use |
|---|---|---|
| INNER JOIN | Only matching rows from both tables | You want only employees who HAVE a department |
| LEFT JOIN (LEFT OUTER) | All rows from left table + matching from right | You want ALL employees, even those without a department |
| RIGHT JOIN (RIGHT OUTER) | All rows from right table + matching from left | You want ALL departments, even empty ones |
| FULL OUTER JOIN | All rows from both tables | You want everything — unmatched rows from both sides |
| CROSS JOIN | Cartesian product (every row x every row) | Rarely used. 10 employees x 5 depts = 50 rows |
| SELF JOIN | Table joined with itself | Finding relationships within the same table (e.g., employee → manager) |
INNER JOIN
Only returns rows where there is a match in BOTH tables. If an employee has no department (dept_id is NULL), that employee is excluded from the results.
SELECT e.emp_name, d.dept_name FROM employees e INNER JOIN departments d ON e.dept_id = d.dept_id;
departments table: 10=Finance, 20=IT, 30=HR
INNER JOIN result: Darshan-Finance, Priya-IT
Raj is excluded (no matching dept). HR is excluded (no employees). Only the overlap survives.
LEFT JOIN
Returns ALL rows from the left table (employees), even if there is no match in the right table (departments). Unmatched right-side columns show NULL.
-- All employees, even those without a department SELECT e.emp_name, d.dept_name FROM employees e LEFT JOIN departments d ON e.dept_id = d.dept_id;
Raj appears now! His dept_name is NULL because he has no department. The LEFT table (employees) keeps ALL its rows.
RIGHT JOIN
The mirror of LEFT JOIN. Returns ALL rows from the right table, even if there is no match in the left table.
-- All departments, even those with no employees SELECT e.emp_name, d.dept_name FROM employees e RIGHT JOIN departments d ON e.dept_id = d.dept_id;
HR appears now! No employee is in HR, so emp_name is NULL. The RIGHT table (departments) keeps ALL its rows.
FULL OUTER JOIN
Returns EVERYTHING from both tables. No row is left behind — unmatched rows from both sides appear with NULLs.
SELECT e.emp_name, d.dept_name FROM employees e FULL OUTER JOIN departments d ON e.dept_id = d.dept_id;
Everyone shows up — even Raj (no dept) and HR (no employees).
CROSS JOIN
-- Every employee × every department (rarely useful) SELECT e.emp_name, d.dept_name FROM employees e CROSS JOIN departments d;
SELF JOIN
A table joined with itself. The classic use case: an employees table where each employee has a manager_id that points to another employee in the SAME table.
-- Find each employee's manager SELECT e.emp_name AS employee, m.emp_name AS manager FROM employees e LEFT JOIN employees m ON e.manager_id = m.emp_id;
INNER = only the overlap
LEFT = entire left circle + overlap
RIGHT = entire right circle + overlap
FULL OUTER = both entire circles
"What is a Cartesian product?" → CROSS JOIN (no ON condition)
Employees table has 10 rows, Departments table has 5 rows. How many rows does SELECT * FROM employees CROSS JOIN departments; produce?
You want ALL employees listed, including those not assigned to any department. Their department name should show NULL. Which JOIN do you use?
Subqueries
Why do subqueries exist? Sometimes the answer to your question depends on another question. Example: "Show me employees who earn more than the average salary." You cannot write WHERE salary > AVG(salary) — that is illegal in SQL (aggregate functions are not allowed in WHERE). So you need a subquery to first calculate the average, and then the outer query uses that result.
Single-Row Subquery
The inner query returns exactly ONE value. You can use =, >, <, >=, <= with it.
-- Employees earning more than the average salary SELECT emp_name, salary FROM employees WHERE salary > (SELECT AVG(salary) FROM employees);
SELECT AVG(salary) FROM employees → returns 520002. Outer query becomes:
SELECT emp_name, salary FROM employees WHERE salary > 520003. Result: all employees earning more than 52000
Multi-Row Subquery (use IN, ANY, ALL)
When the inner query returns MULTIPLE values, you cannot use =. You need IN (matches any value in the list), ANY, or ALL.
-- Employees in departments located in Mumbai SELECT emp_name FROM employees WHERE dept_id IN (SELECT dept_id FROM departments WHERE location = 'Mumbai'); -- Salary greater than ALL in dept 20 (greater than the maximum) SELECT emp_name, salary FROM employees WHERE salary > ALL (SELECT salary FROM employees WHERE dept_id = 20); -- Salary greater than ANY in dept 20 (greater than the minimum) SELECT emp_name, salary FROM employees WHERE salary > ANY (SELECT salary FROM employees WHERE dept_id = 20);
Correlated Subquery
In a normal subquery, the inner query runs once and its result is reused. In a correlated subquery, the inner query runs once for EACH row of the outer query — because it references a value from the outer query. This is slower but more powerful.
-- Employees earning more than the average of THEIR OWN department SELECT emp_name, salary, dept_id FROM employees e WHERE salary > (SELECT AVG(salary) FROM employees WHERE dept_id = e.dept_id);
For Priya (dept 20): inner query calculates AVG salary of dept 20 → checks if Priya's salary exceeds it.
The inner query re-runs for every row with a different dept_id reference. That is what "correlated" means — the inner query is linked to the outer query.
EXISTS
EXISTS checks if the subquery returns ANY rows at all. It does not care about the actual values — just whether rows exist. It is often faster than IN for large datasets.
-- Departments that have at least one employee SELECT dept_name FROM departments d WHERE EXISTS (SELECT 1 FROM employees WHERE dept_id = d.dept_id);
> ALL (subquery) = greater than the maximum value returned. You must beat EVERYONE.> ANY (subquery) = greater than the minimum value returned. You just need to beat ONE person.< ALL = less than the minimum. < ANY = less than the maximum.
Dept 20 has salaries: 40000, 50000, 60000. Which employees are returned by WHERE salary > ALL (SELECT salary FROM employees WHERE dept_id = 20)?
What is the key difference between a regular subquery and a correlated subquery?
e.dept_id), so it must re-execute for each row of the outer query. This makes correlated subqueries slower but capable of row-specific comparisons.Set Operations
Set operations let you combine the results of two separate SELECT queries. Think of them like math operations on sets: union (combine), intersection (overlap), and difference (subtract).
| Operation | Description | Analogy |
|---|---|---|
UNION | Combine results, remove duplicates | Merge two guest lists, removing duplicate names |
UNION ALL | Combine results, keep duplicates | Just staple the two lists together |
INTERSECT | Only rows in BOTH queries | People who appear on BOTH guest lists |
MINUS (Oracle) / EXCEPT (MySQL) | Rows in first query but NOT in second | People on list A who are NOT on list B |
-- All cities where we have employees or departments SELECT city FROM employees UNION SELECT city FROM departments; -- Employees who are also managers SELECT emp_id FROM employees INTERSECT SELECT manager_id FROM employees;
What is the difference between UNION and UNION ALL?
Constraints & Keys
Why do constraints exist? Without constraints, anyone could insert garbage data: an employee with a negative salary, two employees with the same ID, an employee assigned to a department that does not exist. Constraints are the database's bouncers — they enforce rules so your data stays clean and consistent.
| Constraint | Description | What it prevents |
|---|---|---|
PRIMARY KEY | Unique + NOT NULL. Only one per table. | Duplicate or empty IDs |
FOREIGN KEY | References PRIMARY KEY of another table | Orphan records (referencing non-existent data) |
UNIQUE | All values must be different (allows one NULL) | Duplicate emails, duplicate phone numbers |
NOT NULL | Column cannot have NULL value | Missing required fields |
CHECK | Values must satisfy a condition | Invalid data (negative salary, age below 0) |
DEFAULT | Default value if none provided | Nothing — just fills in a value automatically |
CREATE TABLE students ( student_id NUMBER PRIMARY KEY, name VARCHAR2(50) NOT NULL, email VARCHAR2(100) UNIQUE, age NUMBER CHECK (age >= 18), status VARCHAR2(10) DEFAULT 'active', dept_id NUMBER REFERENCES departments(dept_id) );
Composite Primary Key
Sometimes a single column cannot uniquely identify a row. In an "enrollments" table, neither student_id alone nor course_id alone is unique (a student can enroll in many courses, a course can have many students). But the COMBINATION of both is unique — a student can enroll in a specific course only once.
CREATE TABLE enrollments ( student_id NUMBER, course_id NUMBER, grade CHAR(2), PRIMARY KEY (student_id, course_id) );
UNIQUE = values must be different, but allows ONE NULL. Multiple UNIQUE constraints per table allowed.
Example: A student has one student_id (PRIMARY KEY) but also a unique email and unique phone number (both UNIQUE constraints on the same table).
ON DELETE actions (Foreign Key)
What happens when you delete a department that still has employees assigned to it? The ON DELETE action decides.
| Action | What happens when parent row is deleted | Example |
|---|---|---|
ON DELETE CASCADE | Delete child rows too | Delete department 10 → all employees in dept 10 are also deleted |
ON DELETE SET NULL | Set foreign key to NULL in child rows | Delete department 10 → employees in dept 10 get dept_id = NULL |
ON DELETE RESTRICT | Prevent deletion if child rows exist (default) | Cannot delete department 10 while it has employees — error is thrown |
A table has a PRIMARY KEY and three UNIQUE constraints. How many total unique identifier columns does it have?
If a FOREIGN KEY is defined with ON DELETE CASCADE, what happens when the parent row is deleted?
Normalization
Why does normalization matter? Imagine a table where every employee row stores the department name, department location, and department manager. Now the HR department moves to a new office. You have to update the location in 200 employee rows. Miss even one? Your data is now inconsistent — some rows say "Mumbai," others say "Pune" for the same department.
Normalization is the process of organizing your tables so that each piece of information is stored in exactly ONE place. Update it once, and it is correct everywhere. It eliminates redundancy and prevents the kind of update anomalies described above.
| Normal Form | Rule | In Simple Terms | Example Violation |
|---|---|---|---|
| 1NF | Atomic values, no repeating groups | Each cell has ONE value. No comma-separated lists. | courses = "Math, Physics, Chemistry" in one cell |
| 2NF | 1NF + no partial dependency | Every non-key column depends on the WHOLE primary key (not just part of it). | In (student_id, course_id) → student_name depends only on student_id, not on the full key |
| 3NF | 2NF + no transitive dependency | Non-key columns depend ONLY on the primary key, not on other non-key columns. | emp table has dept_id AND dept_name — dept_name depends on dept_id, not on emp_id |
| BCNF | 3NF + every determinant is a candidate key | Stricter version of 3NF. | Rare edge cases where 3NF still has redundancy |
1NF: Split into separate rows — one row per student-course combination. Now each cell has ONE atomic value.
2NF: If the primary key is (student_id, course_id), then student_name depends only on student_id (partial dependency). Fix: move student info to a separate Students table.
3NF: If the Students table has dept_id AND dept_name, the dept_name depends on dept_id (not on student_id). This is a transitive dependency. Fix: move dept_name to a separate Departments table.
"Removing transitive dependency achieves?" → 3NF
"Removing partial dependency achieves?" → 2NF
A table has columns: (student_id PK, student_name, dept_id, dept_name). Which normal form is violated and why?
A cell in a table contains "Java, Python, SQL" as a single value. Which normal form rule is violated?
Views
Why do views exist? Imagine you have a complex query with 5 JOINs that you run 10 times a day. Instead of writing that monster query every time, you save it as a view — a virtual table that runs the query behind the scenes whenever you access it. Views also let you control what data different users can see.
-- Create a view CREATE VIEW high_earners AS SELECT emp_name, salary, dept_id FROM employees WHERE salary > 50000; -- Use the view like a table SELECT * FROM high_earners; -- Replace a view CREATE OR REPLACE VIEW high_earners AS SELECT emp_name, salary FROM employees WHERE salary > 60000; -- Drop a view DROP VIEW high_earners;
- Security: Give interns access to a view that only shows name and department — not salary. They never see the base table.
- Simplicity: A 50-line JOIN query becomes
SELECT * FROM quarterly_report. - Consistency: Everyone uses the same view definition — no one writes a slightly different version of the query.
Does a view store data physically in the database?
Indexes
Why do indexes exist? Without an index, searching for an employee named "Darshan" in a table of 1 million rows means the database scans every single row — like reading a 500-page book from page 1 to find one topic. An index is like the book's index at the back — you look up "Darshan," find it points to row 47892, and jump straight there.
A book with an index: you check the back, see "polymorphism — page 234," and jump directly there.
A database index works the same way. It creates a sorted lookup structure (usually a B-tree) that lets the database find your data in milliseconds instead of scanning millions of rows.
-- Create an index CREATE INDEX idx_emp_name ON employees(emp_name); -- Create a unique index CREATE UNIQUE INDEX idx_emp_email ON employees(email); -- Composite index (useful when you often search by both columns together) CREATE INDEX idx_dept_sal ON employees(dept_id, salary); -- Drop an index DROP INDEX idx_emp_name;
Rule of thumb: Index columns that appear frequently in WHERE, JOIN, and ORDER BY. Do NOT index every column — the maintenance overhead is not worth it for rarely-searched columns.
Which statement about indexes is FALSE?
You have a table with 10 million rows. Queries filtering by WHERE city = 'Mumbai' are slow. What is the best solution?
Transactions & TCL
Why do transactions exist? Imagine transferring Rs. 1000 from your account to a friend's account. This involves TWO operations: (1) subtract 1000 from your balance, (2) add 1000 to friend's balance. What if the system crashes after step 1 but before step 2? Your money vanishes into thin air. Transactions prevent this by treating both operations as ONE atomic unit — either BOTH succeed, or BOTH are undone.
ACID Properties
Every transaction follows four rules, remembered as ACID. This is a guaranteed exam question.
| Property | Meaning | Real-World Example |
|---|---|---|
| Atomicity | All or nothing — entire transaction succeeds or fails | Bank transfer: both debit AND credit happen, or neither does |
| Consistency | Database moves from one valid state to another | Total money in the system remains the same after transfer |
| Isolation | Concurrent transactions don't interfere | Two people transferring money simultaneously don't see each other's half-done work |
| Durability | Once committed, changes survive crashes | After bank confirms transfer, even if the server crashes, your money is safe |
-- Bank transfer: a classic transaction example UPDATE accounts SET balance = balance - 1000 WHERE acc_id = 1; UPDATE accounts SET balance = balance + 1000 WHERE acc_id = 2; COMMIT; -- Save changes permanently — cannot be undone after this -- Something went wrong? Undo everything since last COMMIT ROLLBACK; -- Savepoints — partial rollback SAVEPOINT sp1; UPDATE employees SET salary = 70000 WHERE emp_id = 101; SAVEPOINT sp2; DELETE FROM employees WHERE emp_id = 103; ROLLBACK TO sp2; -- Undo DELETE but keep the UPDATE
Which ACID property ensures that a bank transfer either fully completes or fully rolls back?
You run an UPDATE, then a CREATE TABLE, then a ROLLBACK. What happens?
DCL — GRANT & REVOKE
Why does DCL exist? In a company, not everyone should have the same access. An intern should not be able to DELETE employee records. The CEO's assistant might need to read salary data but never modify it. DCL controls who can do what.
-- Grant permissions GRANT SELECT, INSERT ON employees TO user1; GRANT ALL ON employees TO admin_user; -- Revoke permissions REVOKE INSERT ON employees FROM user1; -- Grant with pass-along ability GRANT SELECT ON employees TO user1 WITH GRANT OPTION;
What does GRANT SELECT ON employees TO user1 WITH GRANT OPTION do?
Miscellaneous Concepts
CASE Expression
CASE is SQL's version of if-else. It lets you create calculated columns based on conditions — like adding a "level" label based on salary ranges.
SELECT emp_name, salary, CASE WHEN salary > 80000 THEN 'Senior' WHEN salary > 50000 THEN 'Mid' ELSE 'Junior' END AS level FROM employees;
CREATE TABLE AS SELECT (CTAS)
-- Create a backup table from existing data CREATE TABLE emp_backup AS SELECT * FROM employees WHERE dept_id = 10;
RENAME & COMMENT
-- Rename a table RENAME employees TO staff; -- Add a comment on table COMMENT ON TABLE employees IS 'Main employee table';
Pseudo-columns (Oracle)
| Pseudo-column | Description |
|---|---|
ROWNUM | Row number in result set |
ROWID | Physical address of the row |
SYSDATE | Current date and time |
USER | Current logged-in user |
DECODE (Oracle-specific IF-THEN)
SELECT emp_name, DECODE(dept_id, 10, 'Finance', 20, 'IT', 30, 'HR', 'Other') AS dept_name FROM employees;
DECODE is Oracle-only, works for simple equality checks only.CASE is ANSI standard, works everywhere, supports complex conditions (ranges, LIKE, IS NULL). Prefer CASE.
Sequences (Oracle — auto-increment)
-- Create a sequence CREATE SEQUENCE emp_seq START WITH 1 INCREMENT BY 1; -- Use in INSERT INSERT INTO employees (emp_id, emp_name) VALUES (emp_seq.NEXTVAL, 'Darshan'); -- Get current value SELECT emp_seq.CURRVAL FROM dual;
Synonyms (Oracle)
-- Create a synonym (alias for a table) CREATE SYNONYM emp FOR hr.employees; -- Now use the short name SELECT * FROM emp;
Practice Questions
Which SQL command is used to remove all rows from a table without logging individual row deletions?
What is the output of: SELECT COUNT(*) FROM employees WHERE commission IS NULL; if 3 out of 10 employees have NULL commission?
Which JOIN returns all rows from both tables, including unmatched rows?
Which clause is used to filter groups after GROUP BY?
A table with no repeating groups and only atomic values in each cell satisfies which normal form?
What does GRANT SELECT ON employees TO user1 WITH GRANT OPTION do?
Which function returns the first non-NULL value from a list?
What does SELECT emp_name FROM employees WHERE salary > ALL (SELECT salary FROM employees WHERE dept_id = 20); return?
The ACID property that ensures a transaction is "all or nothing" is:
What is the difference between DELETE FROM employees; and TRUNCATE TABLE employees;?