# Grouping and Aggregation
> `GROUP BY` groups rows that have the same values into summary rows.
## HAVING
When grouping data, you may need to filter out undesired data from your result set based on groups of data rather than based on the raw data.
`group by` clause runs after `where` clause has been evaluated, you cannot add filter conditions to your `where` clause for this purpose.
You cannot refer to aggregate function `count(*)` in the `where` clause, because the groups have not yet been generated at the time the where clause is evaluated. Instead, you must put your group filter conditions in the having clause.
```sql
-- Error
SELECT open_emp_id, COUNT(*) how_many
FROM account
WHERE COUNT(*) > 4
GROUP BY open_emp_id;
-- Solution
SELECT open_emp_id, COUNT(*) how_many
FROM account
GROUP BY open_emp_id
HAVING COUNT(*) > 4;
```
## Aggregate Functions
| Aggregate Function | Description |
| ------------------ | ------------------------------------------ |
| `Max()` | Returns the maximum value within a set |
| `Min()` | Returns the minimum value within a set |
| `Avg()` | Returns the average value across a set |
| `Sum()` | Returns the sum of the values across a set |
| `Count()` | Returns the number of values in a set |
```sql
SELECT MAX(avail_balance) max_balance,
MIN(avail_balance) min_balance,
AVG(avail_balance) avg_balance,
SUM(avail_balance) tot_balance,
COUNT(*) num_accounts
FROM account
WHERE product_cd = 'CHK';
```
## Counting Distinct Values
```sql
SELECT COUNT(DISTINCT id)
FROM account;
```
## Using Expressions
```sql
SELECT MAX(pending_balance - avail_balance) max_uncleared
FROM account;
```
## How is NULL Handled?
Even with the addition of the null value to the table, the `sum()`, `max()`, and `avg()` functions all return the same values, indicating that they ignore any null values encountered.
## Grouping via Expressions
Group by the Year of **start_date**:
```sql
SELECT EXTRACT(YEAR FROM start_date) year, COUNT(*) how_many
FROM employee
GROUP BY EXTRACT(YEAR FROM start_date);
```
## Generating Rollups
Let’s say, however, that along with the total balances for each product/branch combination, you also want total balances for each distinct product.
```sql
SELECT product_cd,
open_branch_id,
SUM(avail_balance) tot_balance
FROM account
GROUP BY product_cd, open_branch_id
WITH ROLLUP;
```
<img src="GroupingAggregation.assets/image-20200823181800797.png" alt="image-20200823181800797" width="500" />
Seven additional rows in the result set, one for each of the six distinct products and one for the grand total (all products combined).
### Oracle Database
```sql
GROUP BY ROLLUP(product_cd, open_branch_id)
```
Advantage of this syntax: allows you to perform rollups on a subset of columns in the `group by` clause.
To perform rollups on only b and c:
```sql
GROUP BY a, ROLLUP(b, c)
```
### with cube
If, along with totals by product, you also want to calculate totals per branch, then you can use the `with cube` option, which generates summary rows for all possible combinations of the grouping columns.
```sql
SELECT product_cd, open_branch_id, SUM(avail_balance) tot_balance
FROM account
GROUP BY product_cd, open_branch_id
WITH CUBE;
```
<img src="GroupingAggregation.assets/image-20200823182517074.png" alt="image-20200823182517074" width="500" />
Row 2-5 gives the total balance per branch regardless of **product_cd**.
#### Oracle Database
```sql
GROUP BY CUBE(product_cd, open_branch_id)
```