STATISTICS IN SQL

 

Note: Can be done in the database, but best done in a spreadsheet or using statistics software!

 

          

EXAMPLES

E.g.: Price frequency (upper result) and mode (lower result)


price frequency and mode


SQL for the two queries:


SELECT Products.UnitPrice, Count(*) AS Frequency
FROM Products
GROUP BY Products.UnitPrice
ORDER BY Count(*) DESC , UnitPrice;


SELECT UnitPrice, Count(*) AS Frequency
FROM Products
GROUP BY UnitPrice
HAVING Count(*) >= ALL
   (SELECT Count(*)
   FROM Products
   GROUP BY UnitPrice);


Note:  We need the subquery because aggregate functions cannot be nested.


           E.g.:  Assign sequential numbers to customers based on their customer ID.


sequential customer numbers

Note:  CustomerNum is not stored in the database - it is generated whenever needed by running the query.
SQL for the query:

SELECT count(p1.CustomerID) AS CustomerNum, p1.CompanyName,  p1.CustomerID
FROM Customers  p1
   INNER JOIN Customers p2
   ON p1.CustomerID >= p2.CustomerID
GROUP BY p1.CompanyName, p1.CustomerID
ORDER BY 1;

 

E.g.:  Calculate the running total $ amount by date for orders placed by a particular customer (ID RATTC). To simplify the SQL, first calculate total for this customer without generating a running total and then the running total based on the first query (results below, on the left):

running totals

The result on the right includes a running count based on date as well as the running total.

SQL for the three queries:

Query at top left (RATTC_ForRunSum)

SELECT a.OrderID, a.OrderDate, Sum(a.ExtendedPrice) AS OrderTotal
FROM Invoices AS a
WHERE a.CustomerID = "RATTC"
GROUP BY a.OrderID, a.OrderDate;

Query at lower left with running total only (RATTC_RunningTotal):

SELECT a.OrderID, a.OrderDate, Sum(b.OrderTotal) AS RunningTotal
FROM RATTC_ForRunSum AS a, RATTC_ForRunSum AS b
WHERE b.OrderDate <= a.OrderDate
GROUP BY a.OrderID, a.OrderDate;

Query at lower right with running total and count (RATTC_RunningTotalAndCount):

SELECT Count(b.OrderDate) AS DateCount, a.OrderID, a.OrderDate, Sum(b.OrderTotal) AS RunningTotal
FROM RATTC_ForRunSum AS a, RATTC_ForRunSum AS b
WHERE b.OrderDate <= a.OrderDate
GROUP BY a.OrderID, a.OrderDate;


E.g.:  Find the median unit price for all products.  Build the query in three steps (results below the third SQL statement):


SELECT Count(a.ProductID) AS PriceNum, a.ProductName, a.UnitPrice
FROM Products AS a
   INNER JOIN Products AS b
   ON a.UnitPrice>=b.UnitPrice
GROUP BY a.ProductID, a.ProductName, a.UnitPrice
ORDER BY a.UnitPrice;


SELECT Count(a.ProductID) AS PriceNum, a.ProductName, a.UnitPrice
FROM Products AS a
   INNER JOIN Products AS b
   ON a.UnitPrice >= b.UnitPrice
GROUP BY a.ProductID, a.ProductName, a.UnitPrice
HAVING Count(a.ProductID)    
   Between (SELECT (Count(ProductID)/2)-0.5                  
                    FROM Products)   
   And        (SELECT (Count(a.ProductID)/2)+0.5            
                    FROM Products)
ORDER BY a.UnitPrice;



SELECT Avg(UnitPrice) AS MedianPrice
FROM
   (SELECT Count(a.ProductID) AS PriceNum, a.ProductName, a.UnitPrice
   FROM Products AS a
      INNER JOIN Products AS b
      ON a.UnitPrice>=b.UnitPrice
   GROUP BY a.ProductID, a.ProductName, a.UnitPrice
   HAVING Count(a.ProductID)        
      Between
         (SELECT (Count(ProductID)/2)-0.5                                       
          FROM Products)       
      And       
         (SELECT (Count(a.ProductID)/2)+0.5                                 
         FROM Products));

median product unit price