Whenever we use the COUNT function, we invariably use * as an argument to it. It works fine as long as its not joined with another tables. For example, in SQL Sever Pubs database, there are two tables – jobs and employee. If we need to find the number of employees for each job, the following query will not work
SELECT jobs.job_DESC
,count(*)
FROM JOBS
LEFT JOIN EMPLOYEE
ON JOBS.JOB_ID = EMPLOYEE.JOB_ID
GROUP BY JOBS.JOB_DESC
ORDER BY 2
The records returned by above query
Business Operations Manager 1
Chief Executive Officer 1
Chief Financial Officier 1
New Hire – Job not specified 1
Designer 3
Editor 3
Sales Representative 3
Acquisitions Manager 4
Managing Editor 4
Marketing Manager 4
Operations Manager 4
Productions Manager 4
Public Relations Manager 4
Publisher 7
But in database, the job "New Hire – Job not specified" has no records in employee. So, it should zero.
After doing some research, I found how silly I am. The correct query will be
SELECT jobs.job_DESC
,count(EMPLOYEE.JOB_ID)
FROM JOBS
LEFT JOIN EMPLOYEE
ON JOBS.JOB_ID = EMPLOYEE.JOB_ID
GROUP BY JOBS.JOB_DESC
ORDER BY 2
The results of this query is
New Hire – Job not specified 0
Business Operations Manager 1
Chief Executive Officer 1
Chief Financial Officier 1
Designer 3
Editor 3
Sales Representative 3
Acquisitions Manager 4
Managing Editor 4
Marketing Manager 4
Operations Manager 4
Productions Manager 4
Public Relations Manager 4
Publisher 7
Then, I went to books online and checked the argument list for COUT… it clearly states how the COUNT works..
COUNT(*) returns the number of items in a group, including NULL values and duplicates.
COUNT(ALL expression) evaluates expression for each row in a group and returns the number of nonnull values.
COUNT(DISTINCT expression) evaluates expression for each row in a group and returns the number of unique, nonnull values.
A simple thing, but today it took me sometime to figure this out. Good that I learnt this atleast now. I then thought how else can I achieve the above results by giving * as the argument to COUNT (but then, these are only possibilities and not efficient ways). It can be achieved by using Correlated sub queries and Union operation.
Correlated subquery
SELECT job_Desc
,(select count(*) from employee where employee.job_id = jobs.job_id) as count
FROM Jobs
ORDER BY 2
UNION operation
SELECT jobs.job_DESC
,count(EMPLOYEE.JOB_ID)
FROM JOBS
JOIN EMPLOYEE
ON JOBS.JOB_ID = EMPLOYEE.JOB_ID
GROUP BY JOBS.JOB_DESC
UNION
SELECT job_desc, 0
FROM jobs
WHERE job_id NOT in (select job_id from EMPLOYEE)
ORDER BY 2
Three different (may be more) ways to achieve a result. Certainly one is better than other two. No wonder why we programmers write bad code…