Subqueries (subqueries) in SQL Server
You can use a SELECT statement or a query to return records that will be used as criteria for another SELECT statement or query. The outer query is called the parent query and the inner query is called the subquery. The purpose of a subquery is to return results to the outer query. In other words, the inner query statement must return the column or columns used in the outer query’s criteria.
The simplest form of a subquery is a query that returns only one column. The parent query can use the results of this subquery with the “=”. Syntax:
SELECT <ColumnName> FROM <table> WHERE <ColumnName> = (SELECT <ColumnName> FROM <Table> WHERE <ColumnName> = <Condition>)
In a subquery, the innermost SELECT statement is executed first and its results are passed as the criteria for the outer SELECT statement.
Consider a situation where it is necessary to determine the due and delivery dates of the most recent orders.
SELECT DueDate,ShipDate FROM Sales.SalesOrderHeader WHERE Sales.SalesOrderHeader.OrderDate = (SELECT MAX(OrderDate) FROM Sales.SalesOrderHeader)
Here, a subquery was used to achieve the desired output. The inner query or subquery retrieves the most recent order date. This result is then passed to an external query, showing the due date and delivery date for all orders, orders that were fulfilled on that particular day.
Based on the results returned by the inner query, a subquery can be classified as either a scalar subquery ( scalar ) or a multi-valued query ( multi-valued ).
The following descriptions are available:
- The scalar subquery returns a single value. Here, the outer query must be written to process a single result.
- The multi-valued subquery returns the same result as a single-column table. Here, the outer query must be written to handle multiple possible results.
Working with multi-valued queries
If the “=” operator is used with a subquery, the subquery must return a single scalar value. If more than one value is returned, there will be an error and the query will not be processed. In such cases, the ANY, ALL, IN, and EXISTS keywords can be used with the WHERE clause of the SELECT statement when the query returns one column but one or more rows.
These keywords, also known as predicates, are used with multivalued queries. For example, consider that all first and last names of employees with the job title of ‘Research and Development Manager’ must be displayed. Here, the inner query may return more than one row, as there can be multiple employees with that job title. To ensure that the outer query can use the results of the inner query, the IN keyword will have to be used.
SELECT FirstName,LastName FROM Person.Person WHERE Person.Person.BusinessEntityID IN (SELECT BusinessEntityID FROM HumanResources.Employee WHERE JobTitle='Research and Development Manager');
Which of the SOME or ANY keywords evaluates to true if the result is an inner query containing at one row equal to the comparison. Compares a scalar value with a column of values. SOME and ANY are equivalent, both return the same result. They are rarely used.
Some guidelines when using sub queries:
- Data types ntext, text, image cannot be used in SELECT in subqueries
- The subquery SELECT list introduced with the comparison operator can have only one expression or column name.
- Sub queries when working with comparison operators with ANY or ALL keywords cannot use GROUP BY and HAVING clauses
- You cannot use the DISTINCT keyword with sub queries that contain a GROUP BY . clause
- You can specify ORDER BY only when TOP is also specified when using sub query
Besides scalar and multi valued subquery, you can also choose between independent subqueries and correlated subqueries. They are defined as follows:
- The standalone subquery is written as a standalone query, without any dependency on the external query. An independent subquery is processed once when the outer query runs and its results are passed to the outer query.
- The related subqueries refer to one or more columns from the outer query and therefore depend on the outer query. The related subqueries cannot be run separately from the outer query.
The EXISTS keyword is used with the subquery to check for the existence of the row returned by the subquery. It doesn’t actually return data and returns TRUE or FALSE
The syntax of the subquery containing the EXISTS keyword:
SELECT <ColumnName> FROM <table> WHERE [NOT] EXISTS (subquery_statement)
use AdventureWorks2019 go SELECT FirstName,LastName FROM Person.Person AS p WHERE EXISTS (SELECT * FROM HumanResources.Employee AS e WHERE JobTitle='Research and Development Manager' AND p.BusinessEntityID=e.BusinessEntityID )
Here, the inner subquery retrieves all records that match the job title of ‘Research and Development Manager’ and the BusinessEntityID has a BusinessEntityID that matches that record in the Person table. If there are no records that match both of these conditions, the inner subquery will not return any rows. However, the below code will return two rows because the given conditions are satisfied.
Similarly, you can use the NOT EXISTS keyword, which is the negative clause of EXISTS .
Nested Subqueries (Nested Subqueries)
A subquery defined within another subquery is called nested subqueries . Considering the situation where you want to retrieve and display the names of people from Canada in adventureworks2019, there is no direct way to retrieve this information because the Sales.SalesTerritory table is not related to the Person.Person table. Therefore, a nested subquery can be the solution.
SELECT LastName, FirstName FROM Person.Person WHERE BusinessEntityID IN (SELECT BusinessEntityID FROM Sales.SalesPerson WHERE TerritoryID IN (SELECT TerritoryID FROM Sales.SalesTerritory WHERE Name='Canada'))
Correlated Queries (Related Queries)
In Multiple Queries containing subqueries, the subquery is evaluated only once to provide the values required by the parent query. This is because in most queries, the subquery doesn’t refer to the parent query, so the value in the subquery stays the same.
However, if the subquery references the parent query, the subquery must be reevaluated for every iteration in the parent query. This is because the search criteria in the subquery depends on the value of a particular record in the parent query.
When a subquery receives parameters from the parent query, it is called a related subquery. Consider a situation where you want to retrieve all the BusinessEntityIDs of people whose contact information was last modified after 2012. To do this, you can use the related subquery as follows:
SELECT a.BusinessEntityID FROM Person.BusinessEntityContact a WHERE a.ContactTypeID IN (SELECT c.ContactTypeID FROM Person.ContactType c WHERE YEAR (a.ModifiedDate) >= 12)
In the code snippet, the inner query retrieves the contact type ids for all the people whose contact information was last modified after 2012. These results are then passed to the outer query, which results This matches the ContactTypeID in the Person.BusinessEntityContact table.