|
Intro To Structured Query Language (SQL)
The intent of this tutorial is to provide a basic introduction to writing queries so that beginners can begin to extract data from an existing relational database. While there are other SQL commands that can be executed to create tables and indexes or to insert, update and delete data, this tutorial will concentrate only on SQL queries to assist the reader in retrieving pre-existing data through definition and relevant example.
The SQL used in this tutorial is "ANSI" (American National Standards Institute), or standard SQL. No specific database management systems features will be referenced.
SQL is a query language that allows access to data residing in relational database management systems (RDBMS), such as Sybase, Oracle, Informix, DB2, Microsoft SQL Server, Access and many others. To retrieve information users execute 'queries' to pull the requested information from the database using criteria that is defined by the user.
A query, in its simplest form is constructed using the following basic query statements SELECT, FROM, WHERE and ORDER BY. The SELECT clause defines what columns or fields you want to see in your results, the FROM clause defines from what table the columns reside in, the WHERE clause defines any special criteria that must be met in order to be displayed, and finally the ORDER BY clause in which you define the sequence you want to display the results. While the only two query clauses that are required are SELECT and FROM, they are almost always accompanied by the WHERE and ORDER BY clauses to restrict the amount of data retrieved and to present it in an orderly fashion.
Database Tables
Keys
SQL Select Clause
SQL Where Clause
Order By
Compound Conditions
Using IN & BETWEEN
Using LIKE
Joining Tables
Using Aliases & Column Concatenation
Aggregate Functions
Group By
Distinct
For More Information
Database Tables
Tables are probably a good place to start. Data that resides in a relational database is stored in what are referred to as tables. If you think of a table as a series of rows and columns similar to a spreadsheet then then you should have no problem following this tutorial. Each column contains a piece of data such as first name and birth date, while each row contains all of the data elements for a single record, or individual. For discussion purposes we'll use a few sample tables shown below, one to store family related information and one to store information specific to each individual family member. Another table describes attendee types and would be referred to as a code table. A couple of other tables list classes and classes attended by individuals that began on a certain date. In the example below you will notice that the information in the individual table contains a column that also exists in the family table, FAM_ID. This unique identifier links the two tables together, that is where the term relational comes into play. The other tables contain similarly shared columns, but more about that later, here is what the sample tables look like.
There are 4 family records in the Family table.
FAMILY |
FAM_ID | NAME_LAST | SALUTATION | STREET_ADDRESS | CITY | STATE | MAIL_CODE |
001 | Samples | John & Mary | 1234 Main Street | Anytown | OH | 40000 |
002 | Jones | Bob | 376 North West Street | Somwhere | OK | 60000 |
003 | Smith | John & Nancy | 123 Treelined Lane | Overthere | CA | 90000 |
004 | Brown | Tom | 123 Elm Street | Anywhere | FL | 20000 |
There are 7 individual or family member records in the Individual table related to 4 family records.
INDIVIDUAL |
IND_ID | FAM_ID | NAME_LAST | NAME_FIRST | BIRTH_DT | MARITAL_STATUS | SEX | SALARY | ATTEND_TP |
001 | 001 | Samples | John | 1959-06-25 | Married | M | 35000 | MEMBR |
002 | 001 | Samples | Mary | 1959-12-31 | Married | F | 60000 | MEMBR |
003 | 002 | Jones | Bob | 1960-02-12 | Single | M | 40000 | MEMBR |
004 | 003 | Smith | John | 1963-08-23 | Married | M | 105000 | VISIT |
005 | 003 | Smith | Nancy | 1962-11-14 | Married | F | 0 | VISIT |
006 | 004 | Brown | Tom | 1961-11-11 | Divorced | M | 80000 | MEMBR |
007 | 003 | Smith | Susie | 1997-04-28 | Single | F | 0 | UNK |
There are 3 records in the Attendee Type table.
ATTENDEE_TYPE |
ATTEND_TP | ATTEND_DESC |
MEMBR | Member |
VISIT | Visitor |
UNK | Unknown |
There are 4 records in the CLASS table.
CLASS |
CLASS_ID | CLASS_DESC |
E101 | English 101 |
E201 | English 201 |
S101 | Spanish 101 |
S201 | Spanish 201 |
There are 9 records in the CLASS_HISTORY table.
CLASS_HISTORY |
CLASS_ID | CLASS_DT | IND_ID |
E101 | 1999-03-01 | 001 |
E101 | 1999-03-01 | 002 |
E101 | 1999-03-01 | 004 |
E101 | 1999-03-01 | 005 |
E101 | 2000-03-01 | 003 |
E101 | 2000-03-01 | 006 |
E201 | 2000-09-01 | 002 |
E201 | 2000-09-01 | 004 |
E201 | 2000-09-01 | 005 |
Back To Top
Keys
A primary key is a column or set of columns that uniquely identifies the rest of the data in any given row. For example, in the FAMILY table, the FAM_ID column uniquely identifies that row. The IND_ID column does the same in the INDIVIDUAL table. These unique column keys are used throughout the database to track the individuals and families, rather than the individual or family names. So even if two John Smith's existed in the INDIVIDUAL table they could be uniquely referenced using the IND_ID value.
A foreign key is a column in a table where that column is a primary key of another table, which means that any data in a foreign key column must have corresponding data in the other table where that column is the primary key. This relationship preserves the referential integrity of the database.
Note that each record in the INDIVIDUAL table references a column (FAM_ID) from the FAMILY table. The INDIVIDUAL FAM_ID column in this case is a foreign key because it is a primary key in the FAMILY table. A record could not reside in the INDIVIDUAL table without a related record in the FAMILY table.
Back To Top
SQL Select Clause
Remember that the minimum requirements for writing a SQL query is the SELECT & FROM clauses. The SELECT clause is where you request the pieces of information, the columns (separated by commas), that you want to see, and the FROM clause is where you define the table or tables from which the columns reside in.
For example:
SELECT column_name, column_name, column_name ...
FROM table_name
So if we wanted to retrieve a list of all families and their addresses, the query would look like this;
SELECT NAME_LAST, SALUTATION, STREET_ADDRESS, CITY, STATE, MAIL_CODE
FROM FAMILY;
The results of the query would be:
NAME_LAST | SALUTATION | STREET_ADDRESS | CITY | STATE | MAIL_CODE |
Samples | John & Mary | 1234 Main Street | Anytown | OH | 40000 |
Jones | Bob | 376 North West Street | Somwhere | OK | 60000 |
Smith | John & Nancy | 123 Treelined Lane | Overthere | CA | 90000 |
Brown | Tom | 123 Elm Street | Anywhere | FL | 20000 |
Note that we did not request to see the FAM_ID column, specifically we asked to see the columns called NAME_LAST, SALUTATION, STREET_ADDRESS, CITY, STATE and MAIL_CODE. Each column and table name must be specified exactly as it is defined and will not contain any spaces.
If you wanted to see every column without having to type each column name you could use an asterisk (*);
SELECT *
FROM table_name
Or in our example:
SELECT *
FROM FAMILY;
Note that in our example the column FAM_ID would then appear if you used the asterisk.
Back To Top
SQL Where Clause
The WHERE clause is used to conditionally retrieve only the information that you want to display. Here is a list of the operators that can be used. It is important to note that when querying character or date/datetime type of data that the parameters used in the WHERE clause be surrounded with an apostrophe ( ' ), when querying against numeric type data it is not required.
Relational Operators |
= | Equal |
<> or != (check your manual) | Not Equal |
< | Less Than |
> | Greater Than |
<= | Less Than or Equal To |
>= | Greater Than or Equal To |
To see how the WHERE clause works lets take a look at some examples.
If we wanted to see a list of individuals that were married you could run the following query;
SELECT NAME_LAST, NAME_FIRST
FROM INDIVIDUAL
WHERE MARITAL_STATUS = 'Married';
NAME_LAST | NAME_FIRST |
Samples | John |
Samples | Mary |
Smith | John |
Smith | Nancy |
Back To Top
Order By
The ORDER BY clause is used to sort the result set in the desired sequence. The syntax for order by is;
ORDER BY COLUMNNAME [ASC/DESC]
The following example sorts the result in descending first name within ascending last name order. The default for the ORDER BY clause is ascending order so if you want your results to be in ascending order it is not necessary to specify the ASC parameter.
SELECT NAME_LAST, NAME_FIRST
FROM INDIVIDUAL
WHERE MARITAL_STATUS = 'Married';
ORDER BY NAME_LAST ASC, NAME_FIRST DESC
NAME_LAST | NAME_FIRST |
Samples | Mary |
Samples | John |
Smith | Nancy |
Smith | John |
An alternate syntax for ORDER BY is to use the column numbers as they appear in the SELECT statement. In the example we could have used;
ORDER BY 1, 2 DESC
In certain situations (such as with union queries which is not discussed in this tutorial) you are required to use the column number because the column names can be different.
Back To Top
Compound Conditions
The AND operator joins two or more conditions, and displays a row only if ALL of the conditions are met.
For example, building upon our query above, we may want to further restrict the query results using the date of birth column. If we only want to retrieve Individuals that are married and have a birth date on or before December 31, 1959 we could write;
SELECT NAME_LAST, NAME_FIRST
FROM INDIVIDUAL
WHERE MARITAL_STATUS = 'Married' AND BIRTH_DT <= '1959-12-31';
NAME_LAST | NAME_FIRST |
Samples | John |
Samples | Mary |
Note: we could also have written the birth_dt portion of the where clause using the < operator;
...AND BIRTH_DT < '1960-01-01'
also noteworthy is the fact that many databases combine date and time column types into a single datetime type of field which could prove extremely important when writing queries. When adding or updating a datetime field unless the time portion is explicity defined it will usually default to 00:00:00 (hours, minutes, seconds), which is the equivalent to midnight 12 a.m.
For instance, if the time portion of the BIRTH_DT for Mary Samples was 7:39 p.m. (19:39:00) the above query would not retrieve the record for Mary Samples. In this case we might want to query using the following example.
...AND BIRTH_DT <= '1959-12-31 23:59:59'
so that it would include the entire day for 12/31/1959 through 11:59:59 p.m.
The OR operator joins two or more conditions, and displays a row if ANY of the conditions are met.
For example, if we wanted to see a list of who had a marital_status of single or divorced with a birth date earlier than January 1, 1990 we could write a query like this;
SELECT NAME_LAST, NAME_FIRST
FROM INDIVIDUAL
WHERE ( MARITAL_STATUS = 'Single' OR MARITAL_STATUS = 'Divorced' ) AND BIRTH_DT <= '1990-01-01';
NAME_LAST | NAME_FIRST |
Jones | Bob |
Brown | Tom |
The above query is for demonstration purposes only. Using an IN statement might provide a more efficient method to retrieve the information when querying against a single column using multiple arguments.
You can combine AND and OR statements in your WHERE clause. If you do you will want to parenthesize your different statements to make sure you will be returning the results you think you are requesting, plus will add clarity to your query.
For example:
...WHERE column_x = 0 AND (column_y = 1 OR column_z = 2)
will not return the same results as
...WHERE (column_x = 0 AND column_y = 1) OR column_z = 2
Back To Top
Using IN & BETWEEN
In some cases a more efficient method to using compound statements is to use the IN or BETWEEN statements.
In our previous example we wrote a query to retrieve those individuals with a marital status of either 'Single' or 'Divorced' that used a compound where clause (using OR).
A more clear and efficient method to write the query is to use the IN statement, which does the equivalent of multiple OR statements for you. The IN statement is followed by a comma separated list of values to search for within a set of parenthesis. The following query will return the same result set as the earlier example;
SELECT NAME_LAST, NAME_FIRST
FROM INDIVIDUAL
WHERE MARITAL_STATUS IN ('Single','Divorced') AND BIRTH_DT <= '1990-01-01';
The BETWEEN statement will search a column based upon a range of values. For example if we wanted return a list of individuals with a birth date within a certain range, for instance January 1, 1959 and December 31, 1959, we could write the following query.
SELECT NAME_LAST, NAME_FIRST
FROM INDIVIDUAL
WHERE BIRTH_DT BETWEEN '1959-01-01 00:00:00' AND '1959-12-31 23:59:59';
NAME_LAST | NAME_FIRST |
Samples | John |
Samples | Mary |
Note that the search is inclusive of both parameters, that means that anyone born at midnight on 1/1/1959 or at 11:59 pm on 12/31/1959 would meet the query requirements and be retrieved. It is a better alternative than querying where a date is >= AND <= to some date range.
Another option is to use NOT with an IN or BETWEEN. This would retrieve the direct opposite results. Examples of using NOT are;
...WHERE MARITAL_STATUS NOT IN ('Single','Divorced');
or
...WHERE BIRTH_DT NOT BETWEEN '1959-01-01 00:00:00' AND '1959-12-31 23:59:59';
Back To Top
Using LIKE
The LIKE operator can be used to search a column at a somewhat higher level using special wildcard characters. For instance if you want to return records where a column value begins or ends with a certain character or set of characters.
If you wanted to find all of the individuals whose last name began with an 'S' you could write a query like this;
SELECT NAME_LAST, NAME_FIRST
FROM INDIVIDUAL
WHERE NAME_LAST LIKE 'S%'
NAME_LAST | NAME_FIRST |
Samples | John |
Samples | Mary |
Smith | John |
Smith | Nancy |
Smith | Susie |
The percent sign ( % ) represents any possible character or group of characters that would follow the 'S'.
To search for those individuals whose last name ends with an 'S' you would use
...LIKE '%S'
To search for those individuals whose last name contained an 'S' somewhere you would use
...LIKE '%S%'
The underscore character ( _ ) can be used to represent a single character. This comes in handy when you know the exact position you are expecting a certain character to appear. For instance if you wanted to search for those individuals whose last name contained an S in the second position you would use
...LIKE '_S%'
As with IN and BETWEEN, you can also use the NOT operator with LIKE, to search for individuals whose last name did not start with an 'S' you would use
...NAME_LAST NOT LIKE 'S%'
For further details on using LIKE and other SQL commands you should consult your database manual or a good SQL reference book.
Back To Top
Joining Tables
Recalling our discussion about keys, the reason that they exist is to uniquely identify records in a table and are used to preserve relationships between data across multiple tables so that the data does not have to be duplicated in every table. Therefore it is necessary to retrieve data from two or more tables in order to produce the desired results.
In our example we see that a relationship between the INDIVIDUAL and FAMILY tables exist through the FAM_ID column. This allows us to tie the two tables together and determine the mailing address for each individual. The following example joins the INDIVIDUAL and FAMILY tables together to obtain the mailing address for women;
SELECT INDIVIDUAL.NAME_LAST, INDIVIDUAL.NAME_FIRST,
FAMILY.STREET_ADDRESS, FAMILY.CITY, FAMILY.STATE, FAMILY.MAIL_CODE
FROM INDIVIDUAL, FAMILY
WHERE INDIVIDUAL.FAM_ID = FAMILY.FAM_ID
AND INDIVIDUAL.SEX='F';
Note that both tables that we are querying to obtain information from are listed in the FROM clause. The WHERE clause contains a statement that ties the two tables together by comparing the FAM_ID column from both tables, and the WHERE clause contains a statement to further restrict the results to only women.
Also note that columns in the query are prefixed using the table from which they reside followed by a period ( . ), this is referred to as dot notation and avoids ambiguity in having the query engine determine which column to use on its own. This is only necessary when a column exists in more than one table being joined. For instance because the column SEX only resides in the INDIVIDUAL table and the address fields only reside inthe FAMILY table, it was not necessary to use dot notation, but was instead used for clarity.
We can join to a third table, ATTENDEE_TYPE to display the Attendee Type Description for each individual. A relationship between the INDIVIDUAL and ATTENDEE_TYPE table exists through the column ATTEND_TP, which is the primary key in the ATTENDEE_TYPE table and a foreign key in the INDIVIDUAL table.
SELECT INDIVIDUAL.NAME_LAST, INDIVIDUAL.NAME_FIRST,
FAMILY.STREET_ADDRESS, FAMILY.CITY, FAMILY.STATE, FAMILY.MAIL_CODE,
ATTENDEE_TYPE.ATTEND_DESC
FROM INDIVIDUAL, FAMILY, ATTENDEE_TYPE
WHERE INDIVIDUAL.FAM_ID = FAMILY.FAM_ID
AND INDIVIDUAL.ATTEND_TP = ATTENDEE_TYPE.ATTEND_TP
AND INDIVIDUAL.SEX='F';
Here's another example. To produce a report showing the ID and Name of those individuals that have taken a class, we will need to join the CLASS table, CLASS_HISTORY and INDIVIDUAL tables together. The CLASS_HISTORY table lists all individuals that attended a class beginning on a certain day, the CLASS table lists the description of each class and the INDIVIDUAL table lists each individual record.
SELECT INDIVIDUAL.NAME_LAST, INDIVIDUAL.NAME_FIRST, CLASS_HISTORY.CLASS_ID, CLASS.CLASS_DESC
FROM INDIVIDUAL IND, CLASS_HISTORY HST, CLASS CLS
WHERE CLASS_HISTORY.CLASS_ID = CLASS.CLASS_ID
AND CLASS_HISTORY.IND_ID = INDIVIDUAL.IND_ID
Back To Top
Using Alias's & Column Concatenation
Alias's are names that are assigned to tables in the FROM clause of a query that are then used as prefixes in all dot notations for columns referenced in the query. This makes it easier to write queries as well as provides clarity in reading a query, it may also be required to eliminate ambiguity caused when two or more tables are joined together and the query engine cannot determine which table the column you are referencing should be used. The alias can be one or more characters as long as it is unique and helps make the query more clear to you. It could be as simple as the letters a, b, c which would result in columns being referenced as a.columnname, b.columnname, etc.
Concatenation is a fancy term used to describe the process of combining two or more character columns together.
In the example below the alias's are IND for the INDIVIDUAL table and FAM for the FAMILY table. We are also using column concatenation to combine the last and first name columns and defining them as Name. The STREET_ADDRESS column is being alias'd as Street Address, and we are also concatenating the CITY, STATE & MAIL_CODE columns and aliasing it as City State Zip. Note that in our concatenation definition we are also embedding commas and spaces where necessary.
SELECT 'Name' = IND.NAME_LAST + ', ' + IND.NAME_FIRST,
'Street Address' = FAM.STREET_ADDRESS,
'City State Zip' = FAM.CITY + ', ' + FAM.STATE + ' ' + FAM.MAIL_CODE
FROM INDIVIDUAL IND, FAMILY FAM
WHERE IND.FAM_ID = FAM.FAM_ID
AND IND.SEX='F';
Name | Street Address | City State Zip |
Samples, Mary | 1234 Main Street | Anytown, OH 40000 |
Smith, Nancy | 123 Treelined Lane | Overthere, CA 90000 |
Smith, Susie | 123 Treelined Lane | Overthere, CA 90000 |
An alternate method to create a column alias can also be used using the following syntax;
SELECT COLUMNNAME AS ALIASNAME ...
Back To Top
Aggregate Functions
Aggregate functions are functions that summarize results of a query instead of reporting each detail record. This document will discuss the following aggregate functions;
- SUM (columnname) reports the column total for all rows meeting the conditions in the WHERE clause, column must be numeric
- AVG (columnname) reports the average for the column
- MAX (columnname) reports the the largest figure for the column
- MIN (columnname) reports the smallest figure for the column
- COUNT(*) reports the number of rows meeting the conditions in the WHERE clause
This query reports the total number of records (COUNT), and the total (SUM) and average (AVG) of all salaries from the INDIVIDUAL table.
SELECT COUNT(*), SUM(SALARY), AVG(SALARY)
FROM INDIVIDUAL;
This query reports the smallest (MIN) and largest (MAX) salary amount from the INDIVIDUAL table WHERE the ATTEND_TP is equal to MEMBR and the SALARY amount is greater than zero.
SELECT MIN(SALARY), MAX(SALARY)
FROM INDIVIDUAL
WHERE ATTEND_TP = 'MEMBR'
AND SALARY > 0;
MIN(SALARY) | MAX(SALARY) |
35000 | 80000 |
Aggregate functions can be combined with the GROUP BY clause to report aggregate figures at sub total levels. Refer to the Group By section for more information.
Back To Top
Group By
The GROUP BY clause can be used to report aggregate figures at sub total levels. For instance we could further break down the query example discussed in the aggregate section to report totals by ATTEND_TP instead of reporting a single grand total figure for all individuals that meet the WHERE criteria.
Here is an example of a query that reports the total (SUM) and average (AVG) salary amount from the INDIVIDUAL table for each ATTEND_TP where the SALARY amount is greater than zero. This time however the results are broken down by attendee_type.
SELECT 'Attendee Type'=ATTEND_TP 'Total Salary'=SUM(SALARY), 'Average Salary'=AVG(SALARY)
FROM INDIVIDUAL
WHERE SALARY > 0
GROUP BY ATTEND_TP
Note that the GROUP BY clause follows the WHERE clause.
Attendee Type | Total Salary | Average Salary |
MEMBR | 215000 | 53750 |
VISIT | 105000 | 105000 |
Back To Top
Distinct
Getting Rid of Duplicates. If you wanted to produce a report showing the ID and Name of only those individuals that have ever taken a class, you may not want them to be listed more than once if they had attended more than one class. To prevent duplicate names you can use the DISTINCT keyword. The DISTINCT keyword is placed in the SELECT clause preceeding the column that would produce duplication.
SELECT DISTINCT IND.NAME_LAST, IND.NAME_FIRST
FROM INDIVIDUAL IND, CLASS_HISTORY HST
WHERE HST.IND_ID = IND.IND_ID
Back To Top
For More Information
As mentioned at the start, there are many other SQL commands that can be used to write much more complex queries. The intent of this tutorial was meant to provide a basic introduction to writing queries for the beginner to extract data from an existing relational database. For more information I encourage you to seek additional resources available over the world wide web as well as from a variety of books available from your local or on-line bookstore. Here are a couple of books to get you started available from Amazon.com.
|
Sams Teach Yourself SQL in 10 Minutes (4th Edition)
Sams Teach Yourself SQL in 10 Minutes has established itself as the gold standard for introductory SQL books, offering a fast-paced accessible tutorial to the major themes and techniques involved in applying the SQL language. Forta's examples are clear and his writing style is crisp and concise. As with earlier editions, this revision includes coverage of current versions of all major commercial SQL platforms. New this time around is coverage of MySQL, and PostgreSQL. All examples have been tested against each SQL platform, with incompatibilities or platform distinctives called out and explained.
|
| |
|
Beginning SQL Queries: From Novice to Professional (Books for Professionals by Professionals)
Beginning SQL Queries is aimed at laypeople who need to extract information from a database who are new to SQL. The book is especially useful for business intelligence analysts who must ask more complex questions of their database than their GUI based reporting software supports. Such people might be business owners wanting to target specific customers, scientists and students needing to extract subsets of their research data, or end users wanting to make the best use of databases for their clubs and societies.
|
Back To Top
|
|
|