Intro To Structured Query Language (SQL)
Database Tables
Database TablesTables are probably a good place to start. Data that resides in a relational database is stored in what are referred to as tables. If you think of a table as a series of rows and columns similar to a spreadsheet then then you should have no problem following this tutorial. Each column contains a piece of data such as first name and birth date, while each row contains all of the data elements for a single record, or individual. For discussion purposes we'll use a few sample tables shown below, one to store family related information and one to store information specific to each individual family member. Another table describes attendee types and would be referred to as a code table. A couple of other tables list classes and classes attended by individuals that began on a certain date. In the example below you will notice that the information in the individual table contains a column that also exists in the family table, FAM_ID. This unique identifier links the two tables together, that is where the term relational comes into play. The other tables contain similarly shared columns, but more about that later, here is what the sample tables look like.There are 4 family records in the Family table.
There are 7 individual or family member records in the Individual table related to 4 family records.
There are 3 records in the Attendee Type table.
There are 4 records in the CLASS table.
There are 9 records in the CLASS_HISTORY table.
Back To Top KeysA primary key is a column or set of columns that uniquely identifies the rest of the data in any given row. For example, in the FAMILY table, the FAM_ID column uniquely identifies that row. The IND_ID column does the same in the INDIVIDUAL table. These unique column keys are used throughout the database to track the individuals and families, rather than the individual or family names. So even if two John Smith's existed in the INDIVIDUAL table they could be uniquely referenced using the IND_ID value.A foreign key is a column in a table where that column is a primary key of another table, which means that any data in a foreign key column must have corresponding data in the other table where that column is the primary key. This relationship preserves the referential integrity of the database. Note that each record in the INDIVIDUAL table references a column (FAM_ID) from the FAMILY table. The INDIVIDUAL FAM_ID column in this case is a foreign key because it is a primary key in the FAMILY table. A record could not reside in the INDIVIDUAL table without a related record in the FAMILY table. Back To Top SQL Select ClauseRemember that the minimum requirements for writing a SQL query is the SELECT & FROM clauses. The SELECT clause is where you request the pieces of information, the columns (separated by commas), that you want to see, and the FROM clause is where you define the table or tables from which the columns reside in.For example: SELECT column_name, column_name, column_name ... FROM table_name So if we wanted to retrieve a list of all families and their addresses, the query would look like this; SELECT NAME_LAST, SALUTATION, STREET_ADDRESS, CITY, STATE, MAIL_CODE FROM FAMILY; The results of the query would be:
Note that we did not request to see the FAM_ID column, specifically we asked to see the columns called NAME_LAST, SALUTATION, STREET_ADDRESS, CITY, STATE and MAIL_CODE. Each column and table name must be specified exactly as it is defined and will not contain any spaces. If you wanted to see every column without having to type each column name you could use an asterisk (*); SELECT * FROM table_name Or in our example: SELECT * FROM FAMILY; Note that in our example the column FAM_ID would then appear if you used the asterisk. Back To Top SQL Where ClauseThe WHERE clause is used to conditionally retrieve only the information that you want to display. Here is a list of the operators that can be used. It is important to note that when querying character or date/datetime type of data that the parameters used in the WHERE clause be surrounded with an apostrophe ( ' ), when querying against numeric type data it is not required.
To see how the WHERE clause works lets take a look at some examples. If we wanted to see a list of individuals that were married you could run the following query; SELECT NAME_LAST, NAME_FIRST FROM INDIVIDUAL WHERE MARITAL_STATUS = 'Married';
Back To Top Order ByThe ORDER BY clause is used to sort the result set in the desired sequence. The syntax for order by is;ORDER BY COLUMNNAME [ASC/DESC] The following example sorts the result in descending first name within ascending last name order. The default for the ORDER BY clause is ascending order so if you want your results to be in ascending order it is not necessary to specify the ASC parameter. SELECT NAME_LAST, NAME_FIRST FROM INDIVIDUAL WHERE MARITAL_STATUS = 'Married'; ORDER BY NAME_LAST ASC, NAME_FIRST DESC
An alternate syntax for ORDER BY is to use the column numbers as they appear in the SELECT statement. In the example we could have used; ORDER BY 1, 2 DESC In certain situations (such as with union queries which is not discussed in this tutorial) you are required to use the column number because the column names can be different. Back To Top Compound ConditionsThe AND operator joins two or more conditions, and displays a row only if ALL of the conditions are met.For example, building upon our query above, we may want to further restrict the query results using the date of birth column. If we only want to retrieve Individuals that are married and have a birth date on or before December 31, 1959 we could write; SELECT NAME_LAST, NAME_FIRST FROM INDIVIDUAL WHERE MARITAL_STATUS = 'Married' AND BIRTH_DT <= '1959-12-31';
Note: we could also have written the birth_dt portion of the where clause using the < operator; ...AND BIRTH_DT < '1960-01-01' also noteworthy is the fact that many databases combine date and time column types into a single datetime type of field which could prove extremely important when writing queries. When adding or updating a datetime field unless the time portion is explicity defined it will usually default to 00:00:00 (hours, minutes, seconds), which is the equivalent to midnight 12 a.m. For instance, if the time portion of the BIRTH_DT for Mary Samples was 7:39 p.m. (19:39:00) the above query would not retrieve the record for Mary Samples. In this case we might want to query using the following example. ...AND BIRTH_DT <= '1959-12-31 23:59:59' so that it would include the entire day for 12/31/1959 through 11:59:59 p.m. The OR operator joins two or more conditions, and displays a row if ANY of the conditions are met. For example, if we wanted to see a list of who had a marital_status of single or divorced with a birth date earlier than January 1, 1990 we could write a query like this; SELECT NAME_LAST, NAME_FIRST FROM INDIVIDUAL WHERE ( MARITAL_STATUS = 'Single' OR MARITAL_STATUS = 'Divorced' ) AND BIRTH_DT <= '1990-01-01';
The above query is for demonstration purposes only. Using an IN statement might provide a more efficient method to retrieve the information when querying against a single column using multiple arguments. You can combine AND and OR statements in your WHERE clause. If you do you will want to parenthesize your different statements to make sure you will be returning the results you think you are requesting, plus will add clarity to your query. For example: ...WHERE column_x = 0 AND (column_y = 1 OR column_z = 2) will not return the same results as ...WHERE (column_x = 0 AND column_y = 1) OR column_z = 2 Back To Top Using IN & BETWEENIn some cases a more efficient method to using compound statements is to use the IN or BETWEEN statements.In our previous example we wrote a query to retrieve those individuals with a marital status of either 'Single' or 'Divorced' that used a compound where clause (using OR). A more clear and efficient method to write the query is to use the IN statement, which does the equivalent of multiple OR statements for you. The IN statement is followed by a comma separated list of values to search for within a set of parenthesis. The following query will return the same result set as the earlier example; SELECT NAME_LAST, NAME_FIRST FROM INDIVIDUAL WHERE MARITAL_STATUS IN ('Single','Divorced') AND BIRTH_DT <= '1990-01-01'; The BETWEEN statement will search a column based upon a range of values. For example if we wanted return a list of individuals with a birth date within a certain range, for instance January 1, 1959 and December 31, 1959, we could write the following query. SELECT NAME_LAST, NAME_FIRST FROM INDIVIDUAL WHERE BIRTH_DT BETWEEN '1959-01-01 00:00:00' AND '1959-12-31 23:59:59';
Note that the search is inclusive of both parameters, that means that anyone born at midnight on 1/1/1959 or at 11:59 pm on 12/31/1959 would meet the query requirements and be retrieved. It is a better alternative than querying where a date is >= AND <= to some date range. Another option is to use NOT with an IN or BETWEEN. This would retrieve the direct opposite results. Examples of using NOT are; ...WHERE MARITAL_STATUS NOT IN ('Single','Divorced'); or ...WHERE BIRTH_DT NOT BETWEEN '1959-01-01 00:00:00' AND '1959-12-31 23:59:59'; Back To Top Using LIKEThe LIKE operator can be used to search a column at a somewhat higher level using special wildcard characters. For instance if you want to return records where a column value begins or ends with a certain character or set of characters.If you wanted to find all of the individuals whose last name began with an 'S' you could write a query like this; SELECT NAME_LAST, NAME_FIRST FROM INDIVIDUAL WHERE NAME_LAST LIKE 'S%'
The percent sign ( % ) represents any possible character or group of characters that would follow the 'S'. To search for those individuals whose last name ends with an 'S' you would use ...LIKE '%S' To search for those individuals whose last name contained an 'S' somewhere you would use ...LIKE '%S%' The underscore character ( _ ) can be used to represent a single character. This comes in handy when you know the exact position you are expecting a certain character to appear. For instance if you wanted to search for those individuals whose last name contained an S in the second position you would use ...LIKE '_S%' As with IN and BETWEEN, you can also use the NOT operator with LIKE, to search for individuals whose last name did not start with an 'S' you would use ...NAME_LAST NOT LIKE 'S%' For further details on using LIKE and other SQL commands you should consult your database manual or a good SQL reference book. Back To Top Joining TablesRecalling our discussion about keys, the reason that they exist is to uniquely identify records in a table and are used to preserve relationships between data across multiple tables so that the data does not have to be duplicated in every table. Therefore it is necessary to retrieve data from two or more tables in order to produce the desired results.In our example we see that a relationship between the INDIVIDUAL and FAMILY tables exist through the FAM_ID column. This allows us to tie the two tables together and determine the mailing address for each individual. The following example joins the INDIVIDUAL and FAMILY tables together to obtain the mailing address for women; SELECT INDIVIDUAL.NAME_LAST, INDIVIDUAL.NAME_FIRST, FAMILY.STREET_ADDRESS, FAMILY.CITY, FAMILY.STATE, FAMILY.MAIL_CODE FROM INDIVIDUAL, FAMILY WHERE INDIVIDUAL.FAM_ID = FAMILY.FAM_ID AND INDIVIDUAL.SEX='F'; Note that both tables that we are querying to obtain information from are listed in the FROM clause. The WHERE clause contains a statement that ties the two tables together by comparing the FAM_ID column from both tables, and the WHERE clause contains a statement to further restrict the results to only women. Also note that columns in the query are prefixed using the table from which they reside followed by a period ( . ), this is referred to as dot notation and avoids ambiguity in having the query engine determine which column to use on its own. This is only necessary when a column exists in more than one table being joined. For instance because the column SEX only resides in the INDIVIDUAL table and the address fields only reside inthe FAMILY table, it was not necessary to use dot notation, but was instead used for clarity. We can join to a third table, ATTENDEE_TYPE to display the Attendee Type Description for each individual. A relationship between the INDIVIDUAL and ATTENDEE_TYPE table exists through the column ATTEND_TP, which is the primary key in the ATTENDEE_TYPE table and a foreign key in the INDIVIDUAL table. SELECT INDIVIDUAL.NAME_LAST, INDIVIDUAL.NAME_FIRST, FAMILY.STREET_ADDRESS, FAMILY.CITY, FAMILY.STATE, FAMILY.MAIL_CODE, ATTENDEE_TYPE.ATTEND_DESC FROM INDIVIDUAL, FAMILY, ATTENDEE_TYPE WHERE INDIVIDUAL.FAM_ID = FAMILY.FAM_ID AND INDIVIDUAL.ATTEND_TP = ATTENDEE_TYPE.ATTEND_TP AND INDIVIDUAL.SEX='F'; Here's another example. To produce a report showing the ID and Name of those individuals that have taken a class, we will need to join the CLASS table, CLASS_HISTORY and INDIVIDUAL tables together. The CLASS_HISTORY table lists all individuals that attended a class beginning on a certain day, the CLASS table lists the description of each class and the INDIVIDUAL table lists each individual record. SELECT INDIVIDUAL.NAME_LAST, INDIVIDUAL.NAME_FIRST, CLASS_HISTORY.CLASS_ID, CLASS.CLASS_DESC FROM INDIVIDUAL IND, CLASS_HISTORY HST, CLASS CLS WHERE CLASS_HISTORY.CLASS_ID = CLASS.CLASS_ID AND CLASS_HISTORY.IND_ID = INDIVIDUAL.IND_ID Back To Top Using Alias's & Column ConcatenationAlias's are names that are assigned to tables in the FROM clause of a query that are then used as prefixes in all dot notations for columns referenced in the query. This makes it easier to write queries as well as provides clarity in reading a query, it may also be required to eliminate ambiguity caused when two or more tables are joined together and the query engine cannot determine which table the column you are referencing should be used. The alias can be one or more characters as long as it is unique and helps make the query more clear to you. It could be as simple as the letters a, b, c which would result in columns being referenced as a.columnname, b.columnname, etc.Concatenation is a fancy term used to describe the process of combining two or more character columns together. In the example below the alias's are IND for the INDIVIDUAL table and FAM for the FAMILY table. We are also using column concatenation to combine the last and first name columns and defining them as Name. The STREET_ADDRESS column is being alias'd as Street Address, and we are also concatenating the CITY, STATE & MAIL_CODE columns and aliasing it as City State Zip. Note that in our concatenation definition we are also embedding commas and spaces where necessary. SELECT 'Name' = IND.NAME_LAST + ', ' + IND.NAME_FIRST, 'Street Address' = FAM.STREET_ADDRESS, 'City State Zip' = FAM.CITY + ', ' + FAM.STATE + ' ' + FAM.MAIL_CODE FROM INDIVIDUAL IND, FAMILY FAM WHERE IND.FAM_ID = FAM.FAM_ID AND IND.SEX='F';
An alternate method to create a column alias can also be used using the following syntax; SELECT COLUMNNAME AS ALIASNAME ... Back To Top Aggregate FunctionsAggregate functions are functions that summarize results of a query instead of reporting each detail record. This document will discuss the following aggregate functions;
SELECT COUNT(*), SUM(SALARY), AVG(SALARY) FROM INDIVIDUAL; This query reports the smallest (MIN) and largest (MAX) salary amount from the INDIVIDUAL table WHERE the ATTEND_TP is equal to MEMBR and the SALARY amount is greater than zero. SELECT MIN(SALARY), MAX(SALARY) FROM INDIVIDUAL WHERE ATTEND_TP = 'MEMBR' AND SALARY > 0;
Aggregate functions can be combined with the GROUP BY clause to report aggregate figures at sub total levels. Refer to the Group By section for more information. Back To Top Group ByThe GROUP BY clause can be used to report aggregate figures at sub total levels. For instance we could further break down the query example discussed in the aggregate section to report totals by ATTEND_TP instead of reporting a single grand total figure for all individuals that meet the WHERE criteria.Here is an example of a query that reports the total (SUM) and average (AVG) salary amount from the INDIVIDUAL table for each ATTEND_TP where the SALARY amount is greater than zero. This time however the results are broken down by attendee_type. SELECT 'Attendee Type'=ATTEND_TP 'Total Salary'=SUM(SALARY), 'Average Salary'=AVG(SALARY) FROM INDIVIDUAL WHERE SALARY > 0 GROUP BY ATTEND_TP Note that the GROUP BY clause follows the WHERE clause.
Back To Top DistinctGetting Rid of Duplicates. If you wanted to produce a report showing the ID and Name of only those individuals that have ever taken a class, you may not want them to be listed more than once if they had attended more than one class. To prevent duplicate names you can use the DISTINCT keyword. The DISTINCT keyword is placed in the SELECT clause preceeding the column that would produce duplication.SELECT DISTINCT IND.NAME_LAST, IND.NAME_FIRST FROM INDIVIDUAL IND, CLASS_HISTORY HST WHERE HST.IND_ID = IND.IND_ID Back To Top For More InformationAs mentioned at the start, there are many other SQL commands that can be used to write much more complex queries. The intent of this tutorial was meant to provide a basic introduction to writing queries for the beginner to extract data from an existing relational database. For more information I encourage you to seek additional resources available over the world wide web as well as from a variety of books available from your local or on-line bookstore. Here are a couple of books to get you started available from Amazon.com.
Back To Top This article also available in Serbo-Croatian language translated by Jovana Milutinovich of Webhostinggeeks.com. © Copyright 1996-2012, Mount Vernon, Ohio USA, Thunder Software |