dbaspot
Tags Register FAQ Calendar Search Today's Posts Mark Forums Read

25 SQL Commandments - databases

This is a discussion on 25 SQL Commandments - databases ; 25 SQL Commandments By Suresh Aiyer These hints--many of them previously unpublished--can help you avoid the plague of poorly written SQL statements The importance of efficient SQL statements in an Oracle-based application cannot be downplayed; poorly written ones can cause ...


Home > Database Forum > Other Databases > databases > 25 SQL Commandments

Reply

 

LinkBack Thread Tools Display Modes
  #1  
Old 09-08-2005, 11:05 PM
Database Bot
 
Join Date: Sep 2009
Posts: 1,236,254
Database Administrator is on a distinguished road
Default 25 SQL Commandments




25 SQL Commandments
By Suresh Aiyer
These hints--many of them previously unpublished--can help you avoid
the
plague of poorly written SQL statements
The importance of efficient SQL statements in an Oracle-based
application
cannot be downplayed; poorly written ones can cause havoc in the
database.
Because in many organizations power users access the production
databases
via reporting tools and direct queries, efficiently written SQL not
only
results in better application performance but also reduces traffic on
the
network. Therefore, power users as well as developers must have a good
understanding of Oracle's optimizer and of tuning concepts that can
make
their statements more efficient and less risky.
In this article, I'll summarize 25 of the most useful SQL
tuning
tips for making SQL statements run faster. Although some of these
techniques have been previously described in Oracle manuals and various
journals, many others have never been published and are appearing here
for
the first time.

1. Know your data and business application well.
Identical information can often be retrieved from different business
data
sources. Familiarize yourself with these sources; you must be aware of
the
data volume and distribution in your database. Your should also have a
thorough understanding of your data model--such as the relationships
among
business entities--before writing your SQL. This understanding will
help
you write much better queries for retrieving information from multiple
tables. CASE tools such as Designer/2000 do a very good job of
documenting
relations among different business and database objects.
2. Test your queries with realistic data.
Most organizations have three database environments: development, test,
and
production. Programmers use the development database environment to
create
and test applications, which are then more rigorously examined in the
test
environment by programmers and users before they are migrated to the
production environment.
When a SQL is being tested in the test environment, make
sure
the test database contains data that reflects the production database.
A
SQL statement tested with unrealistic data may behave differently when
used
in production. To ensure rigorous testing, the data distribution in the
test environment must also closely resemble that in the production
environment.
3. Write identical SQL statements in your applications.
Take full advantage of bind variables, stored procedures, and packages
wherever possible. The benefits of identical SQL statements include
reduced
memory use on the database server and faster execution, as parsing is
unnecessary. For example, these statements are not identical:

select * from employee where empid = 10;
SELECT * FROM EMPLOYEE WHERE EMPID = 10;
select * from employee where empid = 20;
but when using a bind variable named i_empid, the statement would be:
select * from employee where empid = :i_empid;


4. Use indexes on the tables craefully.
Be sure to create all the necessary indexes on the tables. However, too
many of them can degrade performance. So how do you choose which
columns to
index?
* Make indexes on columns that are used frequently in the
WHERE
clause of the application SQL or queries used by end users.
* Index those columns that are used frequently to join
tables in
SQL statements.
* Use only index columns that have a small percentage of
rows
with the same value.
* Do not make indexes on those columns that are used only
with
functions and operators in the WHERE clause of queries.
* Do not index columns that are frequently modified or when
the
efficiency gained due to index creation results in more efficiency loss
due
to inserts, updates, and deletes. These operations will slow down due
to
the need to maintain indexes.
* Unique indexes are better than nonunique indexes because
of
better selectivity. Use unique indexes on primary key columns and
nonunique
indexes on foreign keys and columns frequently used in the WHERE
clause.
* Create the index so that the columns used in the WHERE
clause
make up a leading portion of the index.
5. Make an indexed path available.
To take advantage of indexes, write your SQL in such a manner that an
indexed path is available to it. The optimizer cannot use an access
path
that uses an index simply because the index exists; the access path
must be
made available by the SQL. Using SQL hints is one of the ways to ensure
the
index is used. See the following tips to select a particular access
path.
6. Use Explain Plan and TKPROF where possible.
If your SQL statements are not well-tuned, they may be inefficient even
if
the Oracle database itself is "well-oiled." Become familiar with
Explain
Plan and TKPROF tools and use them to your advantage. Explain Plan
helps in
discovering the access path used by a SQL; TKPROF shows its actual
performance statistics. These tools are bundled with Oracle server
software
and can help you improve SQL performance.
7. Understand the optimizer.
SQL can be executed by using either a rule-based optimizer or
cost-based
optimizer approach. The rule-based optimizer approach is more common in
older applications; many Oracle shops have used this approach for years
and
are quite happy with it. But for new applications, consider using the
cost-
based optimizer. Oracle is updating the cost-based optimizer in each
release, making it more stable and reliable. If you choose to use it,
be
sure to perform ANALYZE schema regularly. ANALYZE schema stores
database
statistics in the data dictionary tables, which are then used by the
cost-
based optimizer.
SQL can be tuned only when using the cost-based optimizer.
If
you're planning to switch from the rule-based approach to the
cost-based
optimizer approach, you should evaluate the performance of every SQL
statement in every application using the database.
8. Think globally when acting locally.
Always remember: Any changes you make in the database to tune one SQL
statement may affect the performance of other statements used by
applications and users.
9. The WHERE clause is crucial.
The following WHERE clauses would not use the index access path even if
an
index is available (COL1 and COL2 are in the same table and the index
is
created on COL1):
* COL1 > COL2
* COL1 < COL2
* COL1 > = COL2
* COL1 <= COL2
* COL1 IS NULL
* COL1 IS NOT NULL. An index does not store the ROWIDs for
columns that have null values. So querying for rows with null values
cannot
use that index.
* COL1 NOT IN (value1, value2 )
* COL1 != expression
* COL1 LIKE '%pattern'. In this case, the leading edge of
the
index is suppressed so the index cannot be used. On the other side,
COL1
LIKE 'pattern%' or COL1 LIKE 'pattern%pattern%' can use index because
they
would result in a bounded-range index scan.
* NOT EXISTS subquery
* expression1 = expression2. Any expressions, functions, or
calculations involving indexed columns would prohibit usage of the
index.
In the following example, using the UPPER SQL function would prevent
index
scan and result in a full table scan.
SELECT DEPT_NAME
FROM DEPARTMENT
WHERE UPPER(DEPT_NAME) like 'SALES%';
10. Use WHERE instead of HAVING for record filtering.
Avoid using the HAVING clause along with GROUP BY on an indexed column.
In
this case, the index is not used. Furthermore, exclude rows with WHERE
clause rather than using HAVING. If the EMP table has an index on
column
DEPTID, the following query cannot take advantage of it:

SELECT DEPTID,
SUM(SALARY)
FROM EMP
GROUP BY DEPTID
HAVING DEPTID = 100;


However, the same query can be rewritten to exploit the
index:

SELECT DEPTID,
SUM(SALARY)
FROM EMP
WHERE DEPTID = 100
GROUP BY DEPTID;


11. Specify the leading index columns in WHERE clauses.
For a composite index, the query would use the index as long as the
leading
column of the index is specified in the WHERE clause. The following
query
would use the composite index based on the primary key constraint on
the
PART_NUM and PRODUCT_ID columns:

SELECT *
FROM PARTS
WHERE PART_NUM = 100;


whereas this query would not use the composite index:

SELECT *
FROM PARTS
WHERE PRODUCT_ID = 5555;


The same request can rewritten to take advantage of the
index.
In this query, it is assumed that the PART_NUM column will always have
a
value greater than zero:

SELECT *
FROM PARTS
WHERE PART_NUM > 0
AND PRODUCT_ID = 5555;


12. Evaluate index scan vs. full table scan.
If selecting more than 15 percent of the rows from a table, full table
scan
is usually faster than an index access path. In such cases, write your
SQLs
so that they use full table scans.
The following statements would not use index scans even if
an
index is created on the SALARY column. In the first SQL, using the FULL
hint forces Oracle to employ full table scan. When using an index does
more
harm than good, you can also use these techniques to suppress the use
of
the index.

SELECT * --+FULL
FROM EMP
WHERE SALARY = 50000;

SELECT *
FROM EMP
WHERE SALARY+0 = 50000;


The following statements would not use index scans even if
an
index is created on the SS# column:

SELECT *
FROM EMP
WHERE SS# || ' ' = '111-22-333';


An index is also not used if Oracle has to perform implicit
data
conversion. For the following example, SALARY is a numeric column in
the
table EMP and a character value is converted into a numeric value:

SELECT *
FROM EMP
WHERE SALARY = '50000';


When the percentage of table rows accessed is 15 percent or
less, an index scan will work better because it results in multiple
logical
reads per row accessed, whereas a full table scan can read all the rows
in
a block in one logical read. Thus, the performance of full table scan
is
better when accessing a large percentage of rows from a table.
To illustrate this point, say the ANALYZE command is issued
against the EMP table and all its indexes. Oracle generates the
following
statistics in the data dictionary table USER_TABLES and USER_INDEXES:

Table Statistics:
NUM_ROWS = 1000
BLOCKS = 100


Index Statistics:
BLEVEL = 2
AVG_LEAF_BLOCKS_PER_KEY = 1
AVG_DATA_BLOCKS_PER_KEY = 1


Based on these statistics, the following would be the
logical
reads (block accessed) for different types of scans:

Use of index to return one row = 3
(BLEVEL+(AVG_LEAF_BLOCKS_PER_KEY - 1) + AVG_DATA_PER_KEY

Full table scan = 100
(BLOCKS)

Use of index to return all rows = 3000
(NUM_ROWS * Blocks accessed to return one row using index)

13. Use ORDER BY for index scan.
Oracle's optimizer will use an index scan if the ORDER BY clause is on
an
indexed column. The following query illustrates this point. This query
would use the index available on EMPID column even though the column is
not
specified in the WHERE clause. The query would retrieve ROWID for each
row
from the index and access the table using the ROWID.

SELECT SALARY
FROM EMP
ORDER BY EMPID;

If this query performs poorly, you can try another
alternative
by rewriting the same query using the FULL hint described earlier
(Number
12).

14. Know thy data.
As I've already explained, you have to know your data intimately. For
example, say you have a table called BOXER containing two columns--
BOXER_NAME and SEX--with a nonunique index on column SEX. If there are
an
equal number of male and female boxers, the following query will run
faster
if Oracle performs a full table scan:

SELECT BOXER_NAME
FROM BOXER
WHERE SEX = 'F';
You can ensure the query performs a full table scan by rewriting it as
:

SELECT BOXER_NAME --+ FULL
FROM BOXER
WHERE SEX = 'F';

If the table contains 980 male boxers, this query would be
faster because it results in index scan:

SELECT BOXER_NAME --+ INDEX (BOXER BOXER_SEX)
FROM BOXER
WHERE SEX = 'F';

This example illustrates the importance of being familiar
with
data distribution. SQL performance will vary as the database grows and
data
distribution changes. Oracle added a HISTOGRAMS function with Oracle7.3
so
that the optimizer can be aware of the data distribution in a table and
select an appropriate execution plan.

15. Know when to use large-table scans.
When retrieving rows from a very small table or very large table, a
full
table scan may result in better performance over an index scan. An
index
scan on a very large table may require scanning of many index and table
blocks. When these blocks are brought in the database buffer cache,
they
are kept as long as possible. These blocks may not be needed by other
queries so the database buffer hit ratio may decline and the
performance of
a multiuser system may suffer. However, the blocks read by a full table
scan are removed from the database buffer cache much earlier and the
database buffer hit ratio is not affected.

16. Minimize table passes.
Usually, reducing the number of table passes in a SQL query results in
better performance. Queries with fewer table passes mean faster
queries.
Here's an example. The STUDENT table has four columns named NAME,
STATUS,
PARENT_INCOME, and SELF_INCOME. The name is the primary key. The values
of
the STATUS column is 0 for independent students and 1 for dependent
students.
The following query returns the name and income for
dependent
as well as independent students. It results in two passes through the
STUDENT table, creates a temporary table for processing, and initiates
a
sort to eliminate duplicates:

SELECT NAME, PARENT_INCOME
FROM STUDENT
WHERE STATUS = 1
UNION
SELECT NAME, SELF_INCOME
FROM STUDENT
WHERE STATUS = 0;


The same request can be fulfilled by this query, which
results
in only one pass through the table:

SELECT NAME, PARENT_INCOME*STATUS + SELF_INCOME(1-STATUS)
FROM STUDENT;


17. Join tables in the proper order.
The order in which tables are joined in a multiple table join is
crucial.
Overall, fewer rows will be processed if tables are joined in the
correct
order.
Always perform the most restrictive search first to filter
out
the maximum number of rows in the early phases of a multiple table
join.
This way, the optimizer will have to work with fewer rows in the
subsequent
phases of join, improving performance. You should carry fewer rows to
the
subsequent phases of a join. Furthermore, make sure that the driving
table
(the table searched first in the nested loop join) is the table that
returns the least number of rows. In a master and detail tables join
(i.e.,
ORDERS and ORDER LINE ITEMS tables), ensure that the master table is
connected first; connecting a detail table first will usually result in
more rows getting joined.
For the rule-based optimizer, the driving table should be
the
last table specified in the FROM clause. The rule-based optimizer makes
the
last table specified in the FROM clause the driving table in the nested
loop join. If the join is resulting in nested loop join, consider
making
index available for the inner table search. Explain Plan and TKPROF
output
show the join type, table join order, and number of rows processed in
every
phase of a join.
For the cost-based optimizer, the order in which tables
appear
in the WHERE clause is irrelevant as the optimizer will try to find the
best execution plan on its own. Consider using the ORDERED hint to
control
the table join order. In the following query, tables would be joined in
the
order in which they appear in the WHERE clause:

SELECT ORDERS.CUSTID, ORDERS.ORDERNO,
ORDER_LINE_ITEMS.PRODUCTNO --+ORDERED
FROM ORDERS, ORDER_LINE_ITEMS
WHERE ORDERS.ORDERNO = ORDER_LINE_ITEMS.ORDERNO;


18. Use index-only searches when possible.
If possible, write queries that use index-only searches. The optimizer
will
need to search only the index and not the table to satisfy such
queries, so
performance is better; the optimizer will use an index-only search if
all
the information to satisfy the SQL can be found in the index itself. If
the
EMP table has a composite index on LANME and FNAME columns, the
following
query will result in index-only search:

SELECT FNAME
FROM EMP
WHERE LNAME = 'SMITH';

whereas this query results in an index scan as well as table search by
ROWID:

SELECT FNAME , SALARY
FROM EMP
WHERE LNAME = 'SMITH';

Therefore, pay close attention to the column list in the
SELECT
clause. Include only those columns that you really need.

19. Redundancy is good.
Provide as much information as possible in the WHERE clause. For
example,
if the WHERE clause is WHERE COL1 = COL2 and COL1 = 10, the optimizer
will
infer that COL2 =10. But if the WHERE clause is WHERE COL1 = COL2 and
COL2
= COL3, then optimizer will not infer that COL1 = COL3.

20. Keep it simple, stupid.
Make your SQL statements as simple as possible. Very complex SQL
statements
can overwhelm the optimizer; sometimes writing multiple, simpler SQLs
will
yield better performance than a single complex SQL statement. Oracle's
cost-
based optimizer is not completely robust. It's in the development stage
but
getting better with each Oracle release. As a result, you have to keep
an
eye on the Explain Plan cost estimate. "Cost" is a relative
term--nobody
seems to know what the cost numeric value means--but the smaller this
value, the better the SQL performance. So tune the statement for the
lower
cost.
It may often be more efficient to use temporary tables and
break up a complex SQL join involving many tables. For example, if a
join
involves eight tables with large amount of data, it might be better to
split the SQL into two or three SQLs, each involving at most a
four-table
join and storing the intermediate results in precreated temporary
tables.

21. You can reach the same destination in different ways.
In many cases, more than one SQL statement can get you the same desired
results. Each SQL may use a different access path and may perform
differently. For example, the MINUS operator can be much faster than
using
WHERE NOT IN (SELECT ) or WHERE NOT EXISTS.
Let's say we have an index on a STATE column and another
index
on an AREA_CODE column. Despite the availability of indexes, the
following
statement will require a full table scan due to the usage of the NOT IN
predicate:

SELECT CUSTOMER_ID
FROM CUSTOMERS
WHERE STATE IN ('VA', 'DC', 'MD')
AND AREA_CODE NOT IN (804, 410);


However, if the same query is rewritten as the following, it will
result in
index scans:

SELECT CUSTOMER_ID
FROM CUSTOMERS
WHERE STATE IN ('VA', 'DC', 'MD')
MINUS
SELECT CUSTOMER_ID
FROM CUSTOMERS
WHERE AREA_CODE IN (804, 410);


If a SQL involves OR in the WHERE clause, it can also be
rewritten by substituting UNION for OR in the WHERE clause. You must
carefully evaluate execution plans of all SQLs before selecting one to
satisfy the information request. You can use Explain Plan and TKPROF
tools
for this process.

22. Use the special columns.
Take advantage of ROWID and ROWNUM columns. Remember, the ROWID search
is
the fastest. Here's an example of UPDATE using ROWID scan:

SELECT ROWID, SALARY
INTO TEMP_ROWID, TEMP_SALARY
FROM EMPLOYEE;

UPDATE EMPLOYEE
SET SALARY = TEMP_SALARY * 1.5
WHERE ROWID = TEMP_ROWID;


A ROWID value is not constant in a database, so don't
hard-code
a ROWID value in your SQLs and applications.
Use ROWNUM column to limit the number of rows returned. If
you're not sure how many rows a SELECT statement will return, use
ROWNUM to
restrict the number of rows returned. The following statement would not
return more than 100 rows:

SELECT EMPLOYE.SS#, DEPARTMENT.DEPT_NAME
FROM EMPLOYEE, DEPENDENT
WHERE EMPLOYEE.DEPT_ID = DEPARTMENT.DEPT_ID
AND ROWNUM < 100;

23. Use explicit cursors over implicit cursors.
Implicit cursors result in an extra fetch. Explicit cursors are opened
by
developers by using DECLARE, OPEN, FETCH and CLOSE cursor statements.
Implicit cursors are opened by Oracle for DELETE, UPDATE, INSERT and
SELECT
statements.

24. Explore and take advantage of the Oracle parallel query
option.
Using the parallel query option, you can execute SQL in parallel for
faster
performance. In Oracle7, only queries based on a full table scan can be
processed in parallel. In Oracle8, queries based on indexed range scans
can
be parallelized if the index is partitioned. Parallel query option
should
be used only for SMP and MPP systems containing multiple disk drives.
The Oracle server has many features, but the presence of
these
features alone does not ensure faster performance. You must configure
the
database for these features and write SQL specifically to take
advantage of
them. For example, the following SQL would be executed in parallel:

SELECT * --+PARALLEL(ORDERS,6)
FROM ORDERS;


25. Reduce network traffic and increase throughput.
Using array processing and PL/SQL blocks can achieve better performance
as
well as reduce network traffic. Array processing allows a single SQL
statement to process multiple rows. For example, using arrays in an
INSERT
statement can insert 1,000 rows in a table. Significant performance
gains
can be achieved in client/server and batch systems using this
technique.
Multiple SQL statements can cause heavy network traffic.
However, if the SQL statements are within a single PL/SQL block, the
entire
block can be sent to the Oracle server, processed there, and results
returned to the application running on the client.


Faster Than Fast

Developers and users often treat SQL as a simple method of sending and
retrieving data from the database. Sometimes applications are written
using
code generators without writing SQL directly, resulting in significant
performance problems. Their performance degrades as the database
continues
to grow.
Because SQL is flexible, one can get the same results with
many
different statements, but some statements are more efficient than
others.
Using the tips and techniques described here, applications and
customized
reports can be designed to provide information to users as quickly as
possible.

Suresh Aiyer is a senior consultant based in the Washington,
D.C. area. He specializes in the design, development, and
administration of
Oracle-based systems. He can be reached at (703) 716-2555.



search - home - archives - contacts - site index


Copyright 1997 Miller Freeman Inc. All Rights Reserved
Redistribution without permission is prohibited.
Questions? Comments? We would love to hear from you!

Reply With Quote
  #2  
Old 09-09-2005, 04:59 PM
Database Bot
 
Join Date: Sep 2009
Posts: 1,236,254
Database Administrator is on a distinguished road
Default Re: 25 SQL Commandments

well I guess it was relevant in 1997, since then the optimizers have
improved, so things like this:

"* Use only index columns that have a small percentage of
rows
with the same value.
* Do not make indexes on those columns that are used only
with
functions and operators in the WHERE clause of queries

Create the index so that the columns used in the WHERE
clause
make up a leading portion of the index.

SQL hints is one of the ways to ensure
the
index is used

Avoid using the HAVING clause along with GROUP BY on an indexed column.

In
this case, the index is not used.

For a composite index, the query would use the index as long as the
leading
column of the index is specified in the WHERE clause.
"

are no longer always true

Regarding this magical number 15:

"If selecting more than 15 percent of the rows from a table, full table

scan
is usually faster than an index access path.

When the percentage of table rows accessed is 15 percent or
less, an index scan will work better because it results in multiple
logical
reads per row accessed, "

I don't recall if it was completely true even in 1997. to my best
knowledge It depends and used to depend on things like number of rows
per page and cluster factor. But maybe I'm wrong and there was a
hardcoded constant 15 at that time...

Reply With Quote
Reply

Thread Tools
Display Modes


Similar Threads

Thread Thread Starter Forum Replies Last Post
SQL Server high utilization Database Administrator ms-sqlserver 20 05-07-2007 09:29 PM
Is NTBACKUP breaking SQL Server 2005 differential backup chain..again? Database Administrator ms-sqlserver 12 04-13-2007 02:26 AM
RE: SQL Server 2000 Newbie Database Administrator ms-sqlserver 0 01-22-2007 08:09 PM
Sybase FAQ: 1/19 - index Database Administrator sybase 13 07-20-2004 12:16 AM


All times are GMT -4. The time now is 03:37 AM.