Why Indexed Views?
Views have been available throughout the
history of Microsoft SQL Server. However, using views that return very large
result sets can lead to poor performance, as the result set is not indexed and
the entire result must be table scanned if the view is used in a join or a sub-query
of a T-SQL command. Additionally, products like Oracle have come out with the
concept of a Materialized View that give an additional performance boost by
being able to have indexes built on a view. So in the continuing evolution of
the SQL Server product line and in response to Oracle’s Materialized View,
Microsoft SQL Server 2000 has a new feature called the View Index. View Indexes
give the product the capability to define an index on a view. Additionally, SQL
Server View Indexes are dynamic in that changes to the data in the base tables
are automatically reflected in the indexed view. Also the SQL Server query
optimizer will try to use an indexed view even if the view is not referenced in
the from clause of a T-SQL command. These features are not available in
Oracle’s Materialized Views.
Before SQL Server 2000,
what was a View?
Typically a view is thought of as a virtual
table, or a stored query. The results of using a view are not permanently
stored in the database. The data accessed through a view is actually
constructed using standard T-SQL select command and can come from one to many
different base tables or even other views. This T-SQL select command is stored
as a database object (a view). Developers can use the results from the view by
referencing the view name in T-SQL statements the same way they would reference
a real table. When referenced, the stored T-SQL that represents the view is
merged with the referencing T-SQL code and executed to come up with the final
results. Views have additional benefits of:
- Views provide a security mechanism by subsetting the data by
rows (All Active Customers, all customers in a certain state).
- Views
provide a security mechanism by subsetting the data by columns (Payroll
fields not shown in the Employee Phone List View).
- Views
can simplify complex queries into a single reference. Complex Join
operations that can make a normalized database design of several tables
into a single row in the result set of the view. This is great for
reporting tools like Crystal and Cognos.
- Views
give us aggregation capabilities (Min, Max, Count, Sum) where the data is
not stored but calculated.
- Views
can create other calculated fields based on values in the real underlying
tables.
- Views
can reference another view as one its “Base Tables”.
- Views
can hide the complexity of partitioned data (Sales from 1998 are in the
1998 table, Sales from 1999 are in the 1999 table, Sales from 2000 are in
the Current Table) .
- Views
can be updateable in certain situations (only update to 1 of the base
tables!).
- Views
do not incur overhead of additional permanent storage.
What are SQL Server Indexed
views?
Views in SQL Server are very similar to
those in previous versions with a few major exceptions when using Indexed
views. When a clustered index is created on the view, SQL Server immediately
allocates storage space to store the results of the view. You can then treat
the view like any other table by adding additional nonclustered indexes.
What are the
requirements for Indexed views?
There are several requirements that you must take into
consideration when using Indexed views.
- View definition must always return the same results from the
same underlying data.
- Views
cannot use non-deterministic functions.
- The
first index on a View must be a clustered, UNIQUE index.
- If
you use Group By, you must include the new COUNT_BIG(*) in the select
list.
- View
definition cannot contain the following
- TOP
- Text,
ntext or image columns
- DISTINCT
- MIN,
MAX, COUNT, STDEV, VARIANCE, AVG
- SUM
on a nullable expression
- A
derived table
- Rowset
function
- Another
view
- UNION
- Subqueries,
outer joins, self joins
- Full-text
predicates like CONTAIN or FREETEXT
- COMPUTE
or COMPUTE BY
- Cannot
include order by in view definition
Notice that Indexed Views change the very essence of what a view
was before this version of Sql Server. First, the data represented by the view
is actually stored in the database. Secondly, the view definition must always
return the same results for the same underlying data and all functions and
expressions must be deterministic no matter what the current session settings.
To make sure that you can meet this requirement,
the following session options must be set when you create an index view, when
you modify any of the tables included in an indexed view or when the optimizer
decides to use the indexed view as part of a query plan. Session Options that
must be on
ANSI_NULLS
ANSI_PADDING
ANSI_WARNINGS
ARITHABORT
CONCAT_NULL_YEILDS_NULL
QUOTED_IDENTIFIERS
Session
options that must be off
NUMERIC_ROUNDABORT
Functions like GetDate(), rand() are non-deterministic because of different session settings can
return different values and the settings for one user may not be the same as
for another. The list of deterministic and non-deterministic functions will be
included in the final version of Books Online for SQL Server 2000. (Look for
topic Deterministic and Nondeterministic Functions in the Books Online)
Besides these restrictions, the underlying tables
that make up the view must be protected from schema changes. Part of the syntax
of the create view command is the “with SCHEMABINDING”
phrase. This is required to create a View Index and this will prevent the
dropping or altering of tables participating in an Indexed View. Note that
dropping the clustered index of an Indexed View will return it to the standard
SQL view as it was as described above in the section Before SQL Server 2000,
what was a View?
How do I create an Indexed View?
- Make
sure that session properties are properly set.
- Create a
deterministic view with new SCHEMABINDING syntax.
- Create unique
Clustered Index.
- Optionally,
create additional nonclustered indexes.
Below you will find the code that you can paste into the Sql
Server Query Analyzer to test this yourself. This example is based on the
Northwind sample database.
-- Use the northwind database
USE NORTHWIND
GO
-- Make sure
that all of the session settings are set properly
IF sessionproperty('ARITHABORT') = 0 SET ARITHABORT ON
IF sessionproperty('CONCAT_NULL_YIELDS_NULL')
= 0 SET CONCAT_NULL_YIELDS_NULL ON
IF sessionproperty('QUOTED_IDENTIFIER') = 0 SET QUOTED_IDENTIFIER ON
IF sessionproperty('ANSI_NULLS') = 0 SET ANSI_NULLS ON
IF sessionproperty('ANSI_PADDING') = 0 SET ANSI_PADDING ON
IF sessionproperty('ANSI_WARNINGS') = 0 SET ANSI_WARNINGS ON
IF sessionproperty('NUMERIC_ROUNDABORT') = 1 SET NUMERIC_ROUNDABORT OFF
go
-- Create the
view, it must comply with the rules (deterministic)
CREATE VIEW PRODUCTS_BY_CUSTOMER
WITH SCHEMABINDING
AS
select customers.companyname,
products.productname,
sum(odetail.unitprice*odetail.quantity) as TotalPurchase,
count_big(*) as cnt
from dbo."order details" as
odetail
inner join dbo.orders as omain on omain.orderid = odetail.orderid
INNER join dbo.customers as customers on
customers.customerid =
omain.customerid
INNER join dbo.products as products on
products.productid =
odetail.productid
group by customers.companyname,
products.productname
go
-- the following
statement will cause an error if the view has not been
-- indexed
--EXEC
SP_SPACEUSED 'PRODUCTS_BY_CUSTOMER'
--Server: Msg
15235, Level 16, State 1, Procedure sp_spaceused, Line 91
--Views do not
have space allocated.
-- Check to see
if the indexes can be created
if ObjectProperty(object_id('products_by_customer'),'IsIndexable') = 1
BEGIN
-- Create a
clustered index, it MUST be unique
CREATE UNIQUE CLUSTERED INDEX PRODUCTS_BY_CUSTOMER_UNIQUE ON
PRODUCTS_BY_CUSTOMER(COMPANYNAME,
PRODUCTNAME)
EXEC SP_SPACEUSED 'PRODUCTS_BY_CUSTOMER'
-- Create
NonClustered Indexes
CREATE INDEX PRODUCTS_BY_CUSTOMER_1 ON
PRODUCTS_BY_CUSTOMER(COMPANYNAME)
EXEC SP_SPACEUSED 'PRODUCTS_BY_CUSTOMER'
-- Create
NonClustered Indexes
CREATE INDEX PRODUCTS_BY_CUSTOMER_2 ON
PRODUCTS_BY_CUSTOMER(PRODUCTNAME)
EXEC SP_SPACEUSED 'PRODUCTS_BY_CUSTOMER'
END
Please note the ObjectProperty(object_id('products_by_customer'),'IsIndexable') = 1 in the
above code listing. This command will tell you if all of the requirements for
indexing a view have been met so that you can programmatically determine if a
view can be indexed or not.
Also note that no space is allocated in the
database for this view until the clustered index is created. If you try to use
the SP_SPACEUSED stored
procedure on a view that is not indexed, you get an error. The results of the SP_SPACEUSED commands that are
sprinkled throughout the above code listing gives the following results on my
test machine.
|
# of Rows
|
Data
|
Index
|
Total Used
|
After Clustered Index Created
|
1685
|
168 kb
|
16 kb
|
184 kb
|
After NonClustered Index 1
|
1685
|
168 kb
|
168 kb
|
336 kb
|
After NonClustered Index 2
|
1685
|
168 kb
|
320 kb
|
488 kb
|
How do I use the
Indexed View?
You can use the view like you would any
other view. Also, the SQL Server query optimizer will attempt to use a View
Index even if the view is not referenced in the from clause, although you can
override this behavior with the Expand Views hint.
From the sample created in the above code
example, you could use the view as follows
Example 1: select * from
products_by_customer
Example 1 above lets the query optimizer
determine whether or not to use the view and its indexes or to use the base
tables. Surprising on my test machine, this example uses the base tables, not
the Indexed View. The query optimizer is a complex piece of technology but it
isn’t always perfect. Based on my testing with this sample data in the
Northwind database, I had to use the (noexpand) hint seen in the next example
to force the optimizer to use the View Index. The speed of this on my test
machine was about 3 times faster with 1685 records. By increasing the number of
records in the base tables (orders 3000 records and order details 224,696
records), I found that the query optimizer did use the View Index without
specifying the hint and the resulting query speeds where approximately 50 times
faster. The # of records in the view, after adding all of these records in the
base tables, was 1880 records. I conclude that the query optimizer with a small
number of records in the base table (Orders had about 830 and order details had
about 2155 records when I started) lean towards using the base tables instead
of the View index. More testing would be needed to nail down the break even
point but this just points out why the hints are still around and how much
faster performance can be when the View Indexes are used.
Example 2: select *
from products_by_customer with (noexpand)
Example 2 uses a hint to force the query
optimizer to consider only the view and its indexes in the execution plan.
Example 3: select *
from products_by_customer option (Expand Views)
Example 3 uses a hint to force the query
optimizer to expand all indexed views into their underlying Select statements
so the optimizer won’t consider any View Indexes in the execution plan.
When would I want to
use a View Index?
If you have an application that is a Data-Mart, Data-Mining or
decision-support type application, you can improve performance with View
Indexes. Applications that do any of the following may benefit as well:
- Joins and aggregations of big tables
- Repeated
patterns of queries
- Repeated
aggregations on the same or overlapping sets of columns
- Repeated
joins of the same tables on the same keys
Also, situations where you might consider
de-normalizing a set of tables by storing aggregate information in the parent
table may be good situations to consider creating an aggregate view on the
child table and creating the appropriate View Index. In essence, the Indexed
View replaces your de-normalized fields and all the work of keeping this
de-normalized aggregate field up to date is taken care of by the database
engine.
When would I NOT want
to use a View Index?
You obviously cannot use a View Index if
you need to include syntax in the view definition that is not allowed. It seems
to me that Top, Min, Max, Count, using another view, union, subqueiries and
outer joins are serious restrictions that would disqualify a large number of
views that I might want to optimize using the View Index.
Storage may also be a major consideration
as the data in the view is physically and permanently stored not only in its
base table, but in the clustered index of the View Index. Effectively, the data
is stored twice, once in its base table and once in the clustered index of the
Indexed View.
Also, On-Line Transaction Processing systems
(OLTP) will actually suffer performance loss if you try to use View Indexes.
Databases with frequent inserts, updates, deletes suffer as these commands must
not only process the table, but any associated Indexed Views that the base
table is participating in. Also, views that are simply subsets of rows or
columns with no aggregation or computation provide no benefit over a standard
SQL view or a T-SQL command directly against the base tables. This additional
overhead to update the data in the clustered index of the view I believe is the
reason that the clustered index must be unique for an Indexed view. It uses
this unique value to quickly update the appropriate record in the Indexed View.
Without a unique index, the processing for updating the Indexed View records
could be unacceptably long.
What are the
performance benefits?
As I indicated earlier, I experienced query
times 3 times quicker using the Indexed Views over the same query not using the
Indexed Views on the sample data in the NorthWind database. With a much bigger
data set and with the same database objects defined, I got query times as much
as 50 times faster. Microsoft has reported performance improvements of 10 to
100 times with applications that access indexed views instead of base tables. I
also experimented with using other queries that did not directly reference the
Indexed View and got similar performance gains when the optimizer selected the
Indexed View over other possible execution plans.
Summary
As you can see, even
with its restrictions, the View Index is a powerful new tool in the SQL Server
Developer’s toolbox. Because the optimizer can use a View Index, you won’t even
have to change your existing T-SQL to take advantage of the performance
benefits of the View Index. So take into consideration the information above
when evaluating whether a View Index is right for your application.
Posted By: Mr. Palash Paul