Wednesday, 12 October 2011

Indexed Views in SQL Server


Why Indexed Views?

Views have been available throughout the history of Microsoft SQL Server. However, using views that return very large result sets can lead to poor performance, as the result set is not indexed and the entire result must be table scanned if the view is used in a join or a sub-query of a T-SQL command. Additionally, products like Oracle have come out with the concept of a Materialized View that give an additional performance boost by being able to have indexes built on a view. So in the continuing evolution of the SQL Server product line and in response to Oracle’s Materialized View, Microsoft SQL Server 2000 has a new feature called the View Index. View Indexes give the product the capability to define an index on a view. Additionally, SQL Server View Indexes are dynamic in that changes to the data in the base tables are automatically reflected in the indexed view. Also the SQL Server query optimizer will try to use an indexed view even if the view is not referenced in the from clause of a T-SQL command. These features are not available in Oracle’s Materialized Views.

Before SQL Server 2000, what was a View?

Typically a view is thought of as a virtual table, or a stored query. The results of using a view are not permanently stored in the database. The data accessed through a view is actually constructed using standard T-SQL select command and can come from one to many different base tables or even other views. This T-SQL select command is stored as a database object (a view). Developers can use the results from the view by referencing the view name in T-SQL statements the same way they would reference a real table. When referenced, the stored T-SQL that represents the view is merged with the referencing T-SQL code and executed to come up with the final results. Views have additional benefits of:
  1. Views provide a security mechanism by subsetting the data by rows (All Active Customers, all customers in a certain state).
  2. Views provide a security mechanism by subsetting the data by columns (Payroll fields not shown in the Employee Phone List View).
  3. Views can simplify complex queries into a single reference. Complex Join operations that can make a normalized database design of several tables into a single row in the result set of the view. This is great for reporting tools like Crystal and Cognos.
  4. Views give us aggregation capabilities (Min, Max, Count, Sum) where the data is not stored but calculated.
  5. Views can create other calculated fields based on values in the real underlying tables.
  6. Views can reference another view as one its “Base Tables”.
  7. Views can hide the complexity of partitioned data (Sales from 1998 are in the 1998 table, Sales from 1999 are in the 1999 table, Sales from 2000 are in the Current Table) .
  8. Views can be updateable in certain situations (only update to 1 of the base tables!).
  9. Views do not incur overhead of additional permanent storage.

What are SQL Server Indexed views?

Views in SQL Server are very similar to those in previous versions with a few major exceptions when using Indexed views. When a clustered index is created on the view, SQL Server immediately allocates storage space to store the results of the view. You can then treat the view like any other table by adding additional nonclustered indexes. 

What are the requirements for Indexed views?

There are several requirements that you must take into consideration when using Indexed views.
  1. View definition must always return the same results from the same underlying data.
  2. Views cannot use non-deterministic functions.
  3. The first index on a View must be a clustered, UNIQUE index.
  4. If you use Group By, you must include the new COUNT_BIG(*) in the select list.
  5. View definition cannot contain the following
    1. TOP
    2. Text, ntext or image columns
    3. DISTINCT
    4. MIN, MAX, COUNT, STDEV, VARIANCE, AVG
    5. SUM on a nullable expression
    6. A derived table
    7. Rowset function
    8. Another view
    9. UNION
    10. Subqueries, outer joins, self joins
    11. Full-text predicates like CONTAIN or FREETEXT
    12. COMPUTE or COMPUTE BY
    13. Cannot include order by in view definition
Notice that Indexed Views change the very essence of what a view was before this version of Sql Server. First, the data represented by the view is actually stored in the database. Secondly, the view definition must always return the same results for the same underlying data and all functions and expressions must be deterministic no matter what the current session settings.

To make sure that you can meet this requirement, the following session options must be set when you create an index view, when you modify any of the tables included in an indexed view or when the optimizer decides to use the indexed view as part of a query plan. Session Options that must be on
ANSI_NULLS 
ANSI_PADDING
ANSI_WARNINGS
ARITHABORT
CONCAT_NULL_YEILDS_NULL
QUOTED_IDENTIFIERS
Session options that must be off
NUMERIC_ROUNDABORT
Functions like GetDate(), rand() are non-deterministic because of different session settings can return different values and the settings for one user may not be the same as for another. The list of deterministic and non-deterministic functions will be included in the final version of Books Online for SQL Server 2000. (Look for topic Deterministic and Nondeterministic Functions in the Books Online)

Besides these restrictions, the underlying tables that make up the view must be protected from schema changes. Part of the syntax of the create view command is the “
with SCHEMABINDING” phrase. This is required to create a View Index and this will prevent the dropping or altering of tables participating in an Indexed View. Note that dropping the clustered index of an Indexed View will return it to the standard SQL view as it was as described above in the section Before SQL Server 2000, what was a View?
How do I create an Indexed View?
  1. Make sure that session properties are properly set.
  2. Create a deterministic view with new SCHEMABINDING syntax.
  3. Create unique Clustered Index.
  4. Optionally, create additional nonclustered indexes.
Below you will find the code that you can paste into the Sql Server Query Analyzer to test this yourself. This example is based on the Northwind sample database.

-- Use the northwind database
USE NORTHWIND
GO

-- Make sure that all of the session settings are set properly
IF sessionproperty('ARITHABORT') = 0 SET ARITHABORT ON
IF sessionproperty('CONCAT_NULL_YIELDS_NULL') = 0 SET CONCAT_NULL_YIELDS_NULL ON
IF sessionproperty('QUOTED_IDENTIFIER') = 0 SET QUOTED_IDENTIFIER ON
IF sessionproperty('ANSI_NULLS') = 0 SET ANSI_NULLS ON
IF sessionproperty('ANSI_PADDING') = 0 SET ANSI_PADDING ON
IF sessionproperty('ANSI_WARNINGS') = 0 SET ANSI_WARNINGS ON
IF sessionproperty('NUMERIC_ROUNDABORT') = 1 SET NUMERIC_ROUNDABORT OFF
go

-- Create the view, it must comply with the rules (deterministic)
CREATE VIEW PRODUCTS_BY_CUSTOMER
WITH SCHEMABINDING
AS
select customers.companyname,
         products.productname,
         sum(odetail.unitprice*odetail.quantity) as TotalPurchase,
         count_big(*) as cnt
from dbo."order details" as odetail
inner join dbo.orders as omain on omain.orderid = odetail.orderid
INNER join dbo.customers as customers on customers.customerid = omain.customerid
INNER join dbo.products as products on products.productid = odetail.productid
group by customers.companyname,
         products.productname
go

-- the following statement will cause an error if the view has not been
-- indexed
--EXEC SP_SPACEUSED 'PRODUCTS_BY_CUSTOMER'
--Server: Msg 15235, Level 16, State 1, Procedure sp_spaceused, Line 91
--Views do not have space allocated.

-- Check to see if the indexes can be created
if ObjectProperty(object_id('products_by_customer'),'IsIndexable') = 1
BEGIN
-- Create a clustered index, it MUST be unique
CREATE UNIQUE CLUSTERED INDEX PRODUCTS_BY_CUSTOMER_UNIQUE ON
PRODUCTS_BY_CUSTOMER(COMPANYNAME, PRODUCTNAME)

EXEC SP_SPACEUSED 'PRODUCTS_BY_CUSTOMER'

-- Create NonClustered Indexes
CREATE INDEX PRODUCTS_BY_CUSTOMER_1 ON
PRODUCTS_BY_CUSTOMER(COMPANYNAME)

EXEC SP_SPACEUSED 'PRODUCTS_BY_CUSTOMER'

-- Create NonClustered Indexes
CREATE INDEX PRODUCTS_BY_CUSTOMER_2 ON
PRODUCTS_BY_CUSTOMER(PRODUCTNAME)

EXEC SP_SPACEUSED 'PRODUCTS_BY_CUSTOMER'

END

Please note the 
ObjectProperty(object_id('products_by_customer'),'IsIndexable') = 1 in the above code listing. This command will tell you if all of the requirements for indexing a view have been met so that you can programmatically determine if a view can be indexed or not.

Also note that no space is allocated in the database for this view until the clustered index is created. If you try to use the
SP_SPACEUSED stored procedure on a view that is not indexed, you get an error. The results of the SP_SPACEUSED commands that are sprinkled throughout the above code listing gives the following results on my test machine.
# of Rows
Data
Index
Total Used
After Clustered Index Created
1685
168 kb
16 kb
184 kb
After NonClustered Index 1
1685
168 kb
168 kb
336 kb
After NonClustered Index 2
1685
168 kb
320 kb
488 kb

How do I use the Indexed View?

You can use the view like you would any other view. Also, the SQL Server query optimizer will attempt to use a View Index even if the view is not referenced in the from clause, although you can override this behavior with the Expand Views hint.
From the sample created in the above code example, you could use the view as follows

Example 1: 
select * from products_by_customer
Example 1 above lets the query optimizer determine whether or not to use the view and its indexes or to use the base tables. Surprising on my test machine, this example uses the base tables, not the Indexed View. The query optimizer is a complex piece of technology but it isn’t always perfect. Based on my testing with this sample data in the Northwind database, I had to use the (noexpand) hint seen in the next example to force the optimizer to use the View Index. The speed of this on my test machine was about 3 times faster with 1685 records. By increasing the number of records in the base tables (orders 3000 records and order details 224,696 records), I found that the query optimizer did use the View Index without specifying the hint and the resulting query speeds where approximately 50 times faster. The # of records in the view, after adding all of these records in the base tables, was 1880 records. I conclude that the query optimizer with a small number of records in the base table (Orders had about 830 and order details had about 2155 records when I started) lean towards using the base tables instead of the View index. More testing would be needed to nail down the break even point but this just points out why the hints are still around and how much faster performance can be when the View Indexes are used.
Example 2: select * from products_by_customer with (noexpand)
Example 2 uses a hint to force the query optimizer to consider only the view and its indexes in the execution plan.
Example 3: select * from products_by_customer option (Expand Views)
Example 3 uses a hint to force the query optimizer to expand all indexed views into their underlying Select statements so the optimizer won’t consider any View Indexes in the execution plan.

When would I want to use a View Index?

If you have an application that is a Data-Mart, Data-Mining or decision-support type application, you can improve performance with View Indexes. Applications that do any of the following may benefit as well:
  • Joins and aggregations of big tables
  • Repeated patterns of queries
  • Repeated aggregations on the same or overlapping sets of columns
  • Repeated joins of the same tables on the same keys
Also, situations where you might consider de-normalizing a set of tables by storing aggregate information in the parent table may be good situations to consider creating an aggregate view on the child table and creating the appropriate View Index. In essence, the Indexed View replaces your de-normalized fields and all the work of keeping this de-normalized aggregate field up to date is taken care of by the database engine.

When would I NOT want to use a View Index?

You obviously cannot use a View Index if you need to include syntax in the view definition that is not allowed. It seems to me that Top, Min, Max, Count, using another view, union, subqueiries and outer joins are serious restrictions that would disqualify a large number of views that I might want to optimize using the View Index. 
Storage may also be a major consideration as the data in the view is physically and permanently stored not only in its base table, but in the clustered index of the View Index. Effectively, the data is stored twice, once in its base table and once in the clustered index of the Indexed View.

Also, On-Line Transaction Processing systems (OLTP) will actually suffer performance loss if you try to use View Indexes. Databases with frequent inserts, updates, deletes suffer as these commands must not only process the table, but any associated Indexed Views that the base table is participating in. Also, views that are simply subsets of rows or columns with no aggregation or computation provide no benefit over a standard SQL view or a T-SQL command directly against the base tables. This additional overhead to update the data in the clustered index of the view I believe is the reason that the clustered index must be unique for an Indexed view. It uses this unique value to quickly update the appropriate record in the Indexed View. Without a unique index, the processing for updating the Indexed View records could be unacceptably long.

What are the performance benefits?

As I indicated earlier, I experienced query times 3 times quicker using the Indexed Views over the same query not using the Indexed Views on the sample data in the NorthWind database. With a much bigger data set and with the same database objects defined, I got query times as much as 50 times faster. Microsoft has reported performance improvements of 10 to 100 times with applications that access indexed views instead of base tables. I also experimented with using other queries that did not directly reference the Indexed View and got similar performance gains when the optimizer selected the Indexed View over other possible execution plans.

Summary

As you can see, even with its restrictions, the View Index is a powerful new tool in the SQL Server Developer’s toolbox. Because the optimizer can use a View Index, you won’t even have to change your existing T-SQL to take advantage of the performance benefits of the View Index. So take into consideration the information above when evaluating whether a View Index is right for your application.


Posted By: Mr. Palash Paul

1 comment: