Tuesday, 15 November 2011

The Yield and Recursion Statement in C#

Another often overlooked C# statement that was introduced in .NET 2.0 is yield. This keyword is used to return items from a loop within a method and retain the state of the method through multiple calls. That is a bit hard to wrap your head around, so as always, an example will help;

public static IEnumerable<int> Range( int min, int max )
   for ( int i = min; i < max; i++ )
      yield return i;

Each time Range is called, it will return one value in the given range. Of interest is that it maintains the state between calls. The best way to think of it is that for the first call to the method, execution starts at the first line and continues until it hits a yield statement at which time it returns the value. The subsequent runs through the method continue execution at the statement after the yield, continuing to yield values or exiting at the end of the method.
Using this, you can write the following code;

foreach ( int i in Range( 10, 20 ) )
   Console.Write( i.ToString() + " " );

Which will return the following;
10 11 12 13 14 15 16 17 18 19

The above example doesn’t really show you the power, so let’s try a more complex example, calculating prime numbers.

public static IEnumerable<int> Primes( int max )
   yield return 2;
   List<int> found = new List<int>();
   found.Add( 3 );
   int candidate = 3;
   while ( candidate <= max )
      bool isPrime = true;
      foreach ( int prime in found )
         if ( prime * prime > candidate )
         if ( candidate % prime == 0 )
            isPrime = false;
      if ( isPrime )
         found.Add( candidate );
         yield return candidate;
      candidate += 2;
Notice that there are multiple yields within this method and that the state is maintained through each call. You can now print out all of the prime numbers below 50 with;

foreach ( int prime in Primes( 50 ) )
   Console.Write( prime.ToString() + " " );

There is also the yield break statement. If a yield break statement is hit within a method, execution of that method stops with no return. Using this, the first method could be rewritten like this;

public static IEnumerable<int> Range( int min, int max )
   while ( true )
      if ( min >= max )
         yield break;
      yield return min++;

It is not as useful in this case, but I’ll leave it to you to find more interesting reasons to break out of a loop. Also, even though I used the generic IEnumerable<T> for all of my examples, you can also use IEnumerable.

What is recursion?
Basically, in simple terms, recursion is a function that calls itself iteratively until it reaches a stopping point. In another words, recursion is a common method of simplification to divide a problem into sub-problems of the same type. As a computer programming technique, this is called "divide and conquer" and is key to the design of many important algorithms particularly in Arterial Intelligence programming.
The file system is a very good example of recursive programming. Let us say we want to iterate through all the files in all the folders under C:\Program Files\Adobe\Acrobat 5.0 folder as in the above image, the following definition is correct:
ENU has some files and has a parent folder which is HELP. Similarly, HELP has some files and has a parent folder which is Acrobat 5.0 and so on until it reaches the target folder of C:\Program Files\Adobe\Acrobat 5.0. And it could be correct the other way around, i.e. C:\Program Files\Adobe\Acrobat 5.0 has some files and a child folder, Acrobat 5.0 has some files and a child folder until it reaches the final child ENU.
Therefore, if we write down the code to iterate though one of these folders and its files and call it repeatedly, then we have solved the problem. In other words, we are dividing the Bigger Problem of iterating through the Adobe folder to sub-problems which is just iterating through a single folder and calling it iteratively and the problem is resolved.

How recursion works?
Basically it is defining a problem in the form of itself and calling it repeatedly. However the important point to notice is that there should be a Stopping Point or 'End Condition'.

Advantages of recursion:
  • Game programming
  • Artificial Intelligence
  • Most of the mathematic functions can be presented in the form of recursion
  • For some problems, it very easy to understand them using recursion
Enough of the theory. Now let us have a look at some actual C# code.

namespace Recursion
    class IterateFolders
        public    static void Main(string[] args)
            //Create a Directory object using DirectoryInfo

            DirectoryInfo dir =
               new DirectoryInfo(@"C:\Program Files\Adobe\Acrobat 5.0");
            //Pass the Directory for displaying the contents



        //this is the recursive function

        public static void getDirsFiles(DirectoryInfo d)
            //create an array of files using FileInfo object

            FileInfo [] files;
            //get all files for the current directory

            files = d.GetFiles("*.*");

            //iterate through the directory and print the files

            foreach (FileInfo file in files)
                //get details of each file using file object

                String fileName = file.FullName;
                String fileSize = file.Length.ToString();
                String fileExtension =file.Extension;
                String fileCreated = file.LastWriteTime.ToString();

                io.WriteLine(fileName + " " + fileSize +
                   " " + fileExtension + " " + fileCreated);
            //get sub-folders for the current directory

            DirectoryInfo [] dirs = d.GetDirectories("*.*");
            //This is the code that calls
            //the getDirsFiles (calls itself recursively)
            //This is also the stopping point
            //(End Condition) for this recursion function
            //as it loops through until
            //reaches the child folder and then stops.

            foreach (DirectoryInfo dir in dirs)
                io.WriteLine("--------->> {0} ", dir.Name);


Posted By: Mr. Palash Paul

Thursday, 10 November 2011

Reclaiming Table Space

SQL Server will reuse space made available when rows are deleted from a table. This occurs automatically without any outside intervention on our part.

However under certain circumstances, SQL Server does not automatically reclaim space once used. If a table definition is altered to drop one or more variable length columns, the space consumed by those columns is not immediately made available for reuse by SQL Server. But that's to say that the space is forever lost.

To illustrate this behavior and the aforementioned DBCC utility, let's consider an example. Let's create a table with three columns and populate it with some test data as shown in the following code.

--create a test table



         col1 INT,

         col2 VARCHAR(50),

         col3 VARCHAR(MAX)

        ) ;


--Inserting Data

INSERT INTO dbo.Test (col1, col2, col3)

VALUES (1, 'AA', 'AAAA'),

       (1, 'BB', 'BBBB'),

       (1, 'CC', 'CCCC'),

       (1, 'DD', 'DDDD'),

    (1, 'EE', 'EEEE')……….. Needs Lot of Data to Entered

Now let's view the table to make sure we have what we think we have.


FROM   dbo.Test

Using a Dynamic Management View, let's see how much space our newly created table is consuming.

SELECT alloc_unit_type_desc,




FROM   sys.dm_db_index_physical_stats(DB_ID(),




                                      'Detailed') ;

alloc_unit_type_desc    : IN_ROW_DATA

page_count        :84

avg_page_space_used_in_percent : 78.51964418087472 

record_count : 5000


Now let's drop the third column, the one that consumes the most space, and check the space used once again.



We get the same results - 84 data pages, storing 1000 rows, each 78.6% full - even after dropping the column that consumed the most space.

Now, let's reclaim the space using the DBCC CLEANTABLE command and recheck the space consumed.



      ( { 'database' | database_id | 0 }

         , { 'table' | table_id | 'view' | view_id }

            [ , batch_size]




    batch_size   - The no. of rows to process per transaction.

                       default (or if 0 is specified) = whole table

    NO_INFOMSGS  - Suppress all information messages.

If 0 is specified, the current database will be used.

DBCC CLEANTABLE('tempdb', 'dbo.Test')

This time, considerable less space is consumed; the average page is only filled 4.5% full!




Wednesday, 9 November 2011

Partitioning in SQL Server 2008



Partitioning is a feature designed to improve the performance of queries made against a very large table. It works by having more than one subset of data for the same table. All the rows are not directly stored in the table, but they are distributed in different partitions of this table. When you query the table looking for data in a single partition, or just a few, then due to the presence of these different subsets, you should receive a quicker response from the server.

There has been a long history, in SQL Server, as this feature has evolved. It first started with partitioned views. A developer had to create different tables having same schema and use these tables through a UNION operation in a view. Now after 2005, your tables and indexes can also be partitioned. This feature is further improved upon in SQL Server 2008.

SQL Server only supports one type of partitioning, which is Range Partitions. More specifically I should say 'Horizontal Range Partitions'. This the partitioning strategy in which data is partitioned based on the range that the value of a particular field falls in. The other partitioning types are reference, hash, list etc. partitions, which are not supported currently in SQL Server.

Simply saying, when we partition a table, we define on which portion of a table a particular row will be stored. Now you must be wondering what would be the criterion that a specific row would be saved in a particular partition. There are actually rules defined for ranges of data to fill a particular partition. These ranges are based on a particular column; this is called the Partition Key. It should be noted that these ranges are non-overlapping. To achieve this, a Partition Functionand a Partition Scheme is defined.

There might be many questions in your mind. Let's go step by step. We start with the terminologies used in Partitioning.


The Partition Function is the function that defines the number of partitions. This is the first step in the implementation of partitioning for your database object. One partition function can be used to partition a number of objects. The type of partition is also specified in the partition function, which currently can only be 'RANGE'.

Based on the fact about boundary values for partitions that which partition they should belong to, we can divide partition function into two types:

  1. Left: The first value is the maximum value of the first partition.
  2. Right: The first value is the minimum value of the second partition.

The syntax for the creation of a partition function is as follows:

CREATE PARTITION FUNCTION partition_function_name ( input_parameter_type )
FOR VALUES ( [ boundary_value [ ,...n ] ] ) [ ; ]



Generally the input_parameter_type is numeric. SQL Server supports only certain input_parameter_types. The list of partition functions defined can be obtained by using the SYS.PARTITION_FUNCTIONS catalog view:

SELECT * FROM sys.partition_functions

The first thing that you would want to do is to test whether your partition function is implemented as per your desire or not. Specially, we can check if it is working on boundary values. You can check it with the special function provided: $Partition. We test the partition function MyPartitionFunc2 created by us earlier. In this SQL, we are verifying to which partition, (Partition Key = 100), would belong to.


To find out the boundary values defined by partition functions, partition_range_values catalog view is available.


Though SQL Server does not directly support List Partitioning, you can create list partitions by tricking the partition function to specify the values with the LEFT clause. After that, put a CHECK constraint on the table, so that no other values are allowed to be inserted in the table specifying Partition Key column any value other than the 'list' of values.


This is the physical storage scheme that will be followed by the partition. To define scheme, different file groups are specified, which would be occupied by each partition. It must be remembered that all partitions may also be defined with only one file group.

After the definition of a partition function, a partition scheme is defined. The partition scheme just like specifying an alignment for data i.e. it specifies the specific file groups used during partitioning an object. Though it is possible to create all partitions on PRIMARY but it would be best if these different partitions are stored in a separate file groups. This gives some performance improvement even in the case of single core computers. It would be best if these file groups are on different discs on a multi core processing machine.

The syntax for creating partition schema is as follows:

CREATE PARTITION SCHEME partition_scheme_name
AS PARTITION partition_function_name
[ ALL ] TO ( { file_group_name | [ PRIMARY ] } [ ,...n ] )
[ ; ]



The list of partition schemes can be obtained by using the SYS.PARTITION_SCHEMES.


To get the list of all data spaces SYS.DATASPACES catalog view may be used:



After creation of a partition scheme, a table may be defined to follow that scheme. In this case the table is called PARTITIONED. A partitioned table may have a partitioned index. Partition aligned index views may also be created for this table. These index and view may be based on different partition strategy (partition function and partition scheme).

There may be question in your mind if it is possible to partition your table using multiple columns. The answer may be YES or NO. Why? No, because there is no such direct support for this in SQL Server. Yes, because you can still do that by using persisted computed column based on any number of columns you want. It must be remembered that this is still a single dimension partition.

Now you might be wondering whether your existing tables could be partitioned or not. For partitioning your existing table just drop the clustered index on your table and recreate it on the required partition scheme.

[ database_name . [ schema_name ] . | schema_name . ] table_name
( { <column_definition> | <computed_column_definition> }
[ <table_constraint> ] [ ,...n ] )
[ ON { partition_scheme_name ( partition_column_name ) | filegroup
| "default" } ]
[ { TEXTIMAGE_ON { filegroup | "default" } ] [ ; ]



SQL Server 2008 has introduced an exciting new feature, 'FileStream'. Remember if you have any column specified which is based on filestream data, you have to follow some additional steps to get things to work with partitioning.


We can also partition our indexes, and this should contribute to improved performance. If indexes are partitioned they serve as the local indexes for each partition. We can also go with Global indexes on a partitioned table without caring about the different partitions of the table.

It must be remembered that indexes can be partitioned using a different partition key than the table. The index can also have different numbers of partitions than the table. We cannot, however, partition the clustered index differently from the table. To partition an index, ON clause is used, specifying the partition scheme along with the column when creating the index:

ON MyPartitionScheme(MyID2);

You can see that Mytable is indexed on the MyID1 column, but the index is partitioned on the MyID2 column. If your table and index use the same Partition function then they are called Aligned. If they go further and also use the same partition scheme as well, then they are called Storage Aligned (note this in the figure below). If you use the same partition function for partitioning index as used by the table, then generally performance is improved. 

Try this out

Before looking at an example implementation of partitioning, let us create a new database to test this feature. Let us create a database with two file groups FG1 and FG2. Make sure that you create PartitionPractice folder in C directory before running the following query:

USE master
( NAME = db_dat,
FILENAME = 'c:\PartitionPractice\db.mdf',
SIZE = 4MB),
( NAME = FG1_dat,
FILENAME = 'c:\PartitionPractice\FG1.ndf',
SIZE = 2MB),
( NAME = FG2_dat,
FILENAME = 'c:\PartitionPractice\FG2.ndf',
( NAME = db_log,
FILENAME = 'c:\PartitionPractice\log.ndf',
USE MyPartitionPracticeDB

1. STEP # 01 (Create partition function)

RANGE LEFT FOR VALUES (1000, 2000, 3000, 4000, 5000);

2. STEP # 02 (Create partition scheme)

([FG1], [FG2])

3. STEP # 03 (Create table or index based on the partition scheme)

The important clause related to partitioning is ON clause in CREATE TABLE statement. We create our table MyPartitionedTable as follows:

CREATE TABLE MyPartionedTable
 Name VARCHAR(50)
ON MyPartitionScheme(ID)


SQL Server also supports other operations with partitions. These operations help in managing a partitioned object. They are as follows:

1. Split
2. Merge
3. Switch

Remember that these operations are meta data operations and do not involve any movement of data.


This operation is supported to accommodate the continuous changes when a new Partition needs to be added if the already existing partitions are very large. This is technically called a Partition Split. ALTER TABLE statements support partition split with the required options. To split a partition, the ALTER PARTITION FUNCTION statement is used.


But before splitting, a new file group must be created and added to the partition scheme using the partition function being modified, otherwise, this statement would cause an error while executing.


The Merge operation is used to remove an existing partition. The syntax of MERGE statement is nearly the same as the SPLIT statement. This statement would remove the partition created in SPLIT command shown before this.



This operation is used to switch a partition in or out of a partitioned table. It must be remembered that for switching in, the already existing partition must be empty within the destination table. In the following example, we are switching in a table MyNewPartition as 4th partition to MyPartitionedTable.

ALTER TABLE MyNewPartTable switch TO MyPartitionedTable PARTITION 4

In the following example we are switching partition 3 out to a table MyPartTable:

ALTER TABLE MyPartitionedTable switch PARTITION 4 TO MyNewPartTable

We partitioned our table because of large amount of data. In order to speed up data retrieval, we have incorporated several indexes on our table. This would certainly help in satisfying the query requirements of users, but it adds additional time while updating or adding records to the table given the large amount of data in the table. So it is better that we insert our records in an empty table, which is exactly same as the partitioned table. Insert new records in that table and then switch that partition into the partitioned table. We discuss this further as we will be discussing the Sliding Window Technique.

The requirement for the table when switching in and out of a partition means that both of them must also have same clustered and non-clustered indexes, and additionally, the same constraints.


If you have multiple processing cores on your server then partitioning your large tables could also result in more optimized parallel execution plans by the database engine. It must be remembered that in SQL Server 2005, a single thread was created per partition (at max). So if there are less number of partitions then the number of processors, all the cores were not optimally utilized and some processors used to be sitting idle. With SQL Server 2008, this restriction has been removed. Now all processors can be used to satisfy the query requirement.

When SQL Server uses parallel execution plan for partition tables, you can see Parallelism operator in the execution plan of the query. In actual there are multiple threads working on different partitions of the table. Each partition is processed by a single thread. We can control parallel plan of our queries by using MAXDOP hint as part of query or plan guide. First we see how we could control the degree of parallelism using query hints:

SELECT * FROM MyPartitionedTable

The plan can also be created to support parallelism:

EXEC sp_create_plan_guide
 @name = N'Guide1',
 @stmt = N'SELECT * FROM MyPartitionedTable
 ORDER BY OrderDate DESC',
 @type = N'SQL',
 @module_or_batch = NULL,
 @params = NULL,
 @hints = N'OPTION (MAXDOP 4)';

Remember if you are creating a custom plan guide for your query then you must not have specified (Maximum Degree Of Parallelism) MAXDOP = 1 as an option. Additionally the configuration option of maximum degree of parallelism for the server should be appropriately set.

sp_configure 'show advanced options', 1;
sp_configure 'max degree of parallelism', 0;


Threading support for partitions has also been improved in SQL Server 2008. Earlier in 2005 one thread was assigned to each partition. So if there are e.g. 40 threads available and table has only 2 partitions then only 2 threads may be utilized for accessing these partitions' data. In 2008, all the partitions are accessed by all the available threads. These threads access table partitions in a round robin fashion.

You need to make sure that the 'max worker thread' configuration is properly set for your SQL Server instance.

WHERE NAME = 'max worker threads'

This feature is also important in the case of non-partitioned table. This is because back in 2005 if we had large amounts of data in a non-partitioned table then only one thread was available to it. Now in 2008 all the available thread could utilize it which results in boosting the performance of data access.

Many operations can be done in parallel due to partitioning like loading data, backup and recovery and query processing.


SQL Server cannot have an unlimited number of partitions. The limitation is 1000 partitions in 2005. Based on this limitation we cannot switch in partitions forever. So, how to resolve this? The Sliding Window Technique is the answer.

According to this technique, a table always has some specified number of partitions. As the new partition comes, the oldest partition is switched out of the table and archived. This process can goes on forever. Additionally, this should be very fast to do since it is a metadata operation.

This technique could give a drastic boost in performance based on the fact that so long as data is being loaded into the source staging table, your partitioned table is not locked at all. You can even improve that using bulk inserts. After it is loaded the switching in process is basically a metadata operation.

This process is so regular that you can automate this using dynamic SQL. The tasks to be followed are as follows:

  1. Managing staging tables for switching in and out the partitions.
  2. Switching new partition in and old partition out of the table.
  3. Archive staging table in which data is archived and remove both staging tables.


Lock escalation is the process of converting many fine-grain locks into fewer coarse-grain locks, reducing system overhead. SQL Server 2005 supports lock escalation on only the table level. Now in SQL Server 2008, lock escalation is also supported on the partition Level. This was introduced to improve the availability of data in the partitioned table. This is not the default behavior but it can be changed for any table. So changing this option can further improve the performance for querying your table.

If you want to find out about lock escalation for any of your existing tables then you can check it from TABLES table in SYS schema.

SELECT lock_escalation_desc FROM sys.tables WHERE name = 'MyPartitionedTable'

You can change the new LOCK_ESCALATION property of any of your existing table. The possible options are as follows:

1. Auto: This option lets the SQL Server decides whether table is partitioned or not. If the table is partitioned, it is on partition level otherwise it is on table level.

2. Table: The lock escalation will always be on the table level no matter if the table is partitioned or not. This is the same behavior as in SQL Server 2005. This is also the default behavior in SQL Server 2008.

3. Disable: This prevents lock escalation in most cases. Table-level locks are not completely disallowed e.g. when you are scanning a table that has no clustered index under the serializable isolation level, the database engine must take a table lock to protect data integrity.

After you are done with partitioning your table set the LOCK_ESCALATION on the table to AUTO so that database engine may allow locks to escalate on the partition i.e. HoBT (Heap or Binary Tree) level for the said table.


It must be remembered that if the partition level escalation option is not implemented correctly, this may create some deadlock problems for the clients expecting table level lock escalation. This is the reason this is not the default option.

You can also specify the lock escalation settings by changing the properties of your table.

You can view the locking details in your server by using the sp_lock stored procedure.


This is a strong form of alignment. In collocation, objects are joined with an equijoin predicates. These predicates use columns which are partitioning columns. They are useful in writing queries that run way faster than regular queries. These are used by the tables, which are partitioned using the same partition key, and the same partition function is used to partition them.

To ease the creation of partitions in an un-partitioned table, collocation support is provided in the wizard to specify the collocation table. This collocation table is used to copy the definition of partition settings and we don't need to specify the partition settings again for each table having the same partition settings.

Though, In T-SQL there is no support for specifically specifying the collation table, this option has been added in this wizard to provide ease of partitioning a table. On the performance side, if two tables are collocated then extraordinary performance can be expected in joins.


This is the process used by the optimizer to figure out which partitions are necessary to access in order for SQL Server to provide the results of any query. It means those partitions that do not fulfill the criteria of the query are not even checked to find the result. The optimizer figures this out based on the filter specified in the query. These filters are specified in WHERE, IN or other filtering clauses.

If you want to know if the database engine is using partition elimination while executing your queries then run SET STATISTICS PROFILE ON command before running your queries. We can see the partition elimination in the execution plan of the following query:

WHERE ID > 1001

To find out which partition are used to satisfy the required query result, we can look at <RunTimePartitionSummary> in the XML plan of execution.

 <PartitionsAccessed PartitionCount="1">
 <PartitionRange Start="2" End="2" />

It is apparent that only 2nd partition is used and the other partition is not part of the search.

There are two types of partition elimination schemes that SQL Server may use. They are as follows:

  1. Static Partition Elimination
  2. Dynamic Partition Elimination

As appears from the name, with Dynamic Partition Elimination scheme, the decision about selection of partitions, to satisfy the query criteria, is made at the query execution time.

Now you must be wondering how SQL Server decides which scheme to use. You will be glad to know that in SQL Server 2008 the dynamic partition elimination is much more efficient than in SQL Server 2005.


After the long debate about CPU versus data size, your organization might have decided to include compression in your design. Don't forget that you have taken your partitioning strategy into consideration how it would be affected by this new world order.

The good thing is that you can employ compression in your partitions as well. The greatest part is that different partitions may have different compression settings.

All of you who have played around with this feature (compression) know that the first step in compressing any object is to see whether or not compression would be beneficial for the object. The sp_estimate_data_compression_savings stored procedure is available to help with this. The report generated by this stored procedure contains information about expected benefits of compression for each partition individually.

Let us check our table 'MyPartitionedTable' and see how compression could benefit us. Since we have no data in the table, first we insert some rows into the table.