Determining which data structure to use for storing data involves trade-offs between how much memory they require and how long different operations, such as insertions, deletions, or searches take. In C#, using a linear array is the fastest way to enumerate all of the objects in a collection. However, the use of an array or list can have multiple drawbacks. For example, Arrays and lists in C# can have frustrating size limits as objects in the CLR, regardless of system architecture, cannot be over 2 GB in size*. Similarly, while a good data-structure for enumerations, lists and arrays are slower than other alternatives for searching.
I recently had a dataset for which the abilities to search, enumerate and hold over 2 GB of data from the collection were desirable. These criteria naturally make a tree like data structure appealing. However, in a high level language like C# where performance can often heavily depend on which run-time optimizations the virtual machine utilizes, determining the actual performance hit incurred by using a tree as compared to a list during an enumeration is more accurately answered with empirical data than Big O notation.
So, I tested how enumeration performance varies between a balanced binary tree and a list. Since the tree was being enumerated by using the yield statement when a relevant node was found, I compared three methods. One where a class containing a binary tree is enumerated with a yield, one where a class containing a list is enumerated with a yield statement (that is the class enumerates its internal list and yields each element), and one where that same class is enumerated by simply passing out the original List object. In all cases enumerating 220 objects in total. Here is the time comparison.
Method | Time |
Balanced binary tree with yield statement | .110 seconds |
List with yield statement | .055 seconds |
List without yield statement. | .016 seconds |
These values do not vary substantially based on the order of operations or repeated trials. So, yowza, an order of magnitude difference in enumerating the objects, which seemed far more severe than I would have imagined (I was expecting about 3X). It appears much of this is due to the method call and overhead of the yield statement. The two methods using the list are logically identical in code, but it seems the compiler can’t quite figure that out, leading to a 5X drop in performance. Out of curiosity, I also compared to a dictionaries enumeration speed which took about twice as long as the fast list method, but which ran out of memory with a higher number of elements pretty readily.
My take home on this, is to only use arrays and lists in C# if the enumeration speed of a collection appears limiting. It appears traversal of trees in the language is very slow, so trees are not generally recommended when the operations per node are fast.
*This may change on newer version of the MS CLR
Code Below
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; using System.Diagnostics; using System.Globalization; namespace TestTreeVersusList { class Program { public static int TreeDepth = 24; public static int ElementNumber=(int)(Math.Pow(2,TreeDepth)); static void Main(string[] args) { Stopwatch sw = new Stopwatch(); ulong val = 0; ////now for the list val = 0; ListEnumerator le = new ListEnumerator(); sw.Reset(); sw.Start(); foreach (node num in le.GetNodes()) { val += num.value; } sw.Stop(); Console.WriteLine("Time Taken For List with yield: " + sw.Elapsed.TotalSeconds.ToString() + " seconds"); sw.Reset(); sw.Start(); foreach (node num in le.curList) { val += num.value; } sw.Stop(); Console.WriteLine("Time Taken For List without yield: " + sw.Elapsed.TotalSeconds.ToString() + " seconds"); le = null; //create a tree and a list BinaryTreeEnumerator tree = new BinaryTreeEnumerator(); sw.Reset(); sw.Start(); val = 0; foreach (node num in tree.GetNodes()) { val += num.value; } sw.Stop(); Console.WriteLine("Time Taken For Tree: " + sw.Elapsed.TotalSeconds.ToString() + " seconds"); tree = null; DictionaryEnumerator dict = new DictionaryEnumerator(); sw.Reset(); sw.Start(); val = 0; foreach (var kv in dict.dict)//.dict) { val=kv; //val = kv.Key; } sw.Stop(); Console.WriteLine("Time Taken For Dictionary: " + sw.Elapsed.TotalSeconds.ToString() + " seconds"); dict = null; } } class node { public ulong value; public object o; public node Left; public node Right; } class BinaryTreeEnumerator { node root; int nodeCount; public IEnumerable<node> GetNodes() { Stack<node> traversalStack = new Stack<node>(2 * Program.TreeDepth); traversalStack.Push(this.root); while (traversalStack.Count > 0) { node current; current = traversalStack.Pop(); if (current != null) { if (current.value != 0) { traversalStack.Push(current.Right); traversalStack.Push(current.Left); } yield return current; } } //traversalStack.TrimExcess(); } //Just make a binary tree with a bunch of elements public BinaryTreeEnumerator() { this.root = new node() { value = 1, o = null }; node current; Queue<node> traversalStack = new Queue<node>(2*Program.TreeDepth); traversalStack.Enqueue(this.root); while (traversalStack.Count > 0 && nodeCount < Program.ElementNumber) { current = traversalStack.Dequeue(); if (current.value != 0) { current.Left = new node() { value = 1, o = null }; current.Right = new node() { value = 1, o = null }; nodeCount += 2; traversalStack.Enqueue(current.Right); traversalStack.Enqueue(current.Left); } } } } class ListEnumerator { public List<node> curList; public ListEnumerator() { curList = new List<node>(Program.ElementNumber); for(int i=0;i<Program.ElementNumber;i++) { curList.Add(new node(){value=1,o=null}); } } public IEnumerable<node> GetNodes() { foreach (node n in curList) { yield return n; } } } class DictionaryEnumerator { public Dictionary<ulong,node> dict = new Dictionary<ulong,node>(); public DictionaryEnumerator() { for (ulong i = 0; i < (ulong)Program.ElementNumber; i++) { dict[i] = new node() { value = i, o = null }; } } } }