{"id":35,"date":"2013-03-01T21:03:46","date_gmt":"2013-03-01T21:03:46","guid":{"rendered":"http:\/\/evolvedmicrobe.com\/blogs\/?p=35"},"modified":"2016-06-15T22:49:34","modified_gmt":"2016-06-15T22:49:34","slug":"comparing-data-structure-enumeration-speeds-in-c","status":"publish","type":"post","link":"http:\/\/evolvedmicrobe.com\/blogs\/?p=35","title":{"rendered":"Comparing data structure enumeration speeds in C#"},"content":{"rendered":"<p>Determining which data structure to use for storing data involves trade-offs between how much memory they require and how long different operations, such as insertions, deletions, or searches take.&nbsp; In C#, using a linear array is the fastest way to enumerate all of the objects in a collection.&nbsp; However, the use of an array or list can have multiple drawbacks.&nbsp; For example, Arrays and lists in C# can have frustrating size limits as objects in the CLR, regardless of system architecture, cannot be over 2 GB in size*.&nbsp; Similarly, while a good data-structure for enumerations, lists and arrays are slower than other alternatives for searching.<\/p> <p>I recently had a dataset for which the abilities to search, enumerate and hold over 2 GB of data from the collection were desirable.&nbsp; These criteria naturally make a tree like data structure appealing.&nbsp; However, in a high level language like C# where performance can often heavily depend on which run-time optimizations the virtual machine utilizes, determining the actual performance hit incurred by using a tree as compared to a list during an enumeration is more accurately answered with empirical data than Big O notation.<\/p> <p>So, I <strong>tested how enumeration performance varies between a balanced binary tree and a list.&nbsp; <\/strong>Since the tree was being enumerated by using the <font face=\"Courier New\">yield <\/font>statement when a relevant node was found, I compared three methods.&nbsp; One where a class containing a binary tree is enumerated with a yield, one where a class containing a list is enumerated with a yield statement (that is the class enumerates its internal list and yields each element), and one where that same class is enumerated by simply passing out the original List object.&nbsp; In all cases enumerating 2<sup>20 <\/sup>objects in total.&nbsp; Here is the time comparison.<\/p> <table border=\"5\" cellspacing=\"0\" cellpadding=\"0\" width=\"582\"> <tbody> <tr> <td valign=\"top\" width=\"337\"> <h3 align=\"center\"><font color=\"#000000\">Method<\/font><\/h3><\/td> <td valign=\"top\" width=\"235\"> <h3 align=\"center\"><font color=\"#000000\">Time<\/font><\/h3><\/td><\/tr> <tr> <td valign=\"top\" width=\"366\"><font color=\"#000000\" size=\"3\">Balanced binary tree&nbsp; with yield statement<\/font><\/td> <td valign=\"top\" width=\"247\"> <p align=\"center\"><font color=\"#000000\" size=\"3\">.110 seconds<\/font><\/p><\/td><\/tr> <tr> <td valign=\"top\" width=\"371\"> <p align=\"left\"><font color=\"#000000\" size=\"3\">List with yield statement<\/font><\/p><\/td> <td valign=\"top\" width=\"252\"> <p align=\"center\"><font color=\"#000000\" size=\"3\">.055 seconds<\/font><\/p><\/td><\/tr> <tr> <td valign=\"top\" width=\"370\"> <p align=\"left\"><font color=\"#000000\" size=\"3\">List&nbsp; without yield statement.<\/font><\/p><\/td> <td valign=\"top\" width=\"255\"> <p align=\"center\"><font color=\"#000000\" size=\"3\">.016 seconds<\/font><\/p><\/td><\/tr><\/tbody><\/table> <p>These values do not vary substantially based on the order of operations or repeated trials.&nbsp; So, yowza,&nbsp; an order of magnitude difference in enumerating the objects, which seemed far more severe than I would have imagined (I was expecting about 3X).&nbsp; It appears much of this is due to the method call and overhead of the yield statement.&nbsp; The two methods using the list are logically identical in code, but it seems the compiler can\u2019t quite figure that out, leading to a 5X drop in performance.&nbsp; Out of curiosity, I also compared to a dictionaries enumeration speed which took about twice as long as the fast list method, but which ran out of memory with a higher number of elements pretty readily.<\/p> <p>My take home on this, is to<strong><em> <\/em>only use arrays and lists in C# if the enumeration speed of a collection appears limiting<\/strong>.&nbsp; It appears traversal of trees in the language is very slow, so trees are not generally recommended when the operations per node are fast.<\/p> <p>&nbsp;<\/p> <p><font color=\"#444444\" size=\"3\">*This may change on newer version of the MS CLR<\/font><\/p> <p>Code Below<\/p><pre class=\"csharpcode\"><span class=\"kwrd\">using<\/span> System;\r\n<span class=\"kwrd\">using<\/span> System.Collections.Generic;\r\n<span class=\"kwrd\">using<\/span> System.Linq;\r\n<span class=\"kwrd\">using<\/span> System.Text;\r\n<span class=\"kwrd\">using<\/span> System.Threading.Tasks;\r\n<span class=\"kwrd\">using<\/span> System.Diagnostics;\r\n<span class=\"kwrd\">using<\/span> System.Globalization;\r\n\r\n<span class=\"kwrd\">namespace<\/span> TestTreeVersusList\r\n{\r\n    <span class=\"kwrd\">class<\/span> Program\r\n    {\r\n        <span class=\"kwrd\">public<\/span> <span class=\"kwrd\">static<\/span> <span class=\"kwrd\">int<\/span> TreeDepth = 24;\r\n        <span class=\"kwrd\">public<\/span> <span class=\"kwrd\">static<\/span> <span class=\"kwrd\">int<\/span> ElementNumber=(<span class=\"kwrd\">int<\/span>)(Math.Pow(2,TreeDepth));\r\n        <span class=\"kwrd\">static<\/span> <span class=\"kwrd\">void<\/span> Main(<span class=\"kwrd\">string<\/span>[] args)\r\n        {\r\n            Stopwatch sw = <span class=\"kwrd\">new<\/span> Stopwatch();\r\n            <span class=\"kwrd\">ulong<\/span> val = 0;\r\n            <span class=\"rem\">\/\/\/\/now for the list<\/span>\r\n            val = 0;\r\n            ListEnumerator le = <span class=\"kwrd\">new<\/span> ListEnumerator();\r\n            sw.Reset();\r\n            sw.Start();\r\n            <span class=\"kwrd\">foreach<\/span> (node num <span class=\"kwrd\">in<\/span> le.GetNodes())\r\n            {\r\n                val += num.<span class=\"kwrd\">value<\/span>;\r\n            }\r\n            sw.Stop();\r\n            Console.WriteLine(<span class=\"str\">\"Time Taken For List with yield: \"<\/span> + sw.Elapsed.TotalSeconds.ToString() + <span class=\"str\">\" seconds\"<\/span>);\r\n\r\n            sw.Reset();\r\n            sw.Start();\r\n            <span class=\"kwrd\">foreach<\/span> (node num <span class=\"kwrd\">in<\/span> le.curList)\r\n            {\r\n                val += num.<span class=\"kwrd\">value<\/span>;\r\n            }\r\n            sw.Stop();\r\n            Console.WriteLine(<span class=\"str\">\"Time Taken For List without yield: \"<\/span> + sw.Elapsed.TotalSeconds.ToString() + <span class=\"str\">\" seconds\"<\/span>);\r\n            le = <span class=\"kwrd\">null<\/span>;\r\n\r\n            <span class=\"rem\">\/\/create a tree and a list<\/span>\r\n            BinaryTreeEnumerator tree = <span class=\"kwrd\">new<\/span> BinaryTreeEnumerator();\r\n            sw.Reset();\r\n            sw.Start();\r\n            val = 0;\r\n            <span class=\"kwrd\">foreach<\/span> (node num <span class=\"kwrd\">in<\/span> tree.GetNodes())\r\n            {\r\n                val += num.<span class=\"kwrd\">value<\/span>;\r\n            }\r\n            sw.Stop();\r\n            Console.WriteLine(<span class=\"str\">\"Time Taken For Tree: \"<\/span> + sw.Elapsed.TotalSeconds.ToString() + <span class=\"str\">\" seconds\"<\/span>);\r\n            tree = <span class=\"kwrd\">null<\/span>;\r\n            DictionaryEnumerator dict = <span class=\"kwrd\">new<\/span> DictionaryEnumerator();\r\n            sw.Reset();\r\n            sw.Start();\r\n            val = 0;\r\n            <span class=\"kwrd\">foreach<\/span> (var kv <span class=\"kwrd\">in<\/span> dict.dict)<span class=\"rem\">\/\/.dict)<\/span>\r\n            {\r\n                val=kv;\r\n                <span class=\"rem\">\/\/val = kv.Key;<\/span>\r\n            }\r\n            sw.Stop();\r\n            Console.WriteLine(<span class=\"str\">\"Time Taken For Dictionary: \"<\/span> + sw.Elapsed.TotalSeconds.ToString() + <span class=\"str\">\" seconds\"<\/span>);\r\n            dict = <span class=\"kwrd\">null<\/span>;\r\n        }\r\n          }\r\n    <span class=\"kwrd\">class<\/span> node\r\n    {\r\n        <span class=\"kwrd\">public<\/span> <span class=\"kwrd\">ulong<\/span> <span class=\"kwrd\">value<\/span>;\r\n        <span class=\"kwrd\">public<\/span> <span class=\"kwrd\">object<\/span> o;\r\n        <span class=\"kwrd\">public<\/span> node Left;\r\n       <span class=\"kwrd\">public<\/span> node Right;\r\n    }\r\n    <span class=\"kwrd\">class<\/span> BinaryTreeEnumerator\r\n    {\r\n        node root;\r\n        <span class=\"kwrd\">int<\/span> nodeCount;\r\n        <span class=\"kwrd\">public<\/span> IEnumerable&lt;node&gt; GetNodes()\r\n        {\r\n            Stack&lt;node&gt; traversalStack = <span class=\"kwrd\">new<\/span> Stack&lt;node&gt;(2 * Program.TreeDepth);\r\n            traversalStack.Push(<span class=\"kwrd\">this<\/span>.root);\r\n            <span class=\"kwrd\">while<\/span> (traversalStack.Count &gt; 0)\r\n            {\r\n                node current;\r\n                current = traversalStack.Pop();\r\n                <span class=\"kwrd\">if<\/span> (current != <span class=\"kwrd\">null<\/span>)\r\n                {\r\n                    <span class=\"kwrd\">if<\/span> (current.<span class=\"kwrd\">value<\/span> != 0)\r\n                    {\r\n                        traversalStack.Push(current.Right);\r\n                        traversalStack.Push(current.Left);\r\n                    }\r\n                    <span class=\"kwrd\">yield<\/span> <span class=\"kwrd\">return<\/span> current;\r\n                }\r\n            }\r\n            <span class=\"rem\">\/\/traversalStack.TrimExcess();<\/span>\r\n        }\r\n        <span class=\"rem\">\/\/Just make a binary tree with a bunch of elements<\/span>\r\n        <span class=\"kwrd\">public<\/span> BinaryTreeEnumerator()\r\n        {\r\n            <span class=\"kwrd\">this<\/span>.root = <span class=\"kwrd\">new<\/span> node() { <span class=\"kwrd\">value<\/span> = 1, o = <span class=\"kwrd\">null<\/span> };\r\n            node current;\r\n            Queue&lt;node&gt; traversalStack = <span class=\"kwrd\">new<\/span> Queue&lt;node&gt;(2*Program.TreeDepth);\r\n            traversalStack.Enqueue(<span class=\"kwrd\">this<\/span>.root);\r\n            <span class=\"kwrd\">while<\/span> (traversalStack.Count &gt; 0 &amp;&amp; nodeCount &lt; Program.ElementNumber)\r\n            {                \r\n                current = traversalStack.Dequeue();\r\n                <span class=\"kwrd\">if<\/span> (current.<span class=\"kwrd\">value<\/span> != 0)\r\n                {\r\n                    current.Left = <span class=\"kwrd\">new<\/span> node() { <span class=\"kwrd\">value<\/span> = 1, o = <span class=\"kwrd\">null<\/span> };\r\n                    current.Right = <span class=\"kwrd\">new<\/span> node() { <span class=\"kwrd\">value<\/span> = 1, o = <span class=\"kwrd\">null<\/span> };\r\n                    nodeCount += 2;\r\n                    traversalStack.Enqueue(current.Right);\r\n                    traversalStack.Enqueue(current.Left);\r\n                }\r\n            }\r\n        }\r\n    }\r\n    <span class=\"kwrd\">class<\/span> ListEnumerator\r\n    {\r\n        <span class=\"kwrd\">public<\/span> List&lt;node&gt; curList;\r\n        <span class=\"kwrd\">public<\/span> ListEnumerator()\r\n        {\r\n            curList = <span class=\"kwrd\">new<\/span> List&lt;node&gt;(Program.ElementNumber);\r\n            <span class=\"kwrd\">for<\/span>(<span class=\"kwrd\">int<\/span> i=0;i&lt;Program.ElementNumber;i++)\r\n            {\r\n                 curList.Add(<span class=\"kwrd\">new<\/span> node(){<span class=\"kwrd\">value<\/span>=1,o=<span class=\"kwrd\">null<\/span>});\r\n            }\r\n        }\r\n        <span class=\"kwrd\">public<\/span> IEnumerable&lt;node&gt; GetNodes()\r\n        {\r\n            <span class=\"kwrd\">foreach<\/span> (node n <span class=\"kwrd\">in<\/span> curList)\r\n            {\r\n                 <span class=\"kwrd\">yield<\/span> <span class=\"kwrd\">return<\/span> n;\r\n            }\r\n        }\r\n    }\r\n    <span class=\"kwrd\">class<\/span> DictionaryEnumerator\r\n    {\r\n        <span class=\"kwrd\">public<\/span> Dictionary&lt;<span class=\"kwrd\">ulong<\/span>,node&gt; dict = <span class=\"kwrd\">new<\/span> Dictionary&lt;<span class=\"kwrd\">ulong<\/span>,node&gt;();\r\n        <span class=\"kwrd\">public<\/span> DictionaryEnumerator()\r\n        {\r\n            <span class=\"kwrd\">for<\/span> (<span class=\"kwrd\">ulong<\/span> i = 0; i &lt; (<span class=\"kwrd\">ulong<\/span>)Program.ElementNumber; i++)\r\n            {\r\n                dict[i] = <span class=\"kwrd\">new<\/span> node() { <span class=\"kwrd\">value<\/span> = i, o = <span class=\"kwrd\">null<\/span> };\r\n            }\r\n        }\r\n    }\r\n}\r\n<\/pre>\r\n<style type=\"text\/css\">.csharpcode, .csharpcode pre\r\n{\r\n\tfont-size: small;\r\n\tcolor: black;\r\n\tfont-family: consolas, \"Courier New\", courier, monospace;\r\n\tbackground-color: #ffffff;\r\n\t\/*white-space: pre;*\/\r\n}\r\n.csharpcode pre { margin: 0em; }\r\n.csharpcode .rem { color: #008000; }\r\n.csharpcode .kwrd { color: #0000ff; }\r\n.csharpcode .str { color: #006080; }\r\n.csharpcode .op { color: #0000c0; }\r\n.csharpcode .preproc { color: #cc6633; }\r\n.csharpcode .asp { background-color: #ffff00; }\r\n.csharpcode .html { color: #800000; }\r\n.csharpcode .attr { color: #ff0000; }\r\n.csharpcode .alt \r\n{\r\n\tbackground-color: #f4f4f4;\r\n\twidth: 100%;\r\n\tmargin: 0em;\r\n}\r\n.csharpcode .lnum { color: #606060; }\r\n<\/style>\r\n\r\n<div style=\"padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; float: none; padding-top: 0px\" id=\"scid:0767317B-992E-4b12-91E0-4F059A8CECA8:1e0e07c9-fcf3-4a2e-901e-501916f4b482\" class=\"wlWriterEditableSmartContent\">Technorati Tags: <a href=\"http:\/\/technorati.com\/tags\/C%23\" rel=\"tag\">C#<\/a>,<a href=\"http:\/\/technorati.com\/tags\/.NET\" rel=\"tag\">.NET<\/a>,<a href=\"http:\/\/technorati.com\/tags\/Performance\" rel=\"tag\">Performance<\/a><\/div>","protected":false},"excerpt":{"rendered":"Determining which data structure to use for storing data involves trade-offs between how much memory they require and how long different operations, such as insertions, deletions, or searches take.&nbsp; In C#, using a linear array is the fastest way to enumerate all of the objects in a collection.&nbsp; However, the use of an array or [&hellip;]","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true},"categories":[1],"tags":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[{"id":188,"url":"http:\/\/evolvedmicrobe.com\/blogs\/?p=188","url_meta":{"origin":35,"position":0},"title":"The .NET Bio BAM Parser is Smoking Fast","date":"October 12, 2013","format":false,"excerpt":"The .NET Bio library has an improved version of it's BAM file\u00a0parser, which makes it significantly faster and easily competitive with the\u00a0current standard C coded SAMTools for obtaining\u00a0sequencing data and working with it. The chart below compares the time it\u00a0takes in seconds for the old version of the parser and\u2026","rel":"","context":"In &quot;.NET Bio&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2013\/10\/img5.gif?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":398,"url":"http:\/\/evolvedmicrobe.com\/blogs\/?p=398","url_meta":{"origin":35,"position":1},"title":".NET Bio is Significantly Faster on .Net Core 2.0","date":"November 5, 2017","format":false,"excerpt":"Summary: With the release of .NET Core 2.0, .NET Bio is able to run significantly faster (~2X) on Mac OSX due to better compilation and memory mangement. The .NET Bio\u00a0library contains libraries for genomic data processing tasks like parsing, alignment, etc. that are too computationally intense to be\u00a0undertaken with interpreted\u2026","rel":"","context":"In \".NET Bio\"","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/Benchmark-1.png?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":71,"url":"http:\/\/evolvedmicrobe.com\/blogs\/?p=71","url_meta":{"origin":35,"position":2},"title":"Java vs. C# Performance Comparison for Parsing VCF Files","date":"May 26, 2013","format":false,"excerpt":"Making a comparison with a reasonably complex program ported between the two languages. Update 3\/10\/2014: After writing this post I changed the C# parser to remove an extra List<> allocation in the C# code that was not in the Java code.\u00a0\u00a0After this, the Java\/C# versions are indistinguishable on speed, but\u2026","rel":"","context":"In &quot;Algorithms&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2013\/05\/image_thumb1.png?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":153,"url":"http:\/\/evolvedmicrobe.com\/blogs\/?p=153","url_meta":{"origin":35,"position":3},"title":"Using Selectome with .NET Bio, F# and R","date":"September 16, 2013","format":false,"excerpt":"The Bio.Selectome namespace has features to query\u00a0Selectome.Selectome is a database that merges data from Ensembl\u00a0and the programs in PAML used to compute the ratio of non-synonymous to synonymous (dN\/dS)\u00a0mutations along various branches of the phylogenetic tree. A low dN\/dS ratio\u00a0indicates that the protein sequence is under strong selective constraint, while\u2026","rel":"","context":"In &quot;.NET Bio&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":53,"url":"http:\/\/evolvedmicrobe.com\/blogs\/?p=53","url_meta":{"origin":35,"position":4},"title":"How to remove the &ldquo;Trial Edition&rdquo; banner from the VisiFire open source chart kit","date":"April 13, 2013","format":false,"excerpt":"Visifire is a very good graphing component for making silverlight or WPF applications.\u00a0 The component was first released as an open source library on GoogleCode, but since then has been made a closed source proprietary and for profit project.\u00a0 The newer version contains several enhancements, but the open source version\u2026","rel":"","context":"In &quot;C#&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2013\/04\/image_thumb.png?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":91,"url":"http:\/\/evolvedmicrobe.com\/blogs\/?p=91","url_meta":{"origin":35,"position":5},"title":"Accessing dbSNP with C# and the .NET Platform","date":"August 22, 2013","format":false,"excerpt":"NCBI Entrez can be accessed with many different platforms (python, R, etc.) , but I find .NET one of the best because the static typing makes it easy to infer what all the datafields mean, and navigate the data with much greater ease. Documentation is sparse for this task, but\u2026","rel":"","context":"In &quot;Bioinformatics&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"http:\/\/evolvedmicrobe.com\/blogs\/index.php?rest_route=\/wp\/v2\/posts\/35"}],"collection":[{"href":"http:\/\/evolvedmicrobe.com\/blogs\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/evolvedmicrobe.com\/blogs\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/evolvedmicrobe.com\/blogs\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/evolvedmicrobe.com\/blogs\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=35"}],"version-history":[{"count":13,"href":"http:\/\/evolvedmicrobe.com\/blogs\/index.php?rest_route=\/wp\/v2\/posts\/35\/revisions"}],"predecessor-version":[{"id":356,"href":"http:\/\/evolvedmicrobe.com\/blogs\/index.php?rest_route=\/wp\/v2\/posts\/35\/revisions\/356"}],"wp:attachment":[{"href":"http:\/\/evolvedmicrobe.com\/blogs\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=35"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/evolvedmicrobe.com\/blogs\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=35"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/evolvedmicrobe.com\/blogs\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=35"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}