{"id":398,"date":"2017-11-05T17:53:11","date_gmt":"2017-11-05T17:53:11","guid":{"rendered":"http:\/\/evolvedmicrobe.com\/blogs\/?p=398"},"modified":"2017-11-05T20:34:35","modified_gmt":"2017-11-05T20:34:35","slug":"net-bio-is-significantly-faster-on-net-core-2-0","status":"publish","type":"post","link":"http:\/\/evolvedmicrobe.com\/blogs\/?p=398","title":{"rendered":".NET Bio is Significantly Faster on .Net Core 2.0"},"content":{"rendered":"<strong>Summary: With the release of .NET Core 2.0, .NET Bio is able to run significantly faster (~2X) on Mac OSX due to better compilation and memory mangement.<\/strong>\r\n<p>\r\nThe <a href=\"https:\/\/github.com\/dotnetbio\/bio\">.NET Bio<\/a>\u00a0library contains libraries for genomic data processing tasks like parsing, alignment, etc. that are too computationally intense to be\u00a0undertaken with interpreted languages like Python or R, but can be efficiently performed in code that is typed and compiled, like Java and C#, without suffering the productivity burden of coding in C\/C++.\r\n\r\n<p>\r\n\r\nHistorically, .NET Bio ran on two different frameworks.\u00a0 On windows one could leverage all the substantial engineering Microsoft invested in their <a href=\"https:\/\/en.wikipedia.org\/wiki\/Common_Language_Runtime\">CLR<\/a>\u00a0to create fast and stable programs.\u00a0 On Mac or Linux, one could use the independently implemented <a href=\"http:\/\/www.mono-project.com\/\">Mono Framework<\/a>, which was a\u00a0solid effort by a significantly less resourced team that performed well, but never really came close to the quality of the Microsoft implementation.\u00a0 In practice, as bioinformaticians are weened on the Linux\/GCC toolchain and infrastructure, Mono was the frequently used option.\u00a0 This led to performance and reliability issues.\r\n<p>\r\nAll of this changed when Microsoft decided to <a href=\"http:\/\/www.zdnet.com\/article\/microsofts-open-sourcing-of-net-the-back-story\/\">open source .NET<\/a>\u00a0, releasing a version of their framework for Mac, Linux and Windows called <a href=\"https:\/\/github.com\/dotnet\/core\">.NET Core<\/a>.\u00a0 This was a watershed moment, as it allows open source writers to take advantage of all the advanced memory management and compilation techniques Microsoft developed.\u00a0 Better Garbage collection, advanced pre-compilation and vector instructions are all allowing higher level languages like C# to perform nearly as well as aggressively optimized C\/C++ for many practical applications.\r\n<p>\r\nTo test out the\u00a0new possibilites, I <strong>compared the time it takes to parse all the reads in a 1.2 GB BAM<\/strong>\u00a0file using either the old Mono runtime or the new dotnet core runtime on my MacBook computer.\u00a0 The test code was simple counting of reads after they were deserialized from the compressed format and converted into objects.\u00a0<strong>\u00a0<\/strong>\r\n<pre class=\"tab-convert:true lang:c# decode:true\" title=\"Profiling Code\">var p = new BAMParser();\r\nint count = 0;\r\nforeach (var read in p.Parse(fname))\r\n{ count += 1;}\r\nConsole.WriteLine(count);\r\n<\/pre>\r\nAnd the test case was timed on each platform as follows, the times were taken in triplicate and random order and the results are shown below.\u00a0 I also benchmarked an equivalent task using the industry standard <strong>samtools<\/strong> available from <a href=\"https:\/\/github.com\/samtools\/htslib\"><code>htslib<\/code><\/a>, a C library for the same task that uses tons of bit operations, and is about as performant as one can get <sup id=\"footnote_plugin_tooltip_1\" class=\"footnote_plugin_tooltip_text\" onclick=\"footnote_moveToAnchor('footnote_plugin_reference_1');\">1)<\/sup><span class=\"footnote_tooltip\" id=\"footnote_plugin_tooltip_text_1\">This is for methods that fully parse the BAM read, PySAM, which <a href=\"https:\/\/github.com\/pysam-developers\/pysam\/blob\/0e08158ab43757064a6541be461e456bd5176f8c\/pysam\/libcalignedsegment.pyx#L757\">lazily decodes the data<\/a>, can finish a simple counting task in about 15 seconds, but this isn&#8217;t representative of a workflow which would require full decoding.<\/span><script type=\"text\/javascript\">\tjQuery(\"#footnote_plugin_tooltip_1\").tooltip({\t\ttip: \"#footnote_plugin_tooltip_text_1\",\t\ttipClass: \"footnote_tooltip\",\t\teffect: \"fade\",\t\tfadeOutSpeed: 100,\t\tpredelay: 400,\t\tposition: \"top right\",\t\trelative: true,\t\toffset: [10, 10]\t});<\/script>.\r\n<pre class=\"lang:sh decode:true\" title=\"Timing commands\"># Run example on .net core 2.0\r\ntime dotnet test.dll\r\n# On Mono\r\ntime mono test.dll\r\n# Using samtools\r\nsamtools view test.bam | wc -l<\/pre>\r\n<h2><a href=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/Benchmark-1.png\"><img data-attachment-id=\"414\" data-permalink=\"http:\/\/evolvedmicrobe.com\/blogs\/?attachment_id=414\" data-orig-file=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/Benchmark-1.png?fit=480%2C250\" data-orig-size=\"480,250\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Benchmark\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/Benchmark-1.png?fit=300%2C156\" data-large-file=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/Benchmark-1.png?fit=480%2C250\" loading=\"lazy\" class=\"size-full wp-image-414 aligncenter\" src=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/Benchmark-1.png?resize=480%2C250\" alt=\"\" width=\"480\" height=\"250\" srcset=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/Benchmark-1.png?w=480 480w, https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/Benchmark-1.png?resize=300%2C156 300w\" sizes=\"(max-width: 480px) 100vw, 480px\" data-recalc-dims=\"1\" \/><\/a><\/h2>\r\n<h2><\/h2>\r\n<p>\r\n<h2>Profiling Each Platform<\/h2>\r\n<strong>Mono:\u00a0<\/strong>The Mono platform allows us to easily <a href=\"http:\/\/tirania.org\/monomac\/\/archive\/2013\/Jan.html\">profile the code using instruments on Mac OSX<\/a>.\u00a0 Examining the profile of the running code, we can see where the bottlenecks are.\u00a0 A portion of time is spent in <code>libz<\/code> doing decompression of the underlying gzipped data, but it appears that much is also spent on garbage collection associated events, e.g. preparing for memory allocation with\u00a0<code>mono_handle_new<\/code>.\r\n\r\n<a href=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/mono_profile.png\"><img data-attachment-id=\"410\" data-permalink=\"http:\/\/evolvedmicrobe.com\/blogs\/?attachment_id=410\" data-orig-file=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/mono_profile.png?fit=1474%2C370\" data-orig-size=\"1474,370\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"mono_profile\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/mono_profile.png?fit=300%2C75\" data-large-file=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/mono_profile.png?fit=625%2C157\" loading=\"lazy\" class=\"size-full wp-image-410 aligncenter\" src=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/mono_profile.png?resize=625%2C157\" alt=\"\" width=\"625\" height=\"157\" srcset=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/mono_profile.png?w=1474 1474w, https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/mono_profile.png?resize=300%2C75 300w, https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/mono_profile.png?resize=768%2C193 768w, https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/mono_profile.png?resize=1024%2C257 1024w, https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/mono_profile.png?resize=624%2C157 624w, https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/mono_profile.png?w=1250 1250w\" sizes=\"(max-width: 625px) 100vw, 625px\" data-recalc-dims=\"1\" \/><\/a>\r\n<p>\r\n<strong>.NET Core 2.0:\u00a0<\/strong> Being newly available on multiple platforms, there is <a href=\"https:\/\/stackoverflow.com\/questions\/47103572\/profile-a-net-core-application-on-mac\/47105230#47105230\">no good way to profile .NET Core 2.0 on Mac OSX<\/a>.\u00a0 Although typical profilers cannot understand the C#<a href=\"https:\/\/en.wikipedia.org\/wiki\/Just-in-time_compilation\"> JITted <\/a>code, we can look at the compiled C\/C++ code and here we see that .NET is spending twice as much time in libz as a percentage than Mono is.\u00a0 Since much of this task is simply decompressing data, more time in the decompression library means the rest of the code is running faster, and is a good sign.\u00a0 \u00a0<a href=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/dotnetcore.png\"><img data-attachment-id=\"413\" data-permalink=\"http:\/\/evolvedmicrobe.com\/blogs\/?attachment_id=413\" data-orig-file=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/dotnetcore.png?fit=1950%2C676\" data-orig-size=\"1950,676\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"dotnetcore\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/dotnetcore.png?fit=300%2C104\" data-large-file=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/dotnetcore.png?fit=625%2C217\" loading=\"lazy\" class=\"size-full wp-image-413 aligncenter\" src=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/dotnetcore.png?resize=625%2C217\" alt=\"\" width=\"625\" height=\"217\" srcset=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/dotnetcore.png?w=1950 1950w, https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/dotnetcore.png?resize=300%2C104 300w, https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/dotnetcore.png?resize=768%2C266 768w, https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/dotnetcore.png?resize=1024%2C355 1024w, https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/dotnetcore.png?resize=624%2C216 624w, https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/dotnetcore.png?w=1250 1250w, https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/dotnetcore.png?w=1875 1875w\" sizes=\"(max-width: 625px) 100vw, 625px\" data-recalc-dims=\"1\" \/><\/a>\r\n<p>\r\nTo further profile the managed code, I\u00a0wrote a <a href=\"https:\/\/github.com\/evolvedmicrobe\/clr-samples\">custom MacOS Profiler<\/a>\u00a0that changed how each method was compiled to insert a hook that allowed me to track the enter and exit time of each function with a P\/Invoke call to C++ code at the entrance and exit of each managed method frame, and so calculated the inclusive time spent in each method.\u00a0 Although the addition of this instrumentation is expected to greatly skew the results, we saw that the majority of the managed code is spent parsing the data, and much of that is spent doing validations!\r\n\r\n\n<table id=\"tablepress-1\" class=\"tablepress tablepress-id-1\">\n<thead>\n<tr class=\"row-1 odd\">\n\t<th class=\"column-1\">Method Name<\/th><th class=\"column-2\">Inclusive Time (s)<\/th><th class=\"column-3\">Call Count<\/th>\n<\/tr>\n<\/thead>\n<tbody class=\"row-hover\">\n<tr class=\"row-2 even\">\n\t<td class=\"column-1\">Bio.IO.BAM.BAMParser.GetAlignedSequence<\/td><td class=\"column-2\">2.01235<\/td><td class=\"column-3\">65255<\/td>\n<\/tr>\n<tr class=\"row-3 odd\">\n\t<td class=\"column-1\">Bio.Util.Helper.StringContainsIllegalCharacters<\/td><td class=\"column-2\">0.67041<\/td><td class=\"column-3\">1487805<\/td>\n<\/tr>\n<tr class=\"row-4 even\">\n\t<td class=\"column-1\">Bio.IO.SAM.SAMOptionalField.set_Tag<\/td><td class=\"column-2\">0.551263<\/td><td class=\"column-3\">711275<\/td>\n<\/tr>\n<tr class=\"row-5 odd\">\n\t<td class=\"column-1\">Bio.IO.SAM.SAMOptionalField.set_VType<\/td><td class=\"column-2\">0.549131<\/td><td class=\"column-3\">711275<\/td>\n<\/tr>\n<tr class=\"row-6 even\">\n\t<td class=\"column-1\">Bio.IO.SAM.SAMOptionalField.IsValidTag<\/td><td class=\"column-2\">0.547594<\/td><td class=\"column-3\">711275<\/td>\n<\/tr>\n<tr class=\"row-7 odd\">\n\t<td class=\"column-1\">Bio.IO.SAM.SAMOptionalField.IsValidVType<\/td><td class=\"column-2\">0.54615<\/td><td class=\"column-3\">711275<\/td>\n<\/tr>\n<tr class=\"row-8 even\">\n\t<td class=\"column-1\">Bio.IO.SAM.SAMOptionalField.set_Value<\/td><td class=\"column-2\">0.543942<\/td><td class=\"column-3\">711275<\/td>\n<\/tr>\n<tr class=\"row-9 odd\">\n\t<td class=\"column-1\">Bio.IO.SAM.SAMOptionalField.IsValidValue<\/td><td class=\"column-2\">0.543379<\/td><td class=\"column-3\">711275<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<!-- #tablepress-1 from cache -->\r\n\r\n<strong>SAMTools<\/strong>:\u00a0<strong>\u00a0<\/strong>Continuing the trend that the more performant library spends more time in <code>libz<\/code>, samtools spends half of it&#8217;s time there, and the other half basically doing all the decompression operations to convert the binary data into usable data (in <code>sam_format1)<\/code>. SAMTools is optimized C code with minimal error checking, so is a rough upper bound on the speed that could be obtained.\r\n\r\n<a href=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/samtools.png\"><img data-attachment-id=\"411\" data-permalink=\"http:\/\/evolvedmicrobe.com\/blogs\/?attachment_id=411\" data-orig-file=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/samtools.png?fit=1658%2C620\" data-orig-size=\"1658,620\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"samtools profiling\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/samtools.png?fit=300%2C112\" data-large-file=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/samtools.png?fit=625%2C234\" loading=\"lazy\" class=\"size-full wp-image-411 aligncenter\" src=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/samtools.png?resize=625%2C234\" alt=\"\" width=\"625\" height=\"234\" srcset=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/samtools.png?w=1658 1658w, https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/samtools.png?resize=300%2C112 300w, https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/samtools.png?resize=768%2C287 768w, https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/samtools.png?resize=1024%2C383 1024w, https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/samtools.png?resize=624%2C233 624w, https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/samtools.png?w=1250 1250w\" sizes=\"(max-width: 625px) 100vw, 625px\" data-recalc-dims=\"1\" \/><\/a>\r\n<p>\r\n<h2>Conclusions<\/h2>\r\nThe C# code was written in a manner free of performance concerns and has tons of validation checks and unnecessary copies.  Yet .NET core performed only 2X slower than high performance C code.\u00a0 This is due to the fact that thanks to the engineering in .NET core, the code can now run twice as fast on Mac OSX as it used to.\r\n<p>\r\nAlthough C# is still slower than C, it is\u00a0the clear winner here.\u00a0 The SAMTools code bottlenecks at a <a href=\"https:\/\/github.com\/samtools\/htslib\/blob\/b36740b6f82a74d8ccec022f0f641a966339fa70\/sam.c#L1290-L1423\">method<\/a>\u00a0that although heavily optimized is horrifically difficult to read.\u00a0 In contrast, the .NET code bottlenecks at a <a href=\"https:\/\/github.com\/dotnetbio\/bio\/blob\/5fe52bfc4fa94ff6a55b9f49d913f08b75abc593\/Source\/Bio.Core\/IO\/BAM\/BAMParser.cs#L949-L1171\">method<\/a> that is filled with nearly gratuitous memory allocations, and a plethora of checks where useful exceptions could be thrown and meaningful error messages delivered if assumptions aren&#8217;t met.\u00a0 Plus, it has all the advantages of memory safety and\u00a0quality tooling we&#8217;ve come to expect in 2017.\u00a0 It&#8217;s\u00a0intriguing to\u00a0ponder how the benchmark would look if the\u00a0C# code had been written in a style as performance based as the C code, by avoiding unnecessary allocations, using pointers, or using the new <a href=\"https:\/\/docs.microsoft.com\/en-us\/dotnet\/csharp\/programming-guide\/classes-and-structs\/ref-returns\">ref returns available in C# 7.0<\/a>. I&#8217;ve shown previously such changes s could significantly increase the parsers performance.\r\n<p>\r\nThe .NET ecosystem is moving along well for meaningful scientific computing. Importantly, although time-spent C\/C++ code will always be more performant, as .NET Core is now open source, we can embed any time-critical functionality in the runtime itself now, allowing us to have things like ludicrously performant suffix-arrays interacting with parsers.\u00a0 <strong>.NET Core 2.0 is a big step forward.<\/strong><div class=\"footnote_container_prepare\">\t<p><span onclick=\"footnote_expand_reference_container();\">References<\/span><span style=\"display: none;\">&nbsp;&nbsp;&nbsp;[ <a id=\"footnote_reference_container_collapse_button\" style=\"cursor:pointer;\" onclick=\"footnote_expand_collapse_reference_container();\">+<\/a> ]<\/span><\/p><\/div><div id=\"footnote_references_container\" style=\"\">\t<table class=\"footnote-reference-container\">\t\t<tbody>\t\t<tr>\t<td class=\"footnote_plugin_index\"><span id=\"footnote_plugin_reference_1\">1.<\/span><\/td>\t<td class=\"footnote_plugin_link\"><span onclick=\"footnote_moveToAnchor('footnote_plugin_tooltip_1');\">&#8593;<\/span><\/td>\t<td class=\"footnote_plugin_text\">This is for methods that fully parse the BAM read, PySAM, which <a href=\"https:\/\/github.com\/pysam-developers\/pysam\/blob\/0e08158ab43757064a6541be461e456bd5176f8c\/pysam\/libcalignedsegment.pyx#L757\">lazily decodes the data<\/a>, can finish a simple counting task in about 15 seconds, but this isn&#8217;t representative of a workflow which would require full decoding.<\/td><\/tr>\t\t<\/tbody>\t<\/table><\/div><script type=\"text\/javascript\">\tfunction footnote_expand_reference_container() {\t\tjQuery(\"#footnote_references_container\").show();        jQuery(\"#footnote_reference_container_collapse_button\").text(\"-\");\t}    function footnote_collapse_reference_container() {        jQuery(\"#footnote_references_container\").hide();        jQuery(\"#footnote_reference_container_collapse_button\").text(\"+\");    }\tfunction footnote_expand_collapse_reference_container() {\t\tif (jQuery(\"#footnote_references_container\").is(\":hidden\")) {            footnote_expand_reference_container();\t\t} else {            footnote_collapse_reference_container();\t\t}\t}    function footnote_moveToAnchor(p_str_TargetID) {        footnote_expand_reference_container();        var l_obj_Target = jQuery(\"#\" + p_str_TargetID);        if(l_obj_Target.length) {            jQuery('html, body').animate({                scrollTop: l_obj_Target.offset().top - window.innerHeight\/2            }, 1000);        }    }<\/script>","protected":false},"excerpt":{"rendered":"Summary: With the release of .NET Core 2.0, .NET Bio is able to run significantly faster (~2X) on Mac OSX due to better compilation and memory mangement. The .NET Bio\u00a0library contains libraries for genomic data processing tasks like parsing, alignment, etc. that are too computationally intense to be\u00a0undertaken with interpreted languages like Python or R, [&hellip;]","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true},"categories":[1],"tags":[25,20],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[{"id":188,"url":"http:\/\/evolvedmicrobe.com\/blogs\/?p=188","url_meta":{"origin":398,"position":0},"title":"The .NET Bio BAM Parser is Smoking Fast","date":"October 12, 2013","format":false,"excerpt":"The .NET Bio library has an improved version of it's BAM file\u00a0parser, which makes it significantly faster and easily competitive with the\u00a0current standard C coded SAMTools for obtaining\u00a0sequencing data and working with it. The chart below compares the time it\u00a0takes in seconds for the old version of the parser and\u2026","rel":"","context":"In &quot;.NET Bio&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2013\/10\/img5.gif?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":153,"url":"http:\/\/evolvedmicrobe.com\/blogs\/?p=153","url_meta":{"origin":398,"position":1},"title":"Using Selectome with .NET Bio, F# and R","date":"September 16, 2013","format":false,"excerpt":"The Bio.Selectome namespace has features to query\u00a0Selectome.Selectome is a database that merges data from Ensembl\u00a0and the programs in PAML used to compute the ratio of non-synonymous to synonymous (dN\/dS)\u00a0mutations along various branches of the phylogenetic tree. A low dN\/dS ratio\u00a0indicates that the protein sequence is under strong selective constraint, while\u2026","rel":"","context":"In &quot;.NET Bio&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":299,"url":"http:\/\/evolvedmicrobe.com\/blogs\/?p=299","url_meta":{"origin":398,"position":2},"title":"C# vs. Java, Xamarin vs. Oracle, Performance Comparison version 2.0","date":"June 14, 2014","format":false,"excerpt":"Today I noticed the SIMD implementation of the Mandelbrot set algorithm I blogged about last year was successfully submitted to the language shootout webpage. However, I was a bit disappointed to see the C# version was still slower than the Java version, despite my use of the special SIMD instructions\u2026","rel":"","context":"Similar post","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":71,"url":"http:\/\/evolvedmicrobe.com\/blogs\/?p=71","url_meta":{"origin":398,"position":3},"title":"Java vs. C# Performance Comparison for Parsing VCF Files","date":"May 26, 2013","format":false,"excerpt":"Making a comparison with a reasonably complex program ported between the two languages. Update 3\/10\/2014: After writing this post I changed the C# parser to remove an extra List<> allocation in the C# code that was not in the Java code.\u00a0\u00a0After this, the Java\/C# versions are indistinguishable on speed, but\u2026","rel":"","context":"In &quot;Algorithms&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2013\/05\/image_thumb1.png?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":91,"url":"http:\/\/evolvedmicrobe.com\/blogs\/?p=91","url_meta":{"origin":398,"position":4},"title":"Accessing dbSNP with C# and the .NET Platform","date":"August 22, 2013","format":false,"excerpt":"NCBI Entrez can be accessed with many different platforms (python, R, etc.) , but I find .NET one of the best because the static typing makes it easy to infer what all the datafields mean, and navigate the data with much greater ease. Documentation is sparse for this task, but\u2026","rel":"","context":"In &quot;Bioinformatics&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":112,"url":"http:\/\/evolvedmicrobe.com\/blogs\/?p=112","url_meta":{"origin":398,"position":5},"title":"Mono.Simd and the Mandlebrot Set.","date":"September 10, 2013","format":false,"excerpt":"C# and .NET are some of the fastest high level languages, but still cannot truly compete with C\/C++ for low level speed, and C# code can be anywhere from 20%-300% slower. This is despite the fact that the C# compiler often gets as much information about a method as the\u2026","rel":"","context":"In &quot;Algorithms&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2013\/09\/img2_thumb.gif?resize=350%2C200","width":350,"height":200},"classes":[]}],"_links":{"self":[{"href":"http:\/\/evolvedmicrobe.com\/blogs\/index.php?rest_route=\/wp\/v2\/posts\/398"}],"collection":[{"href":"http:\/\/evolvedmicrobe.com\/blogs\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/evolvedmicrobe.com\/blogs\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/evolvedmicrobe.com\/blogs\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/evolvedmicrobe.com\/blogs\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=398"}],"version-history":[{"count":21,"href":"http:\/\/evolvedmicrobe.com\/blogs\/index.php?rest_route=\/wp\/v2\/posts\/398\/revisions"}],"predecessor-version":[{"id":427,"href":"http:\/\/evolvedmicrobe.com\/blogs\/index.php?rest_route=\/wp\/v2\/posts\/398\/revisions\/427"}],"wp:attachment":[{"href":"http:\/\/evolvedmicrobe.com\/blogs\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=398"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/evolvedmicrobe.com\/blogs\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=398"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/evolvedmicrobe.com\/blogs\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=398"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}