{"id":234,"date":"2013-10-20T18:51:42","date_gmt":"2013-10-20T18:51:42","guid":{"rendered":"http:\/\/evolvedmicrobe.com\/blogs\/?p=234"},"modified":"2014-07-18T21:05:43","modified_gmt":"2014-07-18T21:05:43","slug":"numts-mtdna-sequencing-and-aligners","status":"publish","type":"post","link":"http:\/\/evolvedmicrobe.com\/blogs\/?p=234","title":{"rendered":"NuMTs, mtDNA sequencing and Aligners"},"content":{"rendered":"There are a lot of NuMTs (nuclear encoded mitochondrial sequences) in the genome, and\u00a0when the mtDNA is sequenced, so reads may align to the nuclear genome instead of\u00a0the mtDNA because of this.\u00a0 But how much winds up in the nuclear DNA and\u00a0<span style=\"line-height: 1.714285714; font-size: 1rem;\">where does it go? To answer this, I simulated reads from a diverse collection\u00a0<\/span>of mitochodria, and tracked where they landed when aligned with bwa mem.\r\n\r\n<span style=\"line-height: 1.714285714; font-size: 1rem;\">The reads were simulated from the whole collection of mtDNA molecules available from phylotree,\u00a0<\/span><span style=\"line-height: 1.714285714; font-size: 1rem;\">and the simulated reads were 100 bp in length, have a 1% error rate, and an insert size normally distributed around a mean of 150 bp\u00a0<\/span><span style=\"line-height: 1.714285714; font-size: 1rem;\">with a std. dev. of 30 (but bounded at a minimum of 40 insert and max of 700).<\/span>\r\n\r\n<span style=\"line-height: 1.714285714; font-size: 1rem;\">After simulating, I then aligned with. \u00a0<\/span>\r\n<pre>bwa mem hg19.fna f.fq r.fq > simulatedData.sam<\/pre>\r\nAnd discovered that almost all reads align to the mtDNA, only 3% of reads\u00a0aligned elsewhere.\u00a0 As a result, the distribution of coverage depth across the whole genome is very bi-modal. Histograms showing the coverage depth distrbution of sites with data is shown below.\r\n\r\n<a href=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2013\/10\/img51.gif\"><img data-attachment-id=\"207\" data-permalink=\"http:\/\/evolvedmicrobe.com\/blogs\/?attachment_id=207\" data-orig-file=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2013\/10\/img51.gif?fit=1114%2C547\" data-orig-size=\"1114,547\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;}\" data-image-title=\"img5\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2013\/10\/img51.gif?fit=300%2C147\" data-large-file=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2013\/10\/img51.gif?fit=625%2C306\" loading=\"lazy\" class=\"aligncenter size-full wp-image-207\" alt=\"img5\" src=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2013\/10\/img51.gif?resize=625%2C307\" width=\"625\" height=\"307\" data-recalc-dims=\"1\" \/><\/a>\r\n\r\nFor reads that did align to the nucleus, the MAPQ was typically 0, but could be\u00a0as high as 60 and had an unexplained peak at 27.\r\n\r\n<a href=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2013\/10\/img15.gif\"><img data-attachment-id=\"210\" data-permalink=\"http:\/\/evolvedmicrobe.com\/blogs\/?attachment_id=210\" data-orig-file=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2013\/10\/img15.gif?fit=618%2C398\" data-orig-size=\"618,398\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;}\" data-image-title=\"img15\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2013\/10\/img15.gif?fit=300%2C193\" data-large-file=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2013\/10\/img15.gif?fit=618%2C398\" loading=\"lazy\" class=\"aligncenter size-full wp-image-210\" alt=\"img15\" src=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2013\/10\/img15.gif?resize=618%2C398\" width=\"618\" height=\"398\" data-recalc-dims=\"1\" \/><\/a>\r\n\r\nAnd below shows the normalized coverage by positions across the mtDNA, clearly\u00a0some regions are more affected by NuMTs.\r\n\r\n<a href=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2013\/10\/imgD.gif\"><img data-attachment-id=\"213\" data-permalink=\"http:\/\/evolvedmicrobe.com\/blogs\/?attachment_id=213\" data-orig-file=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2013\/10\/imgD.gif?fit=909%2C436\" data-orig-size=\"909,436\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;}\" data-image-title=\"imgD\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2013\/10\/imgD.gif?fit=300%2C143\" data-large-file=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2013\/10\/imgD.gif?fit=625%2C300\" loading=\"lazy\" class=\"aligncenter size-full wp-image-213\" alt=\"imgD\" src=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2013\/10\/imgD.gif?resize=625%2C300\" width=\"625\" height=\"300\" data-recalc-dims=\"1\" \/><\/a>\r\n\r\nReads from the first and last 500 bp of the mtDNA are poorly aligned by bwa.\u00a0 It appears\u00a0most go to chromosome 17, but their true location is\r\nbelied by their mate pair. In fact only 0.6% of reads in this region that\r\nmap to the nuclear DNA do not have their paired read map to the mtDNA.\r\n\r\n<a href=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2013\/10\/img19.gif\"><img data-attachment-id=\"211\" data-permalink=\"http:\/\/evolvedmicrobe.com\/blogs\/?attachment_id=211\" data-orig-file=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2013\/10\/img19.gif?fit=705%2C467\" data-orig-size=\"705,467\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;}\" data-image-title=\"img19\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2013\/10\/img19.gif?fit=300%2C198\" data-large-file=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2013\/10\/img19.gif?fit=625%2C414\" loading=\"lazy\" class=\"aligncenter size-full wp-image-211\" alt=\"img19\" src=\"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2013\/10\/img19.gif?resize=625%2C414\" width=\"625\" height=\"414\" data-recalc-dims=\"1\" \/><\/a>\r\n\r\nI also wanted to see how reads that represent a heteroplasmic deletion would be\u00a0handled.\u00a0 I simulated reads that either spanned or included a deletion randomly chosen to be in the mtDNA, again virtually all mapped to the mitochondria, and the coverage profile looked similar to the simulation with complete reads. Perhaps most reassuringly, almost all reads are mapped. Checking for unmapped reads with the command:\r\n<pre>samtools view -f 4 Simulated.bam<\/pre>\r\nShowed only one un-aligned read out of the simulated millions, and this read had many errors compared with the original sequence it was simulated from.\u00a0The result of all of this is one large BedFile giving the location of all\u00a0possible reads from elsewhere.","protected":false},"excerpt":{"rendered":"There are a lot of NuMTs (nuclear encoded mitochondrial sequences) in the genome, and\u00a0when the mtDNA is sequenced, so reads may align to the nuclear genome instead of\u00a0the mtDNA because of this.\u00a0 But how much winds up in the nuclear DNA and\u00a0where does it go? To answer this, I simulated reads from a diverse collection\u00a0of [&hellip;]","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true},"categories":[1],"tags":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[{"id":12,"url":"http:\/\/evolvedmicrobe.com\/blogs\/?p=12","url_meta":{"origin":234,"position":0},"title":"Compile Bowtie2 on Windows 64 bit.","date":"January 30, 2013","format":false,"excerpt":"Bowtie 2 is a program that efficiently aligns next generation sequence data to a reference genome. However, the version distributed by the authors only compiles on POSIX platforms. These instructions will allow you to compile it on windows by downloading the Mingw64 tools and editing the make file before building\u2026","rel":"","context":"In &quot;Computing&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2013\/01\/Capture.png?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":398,"url":"http:\/\/evolvedmicrobe.com\/blogs\/?p=398","url_meta":{"origin":234,"position":1},"title":".NET Bio is Significantly Faster on .Net Core 2.0","date":"November 5, 2017","format":false,"excerpt":"Summary: With the release of .NET Core 2.0, .NET Bio is able to run significantly faster (~2X) on Mac OSX due to better compilation and memory mangement. The .NET Bio\u00a0library contains libraries for genomic data processing tasks like parsing, alignment, etc. that are too computationally intense to be\u00a0undertaken with interpreted\u2026","rel":"","context":"In \".NET Bio\"","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2017\/11\/Benchmark-1.png?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":359,"url":"http:\/\/evolvedmicrobe.com\/blogs\/?p=359","url_meta":{"origin":234,"position":2},"title":"Profiling Rcpp package code on Windows","date":"September 3, 2016","format":false,"excerpt":"Profiling Rcpp code on Unix\/Mac is easy, but is difficult on Windows because R uses a compilation toolchain (MinGW) that produces files that are not understood by common Windows profiling programs.\u00a0 Additionally, the R build process often removes\u00a0symbols which allow profilers to produce sensible interpretations of their data. The following\u2026","rel":"","context":"In \"Optimization\"","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2016\/09\/assembly.png?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":188,"url":"http:\/\/evolvedmicrobe.com\/blogs\/?p=188","url_meta":{"origin":234,"position":3},"title":"The .NET Bio BAM Parser is Smoking Fast","date":"October 12, 2013","format":false,"excerpt":"The .NET Bio library has an improved version of it's BAM file\u00a0parser, which makes it significantly faster and easily competitive with the\u00a0current standard C coded SAMTools for obtaining\u00a0sequencing data and working with it. The chart below compares the time it\u00a0takes in seconds for the old version of the parser and\u2026","rel":"","context":"In &quot;.NET Bio&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2013\/10\/img5.gif?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":71,"url":"http:\/\/evolvedmicrobe.com\/blogs\/?p=71","url_meta":{"origin":234,"position":4},"title":"Java vs. C# Performance Comparison for Parsing VCF Files","date":"May 26, 2013","format":false,"excerpt":"Making a comparison with a reasonably complex program ported between the two languages. Update 3\/10\/2014: After writing this post I changed the C# parser to remove an extra List<> allocation in the C# code that was not in the Java code.\u00a0\u00a0After this, the Java\/C# versions are indistinguishable on speed, but\u2026","rel":"","context":"In &quot;Algorithms&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/evolvedmicrobe.com\/blogs\/wp-content\/uploads\/2013\/05\/image_thumb1.png?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":35,"url":"http:\/\/evolvedmicrobe.com\/blogs\/?p=35","url_meta":{"origin":234,"position":5},"title":"Comparing data structure enumeration speeds in C#","date":"March 1, 2013","format":false,"excerpt":"Determining which data structure to use for storing data involves trade-offs between how much memory they require and how long different operations, such as insertions, deletions, or searches take.\u00a0 In C#, using a linear array is the fastest way to enumerate all of the objects in a collection.\u00a0 However, the\u2026","rel":"","context":"Similar post","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"http:\/\/evolvedmicrobe.com\/blogs\/index.php?rest_route=\/wp\/v2\/posts\/234"}],"collection":[{"href":"http:\/\/evolvedmicrobe.com\/blogs\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/evolvedmicrobe.com\/blogs\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/evolvedmicrobe.com\/blogs\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/evolvedmicrobe.com\/blogs\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=234"}],"version-history":[{"count":5,"href":"http:\/\/evolvedmicrobe.com\/blogs\/index.php?rest_route=\/wp\/v2\/posts\/234\/revisions"}],"predecessor-version":[{"id":349,"href":"http:\/\/evolvedmicrobe.com\/blogs\/index.php?rest_route=\/wp\/v2\/posts\/234\/revisions\/349"}],"wp:attachment":[{"href":"http:\/\/evolvedmicrobe.com\/blogs\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=234"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/evolvedmicrobe.com\/blogs\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=234"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/evolvedmicrobe.com\/blogs\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=234"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}