{"id":217821,"date":"2014-04-01T14:06:02","date_gmt":"2014-04-01T10:06:02","guid":{"rendered":"http:\/\/savepearlharbor.com\/?p=217821"},"modified":"-0001-11-30T00:00:00","modified_gmt":"-0001-11-29T21:00:00","slug":"","status":"publish","type":"post","link":"https:\/\/savepearlharbor.com\/?p=217821","title":{"rendered":"<span class=\"post_title\">\u041d\u0435\u0431\u043e\u043b\u044c\u0448\u043e\u0439 \u0442\u0435\u0441\u0442 \u043f\u0440\u043e\u0438\u0437\u0432\u043e\u0434\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0441\u0442\u0438 Hadoop\/Mapreduce<\/span>"},"content":{"rendered":"<div class=\"content html_format\">   \t\u0414\u0430\u0432\u043d\u044b\u043c \u0434\u0430\u0432\u043d\u043e \u0437\u0430\u0434\u0430\u043b\u0441\u044f \u0441\u0435\u0431\u0435 \u0432\u043e\u043f\u0440\u043e\u0441\u043e\u043c \u00ab\u041d\u0430 \u0441\u043a\u043e\u043b\u044c\u043a\u043e \u044d\u0444\u0435\u043a\u0442\u0438\u0432\u043d\u043e \u0440\u0430\u0431\u043e\u0442\u0430\u0435\u0442 MapReduce ?\u00bb<\/p>\n<p>  \u041f\u043e\u044f\u0432\u0438\u043b\u0430\u0441\u044c \u0442\u0430\u043a\u0430\u044f \u0432\u043e\u0437\u043c\u043e\u0436\u043d\u043e\u0441\u0442\u044c \u0438 \u043d\u0430 \u043a\u043b\u0430\u0441\u0442\u0435\u0440\u0435 \u0441\u043e\u0441\u0442\u043e\u044f\u0449\u0438\u043c \u0438\u0437 4 \u043d\u043e\u0434\u043e\u0432 \u0432 \u0442\u0430\u043a\u043e\u0439 \u0432\u043e\u0442 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u0438 \u044f \u0440\u0435\u0448\u0438\u043b \u043f\u043e\u0442\u0435\u0441\u0442\u0438\u0442\u044c:<br \/>  \u2014 3 \u043d\u043e\u0434\u044b: Intel\u00ae Xeon\u00ae CPU W3530 @ 2.80GHz 12GB RAM<br \/>  \u2014 1 \u043d\u043e\u0434\u0430: Intel\u00ae Xeon\u00ae CPU X5450 @ 3.00GHz. 8GB RAM<\/p>\n<p>  \u041e\u043f\u0435\u0440\u0430\u0446\u0438\u043e\u043d\u043a\u0430 debian, hadoop 1.2 (\u0441 \u043e\u0444\u0444.\u0441\u0430\u0439\u0442\u0430), java 7 (\u041e\u0442 ORACLE).<\/p>\n<p>  \u0418\u0441\u0445\u043e\u0434\u043d\u044b\u0435 \u0434\u0430\u043d\u043d\u044b\u0435:<br \/>  \u2014 \u0425\u041c\u041b \u0444\u0430\u0439\u043b: <a href=\"http:\/\/dumps.wikimedia.org\/enwiki\/20130904\/enwiki-20130904-stub-meta-current.xml.gz\">dumps.wikimedia.org\/enwiki\/20130904\/enwiki-20130904-stub-meta-current.xml.gz<\/a><br \/>  \u2014 \u0432 \u0440\u0430\u0441\u043f\u0430\u043a\u043e\u0432\u0430\u043d\u043e\u043c \u0441\u043e\u0441\u0442\u043e\u044f\u043d\u0438\u0438 \u0444\u0430\u0439\u043b \u0437\u0430\u043d\u0438\u043c\u0430\u0435\u0442 18\u0413\u0411 \u043c\u0435\u0441\u0442\u0430.<br \/>  \u2014 31\u041c \u0437\u0430\u043f\u0438\u0441\u0435\u0439 \u043e \u0441\u0442\u0440\u0430\u043d\u0438\u0447\u043a\u0430\u0445 \u0432 \u0432\u0438\u043a\u0438.<br \/>  \u2014 Bzip2 \u0441\u0436\u0438\u043c\u0430\u0435\u0442 \u044d\u0442\u043e\u0442 \u0444\u0430\u0439\u043b \u0432 2\u0413\u0411<br \/>  \u2014 593.045.627 \u0441\u0442\u0440\u043e\u043a \u0432 \u0444\u0430\u0439\u043b\u0435<br \/>  <a name=\"habracut\"><\/a><\/p>\n<p>  \u041f\u0440\u0438\u043c\u0435\u0440 \u043e\u0434\u043d\u043e\u0439 \u0437\u0430\u043f\u0438\u0441\u0438:  <\/p>\n<pre><code class=\"xml\">&lt;page&gt;     &lt;title&gt;AfghanistanHistory&lt;\/title&gt;     &lt;ns&gt;0&lt;\/ns&gt;     &lt;id&gt;13&lt;\/id&gt;     &lt;redirect title=&quot;History of Afghanistan&quot; \/&gt;     &lt;revision&gt;       &lt;id&gt;74466652&lt;\/id&gt;       &lt;parentid&gt;15898948&lt;\/parentid&gt;       &lt;timestamp&gt;2006-09-08T04:15:52Z&lt;\/timestamp&gt;       &lt;contributor&gt;         &lt;username&gt;Rory096&lt;\/username&gt;         &lt;id&gt;750223&lt;\/id&gt;       &lt;\/contributor&gt;       &lt;comment&gt;cat rd&lt;\/comment&gt;       &lt;text id=&quot;74089594&quot; bytes=&quot;57&quot; \/&gt;       &lt;sha1&gt;d4tdz2eojqzamnuockahzcbrgd1t9oi&lt;\/sha1&gt;       &lt;model&gt;wikitext&lt;\/model&gt;       &lt;format&gt;text\/x-wiki&lt;\/format&gt;     &lt;\/revision&gt; &lt;\/page&gt; <\/code><\/pre>\n<p>  \u0412 \u043a\u0430\u0447\u0435\u0441\u0442\u0432\u0435 \u0442\u0435\u0441\u0442\u0430 \u0432\u0437\u044f\u043b \u043f\u0440\u043e\u0441\u0442\u0443\u044e \u0437\u0430\u0434\u0430\u0447\u043a\u0443 \u043a\u043e\u0442\u043e\u0440\u0443\u044e \u043c\u043e\u0436\u043d\u043e \u0440\u0435\u0448\u0438\u0442\u044c \u043a\u0430\u043a \u0432 \u043a\u043e\u043d\u0441\u043e\u043b\u0435 \u0442\u0440\u0430\u0434\u0438\u0446\u0438\u043e\u043d\u043d\u044b\u043c \u0441\u0440\u0435\u0434\u0441\u0442\u0432\u043e\u043c \u0442\u0430\u043a \u0438 \u0441 \u043f\u043e\u043c\u043e\u0449\u0443 MapReduce. \u0418 \u0437\u0430\u0434\u0430\u0447\u043a\u0430 \u0432 \u0434\u0432\u0443\u0445 \u0441\u043b\u043e\u0432\u0430\u0445 \u0432\u044b\u0440\u0430\u0436\u0430\u0435\u0442\u0441\u044f \u0432 \u0442\u0430\u043a\u043e\u043c \u0432\u043e\u0442 \u0432\u0438\u0434\u0435:<\/p>\n<pre><code class=\"bash\">time bunzip2 -c \/mnt\/hadoop\/data_hadoop\/test.xml.bz2 | grep &quot;&lt;title&gt;&quot; |wc 31127663 84114856 1382659030  real 9m32.953s user 10m16.779s sys 0m12.737s <\/code><\/pre>\n<p>  \u041f\u043e\u0434\u043e\u0431\u043d\u0430\u044f \u0437\u0430\u0434\u0430\u0447\u0430 \u0440\u0435\u0448\u0435\u043d\u0430 \u043d\u0430 \u0432\u0441\u0451\u043c hadoop \u043a\u043b\u0430\u0441\u0442\u0435\u0440\u0435 \u0437\u0430 3 \u043c\u0438\u043d\u0443\u0442\u044b \u0438 40 \u0441\u0435\u043a\u0443\u043d\u0434. (\u0434\u0430 \u0441 \u043f\u0430\u0440\u0430\u043b\u0435\u043b\u044c\u043d\u043e\u0439 \u0440\u0430\u0441\u043f\u0430\u043a\u043e\u0432\u043a\u043e\u0439, \u0440\u0430\u0441\u043f\u0430\u043a\u043e\u0432\u043a\u0430 \u0434\u0435\u043b\u0430\u043b\u0430\u0441\u044c \u0434\u0436\u0430\u0432\u043e\u0439, \u0430 \u043d\u0435 \u043d\u0430\u0442\u0438\u0432\u043d\u043e).<\/p>\n<p>  \u0412 \u0441\u043b\u0443\u0447\u0430\u0435 \u0435\u0441\u043b\u0438 \u0444\u0430\u0439\u043b \u0431\u044b\u043b \u0432 \u0440\u0430\u0441\u043f\u0430\u043a\u043e\u0432\u0430\u043d\u043e\u043c \u0441\u043e\u0441\u0442\u043e\u044f\u043d\u0438\u0438 (18\u0413\u0411) \u0442\u043e \u043e\u0431\u0440\u0430\u0431\u043e\u0442\u043a\u0430 \u0437\u0430\u043a\u0430\u043d\u0447\u0438\u0432\u0430\u043b\u0430\u0441\u044c \u043d\u0430 hadoop \u043a\u043b\u0430\u0441\u0442\u0435\u0440\u0435 \u0437\u0430 2\u043c \u0438 30\u0441. (\u0431\u044b\u0441\u0442\u0440\u0435\u0435 \u0432\u0441\u0435\u0433\u043e \u0437\u0430 2\u043c\u0438\u043d \u0438 12 \u0441\u0435\u043a\u0443\u043d\u0434). \u0438 \u0432 \u0434\u0430\u043d\u043d\u043e\u043c \u0441\u043b\u0443\u0447\u0430\u0435 \u0434\u0438\u0441\u043a\u0438 \u043d\u0430\u0433\u0440\u0443\u0436\u0435\u043d\u044b \u043f\u043e\u0434 100%<\/p>\n<p>  \u043d\u0443 \u0438 \u043d\u0430 \u043f\u043e\u0434\u0443\u043c\u0430\u0442\u044c )) \u0444\u0430\u0439\u043b \u0431\u044b\u043b \u043f\u0440\u0435\u0434\u0432\u0430\u0440\u0438\u0442\u0435\u043b\u044c\u043d\u043e \u043f\u0435\u0440\u0435\u0436\u0430\u0442 pbzip2\u2026 \u043d\u0430 Intel\u00ae Xeon\u00ae CPU W3530 @ 2.80GHz<\/p>\n<pre><code class=\"bash\">time pbzip2 -d -c -p8 \/mnt\/hadoop\/data_hadoop\/testpbzip.xml.bz2 | grep &quot;&lt;title&gt;&quot; |wc 31127663 84114856 1382659030  real 2m44.507s user 21m28.493s sys 0m19.833s <\/code><\/pre>\n<p>  \u042f \u043d\u0435 \u0441\u043e\u0431\u0438\u0440\u0430\u044e\u0441\u044c \u0434\u0435\u043b\u0430\u0442\u044c \u043a\u0430\u043a\u043e\u0439 \u043b\u0438\u0431\u043e \u0432\u044b\u0432\u043e\u0434 &#8230;, \u043d\u043e \u0433\u0434\u0435 \u0442\u043e \u0432 \u0438\u043d\u0442\u0435\u0440\u043d\u0435\u0442\u0435 \u0432\u0441\u0442\u0440\u0435\u0447\u0430\u043b \u0447\u0442\u043e hadoop \u043a\u043b\u0430\u0441\u0442\u0435\u0440 \u043d\u0430\u0447\u0438\u043d\u0430\u0435\u0442 \u0441\u0435\u0431\u044f \u043f\u043e\u043a\u0430\u0437\u044b\u0432\u0430\u0442\u044c \u043e\u0442 4 \u043d\u043e\u0434\u043e\u0432\u2026 \u043d\u0430\u0432\u0435\u0440\u043d\u043e\u0435 \u0443 \u043d\u0438\u0445 \u0431\u044b\u043b\u0438 \u043d\u0430 \u0442\u043e \u043e\u0441\u043d\u043e\u0432\u0430\u043d\u0438\u044f.    \t<\/p>\n<div class=\"clear\"><\/div>\n<\/p><\/div>\n<p> \u0441\u0441\u044b\u043b\u043a\u0430 \u043d\u0430 \u043e\u0440\u0438\u0433\u0438\u043d\u0430\u043b \u0441\u0442\u0430\u0442\u044c\u0438 <a href=\"http:\/\/habrahabr.ru\/post\/217821\/\"> http:\/\/habrahabr.ru\/post\/217821\/<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<div class=\"content html_format\">   \t\u0414\u0430\u0432\u043d\u044b\u043c \u0434\u0430\u0432\u043d\u043e \u0437\u0430\u0434\u0430\u043b\u0441\u044f \u0441\u0435\u0431\u0435 \u0432\u043e\u043f\u0440\u043e\u0441\u043e\u043c \u00ab\u041d\u0430 \u0441\u043a\u043e\u043b\u044c\u043a\u043e \u044d\u0444\u0435\u043a\u0442\u0438\u0432\u043d\u043e \u0440\u0430\u0431\u043e\u0442\u0430\u0435\u0442 MapReduce ?\u00bb<\/p>\n<p>  \u041f\u043e\u044f\u0432\u0438\u043b\u0430\u0441\u044c \u0442\u0430\u043a\u0430\u044f \u0432\u043e\u0437\u043c\u043e\u0436\u043d\u043e\u0441\u0442\u044c \u0438 \u043d\u0430 \u043a\u043b\u0430\u0441\u0442\u0435\u0440\u0435 \u0441\u043e\u0441\u0442\u043e\u044f\u0449\u0438\u043c \u0438\u0437 4 \u043d\u043e\u0434\u043e\u0432 \u0432 \u0442\u0430\u043a\u043e\u0439 \u0432\u043e\u0442 \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u0438 \u044f \u0440\u0435\u0448\u0438\u043b \u043f\u043e\u0442\u0435\u0441\u0442\u0438\u0442\u044c:<br \/>  \u2014 3 \u043d\u043e\u0434\u044b: Intel\u00ae Xeon\u00ae CPU W3530 @ 2.80GHz 12GB RAM<br \/>  \u2014 1 \u043d\u043e\u0434\u0430: Intel\u00ae Xeon\u00ae CPU X5450 @ 3.00GHz. 8GB RAM<\/p>\n<p>  \u041e\u043f\u0435\u0440\u0430\u0446\u0438\u043e\u043d\u043a\u0430 debian, hadoop 1.2 (\u0441 \u043e\u0444\u0444.\u0441\u0430\u0439\u0442\u0430), java 7 (\u041e\u0442 ORACLE).<\/p>\n<p>  \u0418\u0441\u0445\u043e\u0434\u043d\u044b\u0435 \u0434\u0430\u043d\u043d\u044b\u0435:<br \/>  \u2014 \u0425\u041c\u041b \u0444\u0430\u0439\u043b: <a href=\"http:\/\/dumps.wikimedia.org\/enwiki\/20130904\/enwiki-20130904-stub-meta-current.xml.gz\">dumps.wikimedia.org\/enwiki\/20130904\/enwiki-20130904-stub-meta-current.xml.gz<\/a><br \/>  \u2014 \u0432 \u0440\u0430\u0441\u043f\u0430\u043a\u043e\u0432\u0430\u043d\u043e\u043c \u0441\u043e\u0441\u0442\u043e\u044f\u043d\u0438\u0438 \u0444\u0430\u0439\u043b \u0437\u0430\u043d\u0438\u043c\u0430\u0435\u0442 18\u0413\u0411 \u043c\u0435\u0441\u0442\u0430.<br \/>  \u2014 31\u041c \u0437\u0430\u043f\u0438\u0441\u0435\u0439 \u043e \u0441\u0442\u0440\u0430\u043d\u0438\u0447\u043a\u0430\u0445 \u0432 \u0432\u0438\u043a\u0438.<br \/>  \u2014 Bzip2 \u0441\u0436\u0438\u043c\u0430\u0435\u0442 \u044d\u0442\u043e\u0442 \u0444\u0430\u0439\u043b \u0432 2\u0413\u0411<br \/>  \u2014 593.045.627 \u0441\u0442\u0440\u043e\u043a \u0432 \u0444\u0430\u0439\u043b\u0435  <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[],"tags":[],"class_list":["post-217821","post","type-post","status-publish","format-standard","hentry"],"_links":{"self":[{"href":"https:\/\/savepearlharbor.com\/index.php?rest_route=\/wp\/v2\/posts\/217821","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/savepearlharbor.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/savepearlharbor.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/savepearlharbor.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/savepearlharbor.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=217821"}],"version-history":[{"count":0,"href":"https:\/\/savepearlharbor.com\/index.php?rest_route=\/wp\/v2\/posts\/217821\/revisions"}],"wp:attachment":[{"href":"https:\/\/savepearlharbor.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=217821"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/savepearlharbor.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=217821"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/savepearlharbor.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=217821"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}