{"id":322208,"date":"2021-04-27T21:00:24","date_gmt":"2021-04-27T21:00:24","guid":{"rendered":"http:\/\/savepearlharbor.com\/?p=322208"},"modified":"-0001-11-30T00:00:00","modified_gmt":"-0001-11-29T21:00:00","slug":"","status":"publish","type":"post","link":"https:\/\/savepearlharbor.com\/?p=322208","title":{"rendered":"On commutativity of addition"},"content":{"rendered":"\n<div class=\"post__text post__text-html post__text_v1\" id=\"post-content-body\">Does an assembly change, if we write (b + a) instead (a + b)?<br \/>  Let&#8217;s check out.<\/p>\n<p>  Let&#8217;s write:  <\/p>\n<pre><code class=\"cpp\">__int128 add1(__int128 a, __int128 b) {     return b + a; }<\/code><\/pre>\n<p>  and compile it with risc-v gcc 8.2.0:<br \/>  <a name=\"habracut\"><\/a><br \/>  <code>add1(__int128, __int128):<br \/>  .LFB0:<br \/>   .cfi_startproc<br \/>   add a0,a2,a0<br \/>   sltu a2,a0,a2<br \/>   add a1,a3,a1<br \/>   add a1,a2,a1<br \/>   ret<\/code><\/p>\n<p>  Now write the following:<\/p>\n<pre><code class=\"cpp\">__int128 add1(__int128 a, __int128 b) {     return a + b; }<\/code><\/pre>\n<p>  And get:<\/p>\n<p>  <code>add1(__int128, __int128):<br \/>  .LFB0:<br \/>   .cfi_startproc<br \/>   mv a5,a0<br \/>   add a0,a0,a2<br \/>   sltu a5,a0,a5<br \/>   add a1,a1,a3<br \/>   add a1,a5,a1<br \/>   ret<br \/>  <\/code><br \/>  The difference is obvious.<\/p>\n<p>  Now do the same using clang (rv64gc trunk). In both cases we get the same result:<br \/>  <code>add1(__int128, __int128): # @add1(__int128, __int128)<br \/>   add a1, a1, a3<br \/>   add a0, a0, a2<br \/>   sltu a2, a0, a2<br \/>   add a1, a1, a2<br \/>   ret<\/code><br \/>  The result is the same we got from gcc in the first case. Compilers are smart now, but not so smart yet.<\/p>\n<p>  Let&#8217;s try to find out, what happened here and why. Arguments of a function __int128 add1(__int128 a, __int128 b) are passed through registers a0-a3 in the following order: a0 is a low word of \u00aba\u00bb operand, a1 is a high word of \u00aba\u00bb, a2 is a low word of \u00abb\u00bb and a1 is the high word of \u00abb\u00bb. The result is returned in the same order, with a low word in a0 and a high word in a1.<\/p>\n<p>  Then high words of two arguments are added and the result is located in a1, and for low words, the result is located in a0. Then the result is compared against a2, i.e. the low word of \u00abb\u00bb operand. It is necessary to find out if an overflow has happened at an adding operation. If an overflow has happened, the result is less than any of the operands. Because the operand in a0 does not exist now, the a2 register is used for comparison. If a0 &lt; a2, the overflow has happened, and a2 is set to \u00ab1\u00bb, and to \u00ab0\u00bb otherwise. Then this bit is added to the hight word of the result. Now the result is located in (a1, a0).<\/p>\n<p>  Completely similar text is generated by Clang (rv32gc trunk) for the 32-bit core, if the function has 64-bit arguments and the result:<\/p>\n<pre><code class=\"cpp\">long long add1(long long a, long long b) {     return a + b; }<\/code><\/pre>\n<p>  The assembler:<br \/>  <code>add1(long long, long long): # @add1(long long, long long)<br \/>   add a1, a1, a3<br \/>   add a0, a0, a2<br \/>   sltu a2, a0, a2<br \/>   add a1, a1, a2<br \/>   ret<\/code><br \/>  There is absolutely the same code. Unfortunately, a type __int128 is not supported by compilers for 32-bit architecture.<\/p>\n<p>  Here there is a slight possibility for the core microarchitecture optimization. Considering the RISC-V architecture standard, a microarchitecture can (but not has to) detect instruction pairs (MULH[[S]U] rdh, rs1, rs2; MUL rdl, rs1, rs2) and (DIV[U] rdq, rs1, rs2; REM[U] rdr, rs1, rs2) to process them as one instruction. Similarly, it is possible to detect the pair (add rdl, rs1, rs2; sltu rdh, rdl, rs1\/rs2) and immediately set the overflow bit in the rdh register.<\/p><\/div>\n<p> \u0441\u0441\u044b\u043b\u043a\u0430 \u043d\u0430 \u043e\u0440\u0438\u0433\u0438\u043d\u0430\u043b \u0441\u0442\u0430\u0442\u044c\u0438 <a href=\"https:\/\/habr.com\/ru\/post\/554760\/\"> https:\/\/habr.com\/ru\/post\/554760\/<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"\n<div class=\"post__text post__text-html post__text_v1\" id=\"post-content-body\">Does an assembly change, if we write (b + a) instead (a + b)?<br \/>  Let&#8217;s check out.<\/p>\n<p>  Let&#8217;s write:  <\/p>\n<pre><code class=\"cpp\">__int128 add1(__int128 a, __int128 b) {     return b + a; }<\/code><\/pre>\n<p>  and compile it with risc-v gcc 8.2.0:  <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[],"tags":[],"class_list":["post-322208","post","type-post","status-publish","format-standard","hentry"],"_links":{"self":[{"href":"https:\/\/savepearlharbor.com\/index.php?rest_route=\/wp\/v2\/posts\/322208","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/savepearlharbor.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/savepearlharbor.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/savepearlharbor.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/savepearlharbor.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=322208"}],"version-history":[{"count":0,"href":"https:\/\/savepearlharbor.com\/index.php?rest_route=\/wp\/v2\/posts\/322208\/revisions"}],"wp:attachment":[{"href":"https:\/\/savepearlharbor.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=322208"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/savepearlharbor.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=322208"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/savepearlharbor.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=322208"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}