{"id":465763,"date":"2025-07-03T09:07:33","date_gmt":"2025-07-03T09:07:33","guid":{"rendered":"http:\/\/savepearlharbor.com\/?p=465763"},"modified":"-0001-11-30T00:00:00","modified_gmt":"-0001-11-29T21:00:00","slug":"","status":"publish","type":"post","link":"https:\/\/savepearlharbor.com\/?p=465763","title":{"rendered":"<span>Whose feature is better, or how to compare the efficiency of SQL query plans<\/span>"},"content":{"rendered":"<div><!--[--><!--]--><\/div>\n<div id=\"post-content-body\">\n<div>\n<div class=\"article-formatted-body article-formatted-body article-formatted-body_version-2\">\n<div xmlns=\"http:\/\/www.w3.org\/1999\/xhtml\">\n<figure class=\"full-width\"><img decoding=\"async\" src=\"https:\/\/habrastorage.org\/r\/w1560\/getpro\/habr\/upload_files\/089\/a2d\/a48\/089a2da484b1256fcd340210045134b4.png\" width=\"1280\" height=\"722\" sizes=\"auto, (max-width: 780px) 100vw, 50vw\" srcset=\"https:\/\/habrastorage.org\/r\/w780\/getpro\/habr\/upload_files\/089\/a2d\/a48\/089a2da484b1256fcd340210045134b4.png 780w,&#10;       https:\/\/habrastorage.org\/r\/w1560\/getpro\/habr\/upload_files\/089\/a2d\/a48\/089a2da484b1256fcd340210045134b4.png 781w\" loading=\"lazy\" decode=\"async\"\/><\/figure>\n<p>The most problematic part of academic papers on various aspects of database system technologies, for me, is the\u00a0performance evaluation\u00a0section. It hardly needs saying that this part of the research is often key for the developer, as it justifies the very reason we\u2019re spending time reading the entire preceding text. Equally important is that this section must ensure reproducibility and allow for independent analysis.<\/p>\n<p>In fields like fluid dynamics and heat engineering, for instance, the theory of similarity has been developed. It allows recording experiment results using dimensionless quantities (e.g., Nusselt, Prandtl, Reynolds numbers) and, by repeating the experiment under slightly different conditions, enables meaningful comparisons.<\/p>\n<p>I haven\u2019t seen anything like that in the field of database systems. The testing section usually includes a brief description of the hardware\/software setup and a set of graphs, where the main evaluated parameter is either\u00a0execution time\u00a0or\u00a0TPS (transactions per second).<\/p>\n<p>This approach seems to be the only viable one when comparing different DBMSs and deciding on which configuration to use in production. However, execution time depends on many factors, including server settings, caching effects, the chosen query plan, and the use of parallelism&#8230;<\/p>\n<p>Now imagine we&#8217;re creating a new query optimization method and want to compare it with one already published. Having only execution-time graphs labeled &#171;before&#187; and &#171;after&#187; with a brief description of the test platform \u2014 like <a href=\"https:\/\/arxiv.org\/abs\/1902.08291\" rel=\"noopener noreferrer nofollow\">here<\/a> or <a href=\"https:\/\/www.vldb.org\/pvldb\/vol16\/p2962-zhang.pdf\" rel=\"noopener noreferrer nofollow\">here<\/a> \u2014 we hit a wall: our figures don&#8217;t align with the published ones due to the large number of unknowns. In this case, we need a measurable parameter that excludes the influence of other DBMS subsystems, is more portable, and is easier to analyze. I&#8217;m sure developers of, for example, a new storage engine would also appreciate removing the optimizer\u2019s impact from their benchmarks.<\/p>\n<p>Trying to reproduce experiments from papers or compare my method to the one proposed by an author, I keep running into the same problem: the uncertainty of the commonly used\u00a0execution time\u00a0metric is too high to draw solid conclusions \u2014 it more accurately reflects the efficiency of code in a specific runtime environment than the quality of the query plan itself.<\/p>\n<p>Execution time is a noisy metric: even running the same benchmark test ten times on the same machine and the same instance can yield significant variation in results. For example, the <a href=\"https:\/\/github.com\/danolivo\/conf\/blob\/main\/Benches\/pages-fetched-criteria\/job_stats1.sh\" rel=\"noopener noreferrer nofollow\">ten consecutive runs<\/a> of all 113 queries from the <a href=\"https:\/\/github.com\/danolivo\/jo-bench\" rel=\"noopener noreferrer nofollow\">Join Order Benchmark<\/a> (JOB) on my desktop showed up to 50% variance in execution time \u2014 and that\u2019s under ideal, fully controlled conditions. What kind of variance will a third-party researcher get, and how are they supposed to analyze it?<\/p>\n<figure class=\"full-width\"><img decoding=\"async\" src=\"https:\/\/habrastorage.org\/r\/w1560\/getpro\/habr\/upload_files\/268\/088\/fbd\/268088fbd4ba6e4d5ce19a6e565048e2.png\" width=\"662\" height=\"410\" sizes=\"auto, (max-width: 780px) 100vw, 50vw\" srcset=\"https:\/\/habrastorage.org\/r\/w780\/getpro\/habr\/upload_files\/268\/088\/fbd\/268088fbd4ba6e4d5ce19a6e565048e2.png 780w,&#10;       https:\/\/habrastorage.org\/r\/w1560\/getpro\/habr\/upload_files\/268\/088\/fbd\/268088fbd4ba6e4d5ce19a6e565048e2.png 781w\" loading=\"lazy\" decode=\"async\"\/><\/figure>\n<p>How are we supposed to compare query plans executed with different numbers of parallel workers? On the test machine, parallelism gives a speed boost \u2014 but in production, where hundreds of backends compete, it may become a bottleneck. So maybe we need a better metric?<\/p>\n<p>In the specific area of query optimization,\u00a0execution time\u00a0feels like an excessive parameter. For comparing different optimization approaches or evaluating the effect of a new transformation in the same PostgreSQL optimizer, it\u2019s better to use a more focused characteristic.<\/p>\n<p>Given that the core operations of a DBMS are data operations, it would be natural to choose a parameter like the number of operations performed on table rows during query execution, accounting for the number of attributes per row. Minimizing this parameter would indicate a more efficient query plan.<\/p>\n<p>However, collecting such statistics is a very complex task. So let\u2019s look for something slightly less precise but more easily retrievable. For instance, DBAs often use the\u00a0number of pages read. A page here means a buffer cache page, which is also a table data block on disk.<\/p>\n<p>It doesn\u2019t make much sense to separately account for pages in RAM versus on disk \u2014 that gives us more information about eviction strategy and disk behavior than about the efficiency of the query plan. For our purpose, it\u2019s sufficient to mechanically sum these values. We should also include temporary disk cache pages used by sorting, hashing, and other algorithms to store rows that didn\u2019t fit in memory.<\/p>\n<p>It\u2019s important to note that the same page sometimes needs to be counted more than once. During sequential scans, we access the page once to read tuples. But during rescan operations (like the inner side of a nested loop join), we read the page again \u2014 and should count it again.<\/p>\n<p>Postgres already provides the infrastructure to measure the number of pages read \u2014 via the\u00a0<code>pg_stat_statements<\/code>extension. I use the following approach: before executing each benchmark query, I run <code>SELECT pg_stat_statements_reset();<\/code><\/p>\n<p>After execution, I retrieve statistics with:<\/p>\n<pre><code class=\"sql\">SELECT  shared_blks_hit + shared_blks_read + local_blks_hit + local_blks_read +  temp_blks_read AS blocks,  total_exec_time::integer AS exec_time FROM pg_stat_statements WHERE query NOT LIKE '%pg_stat_statements_reset%'; <\/code><\/pre>\n<p>How good is this metric?\u00a0In the same experiment shown earlier, all ten JOB test runs showed negligible variance in the number of pages read per query:<\/p>\n<figure class=\"full-width\"><img decoding=\"async\" src=\"https:\/\/habrastorage.org\/r\/w1560\/getpro\/habr\/upload_files\/d96\/c91\/677\/d96c91677cf1b785103ab092db23c56e.png\" width=\"600\" height=\"371\" sizes=\"auto, (max-width: 780px) 100vw, 50vw\" srcset=\"https:\/\/habrastorage.org\/r\/w780\/getpro\/habr\/upload_files\/d96\/c91\/677\/d96c91677cf1b785103ab092db23c56e.png 780w,&#10;       https:\/\/habrastorage.org\/r\/w1560\/getpro\/habr\/upload_files\/d96\/c91\/677\/d96c91677cf1b785103ab092db23c56e.png 781w\" loading=\"lazy\" decode=\"async\"\/><\/figure>\n<p>Just a few pages of difference. Arguably even this minimal variance should be studied, but it\u2019s likely caused by internal operations like stats queries or parallel worker competition.<\/p>\n<p>What does this metric give us?\u00a0Let\u2019s run a simple experiment. Take one query from the JOB test set (10a.sql) and gradually <a href=\"https:\/\/github.com\/danolivo\/conf\/blob\/main\/Benches\/pages-fetched-criteria\/workers_explain.sh\" rel=\"noopener noreferrer nofollow\">increase the number of workers<\/a> involved. Watch how the execution time and the number of pages read change.<\/p>\n<figure class=\"full-width\"><img decoding=\"async\" src=\"https:\/\/habrastorage.org\/r\/w1560\/getpro\/habr\/upload_files\/2c3\/5c7\/b68\/2c35c7b68d30fad6cfdf1f913b431993.png\" width=\"600\" height=\"371\" sizes=\"auto, (max-width: 780px) 100vw, 50vw\" srcset=\"https:\/\/habrastorage.org\/r\/w780\/getpro\/habr\/upload_files\/2c3\/5c7\/b68\/2c35c7b68d30fad6cfdf1f913b431993.png 780w,&#10;       https:\/\/habrastorage.org\/r\/w1560\/getpro\/habr\/upload_files\/2c3\/5c7\/b68\/2c35c7b68d30fad6cfdf1f913b431993.png 781w\" loading=\"lazy\" decode=\"async\"\/><\/figure>\n<p>We\u2019ll see that while the execution time changes, the number of pages read remains largely the same. It only jumps once \u2014 when increasing the number of workers from 1 to 2 \u2014 doubling the pages read. Looking at\u00a0<code>EXPLAIN<\/code>, we see why: with 0 or 1 workers, three out of six joins are nested loops and three are hash joins. With 2 or more workers, one more nested loop appears and one fewer hash join \u2014 meaning the page read metric detects a plan change that execution time alone doesn\u2019t make obvious.<\/p>\n<p>Now let\u2019s use the\u00a0pages read\u00a0metric to measure the effect of PostgreSQL\u2019s <a href=\"https:\/\/github.com\/postgrespro\/aqo\/tree\/stable17\" rel=\"noopener noreferrer nofollow\">AQO<\/a> extension on JOB queries.<\/p>\n<p><a href=\"https:\/\/github.com\/danolivo\/conf\/blob\/main\/Benches\/pages-fetched-criteria\/job_stats_aqo.sh\" rel=\"noopener noreferrer nofollow\">Run<\/a> each JOB query with AQO in\u00a0learn\u00a0mode ten times. In this mode, AQO stores cardinality estimates for each plan node at the end of execution and uses them during planning, allowing the optimizer to reject plans with overly optimistic estimates. Given PostgreSQL\u2019s tendency to\u00a0underestimate\u00a0join cardinalities, this approach seems sound.<\/p>\n<p>The chart below (log scale) shows how the number of pages read changes compared to the first iteration (when the optimizer has no knowledge about node cardinalities or column distinct values). By the tenth iteration, nearly all queries either improve or remain stable \u2014 possibly because the optimizer already chose a good plan early on, or the method didn\u2019t work here.<\/p>\n<figure class=\"full-width\"><img decoding=\"async\" src=\"https:\/\/habrastorage.org\/r\/w1560\/getpro\/habr\/upload_files\/b8d\/6ef\/42d\/b8d6ef42d3429a45c901de8e4999430d.png\" width=\"675\" height=\"417\" sizes=\"auto, (max-width: 780px) 100vw, 50vw\" srcset=\"https:\/\/habrastorage.org\/r\/w780\/getpro\/habr\/upload_files\/b8d\/6ef\/42d\/b8d6ef42d3429a45c901de8e4999430d.png 780w,&#10;       https:\/\/habrastorage.org\/r\/w1560\/getpro\/habr\/upload_files\/b8d\/6ef\/42d\/b8d6ef42d3429a45c901de8e4999430d.png 781w\" loading=\"lazy\" decode=\"async\"\/><\/figure>\n<p>However, six queries show a page count increase compared to the first iteration. Maybe ten iterations weren\u2019t enough to rule out suboptimal plans? Let\u2019s increase the count to 30 and see what happens:<\/p>\n<figure class=\"full-width\"><img decoding=\"async\" src=\"https:\/\/habrastorage.org\/r\/w1560\/getpro\/habr\/upload_files\/815\/fe2\/337\/815fe2337d60e5487ea55e3a4a803ca7.png\" width=\"600\" height=\"371\" sizes=\"auto, (max-width: 780px) 100vw, 50vw\" srcset=\"https:\/\/habrastorage.org\/r\/w780\/getpro\/habr\/upload_files\/815\/fe2\/337\/815fe2337d60e5487ea55e3a4a803ca7.png 780w,&#10;       https:\/\/habrastorage.org\/r\/w1560\/getpro\/habr\/upload_files\/815\/fe2\/337\/815fe2337d60e5487ea55e3a4a803ca7.png 781w\" loading=\"lazy\" decode=\"async\"\/><\/figure>\n<p>The chart shows that the plans converge to an optimal solution. Two queries (26b and 33b) converge to a higher page count than in iteration zero. But their execution time improved by 15\u201320%.\u00a0<code>EXPLAIN<\/code>\u00a0shows why: one nested loop was replaced by a hash join. Since hash joins read entire tables, they increase page reads, but are more efficient time-wise \u2014 especially in parallel mode.<\/p>\n<p>So, page count isn\u2019t a universal metric for plan efficiency. However, within a single DBMS, it provides a stable baseline, lets us repeat experiments across platforms, compare optimization strategies, and detect changes hidden by noisy execution time metrics. We shouldn\u2019t stop publishing execution time in benchmarks \u2014 but maybe we should\u00a0add pages read\u00a0alongside?<\/p>\n<p>With a page count chart, the benchmark script (see above), and a <a href=\"https:\/\/docs.google.com\/spreadsheets\/d\/1-7aI2JHZ7famDsKRgxxKqZ8ZPKYf_poCHUU7wNs3Ays\/edit?usp=sharing\" rel=\"noopener noreferrer nofollow\">link<\/a> to raw data, others can replicate the experiment, align it to the published results, run further studies, or compare with their own method. Isn\u2019t that handy?<\/p>\n<p>That\u2019s it. The main goal of this post is to draw attention to the issues of reproducibility and objective analysis in DBMS research. Should we look for additional benchmark metrics? How useful is the\u00a0pages read\u00a0metric for this purpose? Can we adapt it to compare plans across different, but architecturally similar DBMSs? Could we normalize it by average tuple count per page?<\/p>\n<\/div>\n<\/div>\n<\/div>\n<p><!----><!----><\/div>\n<p><!----><!----><br \/> \u0441\u0441\u044b\u043b\u043a\u0430 \u043d\u0430 \u043e\u0440\u0438\u0433\u0438\u043d\u0430\u043b \u0441\u0442\u0430\u0442\u044c\u0438 <a href=\"https:\/\/habr.com\/ru\/articles\/914698\/\"> https:\/\/habr.com\/ru\/articles\/914698\/<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<div><!--[--><!--]--><\/div>\n<div id=\"post-content-body\">\n<div>\n<div class=\"article-formatted-body article-formatted-body article-formatted-body_version-2\">\n<div xmlns=\"http:\/\/www.w3.org\/1999\/xhtml\">\n<figure class=\"full-width\"><\/figure>\n<p>The most problematic part of academic papers on various aspects of database system technologies, for me, is the\u00a0performance evaluation\u00a0section. It hardly needs saying that this part of the research is often key for the developer, as it justifies the very reason we\u2019re spending time reading the entire preceding text. Equally important is that this section must ensure reproducibility and allow for independent analysis.<\/p>\n<p>In fields like fluid dynamics and heat engineering, for instance, the theory of similarity has been developed. It allows recording experiment results using dimensionless quantities (e.g., Nusselt, Prandtl, Reynolds numbers) and, by repeating the experiment under slightly different conditions, enables meaningful comparisons.<\/p>\n<p>I haven\u2019t seen anything like that in the field of database systems. The testing section usually includes a brief description of the hardware\/software setup and a set of graphs, where the main evaluated parameter is either\u00a0execution time\u00a0or\u00a0TPS (transactions per second).<\/p>\n<p>This approach seems to be the only viable one when comparing different DBMSs and deciding on which configuration to use in production. However, execution time depends on many factors, including server settings, caching effects, the chosen query plan, and the use of parallelism&#8230;<\/p>\n<p>Now imagine we&#8217;re creating a new query optimization method and want to compare it with one already published. Having only execution-time graphs labeled &#171;before&#187; and &#171;after&#187; with a brief description of the test platform \u2014 like <a href=\"https:\/\/arxiv.org\/abs\/1902.08291\" rel=\"noopener noreferrer nofollow\">here<\/a> or <a href=\"https:\/\/www.vldb.org\/pvldb\/vol16\/p2962-zhang.pdf\" rel=\"noopener noreferrer nofollow\">here<\/a> \u2014 we hit a wall: our figures don&#8217;t align with the published ones due to the large number of unknowns. In this case, we need a measurable parameter that excludes the influence of other DBMS subsystems, is more portable, and is easier to analyze. I&#8217;m sure developers of, for example, a new storage engine would also appreciate removing the optimizer\u2019s impact from their benchmarks.<\/p>\n<p>Trying to reproduce experiments from papers or compare my method to the one proposed by an author, I keep running into the same problem: the uncertainty of the commonly used\u00a0execution time\u00a0metric is too high to draw solid conclusions \u2014 it more accurately reflects the efficiency of code in a specific runtime environment than the quality of the query plan itself.<\/p>\n<p>Execution time is a noisy metric: even running the same benchmark test ten times on the same machine and the same instance can yield significant variation in results. For example, the <a href=\"https:\/\/github.com\/danolivo\/conf\/blob\/main\/Benches\/pages-fetched-criteria\/job_stats1.sh\" rel=\"noopener noreferrer nofollow\">ten consecutive runs<\/a> of all 113 queries from the <a href=\"https:\/\/github.com\/danolivo\/jo-bench\" rel=\"noopener noreferrer nofollow\">Join Order Benchmark<\/a> (JOB) on my desktop showed up to 50% variance in execution time \u2014 and that\u2019s under ideal, fully controlled conditions. What kind of variance will a third-party researcher get, and how are they supposed to analyze it?<\/p>\n<figure class=\"full-width\"><\/figure>\n<p>How are we supposed to compare query plans executed with different numbers of parallel workers? On the test machine, parallelism gives a speed boost \u2014 but in production, where hundreds of backends compete, it may become a bottleneck. So maybe we need a better metric?<\/p>\n<p>In the specific area of query optimization,\u00a0execution time\u00a0feels like an excessive parameter. For comparing different optimization approaches or evaluating the effect of a new transformation in the same PostgreSQL optimizer, it\u2019s better to use a more focused characteristic.<\/p>\n<p>Given that the core operations of a DBMS are data operations, it would be natural to choose a parameter like the number of operations performed on table rows during query execution, accounting for the number of attributes per row. Minimizing this parameter would indicate a more efficient query plan.<\/p>\n<p>However, collecting such statistics is a very complex task. So let\u2019s look for something slightly less precise but more easily retrievable. For instance, DBAs often use the\u00a0number of pages read. A page here means a buffer cache page, which is also a table data block on disk.<\/p>\n<p>It doesn\u2019t make much sense to separately account for pages in RAM versus on disk \u2014 that gives us more information about eviction strategy and disk behavior than about the efficiency of the query plan. For our purpose, it\u2019s sufficient to mechanically sum these values. We should also include temporary disk cache pages used by sorting, hashing, and other algorithms to store rows that didn\u2019t fit in memory.<\/p>\n<p>It\u2019s important to note that the same page sometimes needs to be counted more than once. During sequential scans, we access the page once to read tuples. But during rescan operations (like the inner side of a nested loop join), we read the page again \u2014 and should count it again.<\/p>\n<p>Postgres already provides the infrastructure to measure the number of pages read \u2014 via the\u00a0<code>pg_stat_statements<\/code>extension. I use the following approach: before executing each benchmark query, I run <code>SELECT pg_stat_statements_reset();<\/code><\/p>\n<p>After execution, I retrieve statistics with:<\/p>\n<pre><code class=\"sql\">SELECT  shared_blks_hit + shared_blks_read + local_blks_hit + local_blks_read +  temp_blks_read AS blocks,  total_exec_time::integer AS exec_time FROM pg_stat_statements WHERE query NOT LIKE '%pg_stat_statements_reset%'; <\/code><\/pre>\n<p>How good is this metric?\u00a0In the same experiment shown earlier, all ten JOB test runs showed negligible variance in the number of pages read per query:<\/p>\n<figure class=\"full-width\"><\/figure>\n<p>Just a few pages of difference. Arguably even this minimal variance should be studied, but it\u2019s likely caused by internal operations like stats queries or parallel worker competition.<\/p>\n<p>What does this metric give us?\u00a0Let\u2019s run a simple experiment. Take one query from the JOB test set (10a.sql) and gradually <a href=\"https:\/\/github.com\/danolivo\/conf\/blob\/main\/Benches\/pages-fetched-criteria\/workers_explain.sh\" rel=\"noopener noreferrer nofollow\">increase the number of workers<\/a> involved. Watch how the execution time and the number of pages read change.<\/p>\n<figure class=\"full-width\"><\/figure>\n<p>We\u2019ll see that while the execution time changes, the number of pages read remains largely the same. It only jumps once \u2014 when increasing the number of workers from 1 to 2 \u2014 doubling the pages read. Looking at\u00a0<code>EXPLAIN<\/code>, we see why: with 0 or 1 workers, three out of six joins are nested loops and three are hash joins. With 2 or more workers, one more nested loop appears and one fewer hash join \u2014 meaning the page read metric detects a plan change that execution time alone doesn\u2019t make obvious.<\/p>\n<p>Now let\u2019s use the\u00a0pages read\u00a0metric to measure the effect of PostgreSQL\u2019s <a href=\"https:\/\/github.com\/postgrespro\/aqo\/tree\/stable17\" rel=\"noopener noreferrer nofollow\">AQO<\/a> extension on JOB queries.<\/p>\n<p><a href=\"https:\/\/github.com\/danolivo\/conf\/blob\/main\/Benches\/pages-fetched-criteria\/job_stats_aqo.sh\" rel=\"noopener noreferrer nofollow\">Run<\/a> each JOB query with AQO in\u00a0learn\u00a0mode ten times. In this mode, AQO stores cardinality estimates for each plan node at the end of execution and uses them during planning, allowing the optimizer to reject plans with overly optimistic estimates. Given PostgreSQL\u2019s tendency to\u00a0underestimate\u00a0join cardinalities, this approach seems sound.<\/p>\n<p>The chart below (log scale) shows how the number of pages read changes compared to the first iteration (when the optimizer has no knowledge about node cardinalities or column distinct values). By the tenth iteration, nearly all queries either improve or remain stable \u2014 possibly because the optimizer already chose a good plan early on, or the method didn\u2019t work here.<\/p>\n<figure class=\"full-width\"><\/figure>\n<p>However, six queries show a page count increase compared to the first iteration. Maybe ten iterations weren\u2019t enough to rule out suboptimal plans? Let\u2019s increase the count to 30 and see what happens:<\/p>\n<figure class=\"full-width\"><\/figure>\n<p>The chart shows that the plans converge to an optimal solution. Two queries (26b and 33b) converge to a higher page count than in iteration zero. But their execution time improved by 15\u201320%.\u00a0<code>EXPLAIN<\/code>\u00a0shows why: one nested loop was replaced by a hash join. Since hash joins read entire tables, they increase page reads, but are more efficient time-wise \u2014 especially in parallel mode.<\/p>\n<p>So, page count isn\u2019t a universal metric for plan efficiency. However, within a single DBMS, it provides a stable baseline, lets us repeat experiments across platforms, compare optimization strategies, and detect changes hidden by noisy execution time metrics. We shouldn\u2019t stop publishing execution time in benchmarks \u2014 but maybe we should\u00a0add pages read\u00a0alongside?<\/p>\n<p>With a page count chart, the benchmark script (see above), and a <a href=\"https:\/\/docs.google.com\/spreadsheets\/d\/1-7aI2JHZ7famDsKRgxxKqZ8ZPKYf_poCHUU7wNs3Ays\/edit?usp=sharing\" rel=\"noopener noreferrer nofollow\">link<\/a> to raw data, others can replicate the experiment, align it to the published results, run further studies, or compare with their own method. Isn\u2019t that handy?<\/p>\n<p>That\u2019s it. The main goal of this post is to draw attention to the issues of reproducibility and objective analysis in DBMS research. Should we look for additional benchmark metrics? How useful is the\u00a0pages read\u00a0metric for this purpose? Can we adapt it to compare plans across different, but architecturally similar DBMSs? Could we normalize it by average tuple count per page?<\/p>\n<\/div>\n<\/div>\n<\/div>\n<p><!----><!----><\/div>\n<p><!----><!----><br \/> \u0441\u0441\u044b\u043b\u043a\u0430 \u043d\u0430 \u043e\u0440\u0438\u0433\u0438\u043d\u0430\u043b \u0441\u0442\u0430\u0442\u044c\u0438 <a href=\"https:\/\/habr.com\/ru\/articles\/914698\/\"> https:\/\/habr.com\/ru\/articles\/914698\/<\/a><br \/><\/br><\/br><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[],"tags":[],"class_list":["post-465763","post","type-post","status-publish","format-standard","hentry"],"_links":{"self":[{"href":"https:\/\/savepearlharbor.com\/index.php?rest_route=\/wp\/v2\/posts\/465763","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/savepearlharbor.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/savepearlharbor.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/savepearlharbor.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/savepearlharbor.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=465763"}],"version-history":[{"count":0,"href":"https:\/\/savepearlharbor.com\/index.php?rest_route=\/wp\/v2\/posts\/465763\/revisions"}],"wp:attachment":[{"href":"https:\/\/savepearlharbor.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=465763"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/savepearlharbor.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=465763"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/savepearlharbor.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=465763"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}