Попытка создать аналог ASH для PostgreSQL

от автора

Постановка задачи

Для оптимизации запросов PostgreSQL, очень требуется возможность анализировать историю активности, в частности – ожидания, блокировки, статистика таблиц.

Имеющиеся возможности

Инструмент анализа исторической нагрузки или «AWR для Postgres»: очень интересное решение, но, нет истории pg_stat_activity и pg_locks.

Расширение pgsentinel :
«Вся накопленная информация хранится только в оперативной памяти, а потребляемый объём памяти регулируется количеством последних хранимых записей.
Добавляется поле queryid — тот самый queryid из расширения pg_stat_statements (требуется предварительная установка).
«
Это конечно сильно бы помогло, но самая неприятность именно первый пункт “Вся накопленная информация хранится только в оперативной памяти ”, т.е. имеем место импакт на целевую базу. К тому, же нет истории блокировок и статистики таблиц. Т.е. решение вообще говоря неполное: “Готового пакета для установки пока нет. Предлагается скачать исходники и собрать библиотеку самостоятельно. Предварительно требуется установить «devel»-пакет для своего сервера и в переменную PATH прописать путь до pg_config.”.

В общем – возни много, а в случае серьезных продакшн баз, может быть, и не будет возможности что-то делать с сервером. Нужно опять, придумывать, что-то свое.

Предупреждение.
В силу довольно большого объема и в связи с незавершением периода тестирования, статья носит в основном ознакомительный характер, скорее как набор тезисов и результатов.
Более подробный материал, будет подготовлен позже, по частям

Эскизные требования к решению

Необходимо разработать инструмент позволяющий хранить:
Историю представления pg_stat_activity
Историю блокировок сессий, используя представление pg_locks

Требование к решению–минимизировать влияние на целевую базу данных.

Общая идея–агент сбора данных запускается не в целевой базе, а в базе данных мониторинга как сервис systemd. Да, возможны некоторые потери данных, но это не критично, для отчетности, зато нет импакта на целевую базу по памяти и месту на диске. А в случае использования пула соединений, импакт по пользовательским процессам минимален.

Этапы реализации

1.Сервисные таблицы

Для хранения таблиц используется отдельная схема, что бы не затруднять анализ основных использованных таблиц.
DROP SCHEMA IF EXISTS activity_hist;
CREATE SCHEMA activity_hist AUTHORIZATION monitor ;

Важно: схема создается не в целевой базе данных, а в базе данных мониторинга.
История представления pg_stat_activity
Для хранения текущих снимков представления pg_stat_activity используется таблица

activity_hist.history_pg_stat_activity :

—ACTIVITY_HIST.HISTORY_PG_STAT_ACTIVITY
DROP TABLE IF EXISTS activity_hist.history_pg_stat_activity;
CREATE TABLE activity_hist.history_pg_stat_activity
(
timepoint timestamp without time zone,
datid oid,
datname name,
pid integer,
usesysid oid,
usename name,
application_name text,
client_addr inet,
client_hostname text,
client_port integer,
backend_start timestamp with time zone,
xact_start timestamp with time zone,
query_start timestamp with time zone,
state_change timestamp with time zone,
wait_event_type text,
wait_event text,
state text,
backend_xid xid,
backend_xmin xid,
query text,
backend_type text,
query_md5hash text,
queryid bigint
);

Для ускорения вставки – никаких индексов и ограничений.

Для хранения непосредственно истории используется секционированная таблица

activity_hist.archive_pg_stat_activity :

DROP TABLE IF EXISTS activity_hist.archive_pg_stat_activity;
CREATE TABLE activity_hist.archive_pg_stat_activity
(
timepoint timestamp without time zone,
datid oid,
datname name,
pid integer,
usesysid oid,
usename name,
application_name text,
client_addr inet,
client_hostname text,
client_port integer,
backend_start timestamp with time zone,
xact_start timestamp with time zone,
query_start timestamp with time zone,
state_change timestamp with time zone,
wait_event_type text,
wait_event text,
state text,
backend_xid xid,
backend_xmin xid,
query text,
backend_type text,
query_md5hash text,
queryid bigint
)
PARTITION BY RANGE (timepoint);

Поскольку в данном случае нет требований по скорости вставки, созданы некоторые индексы, для ускорения создания отчетов.

История блокировок сессий
Для хранения текущих снимков блокировок сессий используется таблица

activity_hist.history_locking :

—ACTIVITY_HIST.HISTORY_LOCKING
DROP TABLE IF EXISTS activity_hist.history_locking;
CREATE TABLE activity_hist.history_locking
(
timepoint timestamp without time zone,
locktype text,
relation oid,
mode text,
tid xid,
vtid text,
pid integer,
blocking_pids integer[],
granted boolean
);

Также, для ускорения вставки – никаких индексов и ограничений.

Для хранения непосредственно истории используется секционированная таблица

activity_hist.archive_locking:

DROP TABLE IF EXISTS activity_hist.archive_locking;
CREATE TABLE activity_hist.archive_locking
(
timepoint timestamp without time zone,
locktype text,
relation oid,
mode text,
tid xid,
vtid text,
pid integer,
blocking_pids integer[],
granted boolean
)
PARTITION BY RANGE (timepoint);

Поскольку в данном случае нет требований по скорости вставки, созданы некоторые индексы, для ускорения создания отчетов.

2.Заполнение текущей истории.

Для непосредственного сбора снимков представления используется bash скрипт, запускающий plpgsql функцию.

get_current_activity.sh

#!/bin/bash ######################################################### #get_current_activity.sh  ERROR_FILE='/home/demon/get_current_activity'$(date +%Y%m%d-)'T'$(date +%H)$(date +%M)$(date +%S) host=$1 s_name=$2 s_pass=$3  psql  -A -t -q -v ON_ERROR_STOP=1 -c "SELECT activity_hist.get_current_activity( '$host' , '$s_name' , '$s_pass' )" >/dev/null 2>$ERROR_FILE  line_count=`cat $ERROR_FILE | wc -l` if [[ $line_count != '0' ]]; then     rm -f /home/demon/*.err >/dev/null 2>/dev/null 	cp $ERROR_FILE $ERROR_FILE'.err' >/dev/null 2>/dev/null   fi rm $ERROR_FILE >/dev/null 2>/dev/null exit 0

plpgsql функция по dblink обращается к представлениям в целевой базе данных и вставляет строки в сервисные таблицы в базе данных мониторинга.

get_current_activity.sql

CREATE OR REPLACE FUNCTION activity_hist.get_current_activity( current_host text , current_s_name text , current_s_pass text ) RETURNS BOOLEAN AS $$ DECLARE    database_rec record;   dblink_str text ; BEGIN     	EXECUTE 'SELECT dblink_connect(''LINK1'',''host='||current_host||' port=5432 dbname=postgres'|| 	                                         ' user='||current_s_name||' password='||current_s_pass|| ' '')';    -------------------------------------------------------------------- --GET pg_stat_activity stats 	INSERT INTO activity_hist.history_pg_stat_activity 	( 		SELECT * FROM dblink('LINK1', 			'SELECT  			now() ,  			datid             ,  			datname           , 			pid               , 			usesysid              , 			usename              , 			application_name     , 			client_addr          , 			client_hostname      , 			client_port       , 			backend_start         , 			xact_start            , 			query_start           , 			state_change          , 			wait_event_type    ,                      			wait_event         ,                    			state              ,                   			backend_xid         ,                  			backend_xmin        ,                 			query              ,                			backend_type   			 		FROM pg_stat_activity 		')  		AS t ( 		    timepoint 		  timestamp without time zone ,			 			datid             oid  ,  			datname           name , 			pid               integer, 			usesysid          oid    , 			usename           name   , 			application_name  text   , 			client_addr       inet   , 			client_hostname   text   , 			client_port       integer, 			backend_start     timestamp with time zone , 			xact_start        timestamp with time zone , 			query_start       timestamp with time zone , 			state_change      timestamp with time zone , 			wait_event_type   text ,                      			wait_event        text ,                    			state             text ,                   			backend_xid       xid  ,                  			backend_xmin      xid  ,                 			query             text ,                			backend_type      text 			 		) 	);  ---------------------------------------	 --ACTIVITY_HIST.HISTORY_LOCKING	 	INSERT INTO activity_hist.history_locking 	( 		SELECT * FROM dblink('LINK1', 			'SELECT  			now() ,  			lock.locktype, 			lock.relation, 			lock.mode, 			lock.transactionid as tid, 			lock.virtualtransaction as vtid, 			lock.pid, 			pg_blocking_pids(lock.pid),  			lock.granted 			FROM 	pg_catalog.pg_locks lock LEFT JOIN pg_catalog.pg_database db ON db.oid = lock.database 			WHERE NOT lock.pid = pg_backend_pid()	 		')  		AS t ( 			timepoint timestamp without time zone , 			locktype text , 			relation oid ,  			mode text , 			tid xid , 			vtid text , 			pid integer , 			blocking_pids integer[] , 			granted boolean 		) 	); 	PERFORM dblink_disconnect('LINK1'); 	 	RETURN TRUE ; END $$ LANGUAGE plpgsql;

Для сбора снимков представления используется systemd service, и два скрипта:

pg_current_activity.service

# /etc/systemd/system/pg_current_activity.service [Unit] Description=Collect history of pg_stat_activity , pg_locks  Wants=pg_current_activity.timer  [Service] Type=forking StartLimitIntervalSec=0 ExecStart=/home/postgres/pgutils/demon/get_current_activity.sh 10.124.70.40 postgres postgres  [Install] WantedBy=multi-user.target
pg_current_activity.timer

# /etc/systemd/system/pg_current_activity.timer [Unit] Description=Run pg_current_activity.sh every 1 second Requires=pg_current_activity.service  [Timer] Unit=pg_current_activity.service OnCalendar=*:*:0/1 AccuracySec=1  [Install] WantedBy=timers.target

Назначаем права скриптам:
# chmod 755 pg_current_activity.timer
# chmod 755 pg_current_activity.service

Запускаем сервис:
# systemctl daemon-reload
# systemctl start pg_current_activity.service

Таким образом, история представлений собирается в виде ежесекундных снимков. Конечно, если все оставить как есть, таблицы очень быстро увеличатся в размерах и более менее продуктивная работа станет невозможна.

Нужно организовать архивирование данных.

3.Архивирование истории

Для архивация используется секционированные таблицы archive*.
Новые секции создаются каждый час, при этом старые данные из таблиц history* удаляются, таким образом, размер таблиц history* сильно не меняется и скорость вставки не деградируется со временем.
Создание новых секций выполняется plpgsql функцией activity_hist.archive_current_activity. Алгоритм работы очень прост( на примере секции для таблицы archive_pg_stat_activity ).

Создаём и заполняем новую секцию

EXECUTE format( 'CREATE TABLE ' || partition_name ||  ' PARTITION OF activity_hist.archive_pg_stat_activity FOR VALUES FROM ( %L ) TO ( %L ) ' ,  to_char(date_trunc('year', partition_min_range ),'YYYY')||'-'|| to_char(date_trunc('month', partition_min_range ),'MM')||'-'|| to_char(date_trunc('day', partition_min_range ),'DD')||' '|| to_char(date_trunc('hour', partition_min_range ),'HH24')||':00',  to_char(date_trunc('year', partition_max_range ),'YYYY')||'-'|| to_char(date_trunc('month', partition_max_range ),'MM')||'-'|| to_char(date_trunc('day', partition_max_range ),'DD')||' '|| to_char(date_trunc('hour', partition_max_range ),'HH24')||':00' );  INSERT INTO activity_hist.archive_pg_stat_activity ( 	SELECT 	*  	FROM 	activity_hist.history_pg_stat_activity 	WHERE 	timepoint BETWEEN partition_min_range AND partition_max_range 		 );

Создаем индексы

EXECUTE format	( 'CREATE INDEX '||index_name|| ' ON '||partition_name||' ( wait_event_type , backend_type , timepoint )'  );  EXECUTE format	('CREATE INDEX '||index_name|| ' ON '||partition_name||' ( wait_event_type , backend_type , timepoint , queryid )'  );

Удаляем старые данные из таблицы history_pg_stat_activity

DELETE  FROM 	activity_hist.history_pg_stat_activity WHERE 	timepoint < partition_max_range;

Разумеется периодически, старые секции, за ненадобностью удаляются.

Базовые отчеты.

Собственно для чего все это делается. Для получения отчетов очень отдаленно, приближенно напоминающих AWR Oracle.

Важно добавить, для получения отчетов необходимо построить связь между представлениями pg_stat_activity и pg_stat_statements. Таблицы связываются путем добавления в таблицы ‘history_pg_stat_activity’, ‘archive_pg_stat_activity’ столбца ‘queryid’. Способ добавления значения столбца, выходит за рамки данной статьи и будет подробно описан чуть позже.

TOTAL CPU TIME FOR QUERIES

Запрос :

WITH hist AS ( SELECT  	aa.query ,aa.queryid ,			 	count(*) * interval '1 second' AS duration  FROM 	activity_hist.archive_pg_stat_activity aa WHERE timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND  pg_stat_history_end+(current_hour_diff * interval '1 hour')  AND backend_type = 'client backend' AND datname != 'postgres' AND	( aa.wait_event_type IS NULL  ) ANDaa.state = 'active' GROUP BY aa.wait_event_type , aa.wait_event , aa.query ,aa.queryid		 UNION  SELECT  	ha.query ,ha.queryid, 	count(*) * interval '1 second' AS duration  FROM 	activity_hist.history_pg_stat_activity_for_reports ha WHERE timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour')  AND 	backend_type = 'client backend' AND datname != 'postgres' AND ( ha.wait_event_type IS NULL  )AND ha.state = 'active' GROUP BY ha.wait_event_type , ha.wait_event , ha.query ,ha.queryid		 ) SELECT 	query , queryid , SUM( duration ) as duration  FROM hist GROUP BY  query , queryid  ORDER BY 3 DESC

Пример:

------------------------------------------------------------------- | TOTAL CPU TIME FOR QUERIES : 07:47:36 +----+----------------------------------------+-------------------- |   #|                                 queryid|            duration +----+----------------------------------------+-------------------- |   1|                      389015618226997618|            04:28:58 |   2|                                        |            01:07:29 |   3|                     1237430309438971376|            00:59:38 |   4|                     4710212362688288619|            00:50:48 |   5|                       28942442626229688|            00:15:50 |   6|                     9150846928388977274|            00:04:46 |   7|                    -6572922443698419129|            00:00:06 |   8|                                        |            00:00:01 +----+----------------------------------------+-------------------- 

TOTAL WAITINGS TIME FOR QUERIES

Запрос :

WITH hist AS ( SELECT  	aa.query ,aa.queryid ,			 	count(*) * interval '1 second' AS duration  FROM 	activity_hist.archive_pg_stat_activity aa WHERE timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour')  AND  	backend_type = 'client backend' AND datname != 'postgres' AND 	( aa.wait_event_type IS NOT NULL  )  GROUP BY aa.wait_event_type , aa.wait_event , aa.query ,aa.queryid		 UNION  SELECT  	ha.query ,ha.queryid, 	count(*) * interval '1 second' AS duration  FROM 	activity_hist.history_pg_stat_activity_for_reports ha WHERE timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour')  AND  	backend_type = 'client backend' AND datname != 'postgres' AND				 	( ha.wait_event_type IS NOT NULL  ) GROUP BY ha.wait_event_type , ha.wait_event , ha.query ,ha.queryid		 ) SELECT 	query , queryid , SUM( duration ) as duration  FROM hist GROUP BY  query , queryid  ORDER BY 3 DESC 

Пример :

------------------------------------------------------------------- | TOTAL WAITINGS TIME FOR QUERIES : 21:55:04 +----+----------------------------------------+-------------------- |   #|                                 queryid|            duration +----+----------------------------------------+-------------------- |   1|                      389015618226997618|            16:19:05 |   2|                                        |            03:47:04 |   3|                     8085340880788646241|            00:40:20 |   4|                     4710212362688288619|            00:13:35 |   5|                     9150846928388977274|            00:12:25 |   6|                       28942442626229688|            00:11:32 |   7|                     1237430309438971376|            00:09:45 |   8|                     2649515222348904837|            00:09:37 |   9|                                        |            00:03:45 |  10|                     3167065002719415275|            00:02:20 |  11|                     5731212217001535134|            00:02:13 |  12|                     8304755792398128062|            00:01:31 |  13|                     2649515222348904837|            00:00:59 |  14|                     2649515222348904837|            00:00:22 |  15|                                        |            00:00:12 |  16|                     3422818749220588372|            00:00:08 |  17|                    -5730801771815999400|            00:00:03 |  18|                    -1473395109729441239|            00:00:02 |  19|                     2404820632950544954|            00:00:02 |  20|                    -6572922443698419129|            00:00:02 |  21|                     2369289265278398647|            00:00:01 |  22|                      180077086776069052|            00:00:01 +----+----------------------------------------+-------------------- 

WAITINGS FOR QUERIES

Запросы:

WITH hist AS ( SELECT  	aa.wait_event_type , aa.wait_event  FROM 	activity_hist.archive_pg_stat_activity aa WHERE timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND  	backend_type = 'client backend' AND datname != 'postgres' AND 	aa.wait_event IS NOT NULL  GROUP BY aa.wait_event_type , aa.wait_event UNION  SELECT  	ha.wait_event_type , ha.wait_event  FROM 	activity_hist.history_pg_stat_activity_for_reports ha WHERE timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND  	backend_type = 'client backend' AND datname != 'postgres' AND 	ha.wait_event IS NOT NULL  GROUP BY ha.wait_event_type , ha.wait_event		 ) SELECT 	wait_event_type , wait_event  FROM hist GROUP BY wait_event_type , wait_event ORDER BY 1 ASC,2 ASC  ----------------------------------------------------------------------  WITH hist AS ( SELECT  	aa.wait_event_type , aa.wait_event , aa.query ,aa.queryid ,			 	count(*) * interval '1 second' AS duration  FROM 	activity_hist.archive_pg_stat_activity aa WHERE timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND  	backend_type = 'client backend' AND datname != 'postgres' AND 	( aa.wait_event_type = waitings_stat_rec.wait_event_type AND aa.wait_event = waitings_stat_rec.wait_event ) GROUP BY aa.wait_event_type , aa.wait_event , aa.query ,aa.queryid		 UNION  SELECT  	ha.wait_event_type , ha.wait_event , ha.query ,ha.queryid, 	count(*) * interval '1 second' AS duration  FROM 	activity_hist.history_pg_stat_activity_for_reports ha WHERE timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND  	backend_type = 'client backend' AND datname != 'postgres' AND				 	( ha.wait_event_type = waitings_stat_rec.wait_event_type AND ha.wait_event = waitings_stat_rec.wait_event ) GROUP BY ha.wait_event_type , ha.wait_event , ha.query ,ha.queryid		 ) SELECT 	query , queryid , SUM( duration ) as duration  FROM hist GROUP BY  query , queryid  ORDER BY 3 DESC

Пример:

------------------------------------------------ | WAITINGS FOR QUERIES +----------------------------------------------- |                      wait_event_type = Client| |                       wait_event = ClientRead| |                        Total time  = 00:46:56| ------------------------------------------------ |    #|             queryid|            duration +-----+--------------------+-------------------- |    1| 8085340880788646241|            00:40:20 |    2|                    |            00:03:45 |    3| 5731212217001535134|            00:01:53 |    4|                    |            00:00:12 |    5| 9150846928388977274|            00:00:09 |    6| 3422818749220588372|            00:00:08 |    7| 1237430309438971376|            00:00:06 |    8|   28942442626229688|            00:00:05 |    9| 4710212362688288619|            00:00:05 |   10|-5730801771815999400|            00:00:03 |   11| 8304755792398128062|            00:00:02 |   12|-6572922443698419129|            00:00:02 |   13|-1473395109729441239|            00:00:02 |   14| 2404820632950544954|            00:00:02 |   15|  180077086776069052|            00:00:01 |   16| 2369289265278398647|            00:00:01  +----------------------------------------------- |                          wait_event_type = IO| |                      wait_event = BufFileRead| |                        Total time  = 00:00:38| ------------------------------------------------ |    #|             queryid|            duration +-----+--------------------+-------------------- |    1|   28942442626229688|            00:00:38  +----------------------------------------------- 

LOCKED PROCESSES HISTORY

Запрос:

SELECT  MIN(date_trunc('second',timepoint)) AS started ,  	count(*) * interval '1 second' as duration , 	pid , blocking_pids , relation , mode , locktype 	  FROM  	activity_hist.archive_locking al  WHERE  	timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND 	NOT granted AND  	locktype = 'relation'  GROUP BY pid , blocking_pids , relation , mode , locktype			 UNION SELECT  	MIN(date_trunc('second',timepoint)) AS started ,  	count(*) * interval '1 second' as duration , 	pid , blocking_pids , relation , mode , locktype FROM  	activity_hist.history_locking  WHERE  	timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND 	NOT granted AND  	locktype = 'relation'  GROUP BY pid , blocking_pids , relation , mode , locktype			 ORDER BY 1

Пример:

------------------------------------------------------------------------------------------------------------------------------------- | LOCKED PROCESSES HISTORY +-----+----------+--------------------+----------+--------------------+--------------------+--------------------+-------------------- |    #|       pid|             started|  duration|       blocking_pids|            relation|                mode|            locktype +-----+----------+--------------------+----------+--------------------+--------------------+--------------------+-------------------- |    1|     26224| 2019-09-02 19:32:16|  00:01:45|             {26211}|               16541|     AccessShareLock|            relation |    2|     26390| 2019-09-02 19:34:03|  00:00:53|             {26211}|               16541|     AccessShareLock|            relation |    3|     26391| 2019-09-02 19:34:03|  00:00:53|             {26211}|               16541|     AccessShareLock|            relation |    4|     26531| 2019-09-02 19:35:27|  00:00:12|             {26211}|               16541|     AccessShareLock|            relation |    5|     27284| 2019-09-02 19:44:02|  00:00:19|             {27276}|               16541|     AccessShareLock|            relation |    6|     27283| 2019-09-02 19:44:02|  00:00:19|             {27276}|               16541|     AccessShareLock|            relation |    7|     27286| 2019-09-02 19:44:02|  00:00:19|             {27276}|               16541|     AccessShareLock|            relation |    8|     27423| 2019-09-02 19:45:24|  00:00:12|             {27394}|               16541|     AccessShareLock|            relation |    9|     27648| 2019-09-02 19:48:06|  00:00:20|             {27647}|               16541|     AccessShareLock|            relation |   10|     27650| 2019-09-02 19:48:06|  00:00:20|             {27647}|               16541|     AccessShareLock|            relation |   11|     27735| 2019-09-02 19:49:08|  00:00:06|             {27650}|               16541| AccessExclusiveLock|            relation |   12|     28380| 2019-09-02 19:56:03|  00:01:56|             {28379}|               16541|     AccessShareLock|            relation |   13|     28379| 2019-09-02 19:56:03|  00:00:01|               28377|               16541| AccessExclusiveLock|            relation |     |          |                    |          |               28376|                    |  

BLOCKING PROCESSES HISTORY

Запросы:

SELECT  blocking_pids  FROM  	activity_hist.archive_locking al  WHERE  	timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND 	NOT granted AND  	locktype = 'relation'  GROUP BY blocking_pids 		 UNION SELECT  	blocking_pids  FROM  	activity_hist.history_locking  WHERE  	timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND 	NOT granted AND  	locktype = 'relation'  GROUP BY blocking_pids 		 ORDER BY 1  ---------------------------------------------------------------  SELECT  	pid , usename , application_name , datname , 	MIN(date_trunc('second',timepoint)) as started ,  	count(*) * interval '1 second' as duration ,		  	state ,  	query 				FROM  	activity_hist.archive_pg_stat_activity 				WHERE 	pid= current_pid AND  						timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') 						  				GROUP BY pid , usename , application_name ,  						datname ,  						state_change,  						state ,  						query  				UNION 				SELECT  					pid , usename , application_name , datname , 					MIN(date_trunc('second',timepoint)) as started ,  					count(*) * interval '1 second' as duration ,		  					state ,  					query 				FROM  	activity_hist.history_pg_stat_activity_for_reports 				WHERE 	pid= current_pid AND  						timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') 						  				GROUP BY pid , usename , application_name ,  						datname ,  						state_change,  						state ,  						query  				ORDER BY 5 , 1

Пример:

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- BLOCKING PROCESSES HISTORY +----+----------+----------+--------------------+----------+--------------------+--------------------+------------------------------+---------------------------------------- |   #|       pid|   usename|    application_name|   datname|             started|            duration|                         state|                                   query +----+----------+----------+--------------------+----------+--------------------+--------------------+------------------------------+---------------------------------------- |   1|     26211|     tuser|                psql|      tdb1| 2019-09-02 19:31:54|            00:00:04|                          idle| |   2|     26211|     tuser|                psql|      tdb1| 2019-09-02 19:31:58|            00:00:06|           idle in transaction|                                  begin; |   3|     26211|     tuser|                psql|      tdb1| 2019-09-02 19:32:16|            00:01:45|           idle in transaction|                  lock table wafer_data; |   4|     26211|     tuser|                psql|      tdb1| 2019-09-02 19:35:54|            00:01:23|                          idle|                                 commit; |   5|     26211|     tuser|                psql|      tdb1| 2019-09-02 19:38:46|            00:00:02|           idle in transaction|                                  begin; |   6|     26211|     tuser|                psql|      tdb1| 2019-09-02 19:38:54|            00:00:08|           idle in transaction|                  lock table wafer_data; |   7|     26211|     tuser|                psql|      tdb1| 2019-09-02 19:39:08|            00:42:42|                          idle|                                 commit; |   8|     26211|     tuser|                psql|      tdb1| 2019-09-03 07:12:07|            00:00:52|                        active|                     select test_del (); 

Развитие.

Показанные базовые запросы и получаемые отчеты, уже сильно облегчают жизнь при анализе инцидентов производительности.
На основе базовых запросов, можно получить отчет, отдаленно приближенно напоминающий AWR Oracle.

Пример сводного отчета

+------------------------------------------------------------------------------------ | CONSOLIDATED REPORT FOR ACTIVITY AND WAITINGS . DATETIME : 03.09.2019 14:08 |------------------------------------------------------------------------------------ | HOST :10.124.70.40 | BEGIN_SNAPSHOT :02.09.2019 14:08 END_SNAPSHOT : 03.09.2019 14:00 |------------------------------------------------------------------------------------ | CURRENT DATABASES SIZE  : | DATABASE :monitor | SIZE (MB) :1370.00 |------------------------------------------------------------------------------------ | CLUSTER CPU TIME : 19:44:22 | CLUSTER WAITINGS TIME : 78:49:16 | | SQL DBTIME : 65:53:09 | SQL CPU TIME : 19:05:21 | SQL WAITINGS TIME : 21:50:46 | SQL IOTIME : 20:53:00 | SQL READ TIME : 20:52:55 | SQL WRITE TIME : 00:00:05 | | SQL CALLS : 311293 ------------------------------------------------------------- | SQL SHARED BLOCKS READS : 13351563334 | SQL SHARED BLOCKS HITS : 2775427045 | SQL SHARED BLOCKS HITS/READS % : 20.79 | SQL SHARED BLOCKS DIRTED : 21105 | SQL SHARED BLOCKS WRITTEN : 3656 | | SQL TEMPORARY BLOCKS READS : 7464932 | SQL TEMPORARY BLOCKS WRITTEN : 10176024 ------------------------------------------------------------- | | WAITINGS STATICTICS | +------------------------------------------------------------------------------------ | TOP 10 WAITINGS BY TOTAL WAIT TIME FOR SYSTEM PROCESSES +-----+------------------------------+--------------------+-------------------- |    #|               wait_event_type|          wait_event|            duration +-----+------------------------------+--------------------+-------------------- |    1|                      Activity| LogicalLauncherMain|            11:21:01 |    2|                      Activity|    CheckpointerMain|            11:20:35 |    3|                      Activity|      AutoVacuumMain|            11:20:31 |    4|                      Activity|       WalWriterMain|            11:19:35 |    5|                      Activity|        BgWriterMain|            10:14:19 |    6|                      Activity|   BgWriterHibernate|            01:06:04 |    7|                      Activity|       WalSenderMain|            00:04:05 |    8|                        Client|         ClientWrite|            00:04:00 |    9|                            IO|        BufFileWrite|            00:02:45 |   10|                        LWLock|      buffer_mapping|            00:02:14 +-----+------------------------------+--------------------+-------------------- | TOP 10 WAITINGS BY TOTAL WAIT TIME FOR CLIENTS PROCESSES +-----+------------------------------+--------------------+--------------------+---------- |    #|               wait_event_type|          wait_event|            duration|  % dbtime +-----+------------------------------+--------------------+--------------------+---------- |    1|                          Lock|       transactionid|            11:55:37|      18.1 |    2|                            IO|        DataFileRead|            07:19:43|     11.12 |    3|                        Client|          ClientRead|            00:46:54|      1.19 |    4|                          Lock|            relation|            00:40:37|      1.03 |    5|                        LWLock|      buffer_mapping|            00:31:08|      0.79 |    6|                        LWLock|           buffer_io|            00:22:12|      0.56 |    7|                       Timeout|             PgSleep|            00:10:58|      0.28 |    8|                          Lock|               tuple|            00:01:30|      0.04 |    9|                            IO|        BufFileWrite|            00:01:16|      0.03 |   10|                            IO|         BufFileRead|            00:00:37|      0.02 +-----+------------------------------+--------------------+--------------------+---------- | WAITINGS TYPES BY TOTAL WAIT TIME, FOR SYSTEM PROCESSES +-----+------------------------------+-------------------- |    #|               wait_event_type|            duration +-----+------------------------------+-------------------- |    1|                      Activity|            56:46:10 |    2|                            IO|            00:05:13 |    3|                        Client|            00:04:00 |    4|                        LWLock|            00:03:07 +-----+------------------------------+-------------------- | WAITINGS TYPES BY TOTAL WAIT TIME, FOR CLIENTS PROCESSES +-----+------------------------------+--------------------+-------------------- |    #|               wait_event_type|            duration|            % dbtime +-----+------------------------------+--------------------+-------------------- |    1|                          Lock|            12:37:44|               19.17 |    2|                            IO|            07:21:40|               11.17 |    3|                        LWLock|            00:53:26|                1.35 |    4|                        Client|            00:46:54|                1.19 |    5|                       Timeout|            00:10:58|                0.28 |    6|                           IPC|            00:00:04|                   0 +-----+------------------------------+--------------------+-------------------- | WAITINGS FOR SYSTEM PROCESSES +-----+-----------------------------+----------+--------------------+----------------------+-------------------- |    #|                 backend_type|    dbname|     wait_event_type|            wait_event|            duration +-----+-----------------------------+----------+--------------------+----------------------+-------------------- |    1| logical replication launcher|          |            Activity|   LogicalLauncherMain|            11:21:01 |    2|                 checkpointer|          |            Activity|      CheckpointerMain|            11:20:35 |    3|          autovacuum launcher|          |            Activity|        AutoVacuumMain|            11:20:31 |    4|                    walwriter|          |            Activity|         WalWriterMain|            11:19:35 |    5|            background writer|          |            Activity|          BgWriterMain|            10:14:19 |    6|            background writer|          |            Activity|     BgWriterHibernate|            01:06:04 |    7|                    walsender|          |            Activity|         WalSenderMain|            00:04:05 |    8|                    walsender|          |              Client|           ClientWrite|            00:04:00 |    9|              parallel worker|      tdb1|                  IO|          BufFileWrite|            00:02:45 |   10|              parallel worker|      tdb1|              LWLock|        buffer_mapping|            00:02:05 |   11|              parallel worker|      tdb1|                  IO|          DataFileRead|            00:01:10 |   12|              parallel worker|      tdb1|                  IO|           BufFileRead|            00:01:05 |   13|              parallel worker|      tdb1|              LWLock|             buffer_io|            00:00:45 |   14|            autovacuum worker|      tdb1|              LWLock|        buffer_mapping|            00:00:09 |   15|                    walwriter|          |                  IO|              WALWrite|            00:00:08 |   16|                    walwriter|          |              LWLock|          WALWriteLock|            00:00:04 |   17|            background writer|          |              LWLock|          WALWriteLock|            00:00:03 |   18|            background writer|          |                  IO|              WALWrite|            00:00:02 |   19|            background writer|          |                  IO|         DataFileWrite|            00:00:02 |   20|                 checkpointer|          |                  IO| ControlFileSyncUpdate|            00:00:01 |   21|            autovacuum worker|      tdb1|              LWLock|             buffer_io|            00:00:01 +-----+-----------------------------+----------+--------------------+----------------------+-------------------- | WAITINGS FOR SQL +-----+-------------------------+----------+--------------------+--------------------+--------------------+---------- |    #|                  queryid|    dbname|     wait_event_type|          wait_event|            duration|  % dbtime +-----+-------------------------+----------+--------------------+--------------------+--------------------+---------- |    1|       389015618226997618|      tdb1|                Lock|       transactionid|            09:47:43|     14.87 |    2|       389015618226997618|      tdb1|                  IO|        DataFileRead|            05:47:07|      8.78 |    3|                         |      tdb1|                Lock|       transactionid|            02:07:54|      3.24 |    4|                         |      tdb1|                  IO|        DataFileRead|            01:30:24|      2.29 |    5|      8085340880788646241|      tdb1|              Client|          ClientRead|            00:40:20|      1.02 |    6|       389015618226997618|      tdb1|              LWLock|      buffer_mapping|            00:20:41|      0.52 |    7|       389015618226997618|      tdb1|              LWLock|           buffer_io|            00:17:30|      0.44 |    8|      2649515222348904837|      tdb1|             Timeout|             PgSleep|            00:10:58|      0.28 |    9|      4710212362688288619|      tdb1|                Lock|            relation|            00:10:44|      0.27 |   10|      9150846928388977274|      tdb1|                Lock|            relation|            00:10:24|      0.26 |   11|        28942442626229688|      tdb1|                Lock|            relation|            00:07:48|       0.2 |   12|      1237430309438971376|      tdb1|                Lock|            relation|            00:07:32|      0.19 |   13|                         |      tdb1|              LWLock|      buffer_mapping|            00:04:32|      0.11 |   14|                         |      tdb1|              LWLock|           buffer_io|            00:04:13|      0.11 |   15|                         |      tdb1|              Client|          ClientRead|            00:03:57|       0.1 |   16|      4710212362688288619|      tdb1|              LWLock|      buffer_mapping|            00:02:26|      0.06 |   17|      3167065002719415275|      tdb1|                Lock|            relation|            00:02:20|      0.06 |   18|      5731212217001535134|      tdb1|              Client|          ClientRead|            00:01:53|      0.05 |   19|      1237430309438971376|      tdb1|              LWLock|      buffer_mapping|            00:01:42|      0.04 |   20|       389015618226997618|      tdb1|                Lock|               tuple|            00:01:30|      0.04 |   21|      8304755792398128062|      tdb1|                Lock|            relation|            00:01:29|      0.04 |   22|        28942442626229688|      tdb1|                  IO|        BufFileWrite|            00:01:16|      0.03 |   23|      9150846928388977274|      tdb1|                  IO|        DataFileRead|            00:01:07|      0.03 |   24|        28942442626229688|      tdb1|              LWLock|      buffer_mapping|            00:01:03|      0.03 |   25|      9150846928388977274|      tdb1|              LWLock|      buffer_mapping|            00:00:44|      0.02 |   26|        28942442626229688|      tdb1|                  IO|         BufFileRead|            00:00:37|      0.02 |   27|        28942442626229688|      tdb1|              LWLock|           buffer_io|            00:00:25|      0.01 |   28|      1237430309438971376|      tdb1|                  IO|        DataFileRead|            00:00:24|      0.01 |   29|        28942442626229688|      tdb1|                  IO|        DataFileRead|            00:00:22|      0.01 |   30|      5731212217001535134|      tdb1|                Lock|            relation|            00:00:20|      0.01 |   31|      4710212362688288619|      tdb1|                  IO|        DataFileRead|            00:00:19|      0.01 |   32|      9150846928388977274|      tdb1|              Client|          ClientRead|            00:00:09|         0 |   33|      3422818749220588372|      tdb1|              Client|          ClientRead|            00:00:08|         0 |   34|      1237430309438971376|      tdb1|              Client|          ClientRead|            00:00:06|         0 |   35|       389015618226997618|      tdb1|              LWLock|      buffer_content|            00:00:05|         0 |   36|      4710212362688288619|      tdb1|              Client|          ClientRead|            00:00:05|         0 |   37|      4710212362688288619|      tdb1|              LWLock|           buffer_io|            00:00:04|         0 |   38|        28942442626229688|      tdb1|              Client|          ClientRead|            00:00:04|         0 |   39|        28942442626229688|      tdb1|                 IPC|      ParallelFinish|            00:00:03|         0 |   40|       389015618226997618|      tdb1|                  IO|       DataFileWrite|            00:00:02|         0 |   41|     -5730801771815999400|      tdb1|              Client|          ClientRead|            00:00:02|         0 |   42|      2404820632950544954|      tdb1|              Client|          ClientRead|            00:00:02|         0 |   43|     -6572922443698419129|      tdb1|              Client|          ClientRead|            00:00:02|         0 |   44|      8304755792398128062|      tdb1|              Client|          ClientRead|            00:00:02|         0 |   45|     -1473395109729441239|      tdb1|              Client|          ClientRead|            00:00:02|         0 |   46|                         |      tdb1|              LWLock|      buffer_content|            00:00:01|         0 |   47|       180077086776069052|      tdb1|              Client|          ClientRead|            00:00:01|         0 |   48|                         |      tdb1|                  IO|       DataFileWrite|            00:00:01|         0 |   49|        28942442626229688|      tdb1|                 IPC| MessageQueueReceive|            00:00:01|         0 |   50|      2369289265278398647|      tdb1|              Client|          ClientRead|            00:00:01|         0 |   51|      9150846928388977274|      tdb1|                  IO|       DataFileWrite|            00:00:01|         0 +-----+-------------------------+----------+--------------------+--------------------+--------------------+---------- | | CLIENT SQL STATICTICS | +------------------------------------------------------------------------------------ | CLIENT SQL ordered by Elapsed Time +--------------------+----------+----------+----------+----------+----------+-------------------- |        elapsed time|     calls|  % dbtime|     % CPU|      % IO|    dbname|             queryid +--------------------+----------+----------+----------+----------+----------+-------------------- |            06:43:19|        36|      10.2|     85.09|     17.38|      tdb1|  389015618226997618 |            02:06:53|       715|      3.21|      0.85|      0.06|      tdb1| 1237430309438971376 |            01:52:07|       720|      2.84|      1.19|      0.08|      tdb1| 4710212362688288619 |            00:39:03|       357|      0.99|      1.02|      0.33|      tdb1|   28942442626229688 |            00:22:00|         8|      0.56|      0.96|         0|      tdb1| 2649515222348904837 +--------------------+----------+----------+----------+----------+----------+-------------------- | CLIENT SQL ordered by CPU Time +--------------------+----------+----------+----------+----------+----------+----------+-------------------- |            cpu time|     calls|  % dbtime|total_time|     % CPU|      % IO|    dbname|             queryid +--------------------+----------+----------+----------+----------+----------+----------+-------------------- |            16:14:38|        36|      10.2|  06:43:19|     85.09|     17.38|      tdb1|  389015618226997618 |            00:13:38|       720|      2.84|  01:52:07|      1.19|      0.08|      tdb1| 4710212362688288619 |            00:11:39|       357|      0.99|  00:39:03|      1.02|      0.33|      tdb1|   28942442626229688 |            00:10:58|         8|      0.56|  00:22:00|      0.96|         0|      tdb1| 2649515222348904837 |            00:09:44|       715|      3.21|  02:06:53|      0.85|      0.06|      tdb1| 1237430309438971376 +--------------------+----------+----------+----------+----------+----------+----------+-------------------- | CLIENT SQL ordered by User I/O Wait Time +--------------------+----------+----------+----------+----------+----------+----------+-------------------- |        io_wait time|     calls|  % dbtime|total_time|     % CPU|      % IO|    dbname|             queryid +--------------------+----------+----------+----------+----------+----------+----------+-------------------- |            05:47:09|        36|      10.2|  06:43:19|     85.09|     17.38|      tdb1|  389015618226997618 |            00:02:15|       357|      0.99|  00:39:03|      1.02|      0.33|      tdb1|   28942442626229688 |            00:00:24|       715|      3.21|  02:06:53|      0.85|      0.06|      tdb1| 1237430309438971376 |            00:00:19|       720|      2.84|  01:52:07|      1.19|      0.08|      tdb1| 4710212362688288619 |            00:00:00|         8|      0.56|  00:22:00|      0.96|         0|      tdb1| 2649515222348904837 +--------------------+----------+----------+----------+----------+----------+----------+-------------------- | CLIENT SQL ordered by Shared Buffers Reads +--------------------+----------+----------+----------+----------+----------+----------+-------------------- |       buffers reads|     calls|  % dbtime|total_time|     % CPU|      % IO|    dbname|             queryid +--------------------+----------+----------+----------+----------+----------+----------+-------------------- |          2562353244|        36|      10.2|  06:43:19|     85.09|     17.38|      tdb1|  389015618226997618 |            11041689|       357|      0.99|  00:39:03|      1.02|      0.33|      tdb1|   28942442626229688 |             3303551|       715|      3.21|  02:06:53|      0.85|      0.06|      tdb1| 1237430309438971376 |             3242892|       720|      2.84|  01:52:07|      1.19|      0.08|      tdb1| 4710212362688288619 |                   0|         8|      0.56|  00:22:00|      0.96|         0|      tdb1| 2649515222348904837 +--------------------+----------+----------+----------+----------+----------+----------+-------------------- | CLIENT SQL ordered by Disk Reads Time +--------------------+----------+----------+----------+----------+----------+----------+-------------------- |           read time|     calls|  % dbtime|total_time|     % CPU|      % IO|    dbname|             queryid +--------------------+----------+----------+----------+----------+----------+----------+-------------------- |            03:37:46|        36|      10.2|  06:43:19|     85.09|     17.38|      tdb1|  389015618226997618 |            00:04:07|       357|      0.99|  00:39:03|      1.02|      0.33|      tdb1|   28942442626229688 |            00:00:59|       720|      2.84|  01:52:07|      1.19|      0.08|      tdb1| 4710212362688288619 |            00:00:42|       715|      3.21|  02:06:53|      0.85|      0.06|      tdb1| 1237430309438971376 |            00:00:00|         8|      0.56|  00:22:00|      0.96|         0|      tdb1| 2649515222348904837 +--------------------+----------+----------+----------+----------+----------+----------+-------------------- | CLIENT SQL ordered by Executions +--------------------+----------+----------+----------+----------+----------+----------+-------------------- |               calls|      rows|  % dbtime|total_time|     % CPU|      % IO|    dbname|             queryid +--------------------+----------+----------+----------+----------+----------+----------+-------------------- |                 720|       720|      2.84|  01:52:07|      1.19|      0.08|      tdb1| 4710212362688288619 |                 715|       715|      3.21|  02:06:53|      0.85|      0.06|      tdb1| 1237430309438971376 |                 357|         0|      0.99|  00:39:03|      1.02|      0.33|      tdb1|   28942442626229688 |                  36|        36|      10.2|  06:43:19|     85.09|     17.38|      tdb1|  389015618226997618 |                   8|         8|      0.56|  00:22:00|      0.96|         0|      tdb1| 2649515222348904837 +--------------------+----------+----------+----------+----------+----------+----------+-------------------- | Complete List of SQL Text ---------------------------------------------- … 

Продолжение – следует. На очереди формирование истории блокировок (pg_stat_locks), более подробное описание процесса заполнения таблиц.


ссылка на оригинал статьи https://habr.com/ru/post/467181/


Комментарии

Добавить комментарий

Ваш адрес email не будет опубликован. Обязательные поля помечены *