PostgreSQL 事务ID回卷问题分析
概述
PG的 MVCC 机制通过保存数据行多个历史版本并通过记录上的事务 ID 元组(xmin,xmax) 来控制不同版本记录的可见性。在 PG里,事务 ID(XID)是 32 位无符号,顺序生成,单调递增。当到达大值(2^32-1) 后又开始从 3 开始使用。这种现象一般称为事务 ID 回卷(Wraparound)。单纯看这个机制可能看出历史事务修改的记录的可见性可能发生反转,一个记录从“历史记录”变成“未来记录”,数据就莫名其妙的“丢失”了。PG 数据库在碰到这种情形下会直接拒绝服务(报错如下)。
ERROR: database is not accepting commands to avoid wraparound data loss in database "devops"HINT: Stop the postmaster and vacuum that database in single-user mode.PG 事务特性
事务 ID 类型
pgtest=# select locktype, database,transactionid,virtualtransaction,pid,mode,granted from pg_locks where transactionid <> '' ; locktype | database | transactionid | virtualtransaction | pid | mode | granted---------------+----------+---------------+--------------------+------+---------------+--------- transactionid | | 2146496507 | 5/12667 | 2600 | ExclusiveLock | t(1 row)事务 ID 比较
/* * TransactionIdPrecedes --- is id1 logically < id2? */bool TransactionIdPrecedes(TransactionId id1, TransactionId id2) // 结果返回一个bool值{ /* * If either ID is a permanent XID then we can just do unsigned * comparison. If both are normal, do a modulo-2^32 comparison. */ int32 diff; if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2)) //若其中一个不是普通id,则其一定较新(较大) return (id1 < id2); diff = (int32) (id1 - id2); return (diff < );}
PG 表的数据更新
xmin:事务 ID,记录插入时的事务 ID。 xmax:事务 ID,记录删除或更新时的事务 ID。更新操作对应的删除和插入两笔记录。如果为 0,表示还没有删除或者删除事务还没有提交或者已回滚。删除也不会真的删除元组。 ctid:表示元组在表中的物理位置。
pgtest=# create table t1(id bigint not null primary key, c1 varchar(50));CREATE TABLEpgtest=# insert into t1 values(1,'A') returning xmin; xmin------ 1645(1 row)
INSERT 1pgtest=# delete from t1 returning xmax; xmax------ 1646(1 row)
DELETE 1pgtest=# insert into t1 values(2,'B') returning xmin; xmin------ 1647(1 row)
INSERT 1pgtest=# update t1 set c1=c1||'*' where id=2 returning xmax; xmax------ (1 row)
UPDATE 1pgtest=# update t1 set c1=c1||'*' where id=2 returning xmax; xmax------ (1 row)
UPDATE 1查看当前会话的事务 ID,方法如下。数据库里所有会话的事务 ID 都是单调递增不重复的。
pgtest=# SELECT pg_current_xact_id(); pg_current_xact_id-------------------- 1664(1 row)pgtest=# SELECT pg_current_xact_id(); pg_current_xact_id-------------------- 1665(1 row)
pgtest=# CREATE EXTENSION pageinspect;CREATE EXTENSIONSELECT lp, t_ctid AS ctid, t_xmin AS xmin, t_xmax AS xmax, to_hex(t_infomask) AS infomask, to_hex(t_infomask2) AS infomask2, t_attrs AS attrs FROM heap_page_item_attrs(get_raw_page('t1', ), 't1') LIMIT 10; lp | ctid | xmin | xmax | infomask | infomask2 | attrs----+-------+------+------+----------+-----------+--------------------------------------- 1 | (0,1) | 1645 | 1646 | 502 | 2002 | {"\\x0100000000000000","\\x0541"} 2 | (0,3) | 1647 | 1662 | 502 | 4002 | {"\\x0200000000000000","\\x0542"} 3 | (0,4) | 1662 | 1663 | 2102 | c002 | {"\\x0200000000000000","\\x07422a"} 4 | (0,4) | 1663 | 0 | 2802 | 8002 | {"\\x0200000000000000","\\x09422a2a"}(4 rows)冻结事务 ID
pgtest=# select oid, relname, relfrozenxid ,age(relfrozenxid) age, pg_current_xact_id() from pg_class c where c.relkind in ('r', 'm') and relfrozenxid <> and relname in ('t1'); oid | relname | relfrozenxid | age | pg_current_xact_id-------+---------+--------------+-------+-------------------- 18163 | t1 | 1644 | 11895 | 13539(1 row)
pgtest=# select oid, relname, relfrozenxid ,age(relfrozenxid) age, pg_current_xact_id() from pg_class c where c.relkind in ('r', 'm') and relfrozenxid <> and relname in ('t1'); oid | relname | relfrozenxid | age | pg_current_xact_id-------+---------+--------------+-------+-------------------- 18163 | t1 | 1644 | 11905 | 13549(1 row)SELECT c.oid::regclass as table_name, greatest(age(c.relfrozenxid), age(t.relfrozenxid)) as agefrom pg_class c left join pg_class t on c.reltoastrelid = t.oidwhere c.relkind in ('r', 'm') and c.relname like '%t1%'order by age desc ;VACUUM 操作
select name, setting, short_desc from pg_settings where name in ('autovacuum','autovacuum_freeze_max_age','log_autovacuum_min_duration','vacuum_failsafe_age','vacuum_freeze_min_age','vacuum_freeze_table_age');pgtest=# select name, setting, short_desc from pg_settings where name in ('autovacuum','autovacuum_freeze_max_age','log_autovacuum_min_duration','vacuum_failsafe_age','vacuum_freeze_min_age','vacuum_freeze_table_age'); name | setting | short_desc-----------------------------+------------+-------------------------------------------------------------------------------- autovacuum | off | Starts the autovacuum subprocess. autovacuum_freeze_max_age | 200000000 | Age at which to autovacuum a table to prevent transaction ID wraparound. log_autovacuum_min_duration | | Sets the minimum execution time above which autovacuum actions will be logged. vacuum_failsafe_age | 1600000000 | Age at which VACUUM should trigger failsafe to avoid a wraparound outage. vacuum_freeze_min_age | 50000000 | Minimum age at which VACUUM should freeze a table row. vacuum_freeze_table_age | 150000000 | Age at which VACUUM should scan whole table to freeze tuples.(6 rows)autovacuum:表示是否开启自动 vacuum操作。注意,即使是设置为off,为了避免事务 ID回卷带来大问题,PG依然会在合适的时机开启autovacuum。vacuum_freeze_min_age:PG 认为事务年龄超过这个的就应该针对部分记录做 freeze操作。vacuum_freeze_table_age:PG 认为事务年龄超过这个的就应该扫描全表做 freeze操作。autovacuum_freeze_max_age:PG 认为事务年龄超过这个(默认2亿)就自动对表发起 vacuum操作。这是强制的。PG 还有别的参数会尽可能降低自动vacuum操作对数据库读写的性能影响。vacuum_failsafe_age:PG 认为事务年龄超过这个(默认值16亿)就强制对表发起不计成本(不考虑对数据库读写的性能影响,以及减少不必要的索引清理任务)的 vacuum操作。因为此时再不及时做vacuum,后面就危险了。这是 PG 14 增加的参数,是后一道保障。log_autovacuum_min_duration:是否在 PG 日志里记录 vacuum信息。默认是-1不记录,这里为了展示设置为 0(记录所有的vacuum信息)。
PG 事务 ID 回卷测试
autovacuum
postgres@MQBOOK:~$ pg_resetwal -x 2097483648 -D 15/mainWrite-ahead log resetpostgres@MQBOOK:~$ sudo systemctl restart postgresql@15-mainJob for postgresql@15-main.service failed because the service did not take the steps required by its unit configuration.See "systemctl status postgresql@15-main.service" and "journalctl -xeu postgresql@15-main.service" for details.
postgres@MQBOOK:~/15/main/log$ tail -n 50 -f postgresql-2023-04-17_152517.log2023-04-17 15:25:17.823 CST [35990] LOG: starting PG15.2 (Ubuntu 15.2-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0, 64-bit2023-04-17 15:25:17.823 CST [35990] LOG: listening on IPv4 address "127.0.0.1", port 254322023-04-17 15:25:17.832 CST [35990] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.25432"2023-04-17 15:25:17.852 CST [35994] LOG: database system was shut down at 2023-04-17 15:24:56 CST2023-04-17 15:25:17.853 CST [35994] FATAL: could not access status of transaction 20974836482023-04-17 15:25:17.853 CST [35994] DETAIL: Could not open file "pg_xact/07D0": No such file or directory.2023-04-17 15:25:17.855 CST [35990] LOG: startup process (PID 35994) exited with exit code 12023-04-17 15:25:17.855 CST [35990] LOG: aborting startup due to startup process failure2023-04-17 15:25:17.857 CST [35990] LOG: database system is shut down
查看 PG的 pg_xact 目录,确实没有那个日志。这个是正常的, 因为事务 ID 是直接修改过来的,不是靠事务推进的。
postgres@MQBOOK:~/15/main/pg_xact$ ls -lrthtotal 8.K-rw------- 1 postgres postgres 8.0K Apr 17 15:21 0000postgres@MQBOOK:~/15/main/pg_xact$ file 00000000: ASCII text, with very long lines (3406), with no line terminatorspostgres@MQBOOK:~/15/main/pg_xact$
postgres@MQBOOK:~$ dd if=/dev/zero of=15/main/pg_xact/07D bs=8192 count=2020+ records in20+ records out163840 bytes (164 kB, 160 KiB) copied, .000231961 s, 706 MB/s
postgres@MQBOOK:~$ sudo systemctl start postgresql@15-mainpgtest=# SELECT lp, t_ctid AS ctid, t_xmin AS xmin, t_xmax AS xmax, to_hex(t_infomask) AS infomask, to_hex(t_infomask2) AS infomask2, t_attrs AS attrsFROM heap_page_item_attrs(get_raw_page('t1', ), 't1')LIMIT 10; lp | ctid | xmin | xmax | infomask | infomask2 | attrs----+-------+------+------+----------+-----------+--------------------------------------- 1 | | | | | | 2 | | | | | | 3 | | | | | | 4 | (0,4) | 1663 | 0 | 2b02 | 8002 | {"\\x0200000000000000","\\x09422a2a"}(4 rows)
pgtest=# select oid, relname, relfrozenxid ,age(relfrozenxid) age, pg_current_xact_id() from pg_class c where c.relkind in ('r', 'm') and relfrozenxid <> and relname in ('t1'); oid | relname | relfrozenxid | age | pg_current_xact_id-------+---------+--------------+-----+-------------------- 18163 | t1 | 2097483657 | 0 | 2097483657(1 row)pgtest=#
2023-04-17 15:43:44.470 CST [36099] WARNING: bypassing nonessential maintenance of table "pgtest.public.t1" as a failsafe after 0 index scans2023-04-17 15:43:44.470 CST [36099] DETAIL: The table's relfrozenxid or relminmxid is too far in the past.2023-04-17 15:43:44.470 CST [36099] HINT: Consider increasing configuration parameter "maintenance_work_mem" or "autovacuum_work_mem". You might also need to consider other ways for VACUUM to keep up with the allocation of transaction IDs.2023-04-17 15:43:44.471 CST [36099] LOG: automatic aggressive vacuum to prevent wraparound of table "pgtest.public.t1": index scans: pages: removed, 1 remain, 1 scanned (100.00% of total) tuples: 3 removed, 1 remain, are dead but not yet removable removable cutoff: 2097483657, which was XIDs old when operation ended new relfrozenxid: 2097483657, which is 2097482013 XIDs ahead of previous value new relminmxid: 193, which is 30 MXIDs ahead of previous value index scan bypassed by failsafe: 1 pages from table (100.00% of total) have 1 dead item identifiers I/O timings: read: 0.169 ms, write: 0.000 ms avg read rate: 52.301 MB/s, avg write rate: 32.688 MB/s buffer usage: 30 hits, 8 misses, 5 dirtied WAL usage: 3 records, 2 full page images, 1604 bytes system usage: CPU: user: .00 s, system: .00 s, elapsed: .00 s
pgtest=# insert into t1 values(3,'C');INSERT 1pgtest=# insert into t1 values(1,'A*') returning xmin; xmin------------ 2097483705(1 row)
INSERT 1pgtest=# select xmin, xmax, cmin, cmax, ctid from t1; xmin | xmax | cmin | cmax | ctid------------+------+------+------+------- 2097483696 | 0 | | 0 | (,3) 1663 | 0 | | 0 | (,4) 2097483705 | 0 | | 0 | (,5)(3 rows)
lp | ctid | xmin | xmax | infomask | infomask2 | attrs----+-------+------------+------+----------+-----------+--------------------------------------- 1 | | | | | | 2 | | | | | | 3 | (0,3) | 2097483696 | 0 | 902 | 2 | {"\\x0300000000000000","\\x0543"} 4 | (0,4) | 1663 | 0 | 2b02 | 8002 | {"\\x0200000000000000","\\x09422a2a"} 5 | (0,5) | 2097483705 | 0 | 902 | 2 | {"\\x0100000000000000","\\x07412a"}(5 rows)
postgres@MQBOOK:~$ sudo systemctl stop postgresql@15-mainpostgres@MQBOOK:~$postgres@MQBOOK:~$ pg_resetwal -x 2146483648 -D 15/main/Write-ahead log reset
postgres@MQBOOK:~$ sudo systemctl restart postgresql@15-main
postgres@MQBOOK:~/15/main/log$ tail -f postgresql-2023-04-17_161411.log2023-04-17 16:14:11.591 CST [36159] LOG: starting PG15.2 (Ubuntu 15.2-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.3.-1ubuntu1~22.04) 11.3., 64-bit2023-04-17 16:14:11.592 CST [36159] LOG: listening on IPv4 address "127.0.0.1", port 254322023-04-17 16:14:11.601 CST [36159] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.25432"2023-04-17 16:14:11.623 CST [36163] LOG: database system was shut down at 2023-04-17 16:13:39 CST2023-04-17 16:14:11.624 CST [36163] FATAL: could not access status of transaction 21464836482023-04-17 16:14:11.624 CST [36163] DETAIL: Could not open file "pg_xact/07FF": No such file or directory.2023-04-17 16:14:11.625 CST [36159] LOG: startup process (PID 36163) exited with exit code 12023-04-17 16:14:11.625 CST [36159] LOG: aborting startup due to startup process failure2023-04-17 16:14:11.627 CST [36159] LOG: database system is shut down
postgres@MQBOOK:~/15/main/pg_xact$ dd if=/dev/zero of=~/15/main/pg_xact/07FF bs=8192 count=1010+ records in10+ records out81920 bytes (82 kB, 80 KiB) copied, .000203643 s, 402 MB/s
查看表的页记录。
lp | ctid | xmin | xmax | infomask | infomask2 | attrs----+-------+------------+------+----------+-----------+--------------------------------------- 1 | | | | | | 2 | | | | | | 3 | (0,3) | 2097483696 | 0 | 902 | 2 | {"\\x0300000000000000","\\x0543"} 4 | (0,4) | 1663 | 0 | 2b02 | 8002 | {"\\x0200000000000000","\\x09422a2a"} 5 | (0,5) | 2097483705 | 0 | 902 | 2 | {"\\x0100000000000000","\\x07412a"}(5 rows)pgtest=# select oid, relname, relfrozenxid ,age(relfrozenxid) age, pg_current_xact_id() from pg_class c where c.relkind in ('r', 'm') and relfrozenxid <> and relname in ('t1'); oid | relname | relfrozenxid | age | pg_current_xact_id-------+---------+--------------+----------+-------------------- 18163 | t1 | 2097483657 | 49000000 | 2146483657(1 row)
观察发现,除了表的年龄变大了,接近 vacuum_freeze_min_age 值,不够出发 vacuum freeze 条件,所以并没有变化。
继续推进事务 ID 到 4092967296 ,比上次事务 ID 前进了 接近 20亿,离 2^32 还差 2亿以上 。重启实例。
postgres@MQBOOK:~$ pg_resetwal -x 4092967296 -D 15/main/Write-ahead log reset
postgres@MQBOOK:~$ dd if=/dev/zero of=15/main/pg_xact/0F3F bs=8192 count=2020+ records in20+ records out163840 bytes (164 kB, 160 KiB) copied, .000228365 s, 717 MB/s
postgres@MQBOOK:~$ sudo systemctl start postgresql@15-main
此时表的年龄已经达到 autovacuum_freeze_max_age 和 vacuum_failsafe_age 阈值,表也是立即进行 aggresive vacuum 。
10216 2023-04-17 17:13:34.725 CST [729] WARNING: bypassing nonessential maintenance of table "pgtest.public.t1" as a failsafe after 0 index scans10217 2023-04-17 17:13:34.725 CST [729] DETAIL: The table's relfrozenxid or relminmxid is too far in the past.10218 2023-04-17 17:13:34.725 CST [729] HINT: Consider increasing configuration parameter "maintenance_work_mem" or "autovacuum_work_mem".10219 You might also need to consider other ways for VACUUM to keep up with the allocation of transaction IDs.10220 2023-04-17 17:13:34.730 CST [729] LOG: automatic aggressive vacuum to prevent wraparound of table "pgtest.public.t1": index scans: 10221 pages: removed, 1 remain, 1 scanned (100.00% of total)10222 tuples: removed, 3 remain, are dead but not yet removable10223 removable cutoff: 4092967303, which was XIDs old when operation ended10224 new relfrozenxid: 4092967303, which is 1995483646 XIDs ahead of previous value10225 new relminmxid: 211, which is 18 MXIDs ahead of previous value10226 index scan bypassed by failsafe: 1 pages from table (100.00% of total) have 1 dead item identifiers10227 I/O timings: read: 7.039 ms, write: 0.000 ms10228 avg read rate: 8.431 MB/s, avg write rate: 5.269 MB/s10229 buffer usage: 30 hits, 8 misses, 5 dirtied10230 WAL usage: 2 records, 2 full page images, 1270 bytes10231 system usage: CPU: user: .00 s, system: .00 s, elapsed: .00 s
查看表的页和年龄,发生了变化。
lp | ctid | xmin | xmax | infomask | infomask2 | attrs----+-------+------------+------+----------+-----------+--------------------------------------- 1 | | | | | | 2 | | | | | | 3 | (0,3) | 2097483696 | 0 | b02 | 2 | {"\\x0300000000000000","\\x0543"} 4 | (0,4) | 1663 | 0 | 2b02 | 8002 | {"\\x0200000000000000","\\x09422a2a"} 5 | (0,5) | 2097483705 | 0 | b02 | 2 | {"\\x0100000000000000","\\x07412a"}(5 rows)pgtest=# select oid, relname, relfrozenxid ,age(relfrozenxid) age, pg_current_xact_id() from pg_class c where c.relkind in ('r', 'm') and relfrozenxid <> and relname in ('t1'); oid | relname | relfrozenxid | age | pg_current_xact_id-------+---------+--------------+-----+-------------------- 18163 | t1 | 4092967303 | 0 | 4092967303(1 row)
pgtest=# select xmin, xmax, cmin, cmax, ctid from t1; xmin | xmax | cmin | cmax | ctid------------+------+------+------+------- 2097483696 | 0 | | 0 | (,3) 1663 | 0 | | 0 | (,4) 2097483705 | 0 | | 0 | (,5)(3 rows)
事务 ID 回卷
接下来模拟事务 ID 回卷,将事务 ID 推进到 3 。相比上次事务 ID 大概前进了 2亿多。表的年龄达到参数 autovacuum_freeze_max_age 值,触发自动的 vacuum 。
postgres@MQBOOK:~$ pg_resetwal -x 3 -D 15/main/Write-ahead log reset
postgres@MQBOOK:~$ dd if=/dev/zero of=15/main/pg_xact/0000 bs=8192 count=2020+ records in20+ records out163840 bytes (164 kB, 160 KiB) copied, .000218186 s, 751 MB/s
postgres@MQBOOK:~$ sudo systemctl start postgresql@15-main
7714 2023-04-17 17:18:27.632 CST [833] LOG: automatic aggressive vacuum to prevent wraparound of table "pgtest.public.t1": index scans: 1 7715 pages: removed, 1 remain, 1 scanned (100.00% of total)7716 tuples: removed, 3 remain, are dead but not yet removable7717 removable cutoff: 10, which was XIDs old when operation ended7718 new relfrozenxid: 10, which is 202000003 XIDs ahead of previous value7719 new relminmxid: 213, which is 2 MXIDs ahead of previous value7720 index scan needed: 1 pages from table (100.00% of total) had 1 dead item identifiers removed7721 index "t1_pkey": pages: 2 in total, newly deleted, currently deleted, reusable7722 I/O timings: read: 0.040 ms, write: 0.000 ms7723 avg read rate: 431.630 MB/s, avg write rate: 172.652 MB/s7724 buffer usage: 36 hits, 10 misses, 4 dirtied7725 WAL usage: 5 records, 4 full page images, 9865 bytes7726 system usage: CPU: user: .00 s, system: .00 s, elapsed: .00 s
查看表的页和年龄,以及继续更新表,即使事务 ID 回卷了,表都能正常使用。
pgtest=# select xmin, xmax, cmin, cmax, ctid from t1; xmin | xmax | cmin | cmax | ctid------------+------+------+------+------- 2097483696 | 0 | | 0 | (,3) 1663 | 0 | | 0 | (,4) 2097483705 | 0 | | 0 | (,5)(3 rows)
pgtest=# select oid, relname, relfrozenxid ,age(relfrozenxid) age, pg_current_xact_id() from pg_class c where c.relkind in ('r', 'm') and relfrozenxid <> and relname in ('t1'); oid | relname | relfrozenxid | age | pg_current_xact_id-------+---------+--------------+-----+-------------------- 18163 | t1 | 10 | 0 | 10(1 row)
lp | ctid | xmin | xmax | infomask | infomask2 | attrs----+-------+------------+------+----------+-----------+--------------------------------------- 1 | | | | | | 2 | | | | | | 3 | (0,3) | 2097483696 | 0 | b02 | 2 | {"\\x0300000000000000","\\x0543"} 4 | (0,4) | 1663 | 0 | 2b02 | 8002 | {"\\x0200000000000000","\\x09422a2a"} 5 | (0,5) | 2097483705 | 0 | b02 | 2 | {"\\x0100000000000000","\\x07412a"}(5 rows)
pgtest=# insert into t1 values(4,'D');INSERT 1pgtest=# update t1 set c1=c1||'*' where id=3;UPDATE 1pgtest=# select oid, relname, relfrozenxid ,age(relfrozenxid) age, pg_current_xact_id() from pg_class c where c.relkind in ('r', 'm') and relfrozenxid <> and relname in ('t1'); oid | relname | relfrozenxid | age | pg_current_xact_id-------+---------+--------------+-----+-------------------- 18163 | t1 | 10 | 12 | 22(1 row)
lp | ctid | xmin | xmax | infomask | infomask2 | attrs----+-------+------------+------+----------+-----------+--------------------------------------- 1 | (0,1) | 12 | 0 | 902 | 2 | {"\\x0400000000000000","\\x0544"} 2 | | | | | | 3 | (0,6) | 2097483696 | 21 | 302 | 4002 | {"\\x0300000000000000","\\x0543"} 4 | (0,4) | 1663 | 0 | 2b02 | 8002 | {"\\x0200000000000000","\\x09422a2a"} 5 | (0,5) | 2097483705 | 0 | b02 | 2 | {"\\x0100000000000000","\\x07412a"} 6 | (0,6) | 21 | 0 | 2802 | 8002 | {"\\x0300000000000000","\\x07432a"}(6 rows)
事务 ID 冻结失败
整个 PG 的事务 ID 随着业务读写不断推进,同时 PG 内部的 vacuum 机制也会在适当的时候被触发 对老的事务 ID 进行冻结回收。正常情况下,PG 是不会碰到文章开头说的那种报错。
如果碰到了,可能是短事务的 TPS 和只读查询的 QPS 非常高,事务 ID 消耗速度非常快,同时表的数据量非常大, vacuum 速度非常慢并且由于未知原因报错中断了。可能性都出在这个未知原因上。如:日志文件损坏、有未决的 prepared transaction 或者有长事务一直锁表未提交等等。
下面就通过模拟存在 prepared transaction 来阻碍 autovacuum 。
pgtest=# BEGIN;BEGINpgtest=*# select pg_current_xact_id(); pg_current_xact_id-------------------- 55(1 row)pgtest=*# lock t1 in share update exclusive mode;LOCK TABLEpgtest=*# PREPARE TRANSACTION 'trx01';PREPARE TRANSACTION
然后关闭数据库。一般的事务就自动回滚了,但这个 prepared transaction 会一直存在。这个在 PG 里可能会是一个很大的风险。以后碰到再专门讨论。
postgres@MQBOOK:~$ sudo systemctl stop postgresql@15-mainpostgres@MQBOOK:~$postgres@MQBOOK:~$ pg_resetwal -x 2137483648 -D 15/main/Write-ahead log resetpostgres@MQBOOK:~$ dd if=/dev/zero of=15/main/pg_xact/07F6 bs=8192 count=1515+ records in15+ records out122880 bytes (123 kB, 120 KiB) copied, .00023577 s, 521 MB/s
此时 PG 依然启动失败。查看 PG 的日志看到了文章开头的那个报错了。
postgres@MQBOOK:~$ sudo systemctl start postgresql@15-main
10787 2023-04-17 17:39:27.560 CST [972] WARNING: database "pgtest" must be vacuumed within 9999999 transactions10788 2023-04-17 17:39:27.560 CST [972] HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.10789 You might also need to commit or roll back old prepared transactions, or drop stale replication slots. 此时因为事务 ID 还有部分可用,数据库还可以读写。只是读写过程都会提示要尽快做冻结操作。按道理,PG 会自动做 vacuum , 但是这个表由于前面的 prepared transaction 影响,应该是 vacuum 失败了。
pgtest=# select pg_current_xact_id();WARNING: database "pgtest" must be vacuumed within 9999999 transactionsHINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.You might also need to commit or roll back old prepared transactions, or drop stale replication slots. pg_current_xact_id-------------------- 2137483658(1 row)pgtest=# insert into t1 values(7,'E');WARNING: database "pgtest" must be vacuumed within 9999997 transactionsHINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.You might also need to commit or roll back old prepared transactions, or drop stale replication slots.INSERT 1pgtest=# SELECT lp, t_ctid AS ctid, t_xmin AS xmin, t_xmax AS xmax, to_hex(t_infomask) AS infomask, to_hex(t_infomask2) AS infomask2, t_attrs AS attrsFROM heap_page_item_attrs(get_raw_page('t1', ), 't1')LIMIT 10; lp | ctid | xmin | xmax | infomask | infomask2 | attrs----+-------+------------+------+----------+-----------+--------------------------------------- 1 | (0,1) | 12 | 0 | 902 | 2 | {"\\x0400000000000000","\\x0544"} 2 | | | | | | 3 | (0,6) | 2097483696 | 21 | 702 | 4002 | {"\\x0300000000000000","\\x0543"} 4 | (0,4) | 1663 | 0 | 2b02 | 8002 | {"\\x0200000000000000","\\x09422a2a"} 5 | (0,5) | 2097483705 | 0 | b02 | 2 | {"\\x0100000000000000","\\x07412a"} 6 | (0,6) | 21 | 0 | 2902 | 8002 | {"\\x0300000000000000","\\x07432a"} 7 | (0,7) | 2137483660 | 0 | 802 | 2 | {"\\x0700000000000000","\\x0545"}(7 rows)
再继续推进事务 ID 到 2^31 。
postgres@MQBOOK:~$ sudo systemctl stop postgresql@15-mainpostgres@MQBOOK:~$postgres@MQBOOK:~$ pg_resetwal -x 2146483648 -D 15/main/Write-ahead log resetpostgres@MQBOOK:~$ dd if=/dev/zero of=15/main/pg_xact/07FF bs=8192 count=1515+ records in15+ records out122880 bytes (123 kB, 120 KiB) copied, .000276528 s, 444 MB/s
再次启动 PG 实例,果然 PG 就直接罢工了。
postgres@MQBOOK:~$ sudo systemctl start postgresql@15-main
2023-04-17 17:44:38.270 CST [1027] LOG: POWA connected to database powa2023-04-17 17:44:38.280 CST [1027] ERROR: Failed to lock the powa_snapshot_metas record: state : 54000 message: database is not accepting commands to avoid wraparound data loss in database "pgtest" detail : hint : Stop the postmaster and vacuum that database in single-user mod
查看一下数据库此时的库表的年龄。
WITH max_age AS (SELECT 2000000000 as max_old_xid, setting AS autovacuum_freeze_max_ageFROM pg_catalog.pg_settingsWHERE name = 'autovacuum_freeze_max_age' ), per_database_stats AS (SELECT datname, m.max_old_xid::int, m.autovacuum_freeze_max_age::int, age(d.datfrozenxid) AS oldest_current_xidFROM pg_catalog.pg_database dJOIN max_age m ON (true)WHERE d.datallowconn )select max(oldest_current_xid) AS oldest_current_xid, max(ROUND(100*(oldest_current_xid/max_old_xid::float))) AS percent_towards_wraparound, max(ROUND(100*(oldest_current_xid/autovacuum_freeze_max_age::float))) AS percent_towards_emergency_autovacFROM per_database_stats;
oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac--------------------+----------------------------+----------------------------------- 2146483638 | 107 | 1073pgtest=# select oid, relname, relfrozenxid ,age(relfrozenxid) age from pg_class c where c.relkind in ('r', 'm') and relfrozenxid <> and relname in ('t1'); oid | relname | relfrozenxid | age-------+---------+--------------+------------ 18163 | t1 | 10 | 2146483638(1 row)
lp | ctid | xmin | xmax | infomask | infomask2 | attrs----+-------+------------+------+----------+-----------+--------------------------------------- 1 | (0,1) | 12 | 0 | 902 | 2 | {"\\x0400000000000000","\\x0544"} 2 | | | | | | 3 | (0,6) | 2097483696 | 21 | 702 | 4002 | {"\\x0300000000000000","\\x0543"} 4 | (0,4) | 1663 | 0 | 2b02 | 8002 | {"\\x0200000000000000","\\x09422a2a"} 5 | (0,5) | 2097483705 | 0 | b02 | 2 | {"\\x0100000000000000","\\x07412a"} 6 | (0,6) | 21 | 0 | 2902 | 8002 | {"\\x0300000000000000","\\x07432a"} 7 | (0,7) | 2137483660 | 0 | 902 | 2 | {"\\x0700000000000000","\\x0545"}(7 rows)
pgtest=# insert into t1 values(8,'G') returning xmin;ERROR: database is not accepting commands to avoid wraparound data loss in database "pgtest"HINT: Stop the postmaster and vacuum that database in single-user mode.You might also need to commit or roll back old prepared transactions, or drop stale replication slots.pgtest=#pgtest=# vacuum freeze t1;
表的年龄达到了 2^31 了。不允许再开启新的事务。
此时手动对表发起 vacuum freeze 会被阻塞,查看一下 PG 的阻塞情况。
SELECT blocked_locks.pid AS blocked_pid, blocked_activity.usename AS blocked_user, blocking_locks.pid AS blocking_pid, blocking_activity.usename AS blocking_user, blocked_activity.query AS blocked_statement, blocking_activity.query AS current_statement_in_blocking_process FROM pg_catalog.pg_locks blocked_locks JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid JOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid AND blocking_locks.pid != blocked_locks.pid JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid WHERE NOT blocked_locks.granted;
blocked_pid | blocked_user | blocking_pid | blocking_user | blocked_statement | current_statement_in_blocking_process-------------+--------------+--------------+---------------+------------------------------------------------------+------------------------------------------------------ 1030 | | 1075 | postgres | autovacuum: VACUUM public.t1 (to prevent wraparound) | vacuum freeze t1; 1075 | postgres | 1030 | | vacuum freeze t1; | autovacuum: VACUUM public.t1 (to prevent wraparound)(2 rows) select locktype, database , virtualtransaction , pid, mode, waitstart from pg_locks where database = (select oid from pg_database where datname = 'pgtest') and mode like '%Exclusive%'; locktype | database | virtualtransaction | pid | mode | waitstart----------+----------+--------------------+------+--------------------------+------------------------------- relation | 18153 | -1/55 | | ShareUpdateExclusiveLock | relation | 18153 | 4/4 | 1030 | ShareUpdateExclusiveLock | 2023-04-17 17:44:38.280852+08(2 rows)发现 PG 已经开始对表开始 autovacuum ,但锁一直不释放,明显是遇到问题了。这里看不到阻塞的根本原因(前面模拟的 prepared transaction )。有关这种事务特点以后再分析。
pgtest=# rollback prepared 'trx01';ROLLBACK PREPAREDpgtest=# select locktype, database , virtualtransaction , pid, mode, waitstart from pg_locks where database = (select oid from pg_database where datname = 'pgtest') and mode like '%Exclusive%'; locktype | database | virtualtransaction | pid | mode | waitstart----------+----------+--------------------+-----+------+-----------( rows)
总结
本文分析解释了 PG 的事务 ID 的使用和回卷现象特点以及实际问题特征和解决方法。PG 14/15 相比 PG 9/10/11/12 在处理这个问题的 vacuum 策略上更加丰富灵活,所以出现事务 ID 用尽的概率是非常低的。一旦出现,结果很严重,排查也比较麻烦。首先看文件是否损坏、然后看是否有未提交或未决事务(如 prepared transaction ) 存在等等。如果平时表特别大,建议适当调整 vacuum 参数改变 vacuum 的性能和效果。有关 vacuum 的详细调优方法以后碰到再分享。
相关文章