btrfs
by btrfs
CVEs (21)
| CVE | Sev | Risk | CVSS | EPSS | KEV | Published | Description |
|---|---|---|---|---|---|---|---|
| CVE-2026-43117 | Cri | 0.59 | 9.1 | 0.00 | May 6, 2026 | In the Linux kernel, the following vulnerability has been resolved: btrfs: tracepoints: get correct superblock from dentry in event btrfs_sync_file() If overlay is used on top of btrfs, dentry->d_sb translates to overlay's super block and fsid assignment will lead to a crash. Use file_inode(file)->i_sb to always get btrfs_sb. | |
| CVE-2026-43361 | Med | 0.36 | 5.5 | 0.00 | May 8, 2026 | In the Linux kernel, the following vulnerability has been resolved: btrfs: fix transaction abort when snapshotting received subvolumes Currently a user can trigger a transaction abort by snapshotting a previously received snapshot a bunch of times until we reach a BTRFS_UUID_KEY_RECEIVED_SUBVOL item overflow (the maximum item size we can store in a leaf). This is very likely not common in practice, but if it happens, it turns the filesystem into RO mode. The snapshot, send and set_received_subvol and subvol_setflags (used by receive) don't require CAP_SYS_ADMIN, just inode_owner_or_capable(). A malicious user could use this to turn a filesystem into RO mode and disrupt a system. Reproducer script: $ cat test.sh #!/bin/bash DEV=/dev/sdi MNT=/mnt/sdi # Use smallest node size to make the test faster. mkfs.btrfs -f --nodesize 4K $DEV mount $DEV $MNT # Create a subvolume and set it to RO so that it can be used for send. btrfs subvolume create $MNT/sv touch $MNT/sv/foo btrfs property set $MNT/sv ro true # Send and receive the subvolume into snaps/sv. mkdir $MNT/snaps btrfs send $MNT/sv | btrfs receive $MNT/snaps # Now snapshot the received subvolume, which has a received_uuid, a # lot of times to trigger the leaf overflow. total=500 for ((i = 1; i <= $total; i++)); do echo -ne "\rCreating snapshot $i/$total" btrfs subvolume snapshot -r $MNT/snaps/sv $MNT/snaps/sv_$i > /dev/null done echo umount $MNT When running the test: $ ./test.sh (...) Create subvolume '/mnt/sdi/sv' At subvol /mnt/sdi/sv At subvol sv Creating snapshot 496/500ERROR: Could not create subvolume: Value too large for defined data type Creating snapshot 497/500ERROR: Could not create subvolume: Read-only file system Creating snapshot 498/500ERROR: Could not create subvolume: Read-only file system Creating snapshot 499/500ERROR: Could not create subvolume: Read-only file system Creating snapshot 500/500ERROR: Could not create subvolume: Read-only file system And in dmesg/syslog: $ dmesg (...) [251067.627338] BTRFS warning (device sdi): insert uuid item failed -75 (0x4628b21c4ac8d898, 0x2598bee2b1515c91) type 252! [251067.629212] ------------[ cut here ]------------ [251067.630033] BTRFS: Transaction aborted (error -75) [251067.630871] WARNING: fs/btrfs/transaction.c:1907 at create_pending_snapshot.cold+0x52/0x465 [btrfs], CPU#10: btrfs/615235 [251067.632851] Modules linked in: btrfs dm_zero (...) [251067.644071] CPU: 10 UID: 0 PID: 615235 Comm: btrfs Tainted: G W 6.19.0-rc8-btrfs-next-225+ #1 PREEMPT(full) [251067.646165] Tainted: [W]=WARN [251067.646733] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 [251067.648735] RIP: 0010:create_pending_snapshot.cold+0x55/0x465 [btrfs] [251067.649984] Code: f0 48 0f (...) [251067.653313] RSP: 0018:ffffce644908fae8 EFLAGS: 00010292 [251067.653987] RAX: 00000000ffffff01 RBX: ffff8e5639e63a80 RCX: 00000000ffffffd3 [251067.655042] RDX: ffff8e53faa76b00 RSI: 00000000ffffffb5 RDI: ffffffffc0919750 [251067.656077] RBP: ffffce644908fbd8 R08: 0000000000000000 R09: ffffce644908f820 [251067.657068] R10: ffff8e5adc1fffa8 R11: 0000000000000003 R12: ffff8e53c0431bd0 [251067.658050] R13: ffff8e5414593600 R14: ffff8e55efafd000 R15: 00000000ffffffb5 [251067.659019] FS: 00007f2a4944b3c0(0000) GS:ffff8e5b27dae000(0000) knlGS:0000000000000000 [251067.660115] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [251067.660943] CR2: 00007ffc5aa57898 CR3: 00000005813a2003 CR4: 0000000000370ef0 [251067.661972] Call Trace: [251067.662292] <TASK> [251067.662653] create_pending_snapshots+0x97/0xc0 [btrfs] [251067.663413] btrfs_commit_transaction+0x26e/0xc00 [btrfs] [251067.664257] ? btrfs_qgroup_convert_reserved_meta+0x35/0x390 [btrfs] [251067.665238] ? _raw_spin_unlock+0x15/0x30 [251067.665837] ? record_root_ ---truncated--- | |
| CVE-2026-43360 | Med | 0.36 | 5.5 | 0.00 | May 8, 2026 | In the Linux kernel, the following vulnerability has been resolved: btrfs: fix transaction abort on file creation due to name hash collision If we attempt to create several files with names that result in the same hash, we have to pack them in same dir item and that has a limit inherent to the leaf size. However if we reach that limit, we trigger a transaction abort and turns the filesystem into RO mode. This allows for a malicious user to disrupt a system, without the need to have administration privileges/capabilities. Reproducer: $ cat exploit-hash-collisions.sh #!/bin/bash DEV=/dev/sdi MNT=/mnt/sdi # Use smallest node size to make the test faster and require fewer file # names that result in hash collision. mkfs.btrfs -f --nodesize 4K $DEV mount $DEV $MNT # List of names that result in the same crc32c hash for btrfs. declare -a names=( 'foobar' '%a8tYkxfGMLWRGr55QSeQc4PBNH9PCLIvR6jZnkDtUUru1t@RouaUe_L:@xGkbO3nCwvLNYeK9vhE628gss:T$yZjZ5l-Nbd6CbC$M=hqE-ujhJICXyIxBvYrIU9-TDC' 'AQci3EUB%shMsg-N%frgU:02ByLs=IPJU0OpgiWit5nexSyxZDncY6WB:=zKZuk5Zy0DD$Ua78%MelgBuMqaHGyKsJUFf9s=UW80PcJmKctb46KveLSiUtNmqrMiL9-Y0I_l5Fnam04CGIg=8@U:Z' 'CvVqJpJzueKcuA$wqwePfyu7VxuWNN3ho$p0zi2H8QFYK$7YlEqOhhb%:hHgjhIjW5vnqWHKNP4' 'ET:vk@rFU4tsvMB0$C_p=xQHaYZjvoF%-BTc%wkFW8yaDAPcCYoR%x$FH5O:' 'HwTon%v7SGSP4FE08jBwwiu5aot2CFKXHTeEAa@38fUcNGOWvE@Mz6WBeDH_VooaZ6AgsXPkVGwy9l@@ZbNXabUU9csiWrrOp0MWUdfi$EZ3w9GkIqtz7I_eOsByOkBOO' 'Ij%2VlFGXSuPvxJGf5UWy6O@1svxGha%b@=%wjkq:CIgE6u7eJOjmQY5qTtxE2Rjbis9@us' 'KBkjG5%9R8K9sOG8UTnAYjxLNAvBmvV5vz3IiZaPmKuLYO03-6asI9lJ_j4@6Xo$KZicaLWJ3Pv8XEwVeUPMwbHYWwbx0pYvNlGMO9F:ZhHAwyctnGy%_eujl%WPd4U2BI7qooOSr85J-C2V$LfY' 'NcRfDfuUQ2=zP8K3CCF5dFcpfiOm6mwenShsAb_F%n6GAGC7fT2JFFn:c35X-3aYwoq7jNX5$ZJ6hI3wnZs$7KgGi7wjulffhHNUxAT0fRRLF39vJ@NvaEMxsMO' 'Oj42AQAEzRoTxa5OuSKIr=A_lwGMy132v4g3Pdq1GvUG9874YseIFQ6QU' 'Ono7avN5GjC:_6dBJ_' 'WHmN2gnmaN-9dVDy4aWo:yNGFzz8qsJyJhWEWcud7$QzN2D9R0efIWWEdu5kwWr73NZm4=@CoCDxrrZnRITr-kGtU_cfW2:%2_am' 'WiFnuTEhAG9FEC6zopQmj-A-$LDQ0T3WULz%ox3UZAPybSV6v1Z$b4L_XBi4M4BMBtJZpz93r9xafpB77r:lbwvitWRyo$odnAUYlYMmU4RvgnNd--e=I5hiEjGLETTtaScWlQp8mYsBovZwM2k' 'XKyH=OsOAF3p%uziGF_ZVr$ivrvhVgD@1u%5RtrV-gl_vqAwHkK@x7YwlxX3qT6WKKQ%PR56NrUBU2dOAOAdzr2=5nJuKPM-T-$ZpQfCL7phxQbUcb:BZOTPaFExc-qK-gDRCDW2' 'd3uUR6OFEwZr%ns1XH_@tbxA@cCPmbBRLdyh7p6V45H$P2$F%w0RqrD3M0g8aGvWpoTFMiBdOTJXjD:JF7=h9a_43xBywYAP%r$SPZi%zDg%ql-KvkdUCtF9OLaQlxmd' 'ePTpbnit%hyNm@WELlpKzNZYOzOTf8EQ$sEfkMy1VOfIUu3coyvIr13-Y7Sv5v-Ivax2Go_GQRFMU1b3362nktT9WOJf3SpT%z8sZmM3gvYQBDgmKI%%RM-G7hyrhgYflOw%z::ZRcv5O:lDCFm' 'evqk743Y@dvZAiG5J05L_ROFV@$2%rVWJ2%3nxV72-W7$e$-SK3tuSHA2mBt$qloC5jwNx33GmQUjD%akhBPu=VJ5g$xhlZiaFtTrjeeM5x7dt4cHpX0cZkmfImndYzGmvwQG:$euFYmXn$_2rA9mKZ' 'gkgUtnihWXsZQTEkrMAWIxir09k3t7jk_IK25t1:cy1XWN0GGqC%FrySdcmU7M8MuPO_ppkLw3=Dfr0UuBAL4%GFk2$Ma10V1jDRGJje%Xx9EV2ERaWKtjpwiZwh0gCSJsj5UL7CR8RtW5opCVFKGGy8Cky' 'hNgsG_8lNRik3PvphqPm0yEH3P%%fYG:kQLY=6O-61Wa6nrV_WVGR6TLB09vHOv%g4VQRP8Gzx7VXUY1qvZyS' 'isA7JVzN12xCxVPJZ_qoLm-pTBuhjjHMvV7o=F:EaClfYNyFGlsfw-Kf%uxdqW-kwk1sPl2vhbjyHU1A6$hz' 'kiJ_fgcdZFDiOptjgH5PN9-PSyLO4fbk_:u5_2tz35lV_iXiJ6cx7pwjTtKy-XGaQ5IefmpJ4N_ZqGsqCsKuqOOBgf9LkUdffHet@Wu' 'lvwtxyhE9:%Q3UxeHiViUyNzJsy:fm38pg_b6s25JvdhOAT=1s0$pG25x=LZ2rlHTszj=gN6M4zHZYr_qrB49i=pA--@WqWLIuX7o1S_SfS@2FSiUZN' 'rC24cw3UBDZ=5qJBUMs9e$=S4Y94ni%Z8639vnrGp=0Hv4z3dNFL0fBLmQ40=EYIY:Z=SLc@QLMSt2zsss2ZXrP7j4=' 'uwGl2s-fFrf@GqS=DQqq2I0LJSsOmM%xzTjS:lzXguE3wChdMoHYtLRKPvfaPOZF2fER@j53evbKa7R%A7r4%YEkD=kicJe@SFiGtXHbKe4gCgPAYbnVn' 'UG37U6KKua2bgc:IHzRs7BnB6FD:2Mt5Cc5NdlsW%$1tyvnfz7S27FvNkroXwAW:mBZLA1@qa9WnDbHCDmQmfPMC9z-Eq6QT0jhhPpqyymaD:R02ghwYo%yx7SAaaq-:x33LYpei$5g8DMl3C' 'y2vjek0FE1PDJC0qpfnN:x8k2wCFZ9xiUF2ege=JnP98R%wxjKkdfEiLWvQzmnW' '8-HCSgH5B%K7P8_jaVtQhBXpBk:pE-$P7ts58U0J@iR9YZntMPl7j$s62yAJO@_9eanFPS54b=UTw$94C-t=HLxT8n6o9P=QnIxq-f1=Ne2dvhe6WbjEQtc' 'YPPh:IFt2mtR6XWSmjHptXL_hbSYu8bMw-JP8@PNyaFkdNFsk$M=xfL6LDKCDM-mSyGA_2MBwZ8Dr4=R1D%7-mC ---truncated--- | |
| CVE-2026-43358 | Med | 0.36 | 5.5 | 0.00 | May 8, 2026 | In the Linux kernel, the following vulnerability has been resolved: btrfs: add missing RCU unlock in error path in try_release_subpage_extent_buffer() Call rcu_read_lock() before exiting the loop in try_release_subpage_extent_buffer() because there is a rcu_read_unlock() call past the loop. This has been detected by the Clang thread-safety analyzer. | |
| CVE-2026-43118 | Med | 0.36 | 5.5 | 0.00 | May 6, 2026 | In the Linux kernel, the following vulnerability has been resolved: btrfs: fix zero size inode with non-zero size after log replay When logging that an inode exists, as part of logging a new name or logging new dir entries for a directory, we always set the generation of the logged inode item to 0. This is to signal during log replay (in overwrite_item()), that we should not set the i_size since we only logged that an inode exists, so the i_size of the inode in the subvolume tree must be preserved (as when we log new names or that an inode exists, we don't log extents). This works fine except when we have already logged an inode in full mode or it's the first time we are logging an inode created in a past transaction, that inode has a new i_size of 0 and then we log a new name for the inode (due to a new hardlink or a rename), in which case we log an i_size of 0 for the inode and a generation of 0, which causes the log replay code to not update the inode's i_size to 0 (in overwrite_item()). An example scenario: mkdir /mnt/dir xfs_io -f -c "pwrite 0 64K" /mnt/dir/foo sync xfs_io -c "truncate 0" -c "fsync" /mnt/dir/foo ln /mnt/dir/foo /mnt/dir/bar xfs_io -c "fsync" /mnt/dir <power fail> After log replay the file remains with a size of 64K. This is because when we first log the inode, when we fsync file foo, we log its current i_size of 0, and then when we create a hard link we log again the inode in exists mode (LOG_INODE_EXISTS) but we set a generation of 0 for the inode item we add to the log tree, so during log replay overwrite_item() sees that the generation is 0 and i_size is 0 so we skip updating the inode's i_size from 64K to 0. Fix this by making sure at fill_inode_item() we always log the real generation of the inode if it was logged in the current transaction with the i_size we logged before. Also if an inode created in a previous transaction is logged in exists mode only, make sure we log the i_size stored in the inode item located from the commit root, so that if we log multiple times that the inode exists we get the correct i_size. A test case for fstests will follow soon. | |
| CVE-2026-43046 | Med | 0.36 | 5.5 | 0.00 | May 1, 2026 | In the Linux kernel, the following vulnerability has been resolved: btrfs: reject root items with drop_progress and zero drop_level [BUG] When recovering relocation at mount time, merge_reloc_root() and btrfs_drop_snapshot() both use BUG_ON(level == 0) to guard against an impossible state: a non-zero drop_progress combined with a zero drop_level in a root_item, which can be triggered: ------------[ cut here ]------------ kernel BUG at fs/btrfs/relocation.c:1545! Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI CPU: 1 UID: 0 PID: 283 ... Tainted: 6.18.0+ #16 PREEMPT(voluntary) Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Hardware name: QEMU Ubuntu 24.04 PC v2, BIOS 1.16.3-debian-1.16.3-2 RIP: 0010:merge_reloc_root+0x1266/0x1650 fs/btrfs/relocation.c:1545 Code: ffff0000 00004589 d7e9acfa ffffe8a1 79bafebe 02000000 Call Trace: merge_reloc_roots+0x295/0x890 fs/btrfs/relocation.c:1861 btrfs_recover_relocation+0xd6e/0x11d0 fs/btrfs/relocation.c:4195 btrfs_start_pre_rw_mount+0xa4d/0x1810 fs/btrfs/disk-io.c:3130 open_ctree+0x5824/0x5fe0 fs/btrfs/disk-io.c:3640 btrfs_fill_super fs/btrfs/super.c:987 [inline] btrfs_get_tree_super fs/btrfs/super.c:1951 [inline] btrfs_get_tree_subvol fs/btrfs/super.c:2094 [inline] btrfs_get_tree+0x111c/0x2190 fs/btrfs/super.c:2128 vfs_get_tree+0x9a/0x370 fs/super.c:1758 fc_mount fs/namespace.c:1199 [inline] do_new_mount_fc fs/namespace.c:3642 [inline] do_new_mount fs/namespace.c:3718 [inline] path_mount+0x5b8/0x1ea0 fs/namespace.c:4028 do_mount fs/namespace.c:4041 [inline] __do_sys_mount fs/namespace.c:4229 [inline] __se_sys_mount fs/namespace.c:4206 [inline] __x64_sys_mount+0x282/0x320 fs/namespace.c:4206 ... RIP: 0033:0x7f969c9a8fde Code: 0f1f4000 48c7c2b0 fffffff7 d8648902 b8ffffff ffc3660f ---[ end trace 0000000000000000 ]--- The bug is reproducible on 7.0.0-rc2-next-20260310 with our dynamic metadata fuzzing tool that corrupts btrfs metadata at runtime. [CAUSE] A non-zero drop_progress.objectid means an interrupted btrfs_drop_snapshot() left a resume point on disk, and in that case drop_level must be greater than 0 because the checkpoint is only saved at internal node levels. Although this invariant is enforced when the kernel writes the root item, it is not validated when the root item is read back from disk. That allows on-disk corruption to provide an invalid state with drop_progress.objectid != 0 and drop_level == 0. When relocation recovery later processes such a root item, merge_reloc_root() reads drop_level and hits BUG_ON(level == 0). The same invalid metadata can also trigger the corresponding BUG_ON() in btrfs_drop_snapshot(). [FIX] Fix this by validating the root_item invariant in tree-checker when reading root items from disk: if drop_progress.objectid is non-zero, drop_level must also be non-zero. Reject such malformed metadata with -EUCLEAN before it reaches merge_reloc_root() or btrfs_drop_snapshot() and triggers the BUG_ON. After the fix, the same corruption is correctly rejected by tree-checker and the BUG_ON is no longer triggered. | |
| CVE-2026-31519 | Med | 0.36 | 5.5 | 0.00 | Apr 22, 2026 | In the Linux kernel, the following vulnerability has been resolved: btrfs: set BTRFS_ROOT_ORPHAN_CLEANUP during subvol create We have recently observed a number of subvolumes with broken dentries. ls-ing the parent dir looks like: drwxrwxrwt 1 root root 16 Jan 23 16:49 . drwxr-xr-x 1 root root 24 Jan 23 16:48 .. d????????? ? ? ? ? ? broken_subvol and similarly stat-ing the file fails. In this state, deleting the subvol fails with ENOENT, but attempting to create a new file or subvol over it errors out with EEXIST and even aborts the fs. Which leaves us a bit stuck. dmesg contains a single notable error message reading: "could not do orphan cleanup -2" 2 is ENOENT and the error comes from the failure handling path of btrfs_orphan_cleanup(), with the stack leading back up to btrfs_lookup(). btrfs_lookup btrfs_lookup_dentry btrfs_orphan_cleanup // prints that message and returns -ENOENT After some detailed inspection of the internal state, it became clear that: - there are no orphan items for the subvol - the subvol is otherwise healthy looking, it is not half-deleted or anything, there is no drop progress, etc. - the subvol was created a while ago and does the meaningful first btrfs_orphan_cleanup() call that sets BTRFS_ROOT_ORPHAN_CLEANUP much later. - after btrfs_orphan_cleanup() fails, btrfs_lookup_dentry() returns -ENOENT, which results in a negative dentry for the subvolume via d_splice_alias(NULL, dentry), leading to the observed behavior. The bug can be mitigated by dropping the dentry cache, at which point we can successfully delete the subvolume if we want. i.e., btrfs_lookup() btrfs_lookup_dentry() if (!sb_rdonly(inode->vfs_inode)->vfs_inode) btrfs_orphan_cleanup(sub_root) test_and_set_bit(BTRFS_ROOT_ORPHAN_CLEANUP) btrfs_search_slot() // finds orphan item for inode N ... prints "could not do orphan cleanup -2" if (inode == ERR_PTR(-ENOENT)) inode = NULL; return d_splice_alias(NULL, dentry) // NEGATIVE DENTRY for valid subvolume btrfs_orphan_cleanup() does test_and_set_bit(BTRFS_ROOT_ORPHAN_CLEANUP) on the root when it runs, so it cannot run more than once on a given root, so something else must run concurrently. However, the obvious routes to deleting an orphan when nlinks goes to 0 should not be able to run without first doing a lookup into the subvolume, which should run btrfs_orphan_cleanup() and set the bit. The final important observation is that create_subvol() calls d_instantiate_new() but does not set BTRFS_ROOT_ORPHAN_CLEANUP, so if the dentry cache gets dropped, the next lookup into the subvolume will make a real call into btrfs_orphan_cleanup() for the first time. This opens up the possibility of concurrently deleting the inode/orphan items but most typical evict() paths will be holding a reference on the parent dentry (child dentry holds parent->d_lockref.count via dget in d_alloc(), released in __dentry_kill()) and prevent the parent from being removed from the dentry cache. The one exception is delayed iputs. Ordered extent creation calls igrab() on the inode. If the file is unlinked and closed while those refs are held, iput() in __dentry_kill() decrements i_count but does not trigger eviction (i_count > 0). The child dentry is freed and the subvol dentry's d_lockref.count drops to 0, making it evictable while the inode is still alive. Since there are two races (the race between writeback and unlink and the race between lookup and delayed iputs), and there are too many moving parts, the following three diagrams show the complete picture. (Only the second and third are races) Phase 1: Create Subvol in dentry cache without BTRFS_ROOT_ORPHAN_CLEANUP set btrfs_mksubvol() lookup_one_len() __lookup_slow() d_alloc_parallel() __d_alloc() // d_lockref.count = 1 create_subvol(dentry) // doesn't touch the bit.. d_instantiate_new(dentry, inode) // dentry in cache with d_lockref.c ---truncated--- | |
| CVE-2026-43393 | 0.00 | — | 0.00 | May 8, 2026 | In the Linux kernel, the following vulnerability has been resolved: btrfs: fix chunk map leak in btrfs_map_block() after btrfs_chunk_map_num_copies() Fix a chunk map leak in btrfs_map_block(): if we return early with -EINVAL, we're not freeing the chunk map that we've just looked up. | ||
| CVE-2026-31434 | 0.00 | — | 0.00 | Apr 22, 2026 | In the Linux kernel, the following vulnerability has been resolved: btrfs: fix leak of kobject name for sub-group space_info When create_space_info_sub_group() allocates elements of space_info->sub_group[], kobject_init_and_add() is called for each element via btrfs_sysfs_add_space_info_type(). However, when check_removing_space_info() frees these elements, it does not call btrfs_sysfs_remove_space_info() on them. As a result, kobject_put() is not called and the associated kobj->name objects are leaked. This memory leak is reproduced by running the blktests test case zbd/009 on kernels built with CONFIG_DEBUG_KMEMLEAK. The kmemleak feature reports the following error: unreferenced object 0xffff888112877d40 (size 16): comm "mount", pid 1244, jiffies 4294996972 hex dump (first 16 bytes): 64 61 74 61 2d 72 65 6c 6f 63 00 c4 c6 a7 cb 7f data-reloc...... backtrace (crc 53ffde4d): __kmalloc_node_track_caller_noprof+0x619/0x870 kstrdup+0x42/0xc0 kobject_set_name_vargs+0x44/0x110 kobject_init_and_add+0xcf/0x150 btrfs_sysfs_add_space_info_type+0xfc/0x210 [btrfs] create_space_info_sub_group.constprop.0+0xfb/0x1b0 [btrfs] create_space_info+0x211/0x320 [btrfs] btrfs_init_space_info+0x15a/0x1b0 [btrfs] open_ctree+0x33c7/0x4a50 [btrfs] btrfs_get_tree.cold+0x9f/0x1ee [btrfs] vfs_get_tree+0x87/0x2f0 vfs_cmd_create+0xbd/0x280 __do_sys_fsconfig+0x3df/0x990 do_syscall_64+0x136/0x1540 entry_SYSCALL_64_after_hwframe+0x76/0x7e To avoid the leak, call btrfs_sysfs_remove_space_info() instead of kfree() for the elements. | ||
| CVE-2026-23181 | 0.00 | — | 0.00 | Feb 14, 2026 | In the Linux kernel, the following vulnerability has been resolved: btrfs: sync read disk super and set block size When the user performs a btrfs mount, the block device is not set correctly. The user sets the block size of the block device to 0x4000 by executing the BLKBSZSET command. Since the block size change also changes the mapping->flags value, this further affects the result of the mapping_min_folio_order() calculation. Let's analyze the following two scenarios: Scenario 1: Without executing the BLKBSZSET command, the block size is 0x1000, and mapping_min_folio_order() returns 0; Scenario 2: After executing the BLKBSZSET command, the block size is 0x4000, and mapping_min_folio_order() returns 2. do_read_cache_folio() allocates a folio before the BLKBSZSET command is executed. This results in the allocated folio having an order value of 0. Later, after BLKBSZSET is executed, the block size increases to 0x4000, and the mapping_min_folio_order() calculation result becomes 2. This leads to two undesirable consequences: 1. filemap_add_folio() triggers a VM_BUG_ON_FOLIO(folio_order(folio) < mapping_min_folio_order(mapping)) assertion. 2. The syzbot report [1] shows a null pointer dereference in create_empty_buffers() due to a buffer head allocation failure. Synchronization should be established based on the inode between the BLKBSZSET command and read cache page to prevent inconsistencies in block size or mapping flags before and after folio allocation. [1] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] RIP: 0010:create_empty_buffers+0x4d/0x480 fs/buffer.c:1694 Call Trace: folio_create_buffers+0x109/0x150 fs/buffer.c:1802 block_read_full_folio+0x14c/0x850 fs/buffer.c:2403 filemap_read_folio+0xc8/0x2a0 mm/filemap.c:2496 do_read_cache_folio+0x266/0x5c0 mm/filemap.c:4096 do_read_cache_page mm/filemap.c:4162 [inline] read_cache_page_gfp+0x29/0x120 mm/filemap.c:4195 btrfs_read_disk_super+0x192/0x500 fs/btrfs/volumes.c:1367 | ||
| CVE-2025-71194 | 0.00 | — | 0.00 | Feb 4, 2026 | In the Linux kernel, the following vulnerability has been resolved: btrfs: fix deadlock in wait_current_trans() due to ignored transaction type When wait_current_trans() is called during start_transaction(), it currently waits for a blocked transaction without considering whether the given transaction type actually needs to wait for that particular transaction state. The btrfs_blocked_trans_types[] array already defines which transaction types should wait for which transaction states, but this check was missing in wait_current_trans(). This can lead to a deadlock scenario involving two transactions and pending ordered extents: 1. Transaction A is in TRANS_STATE_COMMIT_DOING state 2. A worker processing an ordered extent calls start_transaction() with TRANS_JOIN 3. join_transaction() returns -EBUSY because Transaction A is in TRANS_STATE_COMMIT_DOING 4. Transaction A moves to TRANS_STATE_UNBLOCKED and completes 5. A new Transaction B is created (TRANS_STATE_RUNNING) 6. The ordered extent from step 2 is added to Transaction B's pending ordered extents 7. Transaction B immediately starts commit by another task and enters TRANS_STATE_COMMIT_START 8. The worker finally reaches wait_current_trans(), sees Transaction B in TRANS_STATE_COMMIT_START (a blocked state), and waits unconditionally 9. However, TRANS_JOIN should NOT wait for TRANS_STATE_COMMIT_START according to btrfs_blocked_trans_types[] 10. Transaction B is waiting for pending ordered extents to complete 11. Deadlock: Transaction B waits for ordered extent, ordered extent waits for Transaction B This can be illustrated by the following call stacks: CPU0 CPU1 btrfs_finish_ordered_io() start_transaction(TRANS_JOIN) join_transaction() # -EBUSY (Transaction A is # TRANS_STATE_COMMIT_DOING) # Transaction A completes # Transaction B created # ordered extent added to # Transaction B's pending list btrfs_commit_transaction() # Transaction B enters # TRANS_STATE_COMMIT_START # waiting for pending ordered # extents wait_current_trans() # waits for Transaction B # (should not wait!) Task bstore_kv_sync in btrfs_commit_transaction waiting for ordered extents: __schedule+0x2e7/0x8a0 schedule+0x64/0xe0 btrfs_commit_transaction+0xbf7/0xda0 [btrfs] btrfs_sync_file+0x342/0x4d0 [btrfs] __x64_sys_fdatasync+0x4b/0x80 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Task kworker in wait_current_trans waiting for transaction commit: Workqueue: btrfs-syno_nocow btrfs_work_helper [btrfs] __schedule+0x2e7/0x8a0 schedule+0x64/0xe0 wait_current_trans+0xb0/0x110 [btrfs] start_transaction+0x346/0x5b0 [btrfs] btrfs_finish_ordered_io.isra.0+0x49b/0x9c0 [btrfs] btrfs_work_helper+0xe8/0x350 [btrfs] process_one_work+0x1d3/0x3c0 worker_thread+0x4d/0x3e0 kthread+0x12d/0x150 ret_from_fork+0x1f/0x30 Fix this by passing the transaction type to wait_current_trans() and checking btrfs_blocked_trans_types[cur_trans->state] against the given type before deciding to wait. This ensures that transaction types which are allowed to join during certain blocked states will not unnecessarily wait and cause deadlocks. | ||
| CVE-2026-23043 | 0.00 | — | 0.00 | Feb 4, 2026 | In the Linux kernel, the following vulnerability has been resolved: btrfs: fix NULL pointer dereference in do_abort_log_replay() Coverity reported a NULL pointer dereference issue (CID 1666756) in do_abort_log_replay(). When btrfs_alloc_path() fails in replay_one_buffer(), wc->subvol_path is NULL, but btrfs_abort_log_replay() calls do_abort_log_replay() which unconditionally dereferences wc->subvol_path when attempting to print debug information. Fix this by adding a NULL check before dereferencing wc->subvol_path in do_abort_log_replay(). | ||
| CVE-2023-54297 | 0.00 | — | 0.00 | Dec 30, 2025 | In the Linux kernel, the following vulnerability has been resolved: btrfs: zoned: fix memory leak after finding block group with super blocks At exclude_super_stripes(), if we happen to find a block group that has super blocks mapped to it and we are on a zoned filesystem, we error out as this is not supposed to happen, indicating either a bug or maybe some memory corruption for example. However we are exiting the function without freeing the memory allocated for the logical address of the super blocks. Fix this by freeing the logical address. | ||
| CVE-2023-54224 | 0.00 | — | 0.00 | Dec 30, 2025 | In the Linux kernel, the following vulnerability has been resolved: btrfs: fix lockdep splat and potential deadlock after failure running delayed items When running delayed items we are holding a delayed node's mutex and then we will attempt to modify a subvolume btree to insert/update/delete the delayed items. However if have an error during the insertions for example, btrfs_insert_delayed_items() may return with a path that has locked extent buffers (a leaf at the very least), and then we attempt to release the delayed node at __btrfs_run_delayed_items(), which requires taking the delayed node's mutex, causing an ABBA type of deadlock. This was reported by syzbot and the lockdep splat is the following: WARNING: possible circular locking dependency detected 6.5.0-rc7-syzkaller-00024-g93f5de5f648d #0 Not tainted ------------------------------------------------------ syz-executor.2/13257 is trying to acquire lock: ffff88801835c0c0 (&delayed_node->mutex){+.+.}-{3:3}, at: __btrfs_release_delayed_node+0x9a/0xaa0 fs/btrfs/delayed-inode.c:256 but task is already holding lock: ffff88802a5ab8e8 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_lock+0x3c/0x2a0 fs/btrfs/locking.c:198 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (btrfs-tree-00){++++}-{3:3}: __lock_release kernel/locking/lockdep.c:5475 [inline] lock_release+0x36f/0x9d0 kernel/locking/lockdep.c:5781 up_write+0x79/0x580 kernel/locking/rwsem.c:1625 btrfs_tree_unlock_rw fs/btrfs/locking.h:189 [inline] btrfs_unlock_up_safe+0x179/0x3b0 fs/btrfs/locking.c:239 search_leaf fs/btrfs/ctree.c:1986 [inline] btrfs_search_slot+0x2511/0x2f80 fs/btrfs/ctree.c:2230 btrfs_insert_empty_items+0x9c/0x180 fs/btrfs/ctree.c:4376 btrfs_insert_delayed_item fs/btrfs/delayed-inode.c:746 [inline] btrfs_insert_delayed_items fs/btrfs/delayed-inode.c:824 [inline] __btrfs_commit_inode_delayed_items+0xd24/0x2410 fs/btrfs/delayed-inode.c:1111 __btrfs_run_delayed_items+0x1db/0x430 fs/btrfs/delayed-inode.c:1153 flush_space+0x269/0xe70 fs/btrfs/space-info.c:723 btrfs_async_reclaim_metadata_space+0x106/0x350 fs/btrfs/space-info.c:1078 process_one_work+0x92c/0x12c0 kernel/workqueue.c:2600 worker_thread+0xa63/0x1210 kernel/workqueue.c:2751 kthread+0x2b8/0x350 kernel/kthread.c:389 ret_from_fork+0x2e/0x60 arch/x86/kernel/process.c:145 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304 -> #0 (&delayed_node->mutex){+.+.}-{3:3}: check_prev_add kernel/locking/lockdep.c:3142 [inline] check_prevs_add kernel/locking/lockdep.c:3261 [inline] validate_chain kernel/locking/lockdep.c:3876 [inline] __lock_acquire+0x39ff/0x7f70 kernel/locking/lockdep.c:5144 lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5761 __mutex_lock_common+0x1d8/0x2530 kernel/locking/mutex.c:603 __mutex_lock kernel/locking/mutex.c:747 [inline] mutex_lock_nested+0x1b/0x20 kernel/locking/mutex.c:799 __btrfs_release_delayed_node+0x9a/0xaa0 fs/btrfs/delayed-inode.c:256 btrfs_release_delayed_node fs/btrfs/delayed-inode.c:281 [inline] __btrfs_run_delayed_items+0x2b5/0x430 fs/btrfs/delayed-inode.c:1156 btrfs_commit_transaction+0x859/0x2ff0 fs/btrfs/transaction.c:2276 btrfs_sync_file+0xf56/0x1330 fs/btrfs/file.c:1988 vfs_fsync_range fs/sync.c:188 [inline] vfs_fsync fs/sync.c:202 [inline] do_fsync fs/sync.c:212 [inline] __do_sys_fsync fs/sync.c:220 [inline] __se_sys_fsync fs/sync.c:218 [inline] __x64_sys_fsync+0x196/0x1e0 fs/sync.c:218 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd other info that ---truncated--- | ||
| CVE-2023-54158 | 0.00 | — | 0.00 | Dec 24, 2025 | In the Linux kernel, the following vulnerability has been resolved: btrfs: don't free qgroup space unless specified Boris noticed in his simple quotas testing that he was getting a leak with Sweet Tea's change to subvol create that stopped doing a transaction commit. This was just a side effect of that change. In the delayed inode code we have an optimization that will free extra reservations if we think we can pack a dir item into an already modified leaf. Previously this wouldn't be triggered in the subvolume create case because we'd commit the transaction, it was still possible but much harder to trigger. It could actually be triggered if we did a mkdir && subvol create with qgroups enabled. This occurs because in btrfs_insert_delayed_dir_index(), which gets called when we're adding the dir item, we do the following: btrfs_block_rsv_release(fs_info, trans->block_rsv, bytes, NULL); if we're able to skip reserving space. The problem here is that trans->block_rsv points at the temporary block rsv for the subvolume create, which has qgroup reservations in the block rsv. This is a problem because btrfs_block_rsv_release() will do the following: if (block_rsv->qgroup_rsv_reserved >= block_rsv->qgroup_rsv_size) { qgroup_to_release = block_rsv->qgroup_rsv_reserved - block_rsv->qgroup_rsv_size; block_rsv->qgroup_rsv_reserved = block_rsv->qgroup_rsv_size; } The temporary block rsv just has ->qgroup_rsv_reserved set, ->qgroup_rsv_size == 0. The optimization in btrfs_insert_delayed_dir_index() sets ->qgroup_rsv_reserved = 0. Then later on when we call btrfs_subvolume_release_metadata() which has btrfs_block_rsv_release(fs_info, rsv, (u64)-1, &qgroup_to_release); btrfs_qgroup_convert_reserved_meta(root, qgroup_to_release); qgroup_to_release is set to 0, and we do not convert the reserved metadata space. The problem here is that the block rsv code has been unconditionally messing with ->qgroup_rsv_reserved, because the main place this is used is delalloc, and any time we call btrfs_block_rsv_release() we do it with qgroup_to_release set, and thus do the proper accounting. The subvolume code is the only other code that uses the qgroup reservation stuff, but it's intermingled with the above optimization, and thus was getting its reservation freed out from underneath it and thus leaking the reserved space. The solution is to simply not mess with the qgroup reservations if we don't have qgroup_to_release set. This works with the existing code as anything that messes with the delalloc reservations always have qgroup_to_release set. This fixes the leak that Boris was observing. | ||
| CVE-2023-54121 | 0.00 | — | 0.00 | Dec 24, 2025 | In the Linux kernel, the following vulnerability has been resolved: btrfs: fix incorrect splitting in btrfs_drop_extent_map_range In production we were seeing a variety of WARN_ON()'s in the extent_map code, specifically in btrfs_drop_extent_map_range() when we have to call add_extent_mapping() for our second split. Consider the following extent map layout PINNED [0 16K) [32K, 48K) and then we call btrfs_drop_extent_map_range for [0, 36K), with skip_pinned == true. The initial loop will have start = 0 end = 36K len = 36K we will find the [0, 16k) extent, but since we are pinned we will skip it, which has this code start = em_end; if (end != (u64)-1) len = start + len - em_end; em_end here is 16K, so now the values are start = 16K len = 16K + 36K - 16K = 36K len should instead be 20K. This is a problem when we find the next extent at [32K, 48K), we need to split this extent to leave [36K, 48k), however the code for the split looks like this split->start = start + len; split->len = em_end - (start + len); In this case we have em_end = 48K split->start = 16K + 36K // this should be 16K + 20K split->len = 48K - (16K + 36K) // this overflows as 16K + 36K is 52K and now we have an invalid extent_map in the tree that potentially overlaps other entries in the extent map. Even in the non-overlapping case we will have split->start set improperly, which will cause problems with any block related calculations. We don't actually need len in this loop, we can simply use end as our end point, and only adjust start up when we find a pinned extent we need to skip. Adjust the logic to do this, which keeps us from inserting an invalid extent map. We only skip_pinned in the relocation case, so this is relatively rare, except in the case where you are running relocation a lot, which can happen with auto relocation on. | ||
| CVE-2023-54080 | 0.00 | — | 0.00 | Dec 24, 2025 | In the Linux kernel, the following vulnerability has been resolved: btrfs: zoned: skip splitting and logical rewriting on pre-alloc write When doing a relocation, there is a chance that at the time of btrfs_reloc_clone_csums(), there is no checksum for the corresponding region. In this case, btrfs_finish_ordered_zoned()'s sum points to an invalid item and so ordered_extent's logical is set to some invalid value. Then, btrfs_lookup_block_group() in btrfs_zone_finish_endio() failed to find a block group and will hit an assert or a null pointer dereference as following. This can be reprodcued by running btrfs/028 several times (e.g, 4 to 16 times) with a null_blk setup. The device's zone size and capacity is set to 32 MB and the storage size is set to 5 GB on my setup. KASAN: null-ptr-deref in range [0x0000000000000088-0x000000000000008f] CPU: 6 PID: 3105720 Comm: kworker/u16:13 Tainted: G W 6.5.0-rc6-kts+ #1 Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0 12/17/2015 Workqueue: btrfs-endio-write btrfs_work_helper [btrfs] RIP: 0010:btrfs_zone_finish_endio.part.0+0x34/0x160 [btrfs] Code: 41 54 49 89 fc 55 48 89 f5 53 e8 57 7d fc ff 48 8d b8 88 00 00 00 48 89 c3 48 b8 00 00 00 00 00 > 3c 02 00 0f 85 02 01 00 00 f6 83 88 00 00 00 01 0f 84 a8 00 00 RSP: 0018:ffff88833cf87b08 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000011 RSI: 0000000000000004 RDI: 0000000000000088 RBP: 0000000000000002 R08: 0000000000000001 R09: ffffed102877b827 R10: ffff888143bdc13b R11: ffff888125b1cbc0 R12: ffff888143bdc000 R13: 0000000000007000 R14: ffff888125b1cba8 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88881e500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f3ed85223d5 CR3: 00000001519b4005 CR4: 00000000001706e0 Call Trace: <TASK> ? die_addr+0x3c/0xa0 ? exc_general_protection+0x148/0x220 ? asm_exc_general_protection+0x22/0x30 ? btrfs_zone_finish_endio.part.0+0x34/0x160 [btrfs] ? btrfs_zone_finish_endio.part.0+0x19/0x160 [btrfs] btrfs_finish_one_ordered+0x7b8/0x1de0 [btrfs] ? rcu_is_watching+0x11/0xb0 ? lock_release+0x47a/0x620 ? btrfs_finish_ordered_zoned+0x59b/0x800 [btrfs] ? __pfx_btrfs_finish_one_ordered+0x10/0x10 [btrfs] ? btrfs_finish_ordered_zoned+0x358/0x800 [btrfs] ? __smp_call_single_queue+0x124/0x350 ? rcu_is_watching+0x11/0xb0 btrfs_work_helper+0x19f/0xc60 [btrfs] ? __pfx_try_to_wake_up+0x10/0x10 ? _raw_spin_unlock_irq+0x24/0x50 ? rcu_is_watching+0x11/0xb0 process_one_work+0x8c1/0x1430 ? __pfx_lock_acquire+0x10/0x10 ? __pfx_process_one_work+0x10/0x10 ? __pfx_do_raw_spin_lock+0x10/0x10 ? _raw_spin_lock_irq+0x52/0x60 worker_thread+0x100/0x12c0 ? __kthread_parkme+0xc1/0x1f0 ? __pfx_worker_thread+0x10/0x10 kthread+0x2ea/0x3c0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x30/0x70 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> On the zoned mode, writing to pre-allocated region means data relocation write. Such write always uses WRITE command so there is no need of splitting and rewriting logical address. Thus, we can just skip the function for the case. | ||
| CVE-2023-54067 | 0.00 | — | 0.00 | Dec 24, 2025 | In the Linux kernel, the following vulnerability has been resolved: btrfs: fix race when deleting free space root from the dirty cow roots list When deleting the free space tree we are deleting the free space root from the list fs_info->dirty_cowonly_roots without taking the lock that protects it, which is struct btrfs_fs_info::trans_lock. This unsynchronized list manipulation may cause chaos if there's another concurrent manipulation of this list, such as when adding a root to it with ctree.c:add_root_to_dirty_list(). This can result in all sorts of weird failures caused by a race, such as the following crash: [337571.278245] general protection fault, probably for non-canonical address 0xdead000000000108: 0000 [#1] PREEMPT SMP PTI [337571.278933] CPU: 1 PID: 115447 Comm: btrfs Tainted: G W 6.4.0-rc6-btrfs-next-134+ #1 [337571.279153] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [337571.279572] RIP: 0010:commit_cowonly_roots+0x11f/0x250 [btrfs] [337571.279928] Code: 85 38 06 00 (...) [337571.280363] RSP: 0018:ffff9f63446efba0 EFLAGS: 00010206 [337571.280582] RAX: ffff942d98ec2638 RBX: ffff9430b82b4c30 RCX: 0000000449e1c000 [337571.280798] RDX: dead000000000100 RSI: ffff9430021e4900 RDI: 0000000000036070 [337571.281015] RBP: ffff942d98ec2000 R08: ffff942d98ec2000 R09: 000000000000015b [337571.281254] R10: 0000000000000009 R11: 0000000000000001 R12: ffff942fe8fbf600 [337571.281476] R13: ffff942dabe23040 R14: ffff942dabe20800 R15: ffff942d92cf3b48 [337571.281723] FS: 00007f478adb7340(0000) GS:ffff94349fa40000(0000) knlGS:0000000000000000 [337571.281950] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [337571.282184] CR2: 00007f478ab9a3d5 CR3: 000000001e02c001 CR4: 0000000000370ee0 [337571.282416] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [337571.282647] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [337571.282874] Call Trace: [337571.283101] <TASK> [337571.283327] ? __die_body+0x1b/0x60 [337571.283570] ? die_addr+0x39/0x60 [337571.283796] ? exc_general_protection+0x22e/0x430 [337571.284022] ? asm_exc_general_protection+0x22/0x30 [337571.284251] ? commit_cowonly_roots+0x11f/0x250 [btrfs] [337571.284531] btrfs_commit_transaction+0x42e/0xf90 [btrfs] [337571.284803] ? _raw_spin_unlock+0x15/0x30 [337571.285031] ? release_extent_buffer+0x103/0x130 [btrfs] [337571.285305] reset_balance_state+0x152/0x1b0 [btrfs] [337571.285578] btrfs_balance+0xa50/0x11e0 [btrfs] [337571.285864] ? __kmem_cache_alloc_node+0x14a/0x410 [337571.286086] btrfs_ioctl+0x249a/0x3320 [btrfs] [337571.286358] ? mod_objcg_state+0xd2/0x360 [337571.286577] ? refill_obj_stock+0xb0/0x160 [337571.286798] ? seq_release+0x25/0x30 [337571.287016] ? __rseq_handle_notify_resume+0x3ba/0x4b0 [337571.287235] ? percpu_counter_add_batch+0x2e/0xa0 [337571.287455] ? __x64_sys_ioctl+0x88/0xc0 [337571.287675] __x64_sys_ioctl+0x88/0xc0 [337571.287901] do_syscall_64+0x38/0x90 [337571.288126] entry_SYSCALL_64_after_hwframe+0x72/0xdc [337571.288352] RIP: 0033:0x7f478aaffe9b So fix this by locking struct btrfs_fs_info::trans_lock before deleting the free space root from that list. | ||
| CVE-2025-68359 | 0.00 | — | 0.00 | Dec 24, 2025 | In the Linux kernel, the following vulnerability has been resolved: btrfs: fix double free of qgroup record after failure to add delayed ref head In the previous code it was possible to incur into a double kfree() scenario when calling add_delayed_ref_head(). This could happen if the record was reported to already exist in the btrfs_qgroup_trace_extent_nolock() call, but then there was an error later on add_delayed_ref_head(). In this case, since add_delayed_ref_head() returned an error, the caller went to free the record. Since add_delayed_ref_head() couldn't set this kfree'd pointer to NULL, then kfree() would have acted on a non-NULL 'record' object which was pointing to memory already freed by the callee. The problem comes from the fact that the responsibility to kfree the object is on both the caller and the callee at the same time. Hence, the fix for this is to shift the ownership of the 'qrecord' object out of the add_delayed_ref_head(). That is, we will never attempt to kfree() the given object inside of this function, and will expect the caller to act on the 'qrecord' object on its own. The only exception where the 'qrecord' object cannot be kfree'd is if it was inserted into the tracing logic, for which we already have the 'qrecord_inserted_ret' boolean to account for this. Hence, the caller has to kfree the object only if add_delayed_ref_head() reports not to have inserted it on the tracing logic. As a side-effect of the above, we must guarantee that 'qrecord_inserted_ret' is properly initialized at the start of the function, not at the end, and then set when an actual insert happens. This way we avoid 'qrecord_inserted_ret' having an invalid value on an early exit. The documentation from the add_delayed_ref_head() has also been updated to reflect on the exact ownership of the 'qrecord' object. | ||
| CVE-2023-54032 | 0.00 | — | 0.00 | Dec 24, 2025 | In the Linux kernel, the following vulnerability has been resolved: btrfs: fix race when deleting quota root from the dirty cow roots list When disabling quotas we are deleting the quota root from the list fs_info->dirty_cowonly_roots without taking the lock that protects it, which is struct btrfs_fs_info::trans_lock. This unsynchronized list manipulation may cause chaos if there's another concurrent manipulation of this list, such as when adding a root to it with ctree.c:add_root_to_dirty_list(). This can result in all sorts of weird failures caused by a race, such as the following crash: [337571.278245] general protection fault, probably for non-canonical address 0xdead000000000108: 0000 [#1] PREEMPT SMP PTI [337571.278933] CPU: 1 PID: 115447 Comm: btrfs Tainted: G W 6.4.0-rc6-btrfs-next-134+ #1 [337571.279153] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [337571.279572] RIP: 0010:commit_cowonly_roots+0x11f/0x250 [btrfs] [337571.279928] Code: 85 38 06 00 (...) [337571.280363] RSP: 0018:ffff9f63446efba0 EFLAGS: 00010206 [337571.280582] RAX: ffff942d98ec2638 RBX: ffff9430b82b4c30 RCX: 0000000449e1c000 [337571.280798] RDX: dead000000000100 RSI: ffff9430021e4900 RDI: 0000000000036070 [337571.281015] RBP: ffff942d98ec2000 R08: ffff942d98ec2000 R09: 000000000000015b [337571.281254] R10: 0000000000000009 R11: 0000000000000001 R12: ffff942fe8fbf600 [337571.281476] R13: ffff942dabe23040 R14: ffff942dabe20800 R15: ffff942d92cf3b48 [337571.281723] FS: 00007f478adb7340(0000) GS:ffff94349fa40000(0000) knlGS:0000000000000000 [337571.281950] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [337571.282184] CR2: 00007f478ab9a3d5 CR3: 000000001e02c001 CR4: 0000000000370ee0 [337571.282416] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [337571.282647] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [337571.282874] Call Trace: [337571.283101] <TASK> [337571.283327] ? __die_body+0x1b/0x60 [337571.283570] ? die_addr+0x39/0x60 [337571.283796] ? exc_general_protection+0x22e/0x430 [337571.284022] ? asm_exc_general_protection+0x22/0x30 [337571.284251] ? commit_cowonly_roots+0x11f/0x250 [btrfs] [337571.284531] btrfs_commit_transaction+0x42e/0xf90 [btrfs] [337571.284803] ? _raw_spin_unlock+0x15/0x30 [337571.285031] ? release_extent_buffer+0x103/0x130 [btrfs] [337571.285305] reset_balance_state+0x152/0x1b0 [btrfs] [337571.285578] btrfs_balance+0xa50/0x11e0 [btrfs] [337571.285864] ? __kmem_cache_alloc_node+0x14a/0x410 [337571.286086] btrfs_ioctl+0x249a/0x3320 [btrfs] [337571.286358] ? mod_objcg_state+0xd2/0x360 [337571.286577] ? refill_obj_stock+0xb0/0x160 [337571.286798] ? seq_release+0x25/0x30 [337571.287016] ? __rseq_handle_notify_resume+0x3ba/0x4b0 [337571.287235] ? percpu_counter_add_batch+0x2e/0xa0 [337571.287455] ? __x64_sys_ioctl+0x88/0xc0 [337571.287675] __x64_sys_ioctl+0x88/0xc0 [337571.287901] do_syscall_64+0x38/0x90 [337571.288126] entry_SYSCALL_64_after_hwframe+0x72/0xdc [337571.288352] RIP: 0033:0x7f478aaffe9b So fix this by locking struct btrfs_fs_info::trans_lock before deleting the quota root from that list. |
- risk 0.59cvss 9.1epss 0.00
In the Linux kernel, the following vulnerability has been resolved: btrfs: tracepoints: get correct superblock from dentry in event btrfs_sync_file() If overlay is used on top of btrfs, dentry->d_sb translates to overlay's super block and fsid assignment will lead to a crash. Use file_inode(file)->i_sb to always get btrfs_sb.
- risk 0.36cvss 5.5epss 0.00
In the Linux kernel, the following vulnerability has been resolved: btrfs: fix transaction abort when snapshotting received subvolumes Currently a user can trigger a transaction abort by snapshotting a previously received snapshot a bunch of times until we reach a BTRFS_UUID_KEY_RECEIVED_SUBVOL item overflow (the maximum item size we can store in a leaf). This is very likely not common in practice, but if it happens, it turns the filesystem into RO mode. The snapshot, send and set_received_subvol and subvol_setflags (used by receive) don't require CAP_SYS_ADMIN, just inode_owner_or_capable(). A malicious user could use this to turn a filesystem into RO mode and disrupt a system. Reproducer script: $ cat test.sh #!/bin/bash DEV=/dev/sdi MNT=/mnt/sdi # Use smallest node size to make the test faster. mkfs.btrfs -f --nodesize 4K $DEV mount $DEV $MNT # Create a subvolume and set it to RO so that it can be used for send. btrfs subvolume create $MNT/sv touch $MNT/sv/foo btrfs property set $MNT/sv ro true # Send and receive the subvolume into snaps/sv. mkdir $MNT/snaps btrfs send $MNT/sv | btrfs receive $MNT/snaps # Now snapshot the received subvolume, which has a received_uuid, a # lot of times to trigger the leaf overflow. total=500 for ((i = 1; i <= $total; i++)); do echo -ne "\rCreating snapshot $i/$total" btrfs subvolume snapshot -r $MNT/snaps/sv $MNT/snaps/sv_$i > /dev/null done echo umount $MNT When running the test: $ ./test.sh (...) Create subvolume '/mnt/sdi/sv' At subvol /mnt/sdi/sv At subvol sv Creating snapshot 496/500ERROR: Could not create subvolume: Value too large for defined data type Creating snapshot 497/500ERROR: Could not create subvolume: Read-only file system Creating snapshot 498/500ERROR: Could not create subvolume: Read-only file system Creating snapshot 499/500ERROR: Could not create subvolume: Read-only file system Creating snapshot 500/500ERROR: Could not create subvolume: Read-only file system And in dmesg/syslog: $ dmesg (...) [251067.627338] BTRFS warning (device sdi): insert uuid item failed -75 (0x4628b21c4ac8d898, 0x2598bee2b1515c91) type 252! [251067.629212] ------------[ cut here ]------------ [251067.630033] BTRFS: Transaction aborted (error -75) [251067.630871] WARNING: fs/btrfs/transaction.c:1907 at create_pending_snapshot.cold+0x52/0x465 [btrfs], CPU#10: btrfs/615235 [251067.632851] Modules linked in: btrfs dm_zero (...) [251067.644071] CPU: 10 UID: 0 PID: 615235 Comm: btrfs Tainted: G W 6.19.0-rc8-btrfs-next-225+ #1 PREEMPT(full) [251067.646165] Tainted: [W]=WARN [251067.646733] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 [251067.648735] RIP: 0010:create_pending_snapshot.cold+0x55/0x465 [btrfs] [251067.649984] Code: f0 48 0f (...) [251067.653313] RSP: 0018:ffffce644908fae8 EFLAGS: 00010292 [251067.653987] RAX: 00000000ffffff01 RBX: ffff8e5639e63a80 RCX: 00000000ffffffd3 [251067.655042] RDX: ffff8e53faa76b00 RSI: 00000000ffffffb5 RDI: ffffffffc0919750 [251067.656077] RBP: ffffce644908fbd8 R08: 0000000000000000 R09: ffffce644908f820 [251067.657068] R10: ffff8e5adc1fffa8 R11: 0000000000000003 R12: ffff8e53c0431bd0 [251067.658050] R13: ffff8e5414593600 R14: ffff8e55efafd000 R15: 00000000ffffffb5 [251067.659019] FS: 00007f2a4944b3c0(0000) GS:ffff8e5b27dae000(0000) knlGS:0000000000000000 [251067.660115] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [251067.660943] CR2: 00007ffc5aa57898 CR3: 00000005813a2003 CR4: 0000000000370ef0 [251067.661972] Call Trace: [251067.662292] <TASK> [251067.662653] create_pending_snapshots+0x97/0xc0 [btrfs] [251067.663413] btrfs_commit_transaction+0x26e/0xc00 [btrfs] [251067.664257] ? btrfs_qgroup_convert_reserved_meta+0x35/0x390 [btrfs] [251067.665238] ? _raw_spin_unlock+0x15/0x30 [251067.665837] ? record_root_ ---truncated---
- risk 0.36cvss 5.5epss 0.00
In the Linux kernel, the following vulnerability has been resolved: btrfs: fix transaction abort on file creation due to name hash collision If we attempt to create several files with names that result in the same hash, we have to pack them in same dir item and that has a limit inherent to the leaf size. However if we reach that limit, we trigger a transaction abort and turns the filesystem into RO mode. This allows for a malicious user to disrupt a system, without the need to have administration privileges/capabilities. Reproducer: $ cat exploit-hash-collisions.sh #!/bin/bash DEV=/dev/sdi MNT=/mnt/sdi # Use smallest node size to make the test faster and require fewer file # names that result in hash collision. mkfs.btrfs -f --nodesize 4K $DEV mount $DEV $MNT # List of names that result in the same crc32c hash for btrfs. declare -a names=( 'foobar' '%a8tYkxfGMLWRGr55QSeQc4PBNH9PCLIvR6jZnkDtUUru1t@RouaUe_L:@xGkbO3nCwvLNYeK9vhE628gss:T$yZjZ5l-Nbd6CbC$M=hqE-ujhJICXyIxBvYrIU9-TDC' 'AQci3EUB%shMsg-N%frgU:02ByLs=IPJU0OpgiWit5nexSyxZDncY6WB:=zKZuk5Zy0DD$Ua78%MelgBuMqaHGyKsJUFf9s=UW80PcJmKctb46KveLSiUtNmqrMiL9-Y0I_l5Fnam04CGIg=8@U:Z' 'CvVqJpJzueKcuA$wqwePfyu7VxuWNN3ho$p0zi2H8QFYK$7YlEqOhhb%:hHgjhIjW5vnqWHKNP4' 'ET:vk@rFU4tsvMB0$C_p=xQHaYZjvoF%-BTc%wkFW8yaDAPcCYoR%x$FH5O:' 'HwTon%v7SGSP4FE08jBwwiu5aot2CFKXHTeEAa@38fUcNGOWvE@Mz6WBeDH_VooaZ6AgsXPkVGwy9l@@ZbNXabUU9csiWrrOp0MWUdfi$EZ3w9GkIqtz7I_eOsByOkBOO' 'Ij%2VlFGXSuPvxJGf5UWy6O@1svxGha%b@=%wjkq:CIgE6u7eJOjmQY5qTtxE2Rjbis9@us' 'KBkjG5%9R8K9sOG8UTnAYjxLNAvBmvV5vz3IiZaPmKuLYO03-6asI9lJ_j4@6Xo$KZicaLWJ3Pv8XEwVeUPMwbHYWwbx0pYvNlGMO9F:ZhHAwyctnGy%_eujl%WPd4U2BI7qooOSr85J-C2V$LfY' 'NcRfDfuUQ2=zP8K3CCF5dFcpfiOm6mwenShsAb_F%n6GAGC7fT2JFFn:c35X-3aYwoq7jNX5$ZJ6hI3wnZs$7KgGi7wjulffhHNUxAT0fRRLF39vJ@NvaEMxsMO' 'Oj42AQAEzRoTxa5OuSKIr=A_lwGMy132v4g3Pdq1GvUG9874YseIFQ6QU' 'Ono7avN5GjC:_6dBJ_' 'WHmN2gnmaN-9dVDy4aWo:yNGFzz8qsJyJhWEWcud7$QzN2D9R0efIWWEdu5kwWr73NZm4=@CoCDxrrZnRITr-kGtU_cfW2:%2_am' 'WiFnuTEhAG9FEC6zopQmj-A-$LDQ0T3WULz%ox3UZAPybSV6v1Z$b4L_XBi4M4BMBtJZpz93r9xafpB77r:lbwvitWRyo$odnAUYlYMmU4RvgnNd--e=I5hiEjGLETTtaScWlQp8mYsBovZwM2k' 'XKyH=OsOAF3p%uziGF_ZVr$ivrvhVgD@1u%5RtrV-gl_vqAwHkK@x7YwlxX3qT6WKKQ%PR56NrUBU2dOAOAdzr2=5nJuKPM-T-$ZpQfCL7phxQbUcb:BZOTPaFExc-qK-gDRCDW2' 'd3uUR6OFEwZr%ns1XH_@tbxA@cCPmbBRLdyh7p6V45H$P2$F%w0RqrD3M0g8aGvWpoTFMiBdOTJXjD:JF7=h9a_43xBywYAP%r$SPZi%zDg%ql-KvkdUCtF9OLaQlxmd' 'ePTpbnit%hyNm@WELlpKzNZYOzOTf8EQ$sEfkMy1VOfIUu3coyvIr13-Y7Sv5v-Ivax2Go_GQRFMU1b3362nktT9WOJf3SpT%z8sZmM3gvYQBDgmKI%%RM-G7hyrhgYflOw%z::ZRcv5O:lDCFm' 'evqk743Y@dvZAiG5J05L_ROFV@$2%rVWJ2%3nxV72-W7$e$-SK3tuSHA2mBt$qloC5jwNx33GmQUjD%akhBPu=VJ5g$xhlZiaFtTrjeeM5x7dt4cHpX0cZkmfImndYzGmvwQG:$euFYmXn$_2rA9mKZ' 'gkgUtnihWXsZQTEkrMAWIxir09k3t7jk_IK25t1:cy1XWN0GGqC%FrySdcmU7M8MuPO_ppkLw3=Dfr0UuBAL4%GFk2$Ma10V1jDRGJje%Xx9EV2ERaWKtjpwiZwh0gCSJsj5UL7CR8RtW5opCVFKGGy8Cky' 'hNgsG_8lNRik3PvphqPm0yEH3P%%fYG:kQLY=6O-61Wa6nrV_WVGR6TLB09vHOv%g4VQRP8Gzx7VXUY1qvZyS' 'isA7JVzN12xCxVPJZ_qoLm-pTBuhjjHMvV7o=F:EaClfYNyFGlsfw-Kf%uxdqW-kwk1sPl2vhbjyHU1A6$hz' 'kiJ_fgcdZFDiOptjgH5PN9-PSyLO4fbk_:u5_2tz35lV_iXiJ6cx7pwjTtKy-XGaQ5IefmpJ4N_ZqGsqCsKuqOOBgf9LkUdffHet@Wu' 'lvwtxyhE9:%Q3UxeHiViUyNzJsy:fm38pg_b6s25JvdhOAT=1s0$pG25x=LZ2rlHTszj=gN6M4zHZYr_qrB49i=pA--@WqWLIuX7o1S_SfS@2FSiUZN' 'rC24cw3UBDZ=5qJBUMs9e$=S4Y94ni%Z8639vnrGp=0Hv4z3dNFL0fBLmQ40=EYIY:Z=SLc@QLMSt2zsss2ZXrP7j4=' 'uwGl2s-fFrf@GqS=DQqq2I0LJSsOmM%xzTjS:lzXguE3wChdMoHYtLRKPvfaPOZF2fER@j53evbKa7R%A7r4%YEkD=kicJe@SFiGtXHbKe4gCgPAYbnVn' 'UG37U6KKua2bgc:IHzRs7BnB6FD:2Mt5Cc5NdlsW%$1tyvnfz7S27FvNkroXwAW:mBZLA1@qa9WnDbHCDmQmfPMC9z-Eq6QT0jhhPpqyymaD:R02ghwYo%yx7SAaaq-:x33LYpei$5g8DMl3C' 'y2vjek0FE1PDJC0qpfnN:x8k2wCFZ9xiUF2ege=JnP98R%wxjKkdfEiLWvQzmnW' '8-HCSgH5B%K7P8_jaVtQhBXpBk:pE-$P7ts58U0J@iR9YZntMPl7j$s62yAJO@_9eanFPS54b=UTw$94C-t=HLxT8n6o9P=QnIxq-f1=Ne2dvhe6WbjEQtc' 'YPPh:IFt2mtR6XWSmjHptXL_hbSYu8bMw-JP8@PNyaFkdNFsk$M=xfL6LDKCDM-mSyGA_2MBwZ8Dr4=R1D%7-mC ---truncated---
- risk 0.36cvss 5.5epss 0.00
In the Linux kernel, the following vulnerability has been resolved: btrfs: add missing RCU unlock in error path in try_release_subpage_extent_buffer() Call rcu_read_lock() before exiting the loop in try_release_subpage_extent_buffer() because there is a rcu_read_unlock() call past the loop. This has been detected by the Clang thread-safety analyzer.
- risk 0.36cvss 5.5epss 0.00
In the Linux kernel, the following vulnerability has been resolved: btrfs: fix zero size inode with non-zero size after log replay When logging that an inode exists, as part of logging a new name or logging new dir entries for a directory, we always set the generation of the logged inode item to 0. This is to signal during log replay (in overwrite_item()), that we should not set the i_size since we only logged that an inode exists, so the i_size of the inode in the subvolume tree must be preserved (as when we log new names or that an inode exists, we don't log extents). This works fine except when we have already logged an inode in full mode or it's the first time we are logging an inode created in a past transaction, that inode has a new i_size of 0 and then we log a new name for the inode (due to a new hardlink or a rename), in which case we log an i_size of 0 for the inode and a generation of 0, which causes the log replay code to not update the inode's i_size to 0 (in overwrite_item()). An example scenario: mkdir /mnt/dir xfs_io -f -c "pwrite 0 64K" /mnt/dir/foo sync xfs_io -c "truncate 0" -c "fsync" /mnt/dir/foo ln /mnt/dir/foo /mnt/dir/bar xfs_io -c "fsync" /mnt/dir <power fail> After log replay the file remains with a size of 64K. This is because when we first log the inode, when we fsync file foo, we log its current i_size of 0, and then when we create a hard link we log again the inode in exists mode (LOG_INODE_EXISTS) but we set a generation of 0 for the inode item we add to the log tree, so during log replay overwrite_item() sees that the generation is 0 and i_size is 0 so we skip updating the inode's i_size from 64K to 0. Fix this by making sure at fill_inode_item() we always log the real generation of the inode if it was logged in the current transaction with the i_size we logged before. Also if an inode created in a previous transaction is logged in exists mode only, make sure we log the i_size stored in the inode item located from the commit root, so that if we log multiple times that the inode exists we get the correct i_size. A test case for fstests will follow soon.
- risk 0.36cvss 5.5epss 0.00
In the Linux kernel, the following vulnerability has been resolved: btrfs: reject root items with drop_progress and zero drop_level [BUG] When recovering relocation at mount time, merge_reloc_root() and btrfs_drop_snapshot() both use BUG_ON(level == 0) to guard against an impossible state: a non-zero drop_progress combined with a zero drop_level in a root_item, which can be triggered: ------------[ cut here ]------------ kernel BUG at fs/btrfs/relocation.c:1545! Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI CPU: 1 UID: 0 PID: 283 ... Tainted: 6.18.0+ #16 PREEMPT(voluntary) Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Hardware name: QEMU Ubuntu 24.04 PC v2, BIOS 1.16.3-debian-1.16.3-2 RIP: 0010:merge_reloc_root+0x1266/0x1650 fs/btrfs/relocation.c:1545 Code: ffff0000 00004589 d7e9acfa ffffe8a1 79bafebe 02000000 Call Trace: merge_reloc_roots+0x295/0x890 fs/btrfs/relocation.c:1861 btrfs_recover_relocation+0xd6e/0x11d0 fs/btrfs/relocation.c:4195 btrfs_start_pre_rw_mount+0xa4d/0x1810 fs/btrfs/disk-io.c:3130 open_ctree+0x5824/0x5fe0 fs/btrfs/disk-io.c:3640 btrfs_fill_super fs/btrfs/super.c:987 [inline] btrfs_get_tree_super fs/btrfs/super.c:1951 [inline] btrfs_get_tree_subvol fs/btrfs/super.c:2094 [inline] btrfs_get_tree+0x111c/0x2190 fs/btrfs/super.c:2128 vfs_get_tree+0x9a/0x370 fs/super.c:1758 fc_mount fs/namespace.c:1199 [inline] do_new_mount_fc fs/namespace.c:3642 [inline] do_new_mount fs/namespace.c:3718 [inline] path_mount+0x5b8/0x1ea0 fs/namespace.c:4028 do_mount fs/namespace.c:4041 [inline] __do_sys_mount fs/namespace.c:4229 [inline] __se_sys_mount fs/namespace.c:4206 [inline] __x64_sys_mount+0x282/0x320 fs/namespace.c:4206 ... RIP: 0033:0x7f969c9a8fde Code: 0f1f4000 48c7c2b0 fffffff7 d8648902 b8ffffff ffc3660f ---[ end trace 0000000000000000 ]--- The bug is reproducible on 7.0.0-rc2-next-20260310 with our dynamic metadata fuzzing tool that corrupts btrfs metadata at runtime. [CAUSE] A non-zero drop_progress.objectid means an interrupted btrfs_drop_snapshot() left a resume point on disk, and in that case drop_level must be greater than 0 because the checkpoint is only saved at internal node levels. Although this invariant is enforced when the kernel writes the root item, it is not validated when the root item is read back from disk. That allows on-disk corruption to provide an invalid state with drop_progress.objectid != 0 and drop_level == 0. When relocation recovery later processes such a root item, merge_reloc_root() reads drop_level and hits BUG_ON(level == 0). The same invalid metadata can also trigger the corresponding BUG_ON() in btrfs_drop_snapshot(). [FIX] Fix this by validating the root_item invariant in tree-checker when reading root items from disk: if drop_progress.objectid is non-zero, drop_level must also be non-zero. Reject such malformed metadata with -EUCLEAN before it reaches merge_reloc_root() or btrfs_drop_snapshot() and triggers the BUG_ON. After the fix, the same corruption is correctly rejected by tree-checker and the BUG_ON is no longer triggered.
- risk 0.36cvss 5.5epss 0.00
In the Linux kernel, the following vulnerability has been resolved: btrfs: set BTRFS_ROOT_ORPHAN_CLEANUP during subvol create We have recently observed a number of subvolumes with broken dentries. ls-ing the parent dir looks like: drwxrwxrwt 1 root root 16 Jan 23 16:49 . drwxr-xr-x 1 root root 24 Jan 23 16:48 .. d????????? ? ? ? ? ? broken_subvol and similarly stat-ing the file fails. In this state, deleting the subvol fails with ENOENT, but attempting to create a new file or subvol over it errors out with EEXIST and even aborts the fs. Which leaves us a bit stuck. dmesg contains a single notable error message reading: "could not do orphan cleanup -2" 2 is ENOENT and the error comes from the failure handling path of btrfs_orphan_cleanup(), with the stack leading back up to btrfs_lookup(). btrfs_lookup btrfs_lookup_dentry btrfs_orphan_cleanup // prints that message and returns -ENOENT After some detailed inspection of the internal state, it became clear that: - there are no orphan items for the subvol - the subvol is otherwise healthy looking, it is not half-deleted or anything, there is no drop progress, etc. - the subvol was created a while ago and does the meaningful first btrfs_orphan_cleanup() call that sets BTRFS_ROOT_ORPHAN_CLEANUP much later. - after btrfs_orphan_cleanup() fails, btrfs_lookup_dentry() returns -ENOENT, which results in a negative dentry for the subvolume via d_splice_alias(NULL, dentry), leading to the observed behavior. The bug can be mitigated by dropping the dentry cache, at which point we can successfully delete the subvolume if we want. i.e., btrfs_lookup() btrfs_lookup_dentry() if (!sb_rdonly(inode->vfs_inode)->vfs_inode) btrfs_orphan_cleanup(sub_root) test_and_set_bit(BTRFS_ROOT_ORPHAN_CLEANUP) btrfs_search_slot() // finds orphan item for inode N ... prints "could not do orphan cleanup -2" if (inode == ERR_PTR(-ENOENT)) inode = NULL; return d_splice_alias(NULL, dentry) // NEGATIVE DENTRY for valid subvolume btrfs_orphan_cleanup() does test_and_set_bit(BTRFS_ROOT_ORPHAN_CLEANUP) on the root when it runs, so it cannot run more than once on a given root, so something else must run concurrently. However, the obvious routes to deleting an orphan when nlinks goes to 0 should not be able to run without first doing a lookup into the subvolume, which should run btrfs_orphan_cleanup() and set the bit. The final important observation is that create_subvol() calls d_instantiate_new() but does not set BTRFS_ROOT_ORPHAN_CLEANUP, so if the dentry cache gets dropped, the next lookup into the subvolume will make a real call into btrfs_orphan_cleanup() for the first time. This opens up the possibility of concurrently deleting the inode/orphan items but most typical evict() paths will be holding a reference on the parent dentry (child dentry holds parent->d_lockref.count via dget in d_alloc(), released in __dentry_kill()) and prevent the parent from being removed from the dentry cache. The one exception is delayed iputs. Ordered extent creation calls igrab() on the inode. If the file is unlinked and closed while those refs are held, iput() in __dentry_kill() decrements i_count but does not trigger eviction (i_count > 0). The child dentry is freed and the subvol dentry's d_lockref.count drops to 0, making it evictable while the inode is still alive. Since there are two races (the race between writeback and unlink and the race between lookup and delayed iputs), and there are too many moving parts, the following three diagrams show the complete picture. (Only the second and third are races) Phase 1: Create Subvol in dentry cache without BTRFS_ROOT_ORPHAN_CLEANUP set btrfs_mksubvol() lookup_one_len() __lookup_slow() d_alloc_parallel() __d_alloc() // d_lockref.count = 1 create_subvol(dentry) // doesn't touch the bit.. d_instantiate_new(dentry, inode) // dentry in cache with d_lockref.c ---truncated---
- CVE-2026-43393May 8, 2026risk 0.00cvss —epss 0.00
In the Linux kernel, the following vulnerability has been resolved: btrfs: fix chunk map leak in btrfs_map_block() after btrfs_chunk_map_num_copies() Fix a chunk map leak in btrfs_map_block(): if we return early with -EINVAL, we're not freeing the chunk map that we've just looked up.
- CVE-2026-31434Apr 22, 2026risk 0.00cvss —epss 0.00
In the Linux kernel, the following vulnerability has been resolved: btrfs: fix leak of kobject name for sub-group space_info When create_space_info_sub_group() allocates elements of space_info->sub_group[], kobject_init_and_add() is called for each element via btrfs_sysfs_add_space_info_type(). However, when check_removing_space_info() frees these elements, it does not call btrfs_sysfs_remove_space_info() on them. As a result, kobject_put() is not called and the associated kobj->name objects are leaked. This memory leak is reproduced by running the blktests test case zbd/009 on kernels built with CONFIG_DEBUG_KMEMLEAK. The kmemleak feature reports the following error: unreferenced object 0xffff888112877d40 (size 16): comm "mount", pid 1244, jiffies 4294996972 hex dump (first 16 bytes): 64 61 74 61 2d 72 65 6c 6f 63 00 c4 c6 a7 cb 7f data-reloc...... backtrace (crc 53ffde4d): __kmalloc_node_track_caller_noprof+0x619/0x870 kstrdup+0x42/0xc0 kobject_set_name_vargs+0x44/0x110 kobject_init_and_add+0xcf/0x150 btrfs_sysfs_add_space_info_type+0xfc/0x210 [btrfs] create_space_info_sub_group.constprop.0+0xfb/0x1b0 [btrfs] create_space_info+0x211/0x320 [btrfs] btrfs_init_space_info+0x15a/0x1b0 [btrfs] open_ctree+0x33c7/0x4a50 [btrfs] btrfs_get_tree.cold+0x9f/0x1ee [btrfs] vfs_get_tree+0x87/0x2f0 vfs_cmd_create+0xbd/0x280 __do_sys_fsconfig+0x3df/0x990 do_syscall_64+0x136/0x1540 entry_SYSCALL_64_after_hwframe+0x76/0x7e To avoid the leak, call btrfs_sysfs_remove_space_info() instead of kfree() for the elements.
- CVE-2026-23181Feb 14, 2026risk 0.00cvss —epss 0.00
In the Linux kernel, the following vulnerability has been resolved: btrfs: sync read disk super and set block size When the user performs a btrfs mount, the block device is not set correctly. The user sets the block size of the block device to 0x4000 by executing the BLKBSZSET command. Since the block size change also changes the mapping->flags value, this further affects the result of the mapping_min_folio_order() calculation. Let's analyze the following two scenarios: Scenario 1: Without executing the BLKBSZSET command, the block size is 0x1000, and mapping_min_folio_order() returns 0; Scenario 2: After executing the BLKBSZSET command, the block size is 0x4000, and mapping_min_folio_order() returns 2. do_read_cache_folio() allocates a folio before the BLKBSZSET command is executed. This results in the allocated folio having an order value of 0. Later, after BLKBSZSET is executed, the block size increases to 0x4000, and the mapping_min_folio_order() calculation result becomes 2. This leads to two undesirable consequences: 1. filemap_add_folio() triggers a VM_BUG_ON_FOLIO(folio_order(folio) < mapping_min_folio_order(mapping)) assertion. 2. The syzbot report [1] shows a null pointer dereference in create_empty_buffers() due to a buffer head allocation failure. Synchronization should be established based on the inode between the BLKBSZSET command and read cache page to prevent inconsistencies in block size or mapping flags before and after folio allocation. [1] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] RIP: 0010:create_empty_buffers+0x4d/0x480 fs/buffer.c:1694 Call Trace: folio_create_buffers+0x109/0x150 fs/buffer.c:1802 block_read_full_folio+0x14c/0x850 fs/buffer.c:2403 filemap_read_folio+0xc8/0x2a0 mm/filemap.c:2496 do_read_cache_folio+0x266/0x5c0 mm/filemap.c:4096 do_read_cache_page mm/filemap.c:4162 [inline] read_cache_page_gfp+0x29/0x120 mm/filemap.c:4195 btrfs_read_disk_super+0x192/0x500 fs/btrfs/volumes.c:1367
- CVE-2025-71194Feb 4, 2026risk 0.00cvss —epss 0.00
In the Linux kernel, the following vulnerability has been resolved: btrfs: fix deadlock in wait_current_trans() due to ignored transaction type When wait_current_trans() is called during start_transaction(), it currently waits for a blocked transaction without considering whether the given transaction type actually needs to wait for that particular transaction state. The btrfs_blocked_trans_types[] array already defines which transaction types should wait for which transaction states, but this check was missing in wait_current_trans(). This can lead to a deadlock scenario involving two transactions and pending ordered extents: 1. Transaction A is in TRANS_STATE_COMMIT_DOING state 2. A worker processing an ordered extent calls start_transaction() with TRANS_JOIN 3. join_transaction() returns -EBUSY because Transaction A is in TRANS_STATE_COMMIT_DOING 4. Transaction A moves to TRANS_STATE_UNBLOCKED and completes 5. A new Transaction B is created (TRANS_STATE_RUNNING) 6. The ordered extent from step 2 is added to Transaction B's pending ordered extents 7. Transaction B immediately starts commit by another task and enters TRANS_STATE_COMMIT_START 8. The worker finally reaches wait_current_trans(), sees Transaction B in TRANS_STATE_COMMIT_START (a blocked state), and waits unconditionally 9. However, TRANS_JOIN should NOT wait for TRANS_STATE_COMMIT_START according to btrfs_blocked_trans_types[] 10. Transaction B is waiting for pending ordered extents to complete 11. Deadlock: Transaction B waits for ordered extent, ordered extent waits for Transaction B This can be illustrated by the following call stacks: CPU0 CPU1 btrfs_finish_ordered_io() start_transaction(TRANS_JOIN) join_transaction() # -EBUSY (Transaction A is # TRANS_STATE_COMMIT_DOING) # Transaction A completes # Transaction B created # ordered extent added to # Transaction B's pending list btrfs_commit_transaction() # Transaction B enters # TRANS_STATE_COMMIT_START # waiting for pending ordered # extents wait_current_trans() # waits for Transaction B # (should not wait!) Task bstore_kv_sync in btrfs_commit_transaction waiting for ordered extents: __schedule+0x2e7/0x8a0 schedule+0x64/0xe0 btrfs_commit_transaction+0xbf7/0xda0 [btrfs] btrfs_sync_file+0x342/0x4d0 [btrfs] __x64_sys_fdatasync+0x4b/0x80 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Task kworker in wait_current_trans waiting for transaction commit: Workqueue: btrfs-syno_nocow btrfs_work_helper [btrfs] __schedule+0x2e7/0x8a0 schedule+0x64/0xe0 wait_current_trans+0xb0/0x110 [btrfs] start_transaction+0x346/0x5b0 [btrfs] btrfs_finish_ordered_io.isra.0+0x49b/0x9c0 [btrfs] btrfs_work_helper+0xe8/0x350 [btrfs] process_one_work+0x1d3/0x3c0 worker_thread+0x4d/0x3e0 kthread+0x12d/0x150 ret_from_fork+0x1f/0x30 Fix this by passing the transaction type to wait_current_trans() and checking btrfs_blocked_trans_types[cur_trans->state] against the given type before deciding to wait. This ensures that transaction types which are allowed to join during certain blocked states will not unnecessarily wait and cause deadlocks.
- CVE-2026-23043Feb 4, 2026risk 0.00cvss —epss 0.00
In the Linux kernel, the following vulnerability has been resolved: btrfs: fix NULL pointer dereference in do_abort_log_replay() Coverity reported a NULL pointer dereference issue (CID 1666756) in do_abort_log_replay(). When btrfs_alloc_path() fails in replay_one_buffer(), wc->subvol_path is NULL, but btrfs_abort_log_replay() calls do_abort_log_replay() which unconditionally dereferences wc->subvol_path when attempting to print debug information. Fix this by adding a NULL check before dereferencing wc->subvol_path in do_abort_log_replay().
- CVE-2023-54297Dec 30, 2025risk 0.00cvss —epss 0.00
In the Linux kernel, the following vulnerability has been resolved: btrfs: zoned: fix memory leak after finding block group with super blocks At exclude_super_stripes(), if we happen to find a block group that has super blocks mapped to it and we are on a zoned filesystem, we error out as this is not supposed to happen, indicating either a bug or maybe some memory corruption for example. However we are exiting the function without freeing the memory allocated for the logical address of the super blocks. Fix this by freeing the logical address.
- CVE-2023-54224Dec 30, 2025risk 0.00cvss —epss 0.00
In the Linux kernel, the following vulnerability has been resolved: btrfs: fix lockdep splat and potential deadlock after failure running delayed items When running delayed items we are holding a delayed node's mutex and then we will attempt to modify a subvolume btree to insert/update/delete the delayed items. However if have an error during the insertions for example, btrfs_insert_delayed_items() may return with a path that has locked extent buffers (a leaf at the very least), and then we attempt to release the delayed node at __btrfs_run_delayed_items(), which requires taking the delayed node's mutex, causing an ABBA type of deadlock. This was reported by syzbot and the lockdep splat is the following: WARNING: possible circular locking dependency detected 6.5.0-rc7-syzkaller-00024-g93f5de5f648d #0 Not tainted ------------------------------------------------------ syz-executor.2/13257 is trying to acquire lock: ffff88801835c0c0 (&delayed_node->mutex){+.+.}-{3:3}, at: __btrfs_release_delayed_node+0x9a/0xaa0 fs/btrfs/delayed-inode.c:256 but task is already holding lock: ffff88802a5ab8e8 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_lock+0x3c/0x2a0 fs/btrfs/locking.c:198 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (btrfs-tree-00){++++}-{3:3}: __lock_release kernel/locking/lockdep.c:5475 [inline] lock_release+0x36f/0x9d0 kernel/locking/lockdep.c:5781 up_write+0x79/0x580 kernel/locking/rwsem.c:1625 btrfs_tree_unlock_rw fs/btrfs/locking.h:189 [inline] btrfs_unlock_up_safe+0x179/0x3b0 fs/btrfs/locking.c:239 search_leaf fs/btrfs/ctree.c:1986 [inline] btrfs_search_slot+0x2511/0x2f80 fs/btrfs/ctree.c:2230 btrfs_insert_empty_items+0x9c/0x180 fs/btrfs/ctree.c:4376 btrfs_insert_delayed_item fs/btrfs/delayed-inode.c:746 [inline] btrfs_insert_delayed_items fs/btrfs/delayed-inode.c:824 [inline] __btrfs_commit_inode_delayed_items+0xd24/0x2410 fs/btrfs/delayed-inode.c:1111 __btrfs_run_delayed_items+0x1db/0x430 fs/btrfs/delayed-inode.c:1153 flush_space+0x269/0xe70 fs/btrfs/space-info.c:723 btrfs_async_reclaim_metadata_space+0x106/0x350 fs/btrfs/space-info.c:1078 process_one_work+0x92c/0x12c0 kernel/workqueue.c:2600 worker_thread+0xa63/0x1210 kernel/workqueue.c:2751 kthread+0x2b8/0x350 kernel/kthread.c:389 ret_from_fork+0x2e/0x60 arch/x86/kernel/process.c:145 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304 -> #0 (&delayed_node->mutex){+.+.}-{3:3}: check_prev_add kernel/locking/lockdep.c:3142 [inline] check_prevs_add kernel/locking/lockdep.c:3261 [inline] validate_chain kernel/locking/lockdep.c:3876 [inline] __lock_acquire+0x39ff/0x7f70 kernel/locking/lockdep.c:5144 lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5761 __mutex_lock_common+0x1d8/0x2530 kernel/locking/mutex.c:603 __mutex_lock kernel/locking/mutex.c:747 [inline] mutex_lock_nested+0x1b/0x20 kernel/locking/mutex.c:799 __btrfs_release_delayed_node+0x9a/0xaa0 fs/btrfs/delayed-inode.c:256 btrfs_release_delayed_node fs/btrfs/delayed-inode.c:281 [inline] __btrfs_run_delayed_items+0x2b5/0x430 fs/btrfs/delayed-inode.c:1156 btrfs_commit_transaction+0x859/0x2ff0 fs/btrfs/transaction.c:2276 btrfs_sync_file+0xf56/0x1330 fs/btrfs/file.c:1988 vfs_fsync_range fs/sync.c:188 [inline] vfs_fsync fs/sync.c:202 [inline] do_fsync fs/sync.c:212 [inline] __do_sys_fsync fs/sync.c:220 [inline] __se_sys_fsync fs/sync.c:218 [inline] __x64_sys_fsync+0x196/0x1e0 fs/sync.c:218 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd other info that ---truncated---
- CVE-2023-54158Dec 24, 2025risk 0.00cvss —epss 0.00
In the Linux kernel, the following vulnerability has been resolved: btrfs: don't free qgroup space unless specified Boris noticed in his simple quotas testing that he was getting a leak with Sweet Tea's change to subvol create that stopped doing a transaction commit. This was just a side effect of that change. In the delayed inode code we have an optimization that will free extra reservations if we think we can pack a dir item into an already modified leaf. Previously this wouldn't be triggered in the subvolume create case because we'd commit the transaction, it was still possible but much harder to trigger. It could actually be triggered if we did a mkdir && subvol create with qgroups enabled. This occurs because in btrfs_insert_delayed_dir_index(), which gets called when we're adding the dir item, we do the following: btrfs_block_rsv_release(fs_info, trans->block_rsv, bytes, NULL); if we're able to skip reserving space. The problem here is that trans->block_rsv points at the temporary block rsv for the subvolume create, which has qgroup reservations in the block rsv. This is a problem because btrfs_block_rsv_release() will do the following: if (block_rsv->qgroup_rsv_reserved >= block_rsv->qgroup_rsv_size) { qgroup_to_release = block_rsv->qgroup_rsv_reserved - block_rsv->qgroup_rsv_size; block_rsv->qgroup_rsv_reserved = block_rsv->qgroup_rsv_size; } The temporary block rsv just has ->qgroup_rsv_reserved set, ->qgroup_rsv_size == 0. The optimization in btrfs_insert_delayed_dir_index() sets ->qgroup_rsv_reserved = 0. Then later on when we call btrfs_subvolume_release_metadata() which has btrfs_block_rsv_release(fs_info, rsv, (u64)-1, &qgroup_to_release); btrfs_qgroup_convert_reserved_meta(root, qgroup_to_release); qgroup_to_release is set to 0, and we do not convert the reserved metadata space. The problem here is that the block rsv code has been unconditionally messing with ->qgroup_rsv_reserved, because the main place this is used is delalloc, and any time we call btrfs_block_rsv_release() we do it with qgroup_to_release set, and thus do the proper accounting. The subvolume code is the only other code that uses the qgroup reservation stuff, but it's intermingled with the above optimization, and thus was getting its reservation freed out from underneath it and thus leaking the reserved space. The solution is to simply not mess with the qgroup reservations if we don't have qgroup_to_release set. This works with the existing code as anything that messes with the delalloc reservations always have qgroup_to_release set. This fixes the leak that Boris was observing.
- CVE-2023-54121Dec 24, 2025risk 0.00cvss —epss 0.00
In the Linux kernel, the following vulnerability has been resolved: btrfs: fix incorrect splitting in btrfs_drop_extent_map_range In production we were seeing a variety of WARN_ON()'s in the extent_map code, specifically in btrfs_drop_extent_map_range() when we have to call add_extent_mapping() for our second split. Consider the following extent map layout PINNED [0 16K) [32K, 48K) and then we call btrfs_drop_extent_map_range for [0, 36K), with skip_pinned == true. The initial loop will have start = 0 end = 36K len = 36K we will find the [0, 16k) extent, but since we are pinned we will skip it, which has this code start = em_end; if (end != (u64)-1) len = start + len - em_end; em_end here is 16K, so now the values are start = 16K len = 16K + 36K - 16K = 36K len should instead be 20K. This is a problem when we find the next extent at [32K, 48K), we need to split this extent to leave [36K, 48k), however the code for the split looks like this split->start = start + len; split->len = em_end - (start + len); In this case we have em_end = 48K split->start = 16K + 36K // this should be 16K + 20K split->len = 48K - (16K + 36K) // this overflows as 16K + 36K is 52K and now we have an invalid extent_map in the tree that potentially overlaps other entries in the extent map. Even in the non-overlapping case we will have split->start set improperly, which will cause problems with any block related calculations. We don't actually need len in this loop, we can simply use end as our end point, and only adjust start up when we find a pinned extent we need to skip. Adjust the logic to do this, which keeps us from inserting an invalid extent map. We only skip_pinned in the relocation case, so this is relatively rare, except in the case where you are running relocation a lot, which can happen with auto relocation on.
- CVE-2023-54080Dec 24, 2025risk 0.00cvss —epss 0.00
In the Linux kernel, the following vulnerability has been resolved: btrfs: zoned: skip splitting and logical rewriting on pre-alloc write When doing a relocation, there is a chance that at the time of btrfs_reloc_clone_csums(), there is no checksum for the corresponding region. In this case, btrfs_finish_ordered_zoned()'s sum points to an invalid item and so ordered_extent's logical is set to some invalid value. Then, btrfs_lookup_block_group() in btrfs_zone_finish_endio() failed to find a block group and will hit an assert or a null pointer dereference as following. This can be reprodcued by running btrfs/028 several times (e.g, 4 to 16 times) with a null_blk setup. The device's zone size and capacity is set to 32 MB and the storage size is set to 5 GB on my setup. KASAN: null-ptr-deref in range [0x0000000000000088-0x000000000000008f] CPU: 6 PID: 3105720 Comm: kworker/u16:13 Tainted: G W 6.5.0-rc6-kts+ #1 Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0 12/17/2015 Workqueue: btrfs-endio-write btrfs_work_helper [btrfs] RIP: 0010:btrfs_zone_finish_endio.part.0+0x34/0x160 [btrfs] Code: 41 54 49 89 fc 55 48 89 f5 53 e8 57 7d fc ff 48 8d b8 88 00 00 00 48 89 c3 48 b8 00 00 00 00 00 > 3c 02 00 0f 85 02 01 00 00 f6 83 88 00 00 00 01 0f 84 a8 00 00 RSP: 0018:ffff88833cf87b08 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000011 RSI: 0000000000000004 RDI: 0000000000000088 RBP: 0000000000000002 R08: 0000000000000001 R09: ffffed102877b827 R10: ffff888143bdc13b R11: ffff888125b1cbc0 R12: ffff888143bdc000 R13: 0000000000007000 R14: ffff888125b1cba8 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88881e500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f3ed85223d5 CR3: 00000001519b4005 CR4: 00000000001706e0 Call Trace: <TASK> ? die_addr+0x3c/0xa0 ? exc_general_protection+0x148/0x220 ? asm_exc_general_protection+0x22/0x30 ? btrfs_zone_finish_endio.part.0+0x34/0x160 [btrfs] ? btrfs_zone_finish_endio.part.0+0x19/0x160 [btrfs] btrfs_finish_one_ordered+0x7b8/0x1de0 [btrfs] ? rcu_is_watching+0x11/0xb0 ? lock_release+0x47a/0x620 ? btrfs_finish_ordered_zoned+0x59b/0x800 [btrfs] ? __pfx_btrfs_finish_one_ordered+0x10/0x10 [btrfs] ? btrfs_finish_ordered_zoned+0x358/0x800 [btrfs] ? __smp_call_single_queue+0x124/0x350 ? rcu_is_watching+0x11/0xb0 btrfs_work_helper+0x19f/0xc60 [btrfs] ? __pfx_try_to_wake_up+0x10/0x10 ? _raw_spin_unlock_irq+0x24/0x50 ? rcu_is_watching+0x11/0xb0 process_one_work+0x8c1/0x1430 ? __pfx_lock_acquire+0x10/0x10 ? __pfx_process_one_work+0x10/0x10 ? __pfx_do_raw_spin_lock+0x10/0x10 ? _raw_spin_lock_irq+0x52/0x60 worker_thread+0x100/0x12c0 ? __kthread_parkme+0xc1/0x1f0 ? __pfx_worker_thread+0x10/0x10 kthread+0x2ea/0x3c0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x30/0x70 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> On the zoned mode, writing to pre-allocated region means data relocation write. Such write always uses WRITE command so there is no need of splitting and rewriting logical address. Thus, we can just skip the function for the case.
- CVE-2023-54067Dec 24, 2025risk 0.00cvss —epss 0.00
In the Linux kernel, the following vulnerability has been resolved: btrfs: fix race when deleting free space root from the dirty cow roots list When deleting the free space tree we are deleting the free space root from the list fs_info->dirty_cowonly_roots without taking the lock that protects it, which is struct btrfs_fs_info::trans_lock. This unsynchronized list manipulation may cause chaos if there's another concurrent manipulation of this list, such as when adding a root to it with ctree.c:add_root_to_dirty_list(). This can result in all sorts of weird failures caused by a race, such as the following crash: [337571.278245] general protection fault, probably for non-canonical address 0xdead000000000108: 0000 [#1] PREEMPT SMP PTI [337571.278933] CPU: 1 PID: 115447 Comm: btrfs Tainted: G W 6.4.0-rc6-btrfs-next-134+ #1 [337571.279153] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [337571.279572] RIP: 0010:commit_cowonly_roots+0x11f/0x250 [btrfs] [337571.279928] Code: 85 38 06 00 (...) [337571.280363] RSP: 0018:ffff9f63446efba0 EFLAGS: 00010206 [337571.280582] RAX: ffff942d98ec2638 RBX: ffff9430b82b4c30 RCX: 0000000449e1c000 [337571.280798] RDX: dead000000000100 RSI: ffff9430021e4900 RDI: 0000000000036070 [337571.281015] RBP: ffff942d98ec2000 R08: ffff942d98ec2000 R09: 000000000000015b [337571.281254] R10: 0000000000000009 R11: 0000000000000001 R12: ffff942fe8fbf600 [337571.281476] R13: ffff942dabe23040 R14: ffff942dabe20800 R15: ffff942d92cf3b48 [337571.281723] FS: 00007f478adb7340(0000) GS:ffff94349fa40000(0000) knlGS:0000000000000000 [337571.281950] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [337571.282184] CR2: 00007f478ab9a3d5 CR3: 000000001e02c001 CR4: 0000000000370ee0 [337571.282416] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [337571.282647] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [337571.282874] Call Trace: [337571.283101] <TASK> [337571.283327] ? __die_body+0x1b/0x60 [337571.283570] ? die_addr+0x39/0x60 [337571.283796] ? exc_general_protection+0x22e/0x430 [337571.284022] ? asm_exc_general_protection+0x22/0x30 [337571.284251] ? commit_cowonly_roots+0x11f/0x250 [btrfs] [337571.284531] btrfs_commit_transaction+0x42e/0xf90 [btrfs] [337571.284803] ? _raw_spin_unlock+0x15/0x30 [337571.285031] ? release_extent_buffer+0x103/0x130 [btrfs] [337571.285305] reset_balance_state+0x152/0x1b0 [btrfs] [337571.285578] btrfs_balance+0xa50/0x11e0 [btrfs] [337571.285864] ? __kmem_cache_alloc_node+0x14a/0x410 [337571.286086] btrfs_ioctl+0x249a/0x3320 [btrfs] [337571.286358] ? mod_objcg_state+0xd2/0x360 [337571.286577] ? refill_obj_stock+0xb0/0x160 [337571.286798] ? seq_release+0x25/0x30 [337571.287016] ? __rseq_handle_notify_resume+0x3ba/0x4b0 [337571.287235] ? percpu_counter_add_batch+0x2e/0xa0 [337571.287455] ? __x64_sys_ioctl+0x88/0xc0 [337571.287675] __x64_sys_ioctl+0x88/0xc0 [337571.287901] do_syscall_64+0x38/0x90 [337571.288126] entry_SYSCALL_64_after_hwframe+0x72/0xdc [337571.288352] RIP: 0033:0x7f478aaffe9b So fix this by locking struct btrfs_fs_info::trans_lock before deleting the free space root from that list.
- CVE-2025-68359Dec 24, 2025risk 0.00cvss —epss 0.00
In the Linux kernel, the following vulnerability has been resolved: btrfs: fix double free of qgroup record after failure to add delayed ref head In the previous code it was possible to incur into a double kfree() scenario when calling add_delayed_ref_head(). This could happen if the record was reported to already exist in the btrfs_qgroup_trace_extent_nolock() call, but then there was an error later on add_delayed_ref_head(). In this case, since add_delayed_ref_head() returned an error, the caller went to free the record. Since add_delayed_ref_head() couldn't set this kfree'd pointer to NULL, then kfree() would have acted on a non-NULL 'record' object which was pointing to memory already freed by the callee. The problem comes from the fact that the responsibility to kfree the object is on both the caller and the callee at the same time. Hence, the fix for this is to shift the ownership of the 'qrecord' object out of the add_delayed_ref_head(). That is, we will never attempt to kfree() the given object inside of this function, and will expect the caller to act on the 'qrecord' object on its own. The only exception where the 'qrecord' object cannot be kfree'd is if it was inserted into the tracing logic, for which we already have the 'qrecord_inserted_ret' boolean to account for this. Hence, the caller has to kfree the object only if add_delayed_ref_head() reports not to have inserted it on the tracing logic. As a side-effect of the above, we must guarantee that 'qrecord_inserted_ret' is properly initialized at the start of the function, not at the end, and then set when an actual insert happens. This way we avoid 'qrecord_inserted_ret' having an invalid value on an early exit. The documentation from the add_delayed_ref_head() has also been updated to reflect on the exact ownership of the 'qrecord' object.
- CVE-2023-54032Dec 24, 2025risk 0.00cvss —epss 0.00
In the Linux kernel, the following vulnerability has been resolved: btrfs: fix race when deleting quota root from the dirty cow roots list When disabling quotas we are deleting the quota root from the list fs_info->dirty_cowonly_roots without taking the lock that protects it, which is struct btrfs_fs_info::trans_lock. This unsynchronized list manipulation may cause chaos if there's another concurrent manipulation of this list, such as when adding a root to it with ctree.c:add_root_to_dirty_list(). This can result in all sorts of weird failures caused by a race, such as the following crash: [337571.278245] general protection fault, probably for non-canonical address 0xdead000000000108: 0000 [#1] PREEMPT SMP PTI [337571.278933] CPU: 1 PID: 115447 Comm: btrfs Tainted: G W 6.4.0-rc6-btrfs-next-134+ #1 [337571.279153] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [337571.279572] RIP: 0010:commit_cowonly_roots+0x11f/0x250 [btrfs] [337571.279928] Code: 85 38 06 00 (...) [337571.280363] RSP: 0018:ffff9f63446efba0 EFLAGS: 00010206 [337571.280582] RAX: ffff942d98ec2638 RBX: ffff9430b82b4c30 RCX: 0000000449e1c000 [337571.280798] RDX: dead000000000100 RSI: ffff9430021e4900 RDI: 0000000000036070 [337571.281015] RBP: ffff942d98ec2000 R08: ffff942d98ec2000 R09: 000000000000015b [337571.281254] R10: 0000000000000009 R11: 0000000000000001 R12: ffff942fe8fbf600 [337571.281476] R13: ffff942dabe23040 R14: ffff942dabe20800 R15: ffff942d92cf3b48 [337571.281723] FS: 00007f478adb7340(0000) GS:ffff94349fa40000(0000) knlGS:0000000000000000 [337571.281950] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [337571.282184] CR2: 00007f478ab9a3d5 CR3: 000000001e02c001 CR4: 0000000000370ee0 [337571.282416] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [337571.282647] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [337571.282874] Call Trace: [337571.283101] <TASK> [337571.283327] ? __die_body+0x1b/0x60 [337571.283570] ? die_addr+0x39/0x60 [337571.283796] ? exc_general_protection+0x22e/0x430 [337571.284022] ? asm_exc_general_protection+0x22/0x30 [337571.284251] ? commit_cowonly_roots+0x11f/0x250 [btrfs] [337571.284531] btrfs_commit_transaction+0x42e/0xf90 [btrfs] [337571.284803] ? _raw_spin_unlock+0x15/0x30 [337571.285031] ? release_extent_buffer+0x103/0x130 [btrfs] [337571.285305] reset_balance_state+0x152/0x1b0 [btrfs] [337571.285578] btrfs_balance+0xa50/0x11e0 [btrfs] [337571.285864] ? __kmem_cache_alloc_node+0x14a/0x410 [337571.286086] btrfs_ioctl+0x249a/0x3320 [btrfs] [337571.286358] ? mod_objcg_state+0xd2/0x360 [337571.286577] ? refill_obj_stock+0xb0/0x160 [337571.286798] ? seq_release+0x25/0x30 [337571.287016] ? __rseq_handle_notify_resume+0x3ba/0x4b0 [337571.287235] ? percpu_counter_add_batch+0x2e/0xa0 [337571.287455] ? __x64_sys_ioctl+0x88/0xc0 [337571.287675] __x64_sys_ioctl+0x88/0xc0 [337571.287901] do_syscall_64+0x38/0x90 [337571.288126] entry_SYSCALL_64_after_hwframe+0x72/0xdc [337571.288352] RIP: 0033:0x7f478aaffe9b So fix this by locking struct btrfs_fs_info::trans_lock before deleting the quota root from that list.
Page 1 of 2