VYPR
Unrated severityNVD Advisory· Published May 27, 2026· Updated May 27, 2026

CVE-2026-45934

CVE-2026-45934

Description

In the Linux kernel, the following vulnerability has been resolved:

btrfs: fix EEXIST abort due to non-consecutive gaps in chunk allocation

I have been observing a number of systems aborting at insert_dev_extents() in btrfs_create_pending_block_groups(). The following is a sample stack trace of such an abort coming from forced chunk allocation (typically behind CONFIG_BTRFS_EXPERIMENTAL) but this can theoretically happen to any DUP chunk allocation.

[81.801] ------------[ cut here ]------------ [81.801] BTRFS: Transaction aborted (error -17) [81.801] WARNING: fs/btrfs/block-group.c:2876 at btrfs_create_pending_block_groups+0x721/0x770 [btrfs], CPU#1: bash/319 [81.802] Modules linked in: virtio_net btrfs xor zstd_compress raid6_pq null_blk [81.803] CPU: 1 UID: 0 PID: 319 Comm: bash Kdump: loaded Not tainted 6.19.0-rc6+ #319 NONE [81.803] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.17.0-2-2 04/01/2014 [81.804] RIP: 0010:btrfs_create_pending_block_groups+0x723/0x770 [btrfs] [81.806] RSP: 0018:ffffa36241a6bce8 EFLAGS: 00010282 [81.806] RAX: 000000000000000d RBX: ffff8e699921e400 RCX: 0000000000000000 [81.807] RDX: 0000000002040001 RSI: 00000000ffffffef RDI: ffffffffc0608bf0 [81.807] RBP: 00000000ffffffef R08: ffff8e69830f6000 R09: 0000000000000007 [81.808] R10: ffff8e699921e5e8 R11: 0000000000000000 R12: ffff8e6999228000 [81.808] R13: ffff8e6984d82000 R14: ffff8e69966a69c0 R15: ffff8e69aa47b000 [81.809] FS: 00007fec6bdd9740(0000) GS:ffff8e6b1b379000(0000) knlGS:0000000000000000 [81.809] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [81.810] CR2: 00005604833670f0 CR3: 0000000116679000 CR4: 00000000000006f0 [81.810] Call Trace: [81.810] [81.810] __btrfs_end_transaction+0x3e/0x2b0 [btrfs] [81.811] btrfs_force_chunk_alloc_store+0xcd/0x140 [btrfs] [81.811] kernfs_fop_write_iter+0x15f/0x240 [81.812] vfs_write+0x264/0x500 [81.812] ksys_write+0x6c/0xe0 [81.812] do_syscall_64+0x66/0x770 [81.812] entry_SYSCALL_64_after_hwframe+0x76/0x7e [81.813] RIP: 0033:0x7fec6be66197 [81.814] RSP: 002b:00007fffb159dd30 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 [81.815] RAX: ffffffffffffffda RBX: 00007fec6bdd9740 RCX: 00007fec6be66197 [81.815] RDX: 0000000000000002 RSI: 0000560483374f80 RDI: 0000000000000001 [81.816] RBP: 0000560483374f80 R08: 0000000000000000 R09: 0000000000000000 [81.816] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000002 [81.817] R13: 00007fec6bfb85c0 R14: 00007fec6bfb5ee0 R15: 00005604833729c0 [81.817] [81.817] irq event stamp: 20039 [81.818] hardirqs last enabled at (20047): [] __up_console_sem+0x52/0x60 [81.818] hardirqs last disabled at (20056): [] __up_console_sem+0x37/0x60 [81.819] softirqs last enabled at (19470): [] __irq_exit_rcu+0x96/0xc0 [81.819] softirqs last disabled at (19463): [] __irq_exit_rcu+0x96/0xc0 [81.820] ---[ end trace 0000000000000000 ]--- [81.820] BTRFS: error (device dm-7 state A) in btrfs_create_pending_block_groups:2876: errno=-17 Object already exists

Inspecting these aborts with drgn, I observed a pattern of overlapping chunk_maps. Note how stripe 1 of the first chunk overlaps in physical address with stripe 0 of the second chunk.

Physical Start Physical End Length Logical Type Stripe ---------------------------------------------------------------------------------------------------- 0x0000000102500000 0x0000000142500000 1.0G 0x0000000641d00000 META|DUP 0/2 0x0000000142500000 0x0000000182500000 1.0G 0x0000000641d00000 META|DUP 1/2 0x0000000142500000 0x0000000182500000 1.0G 0x0000000601d00000 META|DUP 0/2 0x0000000182500000 0x00000001c2500000 1.0G 0x0000000601d00000 META|DUP 1/2

Now how could this possibly happen? All chunk allocation is ---truncated---

Affected products

2

Patches

6
7d4eadee7042

btrfs: fix EEXIST abort due to non-consecutive gaps in chunk allocation

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.gitBoris BurkovJan 30, 2026Fixed in 6.18.14via kernel-cna
1 file changed · +183 61
  • fs/btrfs/volumes.c+183 61 modified
    diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
    index 8e7dcb12af4c42..645bf98a9571b5 100644
    --- a/fs/btrfs/volumes.c
    +++ b/fs/btrfs/volumes.c
    @@ -1506,30 +1506,158 @@ error_bdev_put:
     }
     
     /*
    - * Try to find a chunk that intersects [start, start + len] range and when one
    - * such is found, record the end of it in *start
    + * Find the first pending extent intersecting a range.
    + *
    + * @device:         the device to search
    + * @start:          start of the range to check
    + * @len:            length of the range to check
    + * @pending_start:  output pointer for the start of the found pending extent
    + * @pending_end:    output pointer for the end of the found pending extent (inclusive)
    + *
    + * Search for a pending chunk allocation that intersects the half-open range
    + * [start, start + len).
    + *
    + * Return: true if a pending extent was found, false otherwise.
    + * If the return value is true, store the first pending extent in
    + * [*pending_start, *pending_end]. Otherwise, the two output variables
    + * may still be modified, to something outside the range and should not
    + * be used.
      */
    -static bool contains_pending_extent(struct btrfs_device *device, u64 *start,
    -				    u64 len)
    +static bool first_pending_extent(struct btrfs_device *device, u64 start, u64 len,
    +				 u64 *pending_start, u64 *pending_end)
     {
    -	u64 physical_start, physical_end;
    -
     	lockdep_assert_held(&device->fs_info->chunk_mutex);
     
    -	if (btrfs_find_first_extent_bit(&device->alloc_state, *start,
    -					&physical_start, &physical_end,
    +	if (btrfs_find_first_extent_bit(&device->alloc_state, start,
    +					pending_start, pending_end,
     					CHUNK_ALLOCATED, NULL)) {
     
    -		if (in_range(physical_start, *start, len) ||
    -		    in_range(*start, physical_start,
    -			     physical_end + 1 - physical_start)) {
    -			*start = physical_end + 1;
    +		if (in_range(*pending_start, start, len) ||
    +		    in_range(start, *pending_start, *pending_end + 1 - *pending_start)) {
     			return true;
     		}
     	}
     	return false;
     }
     
    +/*
    + * Find the first real hole accounting for pending extents.
    + *
    + * @device:         the device containing the candidate hole
    + * @start:          input/output pointer for the hole start position
    + * @len:            input/output pointer for the hole length
    + * @min_hole_size:  the size of hole we are looking for
    + *
    + * Given a potential hole specified by [*start, *start + *len), check for pending
    + * chunk allocations within that range. If pending extents are found, the hole is
    + * adjusted to represent the first true free space that is large enough when
    + * accounting for pending chunks.
    + *
    + * Note that this function must handle various cases involving non consecutive
    + * pending extents.
    + *
    + * Returns: true if a suitable hole was found and false otherwise.
    + * If the return value is true, then *start and *len are set to represent the hole.
    + * If the return value is false, then *start is set to the largest hole we
    + * found and *len is set to its length.
    + * If there are no holes at all, then *start is set to the end of the range and
    + * *len is set to 0.
    + */
    +static bool find_hole_in_pending_extents(struct btrfs_device *device, u64 *start,
    +					 u64 *len, u64 min_hole_size)
    +{
    +	u64 pending_start, pending_end;
    +	u64 end;
    +	u64 max_hole_start = 0;
    +	u64 max_hole_len = 0;
    +
    +	lockdep_assert_held(&device->fs_info->chunk_mutex);
    +
    +	if (*len == 0)
    +		return false;
    +
    +	end = *start + *len - 1;
    +
    +	/*
    +	 * Loop until we either see a large enough hole or check every pending
    +	 * extent overlapping the candidate hole.
    +	 * At every hole that we observe, record it if it is the new max.
    +	 * At the end of the iteration, set the output variables to the max hole.
    +	 */
    +	while (true) {
    +		if (first_pending_extent(device, *start, *len, &pending_start, &pending_end)) {
    +			/*
    +			 * Case 1: the pending extent overlaps the start of
    +			 * candidate hole. That means the true hole is after the
    +			 * pending extent, but we need to find the next pending
    +			 * extent to properly size the hole. In the next loop,
    +			 * we will reduce to case 2 or 3.
    +			 * e.g.,
    +			 *
    +			 *   |----pending A----|    real hole     |----pending B----|
    +			 *            |           candidate hole        |
    +			 *         *start                              end
    +			 */
    +			if (pending_start <= *start) {
    +				*start = pending_end + 1;
    +				goto next;
    +			}
    +			/*
    +			 * Case 2: The pending extent starts after *start (and overlaps
    +			 * [*start, end), so the first hole just goes up to the start
    +			 * of the pending extent.
    +			 * e.g.,
    +			 *
    +			 *   |    real hole    |----pending A----|
    +			 *   |       candidate hole     |
    +			 * *start                      end
    +			 */
    +			*len = pending_start - *start;
    +			if (*len > max_hole_len) {
    +				max_hole_start = *start;
    +				max_hole_len = *len;
    +			}
    +			if (*len >= min_hole_size)
    +				break;
    +			/*
    +			 * If the hole wasn't big enough, then we advance past
    +			 * the pending extent and keep looking.
    +			 */
    +			*start = pending_end + 1;
    +			goto next;
    +		} else {
    +			/*
    +			 * Case 3: There is no pending extent overlapping the
    +			 * range [*start, *start + *len - 1], so the only remaining
    +			 * hole is the remaining range.
    +			 * e.g.,
    +			 *
    +			 *   |       candidate hole           |
    +			 *   |          real hole             |
    +			 * *start                            end
    +			 */
    +
    +			if (*len > max_hole_len) {
    +				max_hole_start = *start;
    +				max_hole_len = *len;
    +			}
    +			break;
    +		}
    +next:
    +		if (*start > end)
    +			break;
    +		*len = end - *start + 1;
    +	}
    +	if (max_hole_len) {
    +		*start = max_hole_start;
    +		*len = max_hole_len;
    +	} else {
    +		*start = end + 1;
    +		*len = 0;
    +	}
    +	return max_hole_len >= min_hole_size;
    +}
    +
     static u64 dev_extent_search_start(struct btrfs_device *device)
     {
     	switch (device->fs_devices->chunk_alloc_policy) {
    @@ -1594,59 +1722,57 @@ static bool dev_extent_hole_check_zoned(struct btrfs_device *device,
     }
     
     /*
    - * Check if specified hole is suitable for allocation.
    + * Validate and adjust a hole for chunk allocation
    + *
    + * @device:      the device containing the candidate hole
    + * @hole_start:  input/output pointer for the hole start position
    + * @hole_size:   input/output pointer for the hole size
    + * @num_bytes:   minimum allocation size required
      *
    - * @device:	the device which we have the hole
    - * @hole_start: starting position of the hole
    - * @hole_size:	the size of the hole
    - * @num_bytes:	the size of the free space that we need
    + * Check if the specified hole is suitable for allocation and adjust it if
    + * necessary. The hole may be modified to skip over pending chunk allocations
    + * and to satisfy stricter zoned requirements on zoned filesystems.
      *
    - * This function may modify @hole_start and @hole_size to reflect the suitable
    - * position for allocation. Returns 1 if hole position is updated, 0 otherwise.
    + * For regular (non-zoned) allocation, if the hole after adjustment is smaller
    + * than @num_bytes, the search continues past additional pending extents until
    + * either a sufficiently large hole is found or no more pending extents exist.
    + *
    + * Return: true if a suitable hole was found and false otherwise.
    + * If the return value is true, then *hole_start and *hole_size are set to
    + * represent the hole we found.
    + * If the return value is false, then *hole_start is set to the largest
    + * hole we found and *hole_size is set to its length.
    + * If there are no holes at all, then *hole_start is set to the end of the range
    + * and *hole_size is set to 0.
      */
     static bool dev_extent_hole_check(struct btrfs_device *device, u64 *hole_start,
     				  u64 *hole_size, u64 num_bytes)
     {
    -	bool changed = false;
    -	u64 hole_end = *hole_start + *hole_size;
    +	bool found = false;
    +	const u64 hole_end = *hole_start + *hole_size - 1;
     
    -	for (;;) {
    -		/*
    -		 * Check before we set max_hole_start, otherwise we could end up
    -		 * sending back this offset anyway.
    -		 */
    -		if (contains_pending_extent(device, hole_start, *hole_size)) {
    -			if (hole_end >= *hole_start)
    -				*hole_size = hole_end - *hole_start;
    -			else
    -				*hole_size = 0;
    -			changed = true;
    -		}
    +	ASSERT(*hole_size > 0);
     
    -		switch (device->fs_devices->chunk_alloc_policy) {
    -		default:
    -			btrfs_warn_unknown_chunk_allocation(device->fs_devices->chunk_alloc_policy);
    -			fallthrough;
    -		case BTRFS_CHUNK_ALLOC_REGULAR:
    -			/* No extra check */
    -			break;
    -		case BTRFS_CHUNK_ALLOC_ZONED:
    -			if (dev_extent_hole_check_zoned(device, hole_start,
    -							hole_size, num_bytes)) {
    -				changed = true;
    -				/*
    -				 * The changed hole can contain pending extent.
    -				 * Loop again to check that.
    -				 */
    -				continue;
    -			}
    -			break;
    -		}
    +again:
    +	*hole_size = hole_end - *hole_start + 1;
    +	found = find_hole_in_pending_extents(device, hole_start, hole_size, num_bytes);
    +	if (!found)
    +		return found;
    +	ASSERT(*hole_size >= num_bytes);
     
    +	switch (device->fs_devices->chunk_alloc_policy) {
    +	default:
    +		btrfs_warn_unknown_chunk_allocation(device->fs_devices->chunk_alloc_policy);
    +		fallthrough;
    +	case BTRFS_CHUNK_ALLOC_REGULAR:
    +		return found;
    +	case BTRFS_CHUNK_ALLOC_ZONED:
    +		if (dev_extent_hole_check_zoned(device, hole_start, hole_size, num_bytes))
    +			goto again;
     		break;
     	}
     
    -	return changed;
    +	return found;
     }
     
     /*
    @@ -1705,7 +1831,7 @@ static int find_free_dev_extent(struct btrfs_device *device, u64 num_bytes,
     		ret = -ENOMEM;
     		goto out;
     	}
    -again:
    +
     	if (search_start >= search_end ||
     		test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state)) {
     		ret = -ENOSPC;
    @@ -1792,11 +1918,7 @@ next:
     	 */
     	if (search_end > search_start) {
     		hole_size = search_end - search_start;
    -		if (dev_extent_hole_check(device, &search_start, &hole_size,
    -					  num_bytes)) {
    -			btrfs_release_path(path);
    -			goto again;
    -		}
    +		dev_extent_hole_check(device, &search_start, &hole_size, num_bytes);
     
     		if (hole_size > max_hole_size) {
     			max_hole_start = search_start;
    @@ -4882,6 +5004,7 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
     	u64 diff;
     	u64 start;
     	u64 free_diff = 0;
    +	u64 pending_start, pending_end;
     
     	new_size = round_down(new_size, fs_info->sectorsize);
     	start = new_size;
    @@ -4927,7 +5050,7 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
     	 * in-memory chunks are synced to disk so that the loop below sees them
     	 * and relocates them accordingly.
     	 */
    -	if (contains_pending_extent(device, &start, diff)) {
    +	if (first_pending_extent(device, start, diff, &pending_start, &pending_end)) {
     		mutex_unlock(&fs_info->chunk_mutex);
     		ret = btrfs_commit_transaction(trans);
     		if (ret)
    -- 
    cgit 1.3-korg
    
    
    
156cac365e27

btrfs: fix EEXIST abort due to non-consecutive gaps in chunk allocation

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.gitBoris BurkovJan 30, 2026Fixed in 6.19.4via kernel-cna
1 file changed · +183 61
  • fs/btrfs/volumes.c+183 61 modified
    diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
    index 8a08412f3529a1..99e167a697ba8e 100644
    --- a/fs/btrfs/volumes.c
    +++ b/fs/btrfs/volumes.c
    @@ -1505,30 +1505,158 @@ error_bdev_put:
     }
     
     /*
    - * Try to find a chunk that intersects [start, start + len] range and when one
    - * such is found, record the end of it in *start
    + * Find the first pending extent intersecting a range.
    + *
    + * @device:         the device to search
    + * @start:          start of the range to check
    + * @len:            length of the range to check
    + * @pending_start:  output pointer for the start of the found pending extent
    + * @pending_end:    output pointer for the end of the found pending extent (inclusive)
    + *
    + * Search for a pending chunk allocation that intersects the half-open range
    + * [start, start + len).
    + *
    + * Return: true if a pending extent was found, false otherwise.
    + * If the return value is true, store the first pending extent in
    + * [*pending_start, *pending_end]. Otherwise, the two output variables
    + * may still be modified, to something outside the range and should not
    + * be used.
      */
    -static bool contains_pending_extent(struct btrfs_device *device, u64 *start,
    -				    u64 len)
    +static bool first_pending_extent(struct btrfs_device *device, u64 start, u64 len,
    +				 u64 *pending_start, u64 *pending_end)
     {
    -	u64 physical_start, physical_end;
    -
     	lockdep_assert_held(&device->fs_info->chunk_mutex);
     
    -	if (btrfs_find_first_extent_bit(&device->alloc_state, *start,
    -					&physical_start, &physical_end,
    +	if (btrfs_find_first_extent_bit(&device->alloc_state, start,
    +					pending_start, pending_end,
     					CHUNK_ALLOCATED, NULL)) {
     
    -		if (in_range(physical_start, *start, len) ||
    -		    in_range(*start, physical_start,
    -			     physical_end + 1 - physical_start)) {
    -			*start = physical_end + 1;
    +		if (in_range(*pending_start, start, len) ||
    +		    in_range(start, *pending_start, *pending_end + 1 - *pending_start)) {
     			return true;
     		}
     	}
     	return false;
     }
     
    +/*
    + * Find the first real hole accounting for pending extents.
    + *
    + * @device:         the device containing the candidate hole
    + * @start:          input/output pointer for the hole start position
    + * @len:            input/output pointer for the hole length
    + * @min_hole_size:  the size of hole we are looking for
    + *
    + * Given a potential hole specified by [*start, *start + *len), check for pending
    + * chunk allocations within that range. If pending extents are found, the hole is
    + * adjusted to represent the first true free space that is large enough when
    + * accounting for pending chunks.
    + *
    + * Note that this function must handle various cases involving non consecutive
    + * pending extents.
    + *
    + * Returns: true if a suitable hole was found and false otherwise.
    + * If the return value is true, then *start and *len are set to represent the hole.
    + * If the return value is false, then *start is set to the largest hole we
    + * found and *len is set to its length.
    + * If there are no holes at all, then *start is set to the end of the range and
    + * *len is set to 0.
    + */
    +static bool find_hole_in_pending_extents(struct btrfs_device *device, u64 *start,
    +					 u64 *len, u64 min_hole_size)
    +{
    +	u64 pending_start, pending_end;
    +	u64 end;
    +	u64 max_hole_start = 0;
    +	u64 max_hole_len = 0;
    +
    +	lockdep_assert_held(&device->fs_info->chunk_mutex);
    +
    +	if (*len == 0)
    +		return false;
    +
    +	end = *start + *len - 1;
    +
    +	/*
    +	 * Loop until we either see a large enough hole or check every pending
    +	 * extent overlapping the candidate hole.
    +	 * At every hole that we observe, record it if it is the new max.
    +	 * At the end of the iteration, set the output variables to the max hole.
    +	 */
    +	while (true) {
    +		if (first_pending_extent(device, *start, *len, &pending_start, &pending_end)) {
    +			/*
    +			 * Case 1: the pending extent overlaps the start of
    +			 * candidate hole. That means the true hole is after the
    +			 * pending extent, but we need to find the next pending
    +			 * extent to properly size the hole. In the next loop,
    +			 * we will reduce to case 2 or 3.
    +			 * e.g.,
    +			 *
    +			 *   |----pending A----|    real hole     |----pending B----|
    +			 *            |           candidate hole        |
    +			 *         *start                              end
    +			 */
    +			if (pending_start <= *start) {
    +				*start = pending_end + 1;
    +				goto next;
    +			}
    +			/*
    +			 * Case 2: The pending extent starts after *start (and overlaps
    +			 * [*start, end), so the first hole just goes up to the start
    +			 * of the pending extent.
    +			 * e.g.,
    +			 *
    +			 *   |    real hole    |----pending A----|
    +			 *   |       candidate hole     |
    +			 * *start                      end
    +			 */
    +			*len = pending_start - *start;
    +			if (*len > max_hole_len) {
    +				max_hole_start = *start;
    +				max_hole_len = *len;
    +			}
    +			if (*len >= min_hole_size)
    +				break;
    +			/*
    +			 * If the hole wasn't big enough, then we advance past
    +			 * the pending extent and keep looking.
    +			 */
    +			*start = pending_end + 1;
    +			goto next;
    +		} else {
    +			/*
    +			 * Case 3: There is no pending extent overlapping the
    +			 * range [*start, *start + *len - 1], so the only remaining
    +			 * hole is the remaining range.
    +			 * e.g.,
    +			 *
    +			 *   |       candidate hole           |
    +			 *   |          real hole             |
    +			 * *start                            end
    +			 */
    +
    +			if (*len > max_hole_len) {
    +				max_hole_start = *start;
    +				max_hole_len = *len;
    +			}
    +			break;
    +		}
    +next:
    +		if (*start > end)
    +			break;
    +		*len = end - *start + 1;
    +	}
    +	if (max_hole_len) {
    +		*start = max_hole_start;
    +		*len = max_hole_len;
    +	} else {
    +		*start = end + 1;
    +		*len = 0;
    +	}
    +	return max_hole_len >= min_hole_size;
    +}
    +
     static u64 dev_extent_search_start(struct btrfs_device *device)
     {
     	switch (device->fs_devices->chunk_alloc_policy) {
    @@ -1593,59 +1721,57 @@ static bool dev_extent_hole_check_zoned(struct btrfs_device *device,
     }
     
     /*
    - * Check if specified hole is suitable for allocation.
    + * Validate and adjust a hole for chunk allocation
    + *
    + * @device:      the device containing the candidate hole
    + * @hole_start:  input/output pointer for the hole start position
    + * @hole_size:   input/output pointer for the hole size
    + * @num_bytes:   minimum allocation size required
      *
    - * @device:	the device which we have the hole
    - * @hole_start: starting position of the hole
    - * @hole_size:	the size of the hole
    - * @num_bytes:	the size of the free space that we need
    + * Check if the specified hole is suitable for allocation and adjust it if
    + * necessary. The hole may be modified to skip over pending chunk allocations
    + * and to satisfy stricter zoned requirements on zoned filesystems.
      *
    - * This function may modify @hole_start and @hole_size to reflect the suitable
    - * position for allocation. Returns 1 if hole position is updated, 0 otherwise.
    + * For regular (non-zoned) allocation, if the hole after adjustment is smaller
    + * than @num_bytes, the search continues past additional pending extents until
    + * either a sufficiently large hole is found or no more pending extents exist.
    + *
    + * Return: true if a suitable hole was found and false otherwise.
    + * If the return value is true, then *hole_start and *hole_size are set to
    + * represent the hole we found.
    + * If the return value is false, then *hole_start is set to the largest
    + * hole we found and *hole_size is set to its length.
    + * If there are no holes at all, then *hole_start is set to the end of the range
    + * and *hole_size is set to 0.
      */
     static bool dev_extent_hole_check(struct btrfs_device *device, u64 *hole_start,
     				  u64 *hole_size, u64 num_bytes)
     {
    -	bool changed = false;
    -	u64 hole_end = *hole_start + *hole_size;
    +	bool found = false;
    +	const u64 hole_end = *hole_start + *hole_size - 1;
     
    -	for (;;) {
    -		/*
    -		 * Check before we set max_hole_start, otherwise we could end up
    -		 * sending back this offset anyway.
    -		 */
    -		if (contains_pending_extent(device, hole_start, *hole_size)) {
    -			if (hole_end >= *hole_start)
    -				*hole_size = hole_end - *hole_start;
    -			else
    -				*hole_size = 0;
    -			changed = true;
    -		}
    +	ASSERT(*hole_size > 0);
     
    -		switch (device->fs_devices->chunk_alloc_policy) {
    -		default:
    -			btrfs_warn_unknown_chunk_allocation(device->fs_devices->chunk_alloc_policy);
    -			fallthrough;
    -		case BTRFS_CHUNK_ALLOC_REGULAR:
    -			/* No extra check */
    -			break;
    -		case BTRFS_CHUNK_ALLOC_ZONED:
    -			if (dev_extent_hole_check_zoned(device, hole_start,
    -							hole_size, num_bytes)) {
    -				changed = true;
    -				/*
    -				 * The changed hole can contain pending extent.
    -				 * Loop again to check that.
    -				 */
    -				continue;
    -			}
    -			break;
    -		}
    +again:
    +	*hole_size = hole_end - *hole_start + 1;
    +	found = find_hole_in_pending_extents(device, hole_start, hole_size, num_bytes);
    +	if (!found)
    +		return found;
    +	ASSERT(*hole_size >= num_bytes);
     
    +	switch (device->fs_devices->chunk_alloc_policy) {
    +	default:
    +		btrfs_warn_unknown_chunk_allocation(device->fs_devices->chunk_alloc_policy);
    +		fallthrough;
    +	case BTRFS_CHUNK_ALLOC_REGULAR:
    +		return found;
    +	case BTRFS_CHUNK_ALLOC_ZONED:
    +		if (dev_extent_hole_check_zoned(device, hole_start, hole_size, num_bytes))
    +			goto again;
     		break;
     	}
     
    -	return changed;
    +	return found;
     }
     
     /*
    @@ -1704,7 +1830,7 @@ static int find_free_dev_extent(struct btrfs_device *device, u64 num_bytes,
     		ret = -ENOMEM;
     		goto out;
     	}
    -again:
    +
     	if (search_start >= search_end ||
     		test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state)) {
     		ret = -ENOSPC;
    @@ -1791,11 +1917,7 @@ next:
     	 */
     	if (search_end > search_start) {
     		hole_size = search_end - search_start;
    -		if (dev_extent_hole_check(device, &search_start, &hole_size,
    -					  num_bytes)) {
    -			btrfs_release_path(path);
    -			goto again;
    -		}
    +		dev_extent_hole_check(device, &search_start, &hole_size, num_bytes);
     
     		if (hole_size > max_hole_size) {
     			max_hole_start = search_start;
    @@ -4844,6 +4966,7 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
     	u64 diff;
     	u64 start;
     	u64 free_diff = 0;
    +	u64 pending_start, pending_end;
     
     	new_size = round_down(new_size, fs_info->sectorsize);
     	start = new_size;
    @@ -4889,7 +5012,7 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
     	 * in-memory chunks are synced to disk so that the loop below sees them
     	 * and relocates them accordingly.
     	 */
    -	if (contains_pending_extent(device, &start, diff)) {
    +	if (first_pending_extent(device, start, diff, &pending_start, &pending_end)) {
     		mutex_unlock(&fs_info->chunk_mutex);
     		ret = btrfs_commit_transaction(trans);
     		if (ret)
    -- 
    cgit 1.3-korg
    
    
    
b14c5e04bd0f

btrfs: fix EEXIST abort due to non-consecutive gaps in chunk allocation

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.gitBoris BurkovJan 30, 2026Fixed in 7.0via kernel-cna
1 file changed · +183 61
  • fs/btrfs/volumes.c+183 61 modified
    diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
    index d33780082b8da2..329a922893b4fe 100644
    --- a/fs/btrfs/volumes.c
    +++ b/fs/btrfs/volumes.c
    @@ -1509,30 +1509,158 @@ error_bdev_put:
     }
     
     /*
    - * Try to find a chunk that intersects [start, start + len] range and when one
    - * such is found, record the end of it in *start
    + * Find the first pending extent intersecting a range.
    + *
    + * @device:         the device to search
    + * @start:          start of the range to check
    + * @len:            length of the range to check
    + * @pending_start:  output pointer for the start of the found pending extent
    + * @pending_end:    output pointer for the end of the found pending extent (inclusive)
    + *
    + * Search for a pending chunk allocation that intersects the half-open range
    + * [start, start + len).
    + *
    + * Return: true if a pending extent was found, false otherwise.
    + * If the return value is true, store the first pending extent in
    + * [*pending_start, *pending_end]. Otherwise, the two output variables
    + * may still be modified, to something outside the range and should not
    + * be used.
      */
    -static bool contains_pending_extent(struct btrfs_device *device, u64 *start,
    -				    u64 len)
    +static bool first_pending_extent(struct btrfs_device *device, u64 start, u64 len,
    +				 u64 *pending_start, u64 *pending_end)
     {
    -	u64 physical_start, physical_end;
    -
     	lockdep_assert_held(&device->fs_info->chunk_mutex);
     
    -	if (btrfs_find_first_extent_bit(&device->alloc_state, *start,
    -					&physical_start, &physical_end,
    +	if (btrfs_find_first_extent_bit(&device->alloc_state, start,
    +					pending_start, pending_end,
     					CHUNK_ALLOCATED, NULL)) {
     
    -		if (in_range(physical_start, *start, len) ||
    -		    in_range(*start, physical_start,
    -			     physical_end + 1 - physical_start)) {
    -			*start = physical_end + 1;
    +		if (in_range(*pending_start, start, len) ||
    +		    in_range(start, *pending_start, *pending_end + 1 - *pending_start)) {
     			return true;
     		}
     	}
     	return false;
     }
     
    +/*
    + * Find the first real hole accounting for pending extents.
    + *
    + * @device:         the device containing the candidate hole
    + * @start:          input/output pointer for the hole start position
    + * @len:            input/output pointer for the hole length
    + * @min_hole_size:  the size of hole we are looking for
    + *
    + * Given a potential hole specified by [*start, *start + *len), check for pending
    + * chunk allocations within that range. If pending extents are found, the hole is
    + * adjusted to represent the first true free space that is large enough when
    + * accounting for pending chunks.
    + *
    + * Note that this function must handle various cases involving non consecutive
    + * pending extents.
    + *
    + * Returns: true if a suitable hole was found and false otherwise.
    + * If the return value is true, then *start and *len are set to represent the hole.
    + * If the return value is false, then *start is set to the largest hole we
    + * found and *len is set to its length.
    + * If there are no holes at all, then *start is set to the end of the range and
    + * *len is set to 0.
    + */
    +static bool find_hole_in_pending_extents(struct btrfs_device *device, u64 *start,
    +					 u64 *len, u64 min_hole_size)
    +{
    +	u64 pending_start, pending_end;
    +	u64 end;
    +	u64 max_hole_start = 0;
    +	u64 max_hole_len = 0;
    +
    +	lockdep_assert_held(&device->fs_info->chunk_mutex);
    +
    +	if (*len == 0)
    +		return false;
    +
    +	end = *start + *len - 1;
    +
    +	/*
    +	 * Loop until we either see a large enough hole or check every pending
    +	 * extent overlapping the candidate hole.
    +	 * At every hole that we observe, record it if it is the new max.
    +	 * At the end of the iteration, set the output variables to the max hole.
    +	 */
    +	while (true) {
    +		if (first_pending_extent(device, *start, *len, &pending_start, &pending_end)) {
    +			/*
    +			 * Case 1: the pending extent overlaps the start of
    +			 * candidate hole. That means the true hole is after the
    +			 * pending extent, but we need to find the next pending
    +			 * extent to properly size the hole. In the next loop,
    +			 * we will reduce to case 2 or 3.
    +			 * e.g.,
    +			 *
    +			 *   |----pending A----|    real hole     |----pending B----|
    +			 *            |           candidate hole        |
    +			 *         *start                              end
    +			 */
    +			if (pending_start <= *start) {
    +				*start = pending_end + 1;
    +				goto next;
    +			}
    +			/*
    +			 * Case 2: The pending extent starts after *start (and overlaps
    +			 * [*start, end), so the first hole just goes up to the start
    +			 * of the pending extent.
    +			 * e.g.,
    +			 *
    +			 *   |    real hole    |----pending A----|
    +			 *   |       candidate hole     |
    +			 * *start                      end
    +			 */
    +			*len = pending_start - *start;
    +			if (*len > max_hole_len) {
    +				max_hole_start = *start;
    +				max_hole_len = *len;
    +			}
    +			if (*len >= min_hole_size)
    +				break;
    +			/*
    +			 * If the hole wasn't big enough, then we advance past
    +			 * the pending extent and keep looking.
    +			 */
    +			*start = pending_end + 1;
    +			goto next;
    +		} else {
    +			/*
    +			 * Case 3: There is no pending extent overlapping the
    +			 * range [*start, *start + *len - 1], so the only remaining
    +			 * hole is the remaining range.
    +			 * e.g.,
    +			 *
    +			 *   |       candidate hole           |
    +			 *   |          real hole             |
    +			 * *start                            end
    +			 */
    +
    +			if (*len > max_hole_len) {
    +				max_hole_start = *start;
    +				max_hole_len = *len;
    +			}
    +			break;
    +		}
    +next:
    +		if (*start > end)
    +			break;
    +		*len = end - *start + 1;
    +	}
    +	if (max_hole_len) {
    +		*start = max_hole_start;
    +		*len = max_hole_len;
    +	} else {
    +		*start = end + 1;
    +		*len = 0;
    +	}
    +	return max_hole_len >= min_hole_size;
    +}
    +
     static u64 dev_extent_search_start(struct btrfs_device *device)
     {
     	switch (device->fs_devices->chunk_alloc_policy) {
    @@ -1597,59 +1725,57 @@ static bool dev_extent_hole_check_zoned(struct btrfs_device *device,
     }
     
     /*
    - * Check if specified hole is suitable for allocation.
    + * Validate and adjust a hole for chunk allocation
    + *
    + * @device:      the device containing the candidate hole
    + * @hole_start:  input/output pointer for the hole start position
    + * @hole_size:   input/output pointer for the hole size
    + * @num_bytes:   minimum allocation size required
      *
    - * @device:	the device which we have the hole
    - * @hole_start: starting position of the hole
    - * @hole_size:	the size of the hole
    - * @num_bytes:	the size of the free space that we need
    + * Check if the specified hole is suitable for allocation and adjust it if
    + * necessary. The hole may be modified to skip over pending chunk allocations
    + * and to satisfy stricter zoned requirements on zoned filesystems.
      *
    - * This function may modify @hole_start and @hole_size to reflect the suitable
    - * position for allocation. Returns 1 if hole position is updated, 0 otherwise.
    + * For regular (non-zoned) allocation, if the hole after adjustment is smaller
    + * than @num_bytes, the search continues past additional pending extents until
    + * either a sufficiently large hole is found or no more pending extents exist.
    + *
    + * Return: true if a suitable hole was found and false otherwise.
    + * If the return value is true, then *hole_start and *hole_size are set to
    + * represent the hole we found.
    + * If the return value is false, then *hole_start is set to the largest
    + * hole we found and *hole_size is set to its length.
    + * If there are no holes at all, then *hole_start is set to the end of the range
    + * and *hole_size is set to 0.
      */
     static bool dev_extent_hole_check(struct btrfs_device *device, u64 *hole_start,
     				  u64 *hole_size, u64 num_bytes)
     {
    -	bool changed = false;
    -	u64 hole_end = *hole_start + *hole_size;
    +	bool found = false;
    +	const u64 hole_end = *hole_start + *hole_size - 1;
     
    -	for (;;) {
    -		/*
    -		 * Check before we set max_hole_start, otherwise we could end up
    -		 * sending back this offset anyway.
    -		 */
    -		if (contains_pending_extent(device, hole_start, *hole_size)) {
    -			if (hole_end >= *hole_start)
    -				*hole_size = hole_end - *hole_start;
    -			else
    -				*hole_size = 0;
    -			changed = true;
    -		}
    +	ASSERT(*hole_size > 0);
     
    -		switch (device->fs_devices->chunk_alloc_policy) {
    -		default:
    -			btrfs_warn_unknown_chunk_allocation(device->fs_devices->chunk_alloc_policy);
    -			fallthrough;
    -		case BTRFS_CHUNK_ALLOC_REGULAR:
    -			/* No extra check */
    -			break;
    -		case BTRFS_CHUNK_ALLOC_ZONED:
    -			if (dev_extent_hole_check_zoned(device, hole_start,
    -							hole_size, num_bytes)) {
    -				changed = true;
    -				/*
    -				 * The changed hole can contain pending extent.
    -				 * Loop again to check that.
    -				 */
    -				continue;
    -			}
    -			break;
    -		}
    +again:
    +	*hole_size = hole_end - *hole_start + 1;
    +	found = find_hole_in_pending_extents(device, hole_start, hole_size, num_bytes);
    +	if (!found)
    +		return found;
    +	ASSERT(*hole_size >= num_bytes);
     
    +	switch (device->fs_devices->chunk_alloc_policy) {
    +	default:
    +		btrfs_warn_unknown_chunk_allocation(device->fs_devices->chunk_alloc_policy);
    +		fallthrough;
    +	case BTRFS_CHUNK_ALLOC_REGULAR:
    +		return found;
    +	case BTRFS_CHUNK_ALLOC_ZONED:
    +		if (dev_extent_hole_check_zoned(device, hole_start, hole_size, num_bytes))
    +			goto again;
     		break;
     	}
     
    -	return changed;
    +	return found;
     }
     
     /*
    @@ -1708,7 +1834,7 @@ static int find_free_dev_extent(struct btrfs_device *device, u64 num_bytes,
     		ret = -ENOMEM;
     		goto out;
     	}
    -again:
    +
     	if (search_start >= search_end ||
     		test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state)) {
     		ret = -ENOSPC;
    @@ -1795,11 +1921,7 @@ next:
     	 */
     	if (search_end > search_start) {
     		hole_size = search_end - search_start;
    -		if (dev_extent_hole_check(device, &search_start, &hole_size,
    -					  num_bytes)) {
    -			btrfs_release_path(path);
    -			goto again;
    -		}
    +		dev_extent_hole_check(device, &search_start, &hole_size, num_bytes);
     
     		if (hole_size > max_hole_size) {
     			max_hole_start = search_start;
    @@ -5022,6 +5144,7 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
     	u64 diff;
     	u64 start;
     	u64 free_diff = 0;
    +	u64 pending_start, pending_end;
     
     	new_size = round_down(new_size, fs_info->sectorsize);
     	start = new_size;
    @@ -5067,7 +5190,7 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
     	 * in-memory chunks are synced to disk so that the loop below sees them
     	 * and relocates them accordingly.
     	 */
    -	if (contains_pending_extent(device, &start, diff)) {
    +	if (first_pending_extent(device, start, diff, &pending_start, &pending_end)) {
     		mutex_unlock(&fs_info->chunk_mutex);
     		ret = btrfs_commit_transaction(trans);
     		if (ret)
    -- 
    cgit 1.3-korg
    
    
    
7d4eadee7042

btrfs: fix EEXIST abort due to non-consecutive gaps in chunk allocation

1 file changed · +183 61
  • fs/btrfs/volumes.c+183 61 modified
    diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
    index 8e7dcb12af4c42..645bf98a9571b5 100644
    --- a/fs/btrfs/volumes.c
    +++ b/fs/btrfs/volumes.c
    @@ -1506,30 +1506,158 @@ error_bdev_put:
     }
     
     /*
    - * Try to find a chunk that intersects [start, start + len] range and when one
    - * such is found, record the end of it in *start
    + * Find the first pending extent intersecting a range.
    + *
    + * @device:         the device to search
    + * @start:          start of the range to check
    + * @len:            length of the range to check
    + * @pending_start:  output pointer for the start of the found pending extent
    + * @pending_end:    output pointer for the end of the found pending extent (inclusive)
    + *
    + * Search for a pending chunk allocation that intersects the half-open range
    + * [start, start + len).
    + *
    + * Return: true if a pending extent was found, false otherwise.
    + * If the return value is true, store the first pending extent in
    + * [*pending_start, *pending_end]. Otherwise, the two output variables
    + * may still be modified, to something outside the range and should not
    + * be used.
      */
    -static bool contains_pending_extent(struct btrfs_device *device, u64 *start,
    -				    u64 len)
    +static bool first_pending_extent(struct btrfs_device *device, u64 start, u64 len,
    +				 u64 *pending_start, u64 *pending_end)
     {
    -	u64 physical_start, physical_end;
    -
     	lockdep_assert_held(&device->fs_info->chunk_mutex);
     
    -	if (btrfs_find_first_extent_bit(&device->alloc_state, *start,
    -					&physical_start, &physical_end,
    +	if (btrfs_find_first_extent_bit(&device->alloc_state, start,
    +					pending_start, pending_end,
     					CHUNK_ALLOCATED, NULL)) {
     
    -		if (in_range(physical_start, *start, len) ||
    -		    in_range(*start, physical_start,
    -			     physical_end + 1 - physical_start)) {
    -			*start = physical_end + 1;
    +		if (in_range(*pending_start, start, len) ||
    +		    in_range(start, *pending_start, *pending_end + 1 - *pending_start)) {
     			return true;
     		}
     	}
     	return false;
     }
     
    +/*
    + * Find the first real hole accounting for pending extents.
    + *
    + * @device:         the device containing the candidate hole
    + * @start:          input/output pointer for the hole start position
    + * @len:            input/output pointer for the hole length
    + * @min_hole_size:  the size of hole we are looking for
    + *
    + * Given a potential hole specified by [*start, *start + *len), check for pending
    + * chunk allocations within that range. If pending extents are found, the hole is
    + * adjusted to represent the first true free space that is large enough when
    + * accounting for pending chunks.
    + *
    + * Note that this function must handle various cases involving non consecutive
    + * pending extents.
    + *
    + * Returns: true if a suitable hole was found and false otherwise.
    + * If the return value is true, then *start and *len are set to represent the hole.
    + * If the return value is false, then *start is set to the largest hole we
    + * found and *len is set to its length.
    + * If there are no holes at all, then *start is set to the end of the range and
    + * *len is set to 0.
    + */
    +static bool find_hole_in_pending_extents(struct btrfs_device *device, u64 *start,
    +					 u64 *len, u64 min_hole_size)
    +{
    +	u64 pending_start, pending_end;
    +	u64 end;
    +	u64 max_hole_start = 0;
    +	u64 max_hole_len = 0;
    +
    +	lockdep_assert_held(&device->fs_info->chunk_mutex);
    +
    +	if (*len == 0)
    +		return false;
    +
    +	end = *start + *len - 1;
    +
    +	/*
    +	 * Loop until we either see a large enough hole or check every pending
    +	 * extent overlapping the candidate hole.
    +	 * At every hole that we observe, record it if it is the new max.
    +	 * At the end of the iteration, set the output variables to the max hole.
    +	 */
    +	while (true) {
    +		if (first_pending_extent(device, *start, *len, &pending_start, &pending_end)) {
    +			/*
    +			 * Case 1: the pending extent overlaps the start of
    +			 * candidate hole. That means the true hole is after the
    +			 * pending extent, but we need to find the next pending
    +			 * extent to properly size the hole. In the next loop,
    +			 * we will reduce to case 2 or 3.
    +			 * e.g.,
    +			 *
    +			 *   |----pending A----|    real hole     |----pending B----|
    +			 *            |           candidate hole        |
    +			 *         *start                              end
    +			 */
    +			if (pending_start <= *start) {
    +				*start = pending_end + 1;
    +				goto next;
    +			}
    +			/*
    +			 * Case 2: The pending extent starts after *start (and overlaps
    +			 * [*start, end), so the first hole just goes up to the start
    +			 * of the pending extent.
    +			 * e.g.,
    +			 *
    +			 *   |    real hole    |----pending A----|
    +			 *   |       candidate hole     |
    +			 * *start                      end
    +			 */
    +			*len = pending_start - *start;
    +			if (*len > max_hole_len) {
    +				max_hole_start = *start;
    +				max_hole_len = *len;
    +			}
    +			if (*len >= min_hole_size)
    +				break;
    +			/*
    +			 * If the hole wasn't big enough, then we advance past
    +			 * the pending extent and keep looking.
    +			 */
    +			*start = pending_end + 1;
    +			goto next;
    +		} else {
    +			/*
    +			 * Case 3: There is no pending extent overlapping the
    +			 * range [*start, *start + *len - 1], so the only remaining
    +			 * hole is the remaining range.
    +			 * e.g.,
    +			 *
    +			 *   |       candidate hole           |
    +			 *   |          real hole             |
    +			 * *start                            end
    +			 */
    +
    +			if (*len > max_hole_len) {
    +				max_hole_start = *start;
    +				max_hole_len = *len;
    +			}
    +			break;
    +		}
    +next:
    +		if (*start > end)
    +			break;
    +		*len = end - *start + 1;
    +	}
    +	if (max_hole_len) {
    +		*start = max_hole_start;
    +		*len = max_hole_len;
    +	} else {
    +		*start = end + 1;
    +		*len = 0;
    +	}
    +	return max_hole_len >= min_hole_size;
    +}
    +
     static u64 dev_extent_search_start(struct btrfs_device *device)
     {
     	switch (device->fs_devices->chunk_alloc_policy) {
    @@ -1594,59 +1722,57 @@ static bool dev_extent_hole_check_zoned(struct btrfs_device *device,
     }
     
     /*
    - * Check if specified hole is suitable for allocation.
    + * Validate and adjust a hole for chunk allocation
    + *
    + * @device:      the device containing the candidate hole
    + * @hole_start:  input/output pointer for the hole start position
    + * @hole_size:   input/output pointer for the hole size
    + * @num_bytes:   minimum allocation size required
      *
    - * @device:	the device which we have the hole
    - * @hole_start: starting position of the hole
    - * @hole_size:	the size of the hole
    - * @num_bytes:	the size of the free space that we need
    + * Check if the specified hole is suitable for allocation and adjust it if
    + * necessary. The hole may be modified to skip over pending chunk allocations
    + * and to satisfy stricter zoned requirements on zoned filesystems.
      *
    - * This function may modify @hole_start and @hole_size to reflect the suitable
    - * position for allocation. Returns 1 if hole position is updated, 0 otherwise.
    + * For regular (non-zoned) allocation, if the hole after adjustment is smaller
    + * than @num_bytes, the search continues past additional pending extents until
    + * either a sufficiently large hole is found or no more pending extents exist.
    + *
    + * Return: true if a suitable hole was found and false otherwise.
    + * If the return value is true, then *hole_start and *hole_size are set to
    + * represent the hole we found.
    + * If the return value is false, then *hole_start is set to the largest
    + * hole we found and *hole_size is set to its length.
    + * If there are no holes at all, then *hole_start is set to the end of the range
    + * and *hole_size is set to 0.
      */
     static bool dev_extent_hole_check(struct btrfs_device *device, u64 *hole_start,
     				  u64 *hole_size, u64 num_bytes)
     {
    -	bool changed = false;
    -	u64 hole_end = *hole_start + *hole_size;
    +	bool found = false;
    +	const u64 hole_end = *hole_start + *hole_size - 1;
     
    -	for (;;) {
    -		/*
    -		 * Check before we set max_hole_start, otherwise we could end up
    -		 * sending back this offset anyway.
    -		 */
    -		if (contains_pending_extent(device, hole_start, *hole_size)) {
    -			if (hole_end >= *hole_start)
    -				*hole_size = hole_end - *hole_start;
    -			else
    -				*hole_size = 0;
    -			changed = true;
    -		}
    +	ASSERT(*hole_size > 0);
     
    -		switch (device->fs_devices->chunk_alloc_policy) {
    -		default:
    -			btrfs_warn_unknown_chunk_allocation(device->fs_devices->chunk_alloc_policy);
    -			fallthrough;
    -		case BTRFS_CHUNK_ALLOC_REGULAR:
    -			/* No extra check */
    -			break;
    -		case BTRFS_CHUNK_ALLOC_ZONED:
    -			if (dev_extent_hole_check_zoned(device, hole_start,
    -							hole_size, num_bytes)) {
    -				changed = true;
    -				/*
    -				 * The changed hole can contain pending extent.
    -				 * Loop again to check that.
    -				 */
    -				continue;
    -			}
    -			break;
    -		}
    +again:
    +	*hole_size = hole_end - *hole_start + 1;
    +	found = find_hole_in_pending_extents(device, hole_start, hole_size, num_bytes);
    +	if (!found)
    +		return found;
    +	ASSERT(*hole_size >= num_bytes);
     
    +	switch (device->fs_devices->chunk_alloc_policy) {
    +	default:
    +		btrfs_warn_unknown_chunk_allocation(device->fs_devices->chunk_alloc_policy);
    +		fallthrough;
    +	case BTRFS_CHUNK_ALLOC_REGULAR:
    +		return found;
    +	case BTRFS_CHUNK_ALLOC_ZONED:
    +		if (dev_extent_hole_check_zoned(device, hole_start, hole_size, num_bytes))
    +			goto again;
     		break;
     	}
     
    -	return changed;
    +	return found;
     }
     
     /*
    @@ -1705,7 +1831,7 @@ static int find_free_dev_extent(struct btrfs_device *device, u64 num_bytes,
     		ret = -ENOMEM;
     		goto out;
     	}
    -again:
    +
     	if (search_start >= search_end ||
     		test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state)) {
     		ret = -ENOSPC;
    @@ -1792,11 +1918,7 @@ next:
     	 */
     	if (search_end > search_start) {
     		hole_size = search_end - search_start;
    -		if (dev_extent_hole_check(device, &search_start, &hole_size,
    -					  num_bytes)) {
    -			btrfs_release_path(path);
    -			goto again;
    -		}
    +		dev_extent_hole_check(device, &search_start, &hole_size, num_bytes);
     
     		if (hole_size > max_hole_size) {
     			max_hole_start = search_start;
    @@ -4882,6 +5004,7 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
     	u64 diff;
     	u64 start;
     	u64 free_diff = 0;
    +	u64 pending_start, pending_end;
     
     	new_size = round_down(new_size, fs_info->sectorsize);
     	start = new_size;
    @@ -4927,7 +5050,7 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
     	 * in-memory chunks are synced to disk so that the loop below sees them
     	 * and relocates them accordingly.
     	 */
    -	if (contains_pending_extent(device, &start, diff)) {
    +	if (first_pending_extent(device, start, diff, &pending_start, &pending_end)) {
     		mutex_unlock(&fs_info->chunk_mutex);
     		ret = btrfs_commit_transaction(trans);
     		if (ret)
    -- 
    cgit 1.3-korg
    
    
    
156cac365e27

btrfs: fix EEXIST abort due to non-consecutive gaps in chunk allocation

1 file changed · +183 61
  • fs/btrfs/volumes.c+183 61 modified
    diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
    index 8a08412f3529a1..99e167a697ba8e 100644
    --- a/fs/btrfs/volumes.c
    +++ b/fs/btrfs/volumes.c
    @@ -1505,30 +1505,158 @@ error_bdev_put:
     }
     
     /*
    - * Try to find a chunk that intersects [start, start + len] range and when one
    - * such is found, record the end of it in *start
    + * Find the first pending extent intersecting a range.
    + *
    + * @device:         the device to search
    + * @start:          start of the range to check
    + * @len:            length of the range to check
    + * @pending_start:  output pointer for the start of the found pending extent
    + * @pending_end:    output pointer for the end of the found pending extent (inclusive)
    + *
    + * Search for a pending chunk allocation that intersects the half-open range
    + * [start, start + len).
    + *
    + * Return: true if a pending extent was found, false otherwise.
    + * If the return value is true, store the first pending extent in
    + * [*pending_start, *pending_end]. Otherwise, the two output variables
    + * may still be modified, to something outside the range and should not
    + * be used.
      */
    -static bool contains_pending_extent(struct btrfs_device *device, u64 *start,
    -				    u64 len)
    +static bool first_pending_extent(struct btrfs_device *device, u64 start, u64 len,
    +				 u64 *pending_start, u64 *pending_end)
     {
    -	u64 physical_start, physical_end;
    -
     	lockdep_assert_held(&device->fs_info->chunk_mutex);
     
    -	if (btrfs_find_first_extent_bit(&device->alloc_state, *start,
    -					&physical_start, &physical_end,
    +	if (btrfs_find_first_extent_bit(&device->alloc_state, start,
    +					pending_start, pending_end,
     					CHUNK_ALLOCATED, NULL)) {
     
    -		if (in_range(physical_start, *start, len) ||
    -		    in_range(*start, physical_start,
    -			     physical_end + 1 - physical_start)) {
    -			*start = physical_end + 1;
    +		if (in_range(*pending_start, start, len) ||
    +		    in_range(start, *pending_start, *pending_end + 1 - *pending_start)) {
     			return true;
     		}
     	}
     	return false;
     }
     
    +/*
    + * Find the first real hole accounting for pending extents.
    + *
    + * @device:         the device containing the candidate hole
    + * @start:          input/output pointer for the hole start position
    + * @len:            input/output pointer for the hole length
    + * @min_hole_size:  the size of hole we are looking for
    + *
    + * Given a potential hole specified by [*start, *start + *len), check for pending
    + * chunk allocations within that range. If pending extents are found, the hole is
    + * adjusted to represent the first true free space that is large enough when
    + * accounting for pending chunks.
    + *
    + * Note that this function must handle various cases involving non consecutive
    + * pending extents.
    + *
    + * Returns: true if a suitable hole was found and false otherwise.
    + * If the return value is true, then *start and *len are set to represent the hole.
    + * If the return value is false, then *start is set to the largest hole we
    + * found and *len is set to its length.
    + * If there are no holes at all, then *start is set to the end of the range and
    + * *len is set to 0.
    + */
    +static bool find_hole_in_pending_extents(struct btrfs_device *device, u64 *start,
    +					 u64 *len, u64 min_hole_size)
    +{
    +	u64 pending_start, pending_end;
    +	u64 end;
    +	u64 max_hole_start = 0;
    +	u64 max_hole_len = 0;
    +
    +	lockdep_assert_held(&device->fs_info->chunk_mutex);
    +
    +	if (*len == 0)
    +		return false;
    +
    +	end = *start + *len - 1;
    +
    +	/*
    +	 * Loop until we either see a large enough hole or check every pending
    +	 * extent overlapping the candidate hole.
    +	 * At every hole that we observe, record it if it is the new max.
    +	 * At the end of the iteration, set the output variables to the max hole.
    +	 */
    +	while (true) {
    +		if (first_pending_extent(device, *start, *len, &pending_start, &pending_end)) {
    +			/*
    +			 * Case 1: the pending extent overlaps the start of
    +			 * candidate hole. That means the true hole is after the
    +			 * pending extent, but we need to find the next pending
    +			 * extent to properly size the hole. In the next loop,
    +			 * we will reduce to case 2 or 3.
    +			 * e.g.,
    +			 *
    +			 *   |----pending A----|    real hole     |----pending B----|
    +			 *            |           candidate hole        |
    +			 *         *start                              end
    +			 */
    +			if (pending_start <= *start) {
    +				*start = pending_end + 1;
    +				goto next;
    +			}
    +			/*
    +			 * Case 2: The pending extent starts after *start (and overlaps
    +			 * [*start, end), so the first hole just goes up to the start
    +			 * of the pending extent.
    +			 * e.g.,
    +			 *
    +			 *   |    real hole    |----pending A----|
    +			 *   |       candidate hole     |
    +			 * *start                      end
    +			 */
    +			*len = pending_start - *start;
    +			if (*len > max_hole_len) {
    +				max_hole_start = *start;
    +				max_hole_len = *len;
    +			}
    +			if (*len >= min_hole_size)
    +				break;
    +			/*
    +			 * If the hole wasn't big enough, then we advance past
    +			 * the pending extent and keep looking.
    +			 */
    +			*start = pending_end + 1;
    +			goto next;
    +		} else {
    +			/*
    +			 * Case 3: There is no pending extent overlapping the
    +			 * range [*start, *start + *len - 1], so the only remaining
    +			 * hole is the remaining range.
    +			 * e.g.,
    +			 *
    +			 *   |       candidate hole           |
    +			 *   |          real hole             |
    +			 * *start                            end
    +			 */
    +
    +			if (*len > max_hole_len) {
    +				max_hole_start = *start;
    +				max_hole_len = *len;
    +			}
    +			break;
    +		}
    +next:
    +		if (*start > end)
    +			break;
    +		*len = end - *start + 1;
    +	}
    +	if (max_hole_len) {
    +		*start = max_hole_start;
    +		*len = max_hole_len;
    +	} else {
    +		*start = end + 1;
    +		*len = 0;
    +	}
    +	return max_hole_len >= min_hole_size;
    +}
    +
     static u64 dev_extent_search_start(struct btrfs_device *device)
     {
     	switch (device->fs_devices->chunk_alloc_policy) {
    @@ -1593,59 +1721,57 @@ static bool dev_extent_hole_check_zoned(struct btrfs_device *device,
     }
     
     /*
    - * Check if specified hole is suitable for allocation.
    + * Validate and adjust a hole for chunk allocation
    + *
    + * @device:      the device containing the candidate hole
    + * @hole_start:  input/output pointer for the hole start position
    + * @hole_size:   input/output pointer for the hole size
    + * @num_bytes:   minimum allocation size required
      *
    - * @device:	the device which we have the hole
    - * @hole_start: starting position of the hole
    - * @hole_size:	the size of the hole
    - * @num_bytes:	the size of the free space that we need
    + * Check if the specified hole is suitable for allocation and adjust it if
    + * necessary. The hole may be modified to skip over pending chunk allocations
    + * and to satisfy stricter zoned requirements on zoned filesystems.
      *
    - * This function may modify @hole_start and @hole_size to reflect the suitable
    - * position for allocation. Returns 1 if hole position is updated, 0 otherwise.
    + * For regular (non-zoned) allocation, if the hole after adjustment is smaller
    + * than @num_bytes, the search continues past additional pending extents until
    + * either a sufficiently large hole is found or no more pending extents exist.
    + *
    + * Return: true if a suitable hole was found and false otherwise.
    + * If the return value is true, then *hole_start and *hole_size are set to
    + * represent the hole we found.
    + * If the return value is false, then *hole_start is set to the largest
    + * hole we found and *hole_size is set to its length.
    + * If there are no holes at all, then *hole_start is set to the end of the range
    + * and *hole_size is set to 0.
      */
     static bool dev_extent_hole_check(struct btrfs_device *device, u64 *hole_start,
     				  u64 *hole_size, u64 num_bytes)
     {
    -	bool changed = false;
    -	u64 hole_end = *hole_start + *hole_size;
    +	bool found = false;
    +	const u64 hole_end = *hole_start + *hole_size - 1;
     
    -	for (;;) {
    -		/*
    -		 * Check before we set max_hole_start, otherwise we could end up
    -		 * sending back this offset anyway.
    -		 */
    -		if (contains_pending_extent(device, hole_start, *hole_size)) {
    -			if (hole_end >= *hole_start)
    -				*hole_size = hole_end - *hole_start;
    -			else
    -				*hole_size = 0;
    -			changed = true;
    -		}
    +	ASSERT(*hole_size > 0);
     
    -		switch (device->fs_devices->chunk_alloc_policy) {
    -		default:
    -			btrfs_warn_unknown_chunk_allocation(device->fs_devices->chunk_alloc_policy);
    -			fallthrough;
    -		case BTRFS_CHUNK_ALLOC_REGULAR:
    -			/* No extra check */
    -			break;
    -		case BTRFS_CHUNK_ALLOC_ZONED:
    -			if (dev_extent_hole_check_zoned(device, hole_start,
    -							hole_size, num_bytes)) {
    -				changed = true;
    -				/*
    -				 * The changed hole can contain pending extent.
    -				 * Loop again to check that.
    -				 */
    -				continue;
    -			}
    -			break;
    -		}
    +again:
    +	*hole_size = hole_end - *hole_start + 1;
    +	found = find_hole_in_pending_extents(device, hole_start, hole_size, num_bytes);
    +	if (!found)
    +		return found;
    +	ASSERT(*hole_size >= num_bytes);
     
    +	switch (device->fs_devices->chunk_alloc_policy) {
    +	default:
    +		btrfs_warn_unknown_chunk_allocation(device->fs_devices->chunk_alloc_policy);
    +		fallthrough;
    +	case BTRFS_CHUNK_ALLOC_REGULAR:
    +		return found;
    +	case BTRFS_CHUNK_ALLOC_ZONED:
    +		if (dev_extent_hole_check_zoned(device, hole_start, hole_size, num_bytes))
    +			goto again;
     		break;
     	}
     
    -	return changed;
    +	return found;
     }
     
     /*
    @@ -1704,7 +1830,7 @@ static int find_free_dev_extent(struct btrfs_device *device, u64 num_bytes,
     		ret = -ENOMEM;
     		goto out;
     	}
    -again:
    +
     	if (search_start >= search_end ||
     		test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state)) {
     		ret = -ENOSPC;
    @@ -1791,11 +1917,7 @@ next:
     	 */
     	if (search_end > search_start) {
     		hole_size = search_end - search_start;
    -		if (dev_extent_hole_check(device, &search_start, &hole_size,
    -					  num_bytes)) {
    -			btrfs_release_path(path);
    -			goto again;
    -		}
    +		dev_extent_hole_check(device, &search_start, &hole_size, num_bytes);
     
     		if (hole_size > max_hole_size) {
     			max_hole_start = search_start;
    @@ -4844,6 +4966,7 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
     	u64 diff;
     	u64 start;
     	u64 free_diff = 0;
    +	u64 pending_start, pending_end;
     
     	new_size = round_down(new_size, fs_info->sectorsize);
     	start = new_size;
    @@ -4889,7 +5012,7 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
     	 * in-memory chunks are synced to disk so that the loop below sees them
     	 * and relocates them accordingly.
     	 */
    -	if (contains_pending_extent(device, &start, diff)) {
    +	if (first_pending_extent(device, start, diff, &pending_start, &pending_end)) {
     		mutex_unlock(&fs_info->chunk_mutex);
     		ret = btrfs_commit_transaction(trans);
     		if (ret)
    -- 
    cgit 1.3-korg
    
    
    
b14c5e04bd0f

btrfs: fix EEXIST abort due to non-consecutive gaps in chunk allocation

1 file changed · +183 61
  • fs/btrfs/volumes.c+183 61 modified
    diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
    index d33780082b8da2..329a922893b4fe 100644
    --- a/fs/btrfs/volumes.c
    +++ b/fs/btrfs/volumes.c
    @@ -1509,30 +1509,158 @@ error_bdev_put:
     }
     
     /*
    - * Try to find a chunk that intersects [start, start + len] range and when one
    - * such is found, record the end of it in *start
    + * Find the first pending extent intersecting a range.
    + *
    + * @device:         the device to search
    + * @start:          start of the range to check
    + * @len:            length of the range to check
    + * @pending_start:  output pointer for the start of the found pending extent
    + * @pending_end:    output pointer for the end of the found pending extent (inclusive)
    + *
    + * Search for a pending chunk allocation that intersects the half-open range
    + * [start, start + len).
    + *
    + * Return: true if a pending extent was found, false otherwise.
    + * If the return value is true, store the first pending extent in
    + * [*pending_start, *pending_end]. Otherwise, the two output variables
    + * may still be modified, to something outside the range and should not
    + * be used.
      */
    -static bool contains_pending_extent(struct btrfs_device *device, u64 *start,
    -				    u64 len)
    +static bool first_pending_extent(struct btrfs_device *device, u64 start, u64 len,
    +				 u64 *pending_start, u64 *pending_end)
     {
    -	u64 physical_start, physical_end;
    -
     	lockdep_assert_held(&device->fs_info->chunk_mutex);
     
    -	if (btrfs_find_first_extent_bit(&device->alloc_state, *start,
    -					&physical_start, &physical_end,
    +	if (btrfs_find_first_extent_bit(&device->alloc_state, start,
    +					pending_start, pending_end,
     					CHUNK_ALLOCATED, NULL)) {
     
    -		if (in_range(physical_start, *start, len) ||
    -		    in_range(*start, physical_start,
    -			     physical_end + 1 - physical_start)) {
    -			*start = physical_end + 1;
    +		if (in_range(*pending_start, start, len) ||
    +		    in_range(start, *pending_start, *pending_end + 1 - *pending_start)) {
     			return true;
     		}
     	}
     	return false;
     }
     
    +/*
    + * Find the first real hole accounting for pending extents.
    + *
    + * @device:         the device containing the candidate hole
    + * @start:          input/output pointer for the hole start position
    + * @len:            input/output pointer for the hole length
    + * @min_hole_size:  the size of hole we are looking for
    + *
    + * Given a potential hole specified by [*start, *start + *len), check for pending
    + * chunk allocations within that range. If pending extents are found, the hole is
    + * adjusted to represent the first true free space that is large enough when
    + * accounting for pending chunks.
    + *
    + * Note that this function must handle various cases involving non consecutive
    + * pending extents.
    + *
    + * Returns: true if a suitable hole was found and false otherwise.
    + * If the return value is true, then *start and *len are set to represent the hole.
    + * If the return value is false, then *start is set to the largest hole we
    + * found and *len is set to its length.
    + * If there are no holes at all, then *start is set to the end of the range and
    + * *len is set to 0.
    + */
    +static bool find_hole_in_pending_extents(struct btrfs_device *device, u64 *start,
    +					 u64 *len, u64 min_hole_size)
    +{
    +	u64 pending_start, pending_end;
    +	u64 end;
    +	u64 max_hole_start = 0;
    +	u64 max_hole_len = 0;
    +
    +	lockdep_assert_held(&device->fs_info->chunk_mutex);
    +
    +	if (*len == 0)
    +		return false;
    +
    +	end = *start + *len - 1;
    +
    +	/*
    +	 * Loop until we either see a large enough hole or check every pending
    +	 * extent overlapping the candidate hole.
    +	 * At every hole that we observe, record it if it is the new max.
    +	 * At the end of the iteration, set the output variables to the max hole.
    +	 */
    +	while (true) {
    +		if (first_pending_extent(device, *start, *len, &pending_start, &pending_end)) {
    +			/*
    +			 * Case 1: the pending extent overlaps the start of
    +			 * candidate hole. That means the true hole is after the
    +			 * pending extent, but we need to find the next pending
    +			 * extent to properly size the hole. In the next loop,
    +			 * we will reduce to case 2 or 3.
    +			 * e.g.,
    +			 *
    +			 *   |----pending A----|    real hole     |----pending B----|
    +			 *            |           candidate hole        |
    +			 *         *start                              end
    +			 */
    +			if (pending_start <= *start) {
    +				*start = pending_end + 1;
    +				goto next;
    +			}
    +			/*
    +			 * Case 2: The pending extent starts after *start (and overlaps
    +			 * [*start, end), so the first hole just goes up to the start
    +			 * of the pending extent.
    +			 * e.g.,
    +			 *
    +			 *   |    real hole    |----pending A----|
    +			 *   |       candidate hole     |
    +			 * *start                      end
    +			 */
    +			*len = pending_start - *start;
    +			if (*len > max_hole_len) {
    +				max_hole_start = *start;
    +				max_hole_len = *len;
    +			}
    +			if (*len >= min_hole_size)
    +				break;
    +			/*
    +			 * If the hole wasn't big enough, then we advance past
    +			 * the pending extent and keep looking.
    +			 */
    +			*start = pending_end + 1;
    +			goto next;
    +		} else {
    +			/*
    +			 * Case 3: There is no pending extent overlapping the
    +			 * range [*start, *start + *len - 1], so the only remaining
    +			 * hole is the remaining range.
    +			 * e.g.,
    +			 *
    +			 *   |       candidate hole           |
    +			 *   |          real hole             |
    +			 * *start                            end
    +			 */
    +
    +			if (*len > max_hole_len) {
    +				max_hole_start = *start;
    +				max_hole_len = *len;
    +			}
    +			break;
    +		}
    +next:
    +		if (*start > end)
    +			break;
    +		*len = end - *start + 1;
    +	}
    +	if (max_hole_len) {
    +		*start = max_hole_start;
    +		*len = max_hole_len;
    +	} else {
    +		*start = end + 1;
    +		*len = 0;
    +	}
    +	return max_hole_len >= min_hole_size;
    +}
    +
     static u64 dev_extent_search_start(struct btrfs_device *device)
     {
     	switch (device->fs_devices->chunk_alloc_policy) {
    @@ -1597,59 +1725,57 @@ static bool dev_extent_hole_check_zoned(struct btrfs_device *device,
     }
     
     /*
    - * Check if specified hole is suitable for allocation.
    + * Validate and adjust a hole for chunk allocation
    + *
    + * @device:      the device containing the candidate hole
    + * @hole_start:  input/output pointer for the hole start position
    + * @hole_size:   input/output pointer for the hole size
    + * @num_bytes:   minimum allocation size required
      *
    - * @device:	the device which we have the hole
    - * @hole_start: starting position of the hole
    - * @hole_size:	the size of the hole
    - * @num_bytes:	the size of the free space that we need
    + * Check if the specified hole is suitable for allocation and adjust it if
    + * necessary. The hole may be modified to skip over pending chunk allocations
    + * and to satisfy stricter zoned requirements on zoned filesystems.
      *
    - * This function may modify @hole_start and @hole_size to reflect the suitable
    - * position for allocation. Returns 1 if hole position is updated, 0 otherwise.
    + * For regular (non-zoned) allocation, if the hole after adjustment is smaller
    + * than @num_bytes, the search continues past additional pending extents until
    + * either a sufficiently large hole is found or no more pending extents exist.
    + *
    + * Return: true if a suitable hole was found and false otherwise.
    + * If the return value is true, then *hole_start and *hole_size are set to
    + * represent the hole we found.
    + * If the return value is false, then *hole_start is set to the largest
    + * hole we found and *hole_size is set to its length.
    + * If there are no holes at all, then *hole_start is set to the end of the range
    + * and *hole_size is set to 0.
      */
     static bool dev_extent_hole_check(struct btrfs_device *device, u64 *hole_start,
     				  u64 *hole_size, u64 num_bytes)
     {
    -	bool changed = false;
    -	u64 hole_end = *hole_start + *hole_size;
    +	bool found = false;
    +	const u64 hole_end = *hole_start + *hole_size - 1;
     
    -	for (;;) {
    -		/*
    -		 * Check before we set max_hole_start, otherwise we could end up
    -		 * sending back this offset anyway.
    -		 */
    -		if (contains_pending_extent(device, hole_start, *hole_size)) {
    -			if (hole_end >= *hole_start)
    -				*hole_size = hole_end - *hole_start;
    -			else
    -				*hole_size = 0;
    -			changed = true;
    -		}
    +	ASSERT(*hole_size > 0);
     
    -		switch (device->fs_devices->chunk_alloc_policy) {
    -		default:
    -			btrfs_warn_unknown_chunk_allocation(device->fs_devices->chunk_alloc_policy);
    -			fallthrough;
    -		case BTRFS_CHUNK_ALLOC_REGULAR:
    -			/* No extra check */
    -			break;
    -		case BTRFS_CHUNK_ALLOC_ZONED:
    -			if (dev_extent_hole_check_zoned(device, hole_start,
    -							hole_size, num_bytes)) {
    -				changed = true;
    -				/*
    -				 * The changed hole can contain pending extent.
    -				 * Loop again to check that.
    -				 */
    -				continue;
    -			}
    -			break;
    -		}
    +again:
    +	*hole_size = hole_end - *hole_start + 1;
    +	found = find_hole_in_pending_extents(device, hole_start, hole_size, num_bytes);
    +	if (!found)
    +		return found;
    +	ASSERT(*hole_size >= num_bytes);
     
    +	switch (device->fs_devices->chunk_alloc_policy) {
    +	default:
    +		btrfs_warn_unknown_chunk_allocation(device->fs_devices->chunk_alloc_policy);
    +		fallthrough;
    +	case BTRFS_CHUNK_ALLOC_REGULAR:
    +		return found;
    +	case BTRFS_CHUNK_ALLOC_ZONED:
    +		if (dev_extent_hole_check_zoned(device, hole_start, hole_size, num_bytes))
    +			goto again;
     		break;
     	}
     
    -	return changed;
    +	return found;
     }
     
     /*
    @@ -1708,7 +1834,7 @@ static int find_free_dev_extent(struct btrfs_device *device, u64 num_bytes,
     		ret = -ENOMEM;
     		goto out;
     	}
    -again:
    +
     	if (search_start >= search_end ||
     		test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state)) {
     		ret = -ENOSPC;
    @@ -1795,11 +1921,7 @@ next:
     	 */
     	if (search_end > search_start) {
     		hole_size = search_end - search_start;
    -		if (dev_extent_hole_check(device, &search_start, &hole_size,
    -					  num_bytes)) {
    -			btrfs_release_path(path);
    -			goto again;
    -		}
    +		dev_extent_hole_check(device, &search_start, &hole_size, num_bytes);
     
     		if (hole_size > max_hole_size) {
     			max_hole_start = search_start;
    @@ -5022,6 +5144,7 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
     	u64 diff;
     	u64 start;
     	u64 free_diff = 0;
    +	u64 pending_start, pending_end;
     
     	new_size = round_down(new_size, fs_info->sectorsize);
     	start = new_size;
    @@ -5067,7 +5190,7 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size)
     	 * in-memory chunks are synced to disk so that the loop below sees them
     	 * and relocates them accordingly.
     	 */
    -	if (contains_pending_extent(device, &start, diff)) {
    +	if (first_pending_extent(device, start, diff, &pending_start, &pending_end)) {
     		mutex_unlock(&fs_info->chunk_mutex);
     		ret = btrfs_commit_transaction(trans);
     		if (ret)
    -- 
    cgit 1.3-korg
    
    
    

Vulnerability mechanics

Root cause

"Incorrect hole-scanning logic in `contains_pending_extent()` only skipped the first pending extent, allowing overlapping chunk allocations when multiple non-consecutive pending extents existed in the candidate free range."

Attack vector

An attacker with local access or the ability to trigger filesystem operations that cause chunk allocation (e.g., writing data to a btrfs filesystem under space pressure, or triggering forced chunk allocation via sysfs) can hit this bug. When the chunk allocator searches for a free device extent, the old `contains_pending_extent()` function only skipped the first pending extent and returned the remaining range as a hole. If multiple non-consecutive pending extents existed in that range, the returned "hole" could still overlap a second pending extent. The allocator then assigns overlapping physical space to a new chunk, causing an EEXIST (-17) abort at `insert_dev_extents()` in `btrfs_create_pending_block_groups()` [patch_id=2661217]. The commit message notes this can happen with any DUP chunk allocation, though it was observed via forced chunk allocation behind `CONFIG_BTRFS_EXPERIMENTAL`.

Affected code

The bug is in `fs/btrfs/volumes.c` in the functions `contains_pending_extent()` and `dev_extent_hole_check()`. The old `contains_pending_extent()` only skipped past the first pending extent it found, setting `*start = physical_end + 1`, but did not loop to check for additional non-consecutive pending extents within the candidate hole [patch_id=2661217]. The old `dev_extent_hole_check()` also had a flawed loop that could return a hole that still overlapped a pending extent.

What the fix does

The patch replaces `contains_pending_extent()` with two new functions: `first_pending_extent()` (which finds the first pending extent in a range without modifying the caller's start) and `find_hole_in_pending_extents()` (which iterates through all pending extents in the candidate hole, tracking the largest true free gap, and returns only when a hole large enough for the allocation is found or all pending extents have been accounted for) [patch_id=2661217]. The `dev_extent_hole_check()` function is simplified to call `find_hole_in_pending_extents()` once and then handle zoned rechecks, rather than using a fragile loop. The `again` label in `find_free_dev_extent()` is removed because the new logic handles all pending-extent skipping in a single pass. Additionally, `btrfs_shrink_device()` is updated to use the new `first_pending_extent()` API which no longer mutates its start argument [patch_id=2661217].

Preconditions

  • inputThe attacker must be able to trigger btrfs chunk allocation (e.g., by writing data to a btrfs filesystem under space pressure or via forced chunk allocation through sysfs)
  • configThe filesystem must have non-consecutive pending chunk allocations on the same device (a natural consequence of concurrent or sequential allocation operations)

Generated on May 27, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.

References

3

News mentions

0

No linked articles in our index yet.