VYPR
Unrated severityNVD Advisory· Published Jan 5, 2026· Updated Apr 15, 2026

CVE-2025-68756

CVE-2025-68756

Description

In the Linux kernel, the following vulnerability has been resolved:

block: Use RCU in blk_mq_[un]quiesce_tagset() instead of set->tag_list_lock

blk_mq_{add,del}_queue_tag_set() functions add and remove queues from tagset, the functions make sure that tagset and queues are marked as shared when two or more queues are attached to the same tagset. Initially a tagset starts as unshared and when the number of added queues reaches two, blk_mq_add_queue_tag_set() marks it as shared along with all the queues attached to it. When the number of attached queues drops to 1 blk_mq_del_queue_tag_set() need to mark both the tagset and the remaining queues as unshared.

Both functions need to freeze current queues in tagset before setting on unsetting BLK_MQ_F_TAG_QUEUE_SHARED flag. While doing so, both functions hold set->tag_list_lock mutex, which makes sense as we do not want queues to be added or deleted in the process. This used to work fine until commit 98d81f0df70c ("nvme: use blk_mq_[un]quiesce_tagset") made the nvme driver quiesce tagset instead of quiscing individual queues. blk_mq_quiesce_tagset() does the job and quiesce the queues in set->tag_list while holding set->tag_list_lock also.

This results in deadlock between two threads with these stacktraces:

__schedule+0x47c/0xbb0 ? timerqueue_add+0x66/0xb0 schedule+0x1c/0xa0 schedule_preempt_disabled+0xa/0x10 __mutex_lock.constprop.0+0x271/0x600 blk_mq_quiesce_tagset+0x25/0xc0 nvme_dev_disable+0x9c/0x250 nvme_timeout+0x1fc/0x520 blk_mq_handle_expired+0x5c/0x90 bt_iter+0x7e/0x90 blk_mq_queue_tag_busy_iter+0x27e/0x550 ? __blk_mq_complete_request_remote+0x10/0x10 ? __blk_mq_complete_request_remote+0x10/0x10 ? __call_rcu_common.constprop.0+0x1c0/0x210 blk_mq_timeout_work+0x12d/0x170 process_one_work+0x12e/0x2d0 worker_thread+0x288/0x3a0 ? rescuer_thread+0x480/0x480 kthread+0xb8/0xe0 ? kthread_park+0x80/0x80 ret_from_fork+0x2d/0x50 ? kthread_park+0x80/0x80 ret_from_fork_asm+0x11/0x20

__schedule+0x47c/0xbb0 ? xas_find+0x161/0x1a0 schedule+0x1c/0xa0 blk_mq_freeze_queue_wait+0x3d/0x70 ? destroy_sched_domains_rcu+0x30/0x30 blk_mq_update_tag_set_shared+0x44/0x80 blk_mq_exit_queue+0x141/0x150 del_gendisk+0x25a/0x2d0 nvme_ns_remove+0xc9/0x170 nvme_remove_namespaces+0xc7/0x100 nvme_remove+0x62/0x150 pci_device_remove+0x23/0x60 device_release_driver_internal+0x159/0x200 unbind_store+0x99/0xa0 kernfs_fop_write_iter+0x112/0x1e0 vfs_write+0x2b1/0x3d0 ksys_write+0x4e/0xb0 do_syscall_64+0x5b/0x160 entry_SYSCALL_64_after_hwframe+0x4b/0x53

The top stacktrace is showing nvme_timeout() called to handle nvme command timeout. timeout handler is trying to disable the controller and as a first step, it needs to blk_mq_quiesce_tagset() to tell blk-mq not to call queue callback handlers. The thread is stuck waiting for set->tag_list_lock as it tries to walk the queues in set->tag_list.

The lock is held by the second thread in the bottom stack which is waiting for one of queues to be frozen. The queue usage counter will drop to zero after nvme_timeout() finishes, and this will not happen because the thread will wait for this mutex forever.

Given that [un]quiescing queue is an operation that does not need to sleep, update blk_mq_[un]quiesce_tagset() to use RCU instead of taking set->tag_list_lock, update blk_mq_{add,del}_queue_tag_set() to use RCU safe list operations. Also, delete INIT_LIST_HEAD(&q->tag_set_list) in blk_mq_del_queue_tag_set() because we can not re-initialize it while the list is being traversed under RCU. The deleted queue will not be added/deleted to/from a tagset and it will be freed in blk_free_queue() after the end of RCU grace period.

AI Insight

LLM-synthesized narrative grounded in this CVE's description and references.

A lock ordering deadlock occurs in the Linux kernel's block layer when blk_mq_quiesce_tagset() holds set->tag_list_lock while a timeout handler tries to acquire the same lock, fixed by switching to RCU.

Vulnerability

Description

In the Linux kernel's block multi-queue (blk-mq) subsystem, a deadlock can occur during tag set quiescing operations. The functions blk_mq_add_queue_tag_set() and blk_mq_del_queue_tag_set() hold the set->tag_list_lock mutex while freezing queues to toggle the BLK_MQ_F_TAG_QUEUE_SHARED flag. Normally, this is safe because no other path tries to acquire that lock while queues are frozen. However, commit 98d81f0df70c introduced blk_mq_quiesce_tagset() which also holds set->tag_list_lock to quiesce all queues in the set. When the nvme driver uses this new function (e.g., in nvme_dev_disable()), a deadlock arises: one thread holds the lock and freezes queues, while a timeout handler running on another core tries to acquire the same lock via blk_mq_quiesce_tagset() when it calls into the block layer to handle an expired request.

Attack

Surface & Exploitation

The deadlock is triggered during normal operation when an NVMe timeout occurs while the tag set is being modified (adding/removing a queue). The kernel stack traces show the two conflicting paths: nvme_timeout -> blk_mq_handle_expired -> bt_iter -> blk_mq_queue_tag_busy_iter -> blk_mq_timeout_work and blk_mq_add_queue_tag_set / blk_mq_del_queue_tag_set -> blk_mq_freeze_queue (which calls blk_mq_quiesce_tagset). No special privileges are required beyond local access; the bug is reachable by standard I/O patterns that cause timeouts. An attacker with local user access on a system using the nvme driver could potentially induce the deadlock, leading to a denial of service (system hang or kernel panic) [1][2].

Impact

Successful exploitation results in a kernel deadlock (soft lockup or hang). The system becomes unresponsive; ongoing I/O operations freeze, and no further work items on the affected CPU(s) can make progress. This is a denial-of-service vulnerability with high availability impact. No privilege escalation or data corruption is described. The CVSS score of 5.5 (Medium) reflects the local attack vector and the requirement for specific system conditions (NVMe timeout during tag set modification).

Mitigation

The fix replaces the mutex-based lock protecting the tag set list with a Read-Copy-Update (RCU) mechanism. RCU allows concurrent readers (such as blk_mq_quiesce_tagset()) to proceed without blocking, eliminating the circular wait. The patches are merged into the upstream Linux kernel stable branches; users should update to kernels containing commit 59e25ef2b413 or 3baeec23a82e (or later) [1][2]. No workaround other than applying the kernel update is available. This vulnerability is not listed on CISA's Known Exploited Vulnerabilities catalog per the provided information.

AI Insight generated on May 19, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.

Affected products

1

Patches

0

No patches discovered yet.

Vulnerability mechanics

AI mechanics synthesis has not run for this CVE yet.

References

5

News mentions

0

No linked articles in our index yet.