xgrammar vulnerable to denial of service by huge enum grammar
Description
xgrammar is an open-source library for efficient, flexible, and portable structured generation. A grammar optimizer introduced in 0.1.23 processes large grammars (>100k characters) at very low rates, and can be used for DOS of model providers. This issue is fixed in version 0.1.24.
AI Insight
LLM-synthesized narrative grounded in this CVE's description and references.
xgrammar 0.1.23's grammar optimizer is too slow on large grammars (>100k chars), enabling denial-of-service attacks against model providers; fixed in 0.1.24.
Vulnerability
Overview
CVE-2025-58446 describes a denial-of-service (DoS) vulnerability in the xgrammar library, an open-source tool for efficient structured generation. The issue was introduced in version 0.1.23 with a new grammar optimizer that processes large grammars (over 100,000 characters) at very low rates [1][4]. This optimizer, part of the Earley parser, can take minutes to process a grammar that would otherwise fit within a model's context window, causing significant delays [4].
Exploitation
An attacker can craft a large grammar, such as one containing a huge enum with thousands of string options, and submit it to trigger the slow processing [4]. The attack requires no special privileges; the attacker simply submits the oversized grammar to a model provider using xgrammar 0.1.23. The grammar parsing itself becomes significantly longer than the LLM processing time, effectively blocking or delaying service for other users [4].
Impact
Successful exploitation allows an attacker to cause a denial of service against model providers that rely on xgrammar for structured generation. The slow processing of the grammar optimizer can tie up server resources, making the service unavailable or severely degraded for legitimate users [1][4].
Mitigation
The vulnerability is fixed in xgrammar version 0.1.24, which optimizes the speed of the grammar optimizer and disables some slow optimizations for large grammars [1][4]. Users should upgrade to version 0.1.24 or later. The fix is implemented in commit ced69c3ad2f8f61b516cc278a342e7c644383e27 [2]. No workarounds are mentioned for versions prior to the fix.
AI Insight generated on May 19, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.
Affected packages
Versions sourced from the GitHub Security Advisory.
| Package | Affected versions | Patched versions |
|---|---|---|
xgrammarPyPI | >= 0.1.23, < 0.1.24 | 0.1.24 |
Affected products
2Patches
1ced69c3ad2f8[Feature] Add a new expression to represent repetition to speed up. (#368)
15 files changed · +382 −148
cpp/earley_parser.cc+90 −26 modified@@ -5,6 +5,7 @@ #include "earley_parser.h" +#include <algorithm> #include <cassert> #include <cctype> #include <cstdint> @@ -55,18 +56,55 @@ void EarleyParser::Complete(const ParserState& state, const GrammarExpr& grammar const auto& parent_state = parent_state_iter->second; const auto& parent_expr = grammar_->GetGrammarExpr(parent_state.sequence_id); if (parent_state.rule_id == -1 || !grammar_->per_rule_fsms[parent_state.rule_id].has_value()) { + const auto& element_expr = grammar_->GetGrammarExpr(parent_expr[parent_state.element_id]); // The new rule is not referenced by a fsm. XGRAMMAR_DCHECK( - grammar_->GetGrammarExpr(parent_expr[parent_state.element_id]).type == - GrammarExprType::kRuleRef + element_expr.type == GrammarExprType::kRuleRef || + element_expr.type == GrammarExprType::kRepeat ); - Enqueue(ParserState{ - parent_state.rule_id, - parent_state.sequence_id, - parent_state.element_id + 1, - parent_state.rule_start_pos, - 0 - }); + if (element_expr.type == GrammarExprType::kRuleRef) { + Enqueue(ParserState{ + parent_state.rule_id, + parent_state.sequence_id, + parent_state.element_id + 1, + parent_state.rule_start_pos, + 0 + }); + continue; + } + XGRAMMAR_DCHECK(element_expr.type == GrammarExprType::kRepeat); + if (state.rule_start_pos == + static_cast<int32_t>(rule_id_to_completeable_states_.size() - 1) && + std::binary_search( + grammar_->allow_empty_rule_ids.begin(), + grammar_->allow_empty_rule_ids.end(), + element_expr[0] + )) { + // It means that the subrule of the repeat is empty, and we have already detected it. + // We shouldn't add it into the queue. + continue; + } + // The parent state is a repeat, we need to increase the repeat count. + auto new_state = parent_state; + const int32_t& min_repeat_count = element_expr[1]; + const int32_t& max_repeat_count = element_expr[2]; + new_state.repeat_count++; + // The repeat rule can be completed, and we advance the state. Don't forget to + // reset the repeat count. + if (new_state.repeat_count >= min_repeat_count) { + Enqueue(ParserState{ + parent_state.rule_id, + parent_state.sequence_id, + parent_state.element_id + 1, + parent_state.rule_start_pos, + 0 + }); + } + // If the repeat count is less than the max repeat count, we can continue to + // visit the repeat state for another round. + if (new_state.repeat_count < max_repeat_count) { + Enqueue(new_state); + } continue; } // If the rule is referenced by a fsm, we need to advance the fsm. @@ -105,16 +143,37 @@ std::pair</* scanable */ bool, /* completable */ bool> EarleyParser::Predict( return std::make_pair(false, true); } const auto& element_expr = grammar_->GetGrammarExpr(grammar_expr[state.element_id]); - if (element_expr.type == GrammarExprType::kRuleRef) { - ExpandNextRuleRefElement(state, grammar_expr, &element_expr); - return std::make_pair(false, false); - } - if (element_expr.type == GrammarExprType::kCharacterClassStar && state.sub_element_id == 0) { - Enqueue( - ParserState{state.rule_id, state.sequence_id, state.element_id + 1, state.rule_start_pos, 0} - ); + switch (element_expr.type) { + case GrammarExprType::kRuleRef: { + ExpandNextRuleRefElement(state, grammar_expr, &element_expr); + return std::make_pair(false, false); + } + case GrammarExprType::kCharacterClassStar: { + if (state.sub_element_id == 0) { + Enqueue(ParserState{ + state.rule_id, state.sequence_id, state.element_id + 1, state.rule_start_pos, 0 + }); + } + return std::make_pair(true, false); + } + case GrammarExprType::kRepeat: { + const int32_t& min_repeat_count = element_expr[1]; + const int32_t& max_repeat_count = element_expr[2]; + // If the current repeat count is less than the max repeat count, + // we can expand the next rule reference element. + XGRAMMAR_DCHECK(state.repeat_count <= max_repeat_count); + ExpandNextRuleRefElement(state, grammar_expr, &element_expr); + if (state.repeat_count >= min_repeat_count) { + Enqueue(ParserState{ + state.rule_id, state.sequence_id, state.element_id + 1, state.rule_start_pos, 0 + }); + } + return std::make_pair(false, false); + } + default: { + return std::make_pair(true, false); + } } - return std::make_pair(true, false); } void EarleyParser::Scan(const ParserState& state, const uint8_t ch) { @@ -165,7 +224,6 @@ bool EarleyParser::Advance(const uint8_t ch) { tmp_states_to_be_added_.clear(); tmp_accept_stop_token_ = false; const auto& latest_states = scanable_state_history_[scanable_state_history_.size() - 1]; - // Scan all the scanable states. for (const auto& state : latest_states) { Scan(state, ch); @@ -322,14 +380,18 @@ void EarleyParser::ExpandNextRuleRefElement( } } else { XGRAMMAR_DCHECK(grammar_expr.type == GrammarExprType::kSequence); - XGRAMMAR_DCHECK(sub_grammar_expr->type == GrammarExprType::kRuleRef); + XGRAMMAR_DCHECK( + sub_grammar_expr->type == GrammarExprType::kRuleRef || + sub_grammar_expr->type == GrammarExprType::kRepeat + ); ref_rule_ids.push_back((*sub_grammar_expr)[0]); } for (const auto& ref_rule_id : ref_rule_ids) { { // Add the reference rule to map. if ((state.element_id != grammar_expr.size() - 1) || state.rule_start_pos == ParserState::kNoPrevInputPos || - (state.rule_id != -1 && grammar_->per_rule_fsms[state.rule_id].has_value())) { + (state.rule_id != -1 && grammar_->per_rule_fsms[state.rule_id].has_value()) || + sub_grammar_expr->type == GrammarExprType::kRepeat) { // It's not the right recursion, or it's the root rule. auto& states_map = rule_id_to_completeable_states_.back(); states_map.insert({ref_rule_id, state}); @@ -374,11 +436,11 @@ void EarleyParser::ExpandNextRuleRefElement( // Check if the reference rule is already visited. if (IsStateVisitedInQueue({ref_rule_id, -1, -1, -1, -1})) { - if (std::find( + if (std::binary_search( grammar_->allow_empty_rule_ids.begin(), grammar_->allow_empty_rule_ids.end(), ref_rule_id - ) != grammar_->allow_empty_rule_ids.end()) { + )) { if (state.rule_id != -1 && grammar_->per_rule_fsms[state.rule_id].has_value()) { const auto& current_fsm = grammar_->per_rule_fsms[state.rule_id].value(); for (const auto& edge : current_fsm->GetEdges(state.element_id)) { @@ -391,9 +453,11 @@ void EarleyParser::ExpandNextRuleRefElement( continue; } XGRAMMAR_DCHECK(grammar_expr.type == GrammarExprType::kSequence); - Enqueue(ParserState{ - state.rule_id, state.sequence_id, state.element_id + 1, state.rule_start_pos, 0 - }); + if (sub_grammar_expr->type == GrammarExprType::kRuleRef) { + Enqueue(ParserState{ + state.rule_id, state.sequence_id, state.element_id + 1, state.rule_start_pos, 0 + }); + } } continue; }
cpp/earley_parser.h+19 −15 modified@@ -43,22 +43,19 @@ struct ParserState { constexpr ParserState() = default; constexpr ParserState( - int32_t rule_id, - int32_t sequence_id, - int32_t element_id, - int32_t rule_start_pos, - int32_t sub_element_id + const int32_t& rule_id, + const int32_t& sequence_id, + const int32_t& element_id, + const int32_t& rule_start_pos, + const int32_t& sub_element_id, + const int32_t& repeat_count = 0 ) : rule_id(rule_id), sequence_id(sequence_id), element_id(element_id), rule_start_pos(rule_start_pos), - sub_element_id(sub_element_id) {} - - constexpr ParserState(const ParserState&) = default; - constexpr ParserState(ParserState&&) = default; - ParserState& operator=(const ParserState&) = default; - ParserState& operator=(ParserState&&) = default; + sub_element_id(sub_element_id), + repeat_count(repeat_count) {} /*! * \brief A sequence_id value of kUnexpandedRuleStartSequenceId means a rule hasn't been @@ -92,6 +89,9 @@ struct ParserState { /*! \brief The id of the sub element in the current selement of the sequence. */ int32_t sub_element_id = 0; + /*! \brief The number of times the element is repeated. It will be used in kRepeat.*/ + int32_t repeat_count = 0; + /*! \brief The element is invalid when sequence_id is -1. */ bool IsInvalid() const { return sequence_id == -1; } @@ -107,13 +107,15 @@ struct ParserState { if (sequence_id != other.sequence_id) return sequence_id < other.sequence_id; if (element_id != other.element_id) return element_id < other.element_id; if (rule_start_pos != other.rule_start_pos) return rule_start_pos < other.rule_start_pos; - return sub_element_id < other.sub_element_id; + if (sub_element_id != other.sub_element_id) return sub_element_id < other.sub_element_id; + return repeat_count < other.repeat_count; } friend std::ostream& operator<<(std::ostream& os, const ParserState& state) { os << "ParserState(rule_id=" << state.rule_id << ", sequence_id=" << state.sequence_id << ", element_id=" << state.element_id << ", rule_start_pos=" << state.rule_start_pos - << ", sub_element_id=" << state.sub_element_id << ")"; + << ", sub_element_id=" << state.sub_element_id << ", repeat_count=" << state.repeat_count + << ")"; return os; } @@ -132,7 +134,8 @@ XGRAMMAR_MEMBER_ARRAY( &ParserState::sequence_id, &ParserState::element_id, &ParserState::rule_start_pos, - &ParserState::sub_element_id + &ParserState::sub_element_id, + &ParserState::repeat_count ); /*! @@ -170,7 +173,8 @@ class StateHashForParsing { state.sequence_id, state.element_id, state.rule_start_pos, - state.sub_element_id + state.sub_element_id, + state.repeat_count ); } };
cpp/grammar_builder.h+6 −0 modified@@ -199,6 +199,12 @@ class GrammarBuilder { ); } + int32_t AddRepeat(const int32_t ref_rule_id, int32_t min_repeat_count, int32_t max_repeat_count) { + std::vector<int32_t> data({ref_rule_id, min_repeat_count, max_repeat_count}); + return AddGrammarExpr({GrammarExprType::kRepeat, data.data(), static_cast<int32_t>(data.size())} + ); + } + /*! \brief Get the number of grammar_exprs. */ int32_t NumGrammarExprs() const { return grammar_->NumGrammarExprs(); }
cpp/grammar_compiler.cc+40 −21 modified@@ -17,6 +17,8 @@ #include "compiled_grammar_impl.h" #include "earley_parser.h" #include "fsm.h" +#include "fsm_builder.h" +#include "grammar_builder.h" #include "grammar_functor.h" #include "grammar_impl.h" #include "support/logging.h" @@ -83,7 +85,7 @@ class GrammarMatcherForTokenMaskCache : public EarleyParser { private: /*! \brief Check if a token can pass the lookahead assertion. */ - bool IsTokenPassLookaheadAssertion( + std::pair</*acceptable*/ bool, /*can reach end*/ bool> IsTokenPassLookaheadAssertion( const std::string& token, const std::vector<bool>& can_reach_end_stack ); @@ -107,12 +109,12 @@ class GrammarMatcherForTokenMaskCache : public EarleyParser { std::vector<bool> tmp_can_reach_end_prefix_or_stack_; }; -bool GrammarMatcherForTokenMaskCache::IsTokenPassLookaheadAssertion( +std::pair<bool, bool> GrammarMatcherForTokenMaskCache::IsTokenPassLookaheadAssertion( const std::string& token, const std::vector<bool>& can_reach_end_stack ) { auto lookahead_assertion_id = grammar_->GetRule(init_rule_id).lookahead_assertion_id; if (lookahead_assertion_id == -1) { - return true; + return {true, true}; } auto lookahead_state = ParserState(/*rule_id*/ -1, lookahead_assertion_id, 0, ParserState::kNoPrevInputPos, 0); @@ -136,20 +138,20 @@ bool GrammarMatcherForTokenMaskCache::IsTokenPassLookaheadAssertion( // accepted chars: pos - i + 1 // we need to rollback the pushed initial state as well PopLastStates(pos - i + 2); - return true; + return {true, true}; } } // Case 2. The whole token is accepted if (last_accept_pos == token_len - 1) { PopLastStates(last_accept_pos - i + 2); - return true; + return {true, false}; } // Case 3. The token is not accepted. Check the next position. PopLastStates(last_accept_pos - i + 1); } PopLastStates(1); - return false; + return {false, false}; } // Comparator for std::pair<int32_t, std::string> based on the string value. @@ -359,18 +361,34 @@ bool GrammarMatcherForTokenMaskCache::GetTokenMaskWithFirstCharacterCheck( if (accepted) { tmp_accepted_indices_.push_back(i); - } else if (can_reach_end && !is_root_rule && - IsTokenPassLookaheadAssertion(token, tmp_can_reach_end_stack_) && - prev_matched_size > 0) { - // 1. If the current rule is the root rule (is_root_rule=true), there are no - // uncertain tokens. Not accepted tokens are just rejected. - // 2. If a token cannot pass the lookahead assertion, it is rejected. - tmp_uncertain_indices_.push_back(i); } else { - tmp_rejected_indices_.push_back(i); - last_rejected_range = subtree_nodes_range[i]; - fill_reject_indices = - tmp_rejected_indices_.size() < AdaptiveTokenMask::USE_BITSET_THRESHOLD; + auto lookahead_result_pair = IsTokenPassLookaheadAssertion(token, tmp_can_reach_end_stack_); + if (can_reach_end && !is_root_rule && lookahead_result_pair.first && + prev_matched_size > 0) { + // 1. If the current rule is the root rule (is_root_rule=true), there are no + // uncertain tokens. Not accepted tokens are just rejected. + // 2. If a token cannot pass the lookahead assertion, it is rejected. + if ((!lookahead_result_pair.second) && + (std::binary_search( + grammar_->exact_lookahead.begin(), grammar_->exact_lookahead.end(), init_rule_id + ))) { + tmp_accepted_indices_.push_back(i); + } else { + tmp_uncertain_indices_.push_back(i); + // On the subtree, they are all uncertain tokens. + if (lookahead_result_pair.second) { + for (int j = i + 1; j < subtree_nodes_range[i]; ++j) { + tmp_uncertain_indices_.push_back(j); + } + i = subtree_nodes_range[i] - 1; // Skip the subtree nodes. + } + } + } else { + tmp_rejected_indices_.push_back(i); + last_rejected_range = subtree_nodes_range[i]; + fill_reject_indices = + tmp_rejected_indices_.size() < AdaptiveTokenMask::USE_BITSET_THRESHOLD; + } } } if (interval_idx != possible_intervals.size() - 1 && fill_reject_indices) { @@ -573,7 +591,10 @@ CompiledGrammar GrammarCompiler::Impl::MultiThreadCompileGrammar(Grammar grammar // Step 1. Compute the ids of rules that can be empty compiled_grammar_impl->grammar->allow_empty_rule_ids = AllowEmptyRuleAnalyzer::Apply(grammar); - // Step 2. Build the fsm for each rule + // Step 2. Normalize the repeat expressions in the grammar. + RepetitionNormalizer::Apply(&compiled_grammar_impl->grammar); + + // Step 3. Build the fsm for each rule GrammarFSMBuilder::Apply(&compiled_grammar_impl->grammar); if (tokenizer_info_.GetVocabSize() == 0) { @@ -630,7 +651,6 @@ CompiledGrammar GrammarCompiler::Impl::MultiThreadCompileGrammar(Grammar grammar auto rule = grammar->GetRule(rule_id); auto rule_body = grammar->GetGrammarExpr(rule.body_expr_id); const auto& rule_fsm = grammar->per_rule_fsms[rule_id]; - if (rule_fsm.has_value()) { auto cur_stack_element = ParserState(rule_id, rule.body_expr_id, 0, ParserState::kNoPrevInputPos, 0); @@ -642,7 +662,6 @@ CompiledGrammar GrammarCompiler::Impl::MultiThreadCompileGrammar(Grammar grammar } continue; } - XGRAMMAR_DCHECK(rule_body.type == GrammarExprType::kChoices); for (auto sequence_id : rule_body) { const auto& sequence = grammar->GetGrammarExpr(sequence_id); @@ -654,7 +673,7 @@ CompiledGrammar GrammarCompiler::Impl::MultiThreadCompileGrammar(Grammar grammar for (int element_id = 0; element_id < sequence.size(); ++element_id) { state.element_id = element_id; auto element = grammar->GetGrammarExpr(sequence[element_id]); - if (element.type == GrammarExprType::kRuleRef) { + if (element.type == GrammarExprType::kRuleRef || element.type == GrammarExprType::kRepeat) { continue; } if (element.type == GrammarExprType::kByteString) {
cpp/grammar_functor.cc+52 −1 modified@@ -14,8 +14,10 @@ #include <vector> #include "fsm_builder.h" +#include "grammar_builder.h" #include "grammar_impl.h" #include "support/encoding.h" +#include "support/logging.h" #include "xgrammar/grammar.h" namespace xgrammar { @@ -135,6 +137,7 @@ class StructureNormalizerSub : public GrammarMutator { case GrammarExprType::kCharacterClass: case GrammarExprType::kCharacterClassStar: case GrammarExprType::kRuleRef: + case GrammarExprType::kRepeat: return builder_->AddSequence({builder_->AddGrammarExpr(assertion_expr)}); default: XGRAMMAR_LOG(FATAL) << "Unexpected lookahead assertion type: " @@ -156,6 +159,7 @@ class StructureNormalizerSub : public GrammarMutator { case GrammarExprType::kCharacterClass: case GrammarExprType::kCharacterClassStar: case GrammarExprType::kRuleRef: + case GrammarExprType::kRepeat: return builder_->AddChoices({builder_->AddSequence({builder_->AddGrammarExpr(grammar_expr)}) }); case GrammarExprType::kTagDispatch: @@ -189,6 +193,7 @@ class StructureNormalizerSub : public GrammarMutator { case GrammarExprType::kCharacterClass: case GrammarExprType::kCharacterClassStar: case GrammarExprType::kRuleRef: + case GrammarExprType::kRepeat: VisitElementInChoices(choice_expr, &new_choice_ids); break; case GrammarExprType::kTagDispatch: { @@ -267,6 +272,7 @@ class StructureNormalizerSub : public GrammarMutator { case GrammarExprType::kCharacterClass: case GrammarExprType::kCharacterClassStar: case GrammarExprType::kRuleRef: + case GrammarExprType::kRepeat: VisitElementInSequence(element_expr, &new_sequence_ids); break; case GrammarExprType::kTagDispatch: { @@ -492,6 +498,8 @@ class UsedRulesAnalyzer : public GrammarVisitor<std::vector<int32_t>> { void VisitRuleRef(const GrammarExpr& grammar_expr) { visit_queue_.push(grammar_expr[0]); } + void VisitRepeat(const GrammarExpr& grammar_expr) { visit_queue_.push(grammar_expr[0]); } + private: std::queue<int32_t> visit_queue_; }; @@ -537,6 +545,12 @@ class DeadCodeEliminatorImpl : public GrammarMutator { return builder_->AddRuleRef(new_rule_id); } + int32_t VisitRepeat(const GrammarExpr& grammar_expr) final { + XGRAMMAR_DCHECK(rule_id_map_.count(grammar_expr[0]) > 0); + auto new_rule_id = rule_id_map_[grammar_expr[0]]; + return builder_->AddRepeat(new_rule_id, grammar_expr[1], grammar_expr[2]); + } + private: std::unordered_map<int32_t, int32_t> rule_id_map_; }; @@ -689,6 +703,12 @@ class SubGrammarAdderImpl : public GrammarMutator { return builder_->AddRuleRef(new_rule_ids_names[grammar_expr[0]].first); } + int32_t VisitRepeat(const GrammarExpr& grammar_expr) final { + return builder_->AddRepeat( + new_rule_ids_names[grammar_expr[0]].first, grammar_expr[1], grammar_expr[2] + ); + } + std::vector<std::pair<int32_t, std::string>> new_rule_ids_names; }; @@ -799,6 +819,10 @@ class RuleRefGraphFinder : public GrammarVisitor<std::vector<std::vector<int32_t rule_visit_graph_[grammar_expr[0]].push_back(cur_rule_id_); } + void VisitRepeat(const GrammarExpr& grammar_expr) { + rule_visit_graph_[grammar_expr[0]].push_back(cur_rule_id_); + } + void VisitTagDispatch(const GrammarExpr& grammar_expr) { for (int i = 1; i < grammar_expr.size() - 3; i += 2) { rule_visit_graph_[grammar_expr[i]].push_back(cur_rule_id_); @@ -872,7 +896,9 @@ class AllowEmptyRuleAnalyzerImpl : public GrammarVisitor<std::vector<int32_t>> { auto element_expr = base_grammar_->GetGrammarExpr(i); return (element_expr.type == GrammarExprType::kRuleRef && empty_rule_id_set.count(element_expr[0])) || - element_expr.type == GrammarExprType::kCharacterClassStar; + element_expr.type == GrammarExprType::kCharacterClassStar || + (element_expr.type == GrammarExprType::kRepeat && + (empty_rule_id_set.count(element_expr[0]) || element_expr[1] == 0)); }); } @@ -1023,6 +1049,29 @@ class GrammarFSMBuilderImpl { } }; +class RepetitionNormalizerImpl { + public: + void Apply(Grammar* grammar) { + for (int i = 0; i < (*grammar)->NumGrammarExprs(); ++i) { + auto expr = (*grammar)->GetGrammarExpr(i); + if (expr.type != Grammar::Impl::GrammarExprType::kRepeat) { + continue; + } + int repeat_rule_id = expr[0]; + (*grammar)->exact_lookahead.push_back(repeat_rule_id); + if (std::binary_search( + (*grammar)->allow_empty_rule_ids.begin(), + (*grammar)->allow_empty_rule_ids.end(), + repeat_rule_id + )) { + // The repeated rule can be empty, so we need to normalize it. + expr.SetData(1, 0); // Set min repeat to 0 + } + } + std::sort((*grammar)->exact_lookahead.begin(), (*grammar)->exact_lookahead.end()); + } +}; + /*************************** Forward grammar functors to their impl ***************************/ Grammar GrammarNormalizer::Apply(const Grammar& grammar) { @@ -1072,4 +1121,6 @@ int32_t SubGrammarAdder::Apply(GrammarBuilder* builder, const Grammar& sub_gramm void GrammarFSMBuilder::Apply(Grammar* grammar) { GrammarFSMBuilderImpl().Apply(grammar); } +void RepetitionNormalizer::Apply(Grammar* grammar) { RepetitionNormalizerImpl().Apply(grammar); } + } // namespace xgrammar
cpp/grammar_functor.h+10 −1 modified@@ -131,6 +131,8 @@ class GrammarFunctor { return VisitRuleRef(grammar_expr); case GrammarExprType::kTagDispatch: return VisitTagDispatch(grammar_expr); + case GrammarExprType::kRepeat: + return VisitRepeat(grammar_expr); default: XGRAMMAR_LOG(FATAL) << "Unexpected sequence type: " << static_cast<int>(grammar_expr.type); XGRAMMAR_UNREACHABLE(); @@ -212,6 +214,9 @@ class GrammarFunctor { /*! \brief Visit a rule reference GrammarExpr. */ virtual T VisitRuleRef(const GrammarExpr& grammar_expr) { return VisitElement(grammar_expr); } + /*! \brief Visit a repeat GrammarExpr. */ + virtual T VisitRepeat(const GrammarExpr& grammar_expr) { return VisitElement(grammar_expr); } + /*! \brief The grammar to visit or mutate. */ Grammar base_grammar_{NullObj{}}; @@ -340,12 +345,16 @@ class GrammarFSMBuilder { public: static void Apply(Grammar* grammar); }; - class SubGrammarAdder { public: static int32_t Apply(GrammarBuilder* builder, const Grammar& sub_grammar); }; +class RepetitionNormalizer { + public: + static void Apply(Grammar* grammar); +}; + } // namespace xgrammar #endif // XGRAMMAR_GRAMMAR_FUNCTOR_H_
cpp/grammar_impl.h+9 −1 modified@@ -118,6 +118,8 @@ class Grammar::Impl { // tag_expr should be a byte string, and rule_id should be a rule id. // loop_after_dispatch is a bool. kTagDispatch, + // data format: [grammar_expr_id, min_repeat_count, max_repeat_count] + kRepeat, }; /*! \brief The object representing a grammar expr. */ @@ -138,6 +140,7 @@ class Grammar::Impl { } const int32_t* begin() const { return data; } const int32_t* end() const { return data + data_len; } + void SetData(int index, int value) { const_cast<int32_t*>(data)[index] = value; } }; /*! \brief Get the number of grammar_exprs. */ @@ -247,6 +250,9 @@ class Grammar::Impl { /*! \brief The ids of the rules that are allowed to be empty. */ std::vector<int32_t> allow_empty_rule_ids; + /*! \brief Store the lookahead which are exact, used to reduce uncertainty.*/ + std::vector<int32_t> exact_lookahead; + friend class GrammarBuilder; friend class GrammarCompiler; @@ -276,7 +282,9 @@ XGRAMMAR_MEMBER_TABLE( "per_rule_fsms", &Grammar::Impl::per_rule_fsms, "allow_empty_rule_ids", - &Grammar::Impl::allow_empty_rule_ids + &Grammar::Impl::allow_empty_rule_ids, + "exact_lookahead", + &Grammar::Impl::exact_lookahead ); } // namespace xgrammar
cpp/grammar_matcher.cc+7 −13 modified@@ -366,18 +366,14 @@ bool GrammarMatcher::Impl::IsStopTokenAccepted() const { return stop_token_is_ac // TODO(yixin): Polish verbose logging bool GrammarMatcher::Impl::AcceptToken(int32_t token_id, bool debug_print) { if (IsStopTokenAccepted()) { - if (debug_print) { - XGRAMMAR_LOG(WARNING) << "The matcher has terminated after accepting the stop token, but is " - << "trying to accept new token with id " << token_id << "."; - } + XGRAMMAR_LOG(WARNING) << "The matcher has terminated after accepting the stop token, but is " + << "trying to accept new token with id " << token_id << "."; return false; } if (token_id < 0 || token_id >= tokenizer_info_.GetVocabSize()) { - if (debug_print) { - XGRAMMAR_LOG(WARNING) << "The token id " << token_id << " is out of range [0, " - << tokenizer_info_.GetVocabSize() << "). Rejecting the token."; - } + XGRAMMAR_LOG(WARNING) << "The token id " << token_id << " is out of range [0, " + << tokenizer_info_.GetVocabSize() << "). Rejecting the token."; return false; } @@ -404,11 +400,9 @@ bool GrammarMatcher::Impl::AcceptToken(int32_t token_id, bool debug_print) { const auto& special_token_ids = tokenizer_info_.GetSpecialTokenIds(); if (std::find(special_token_ids.begin(), special_token_ids.end(), token_id) != special_token_ids.end()) { - if (debug_print) { - XGRAMMAR_LOG(WARNING) << "GrammarMatcher cannot accept special token id " << token_id << ": " - << tokenizer_info_.GetDecodedVocab()[token_id] - << ". Rejecting the token."; - } + XGRAMMAR_LOG(WARNING) << "GrammarMatcher cannot accept special token id " << token_id << ": " + << tokenizer_info_.GetDecodedVocab()[token_id] + << ". Rejecting the token."; return false; }
cpp/grammar_parser.cc+46 −41 modified@@ -7,12 +7,15 @@ #include <picojson.h> +#include <cstdint> #include <variant> +#include <vector> #include "grammar_builder.h" #include "grammar_impl.h" #include "support/encoding.h" #include "support/logging.h" +#include "xgrammar/grammar.h" namespace xgrammar { @@ -757,57 +760,59 @@ int32_t EBNFParser::HandleQuestionQuantifier(int32_t grammar_expr_id) { return builder_.AddRuleRef(new_rule_id); } -int32_t EBNFParser::HandleRepetitionRange(int32_t grammar_expr_id, int64_t lower, int64_t upper) { - // Construct expr expr ... expr (l times) +int32_t EBNFParser::HandleRepetitionRange( + const int32_t grammar_expr_id, int64_t lower, int64_t upper +) { + if (upper == -1) { + // The repeation is unbounded, e.g. {2,} + upper = 0x7FFFFFFF; // Use a large number to represent unbounded + } + const auto repeat_name = builder_.GetNewRuleName(cur_rule_name_) + "_xgrammar_repetition_context"; std::vector<int32_t> elements; - for (int64_t i = 0; i < lower; ++i) { - elements.push_back(grammar_expr_id); + int splited_count = lower >= 4 ? 4 : lower; + int nullable_splited_count = 0; + if (splited_count != 4) { + nullable_splited_count = + (upper - lower) >= (4 - splited_count) ? 4 - splited_count : upper - lower; } - - // Case 1: {l}: - // expr expr ... expr (l times) - if (upper == lower) { - return builder_.AddSequence(elements); + // The repetition sentence. + if (upper != (splited_count + nullable_splited_count)) { + auto new_rule_name = builder_.GetNewRuleName(repeat_name); + auto new_grammar_expr_id = builder_.AddChoices({builder_.AddSequence({grammar_expr_id})}); + auto new_rule_id = builder_.AddRule(new_rule_name, new_grammar_expr_id); + elements.push_back(builder_.AddRepeat( + new_rule_id, lower - splited_count, upper - splited_count - nullable_splited_count + )); } - - // Case 2: {l,}: - // expr expr ... expr (l times) rest - // rest ::= "" | expr rest - if (upper == -1) { - auto new_rule_name = builder_.GetNewRuleName(cur_rule_name_); - auto new_rule_id = builder_.AddEmptyRule(new_rule_name); - auto ref_to_new_rule = builder_.AddRuleRef(new_rule_id); - auto new_grammar_expr_id = builder_.AddChoices( - {builder_.AddEmptyStr(), builder_.AddSequence({grammar_expr_id, ref_to_new_rule})} - ); - builder_.UpdateRuleBody(new_rule_id, new_grammar_expr_id); + // The last split_count exprs. + + // The nullable exprs. + for (int i = 0; i < nullable_splited_count; i++) { + auto new_rule_name = builder_.GetNewRuleName(repeat_name); + auto new_grammar_expr_id = + builder_.AddChoices({builder_.AddEmptyStr(), builder_.AddSequence({grammar_expr_id})}); + auto new_rule_id = builder_.AddRule(new_rule_name, new_grammar_expr_id); elements.push_back(builder_.AddRuleRef(new_rule_id)); - return builder_.AddSequence(elements); } - // Case 3: {l, r} (r - l >= 1) - // expr expr ... expr (l times) rest1 - // rest1 ::= "" | expr rest2 - // rest2 ::= "" | expr rest3 - // ... - // rest(r - l) ::= "" | expr - std::vector<int32_t> rest_rule_ids; + for (int i = 0; i < splited_count; i++) { + auto new_rule_name = builder_.GetNewRuleName(repeat_name); + auto new_grammar_expr_id = builder_.AddChoices({builder_.AddSequence({grammar_expr_id})}); + auto new_rule_id = builder_.AddRule(new_rule_name, new_grammar_expr_id); + elements.push_back(builder_.AddRuleRef(new_rule_id)); + } - for (int64_t i = 0; i < upper - lower; ++i) { - auto new_rule_name = builder_.GetNewRuleName(cur_rule_name_); - rest_rule_ids.push_back(builder_.AddEmptyRule(new_rule_name)); + // Add the lookahead elements + std::vector<int32_t> lookahead_elements = elements; + if (elements.empty()) { + return builder_.AddEmptyStr(); } - for (int64_t i = 0; i < upper - lower - 1; ++i) { - auto ref_to_next_rule = builder_.AddRuleRef(rest_rule_ids[i + 1]); - auto new_grammar_expr_id = builder_.AddChoices( - {builder_.AddEmptyStr(), builder_.AddSequence({grammar_expr_id, ref_to_next_rule})} + for (int64_t i = 0; i < static_cast<int64_t>(elements.size() - 1); i++) { + lookahead_elements.erase(lookahead_elements.begin()); + builder_.UpdateLookaheadAssertion( + builder_.GetGrammarExpr(elements[i])[0], builder_.AddSequence(lookahead_elements) ); - builder_.UpdateRuleBody(rest_rule_ids[i], new_grammar_expr_id); } - auto last_grammar_expr_id = builder_.AddChoices({builder_.AddEmptyStr(), grammar_expr_id}); - builder_.UpdateRuleBody(rest_rule_ids.back(), last_grammar_expr_id); - - elements.push_back(builder_.AddRuleRef(rest_rule_ids[0])); return builder_.AddSequence(elements); }
cpp/grammar_printer.cc+13 −0 modified@@ -42,6 +42,8 @@ std::string GrammarPrinter::PrintGrammarExpr(const GrammarExpr& grammar_expr) { return PrintChoices(grammar_expr); case GrammarExprType::kTagDispatch: return PrintTagDispatch(grammar_expr); + case GrammarExprType::kRepeat: + return PrintRepeat(grammar_expr); default: XGRAMMAR_LOG(FATAL) << "Unexpected GrammarExpr type: " << static_cast<int>(grammar_expr.type); XGRAMMAR_UNREACHABLE(); @@ -146,6 +148,17 @@ std::string GrammarPrinter::PrintTagDispatch(const GrammarExpr& grammar_expr) { return result; } +std::string GrammarPrinter::PrintRepeat(const GrammarExpr& grammar_expr) { + int32_t lower_bound = grammar_expr[1]; + int32_t upper_bound = grammar_expr[2]; + std::string result = grammar_->GetRule(grammar_expr[0]).name + "{"; + result += std::to_string(lower_bound); + result += ", "; + result += std::to_string(upper_bound); + result += "}"; + return result; +} + std::string GrammarPrinter::ToString() { std::string result; int num_rules = grammar_->NumRules();
cpp/grammar_printer.h+2 −0 modified@@ -60,6 +60,8 @@ class GrammarPrinter { std::string PrintChoices(const GrammarExpr& grammar_expr); /*! \brief Print a GrammarExpr for tag dispatch. */ std::string PrintTagDispatch(const GrammarExpr& grammar_expr); + /*! \brief Print a GrammarExpr for repeat. */ + std::string PrintRepeat(const GrammarExpr& grammar_expr); /*! \brief Print a string. */ std::string PrintString(const std::string& str); /*! \brief Print a boolean. */
tests/python/test_grammar_matcher_ebnf.py+22 −0 modified@@ -51,6 +51,28 @@ def test_repetition(input: str, accepted: bool): assert _is_grammar_accept_string(grammar, input) == accepted +input_accepted_test_repetition_with_empty = ( + ("aaa", True), + ("abcbc", True), + ("bcbcbcbcbc", True), + ("bcbcbcbcbcbcbcb", True), + ("aaaa", False), + ("", True), + ("a", True), + ("d", True), +) + + +@pytest.mark.parametrize("input, accepted", input_accepted_test_repetition_with_empty) +def test_repetition_with_empty(input: str, accepted: bool): + grammar_str = """ + root ::= rule {2, 3} "d"? + rule ::= ("a" | [bc] {4,}) | "" + """ + grammar = xgr.Grammar.from_ebnf(grammar_str) + assert _is_grammar_accept_string(grammar, input) == accepted + + def test_utf8(): # Test utf8-encoded string with EBNF grammar ebnf_grammar_str = "root ::= [,]+"
tests/python/test_grammar_matcher_regex.py+14 −0 modified@@ -159,5 +159,19 @@ def test_fill_next_token_bitmask(regex: str, input_str: str): assert tokenizer.eos_token_id not in rejected_token_ids +@pytest.mark.hf_token_required +def test_regex_with_large_range_compilation(): + regex_with_large_range = r"[a-z]{100,20000}" + tokenizer_path = "meta-llama/Meta-Llama-3-8B-Instruct" + tokenizer = AutoTokenizer.from_pretrained(tokenizer_path, use_fast=True, trust_remote_code=True) + tokenizer_info = xgr.TokenizerInfo.from_huggingface(tokenizer) + compiler = xgr.GrammarCompiler(tokenizer_info) + + time_start = time.monotonic_ns() + _ = compiler.compile_regex(regex_with_large_range) + time_end = time.monotonic_ns() + print(f"Time to compile regex with large range: {(time_end - time_start) / 1e3} us") + + if __name__ == "__main__": pytest.main(sys.argv)
tests/python/test_grammar_parser.py+48 −29 modified@@ -146,7 +146,10 @@ def test_repetition_range_exact(): """Test repetition range with exact count {n}.""" before = """root ::= "a"{3} """ - expected = """root ::= ((("a" "a" "a"))) + expected = """root ::= (((root_1_xgrammar_repetition_context root_1_xgrammar_repetition_context_1 root_1_xgrammar_repetition_context_2))) +root_1_xgrammar_repetition_context ::= (("a")) (=(root_1_xgrammar_repetition_context_1 root_1_xgrammar_repetition_context_2)) +root_1_xgrammar_repetition_context_1 ::= (("a")) (=(root_1_xgrammar_repetition_context_2)) +root_1_xgrammar_repetition_context_2 ::= (("a")) """ grammar = _ebnf_to_grammar_no_normalization(before) after = str(grammar) @@ -157,9 +160,11 @@ def test_repetition_range_min_max(): """Test repetition range with min and max {n,m}.""" before = """root ::= "a"{2,4} """ - expected = """root ::= ((("a" "a" root_1))) -root_1 ::= ("" | ("a" root_2)) -root_2 ::= ("" | "a") + expected = """root ::= (((root_1_xgrammar_repetition_context root_1_xgrammar_repetition_context_1 root_1_xgrammar_repetition_context_2 root_1_xgrammar_repetition_context_3))) +root_1_xgrammar_repetition_context ::= ("" | ("a")) (=(root_1_xgrammar_repetition_context_1 root_1_xgrammar_repetition_context_2 root_1_xgrammar_repetition_context_3)) +root_1_xgrammar_repetition_context_1 ::= ("" | ("a")) (=(root_1_xgrammar_repetition_context_2 root_1_xgrammar_repetition_context_3)) +root_1_xgrammar_repetition_context_2 ::= (("a")) (=(root_1_xgrammar_repetition_context_3)) +root_1_xgrammar_repetition_context_3 ::= (("a")) """ grammar = _ebnf_to_grammar_no_normalization(before) after = str(grammar) @@ -170,8 +175,12 @@ def test_repetition_range_min_only(): """Test repetition range with only min {n,}.""" before = """root ::= "a"{2,} """ - expected = """root ::= ((("a" "a" root_1))) -root_1 ::= ("" | ("a" root_1)) + expected = """root ::= (((root_1_xgrammar_repetition_context{0, 2147483643} root_1_xgrammar_repetition_context_1 root_1_xgrammar_repetition_context_2 root_1_xgrammar_repetition_context_3 root_1_xgrammar_repetition_context_4))) +root_1_xgrammar_repetition_context ::= (("a")) (=(root_1_xgrammar_repetition_context_1 root_1_xgrammar_repetition_context_2 root_1_xgrammar_repetition_context_3 root_1_xgrammar_repetition_context_4)) +root_1_xgrammar_repetition_context_1 ::= ("" | ("a")) (=(root_1_xgrammar_repetition_context_2 root_1_xgrammar_repetition_context_3 root_1_xgrammar_repetition_context_4)) +root_1_xgrammar_repetition_context_2 ::= ("" | ("a")) (=(root_1_xgrammar_repetition_context_3 root_1_xgrammar_repetition_context_4)) +root_1_xgrammar_repetition_context_3 ::= (("a")) (=(root_1_xgrammar_repetition_context_4)) +root_1_xgrammar_repetition_context_4 ::= (("a")) """ grammar = _ebnf_to_grammar_no_normalization(before) after = str(grammar) @@ -266,11 +275,12 @@ def test_combined_features(): rule2 ::= [0-9]+ "." [0-9]* """ expected = """root ::= (("start" root_1 "end")) -rule1 ::= ((([a-z] rule1_1))) (=((":"))) +rule1 ::= (((rule1_1_xgrammar_repetition_context rule1_1_xgrammar_repetition_context_1 rule1_1_xgrammar_repetition_context_2))) (=((":"))) rule2 ::= ((rule2_1 "." [0-9]*)) root_1 ::= ((((rule1) | (rule2)) root_1) | ((rule1) | (rule2))) -rule1_1 ::= ("" | ([a-z] rule1_2)) -rule1_2 ::= ("" | [a-z]) +rule1_1_xgrammar_repetition_context ::= ("" | ([a-z])) (=(rule1_1_xgrammar_repetition_context_1 rule1_1_xgrammar_repetition_context_2)) +rule1_1_xgrammar_repetition_context_1 ::= ("" | ([a-z])) (=(rule1_1_xgrammar_repetition_context_2)) +rule1_1_xgrammar_repetition_context_2 ::= (([a-z])) rule2_1 ::= (([0-9] rule2_1) | [0-9]) """ grammar = _ebnf_to_grammar_no_normalization(before) @@ -344,26 +354,35 @@ def test_repetition_range(): """ expected = """root ::= ((a b c d e f g)) -a ::= (("a" a_1)) -b ::= ((b_5 b_1)) -c ::= ((c_1)) -d ::= ((d_1)) -e ::= (("ee" e_1)) -f ::= (("fff")) -g ::= (()) -a_1 ::= ("" | ("a")) -b_1 ::= ("" | (b_1_1 b_2)) -b_2 ::= ("" | (b_2_1 b_3)) -b_3 ::= ("" | (b_3_1 b_4)) -b_4 ::= ("" | (a) | ("b")) -c_1 ::= ("" | ("c" c_2)) -c_2 ::= ("" | ("c")) -d_1 ::= ("" | ("d" d_1)) -e_1 ::= ("" | ("e" e_1)) -b_5 ::= ((a) | ("b")) -b_1_1 ::= ((a) | ("b")) -b_2_1 ::= ((a) | ("b")) -b_3_1 ::= ((a) | ("b")) +a ::= ((a_1_xgrammar_repetition_context a_1_xgrammar_repetition_context_1)) +b ::= ((b_1_xgrammar_repetition_context{0, 1} b_1_xgrammar_repetition_context_1 b_1_xgrammar_repetition_context_2 b_1_xgrammar_repetition_context_3 b_1_xgrammar_repetition_context_4)) +c ::= ((c_1_xgrammar_repetition_context c_1_xgrammar_repetition_context_1)) +d ::= ((d_1_xgrammar_repetition_context{0, 2147483643} d_1_xgrammar_repetition_context_1 d_1_xgrammar_repetition_context_2 d_1_xgrammar_repetition_context_3 d_1_xgrammar_repetition_context_4)) +e ::= ((e_1_xgrammar_repetition_context{0, 2147483643} e_1_xgrammar_repetition_context_1 e_1_xgrammar_repetition_context_2 e_1_xgrammar_repetition_context_3 e_1_xgrammar_repetition_context_4)) +f ::= ((f_1_xgrammar_repetition_context f_1_xgrammar_repetition_context_1 f_1_xgrammar_repetition_context_2)) +g ::= ("") +a_1_xgrammar_repetition_context ::= ("" | ("a")) (=(a_1_xgrammar_repetition_context_1)) +a_1_xgrammar_repetition_context_1 ::= (("a")) +b_1_xgrammar_repetition_context ::= ((a) | ("b")) (=(b_1_xgrammar_repetition_context_1 b_1_xgrammar_repetition_context_2 b_1_xgrammar_repetition_context_3 b_1_xgrammar_repetition_context_4)) +b_1_xgrammar_repetition_context_1 ::= ("" | (a) | ("b")) (=(b_1_xgrammar_repetition_context_2 b_1_xgrammar_repetition_context_3 b_1_xgrammar_repetition_context_4)) +b_1_xgrammar_repetition_context_2 ::= ("" | (a) | ("b")) (=(b_1_xgrammar_repetition_context_3 b_1_xgrammar_repetition_context_4)) +b_1_xgrammar_repetition_context_3 ::= ("" | (a) | ("b")) (=(b_1_xgrammar_repetition_context_4)) +b_1_xgrammar_repetition_context_4 ::= ((a) | ("b")) +c_1_xgrammar_repetition_context ::= ("" | ("c")) (=(c_1_xgrammar_repetition_context_1)) +c_1_xgrammar_repetition_context_1 ::= ("" | ("c")) +d_1_xgrammar_repetition_context ::= (("d")) (=(d_1_xgrammar_repetition_context_1 d_1_xgrammar_repetition_context_2 d_1_xgrammar_repetition_context_3 d_1_xgrammar_repetition_context_4)) +d_1_xgrammar_repetition_context_1 ::= ("" | ("d")) (=(d_1_xgrammar_repetition_context_2 d_1_xgrammar_repetition_context_3 d_1_xgrammar_repetition_context_4)) +d_1_xgrammar_repetition_context_2 ::= ("" | ("d")) (=(d_1_xgrammar_repetition_context_3 d_1_xgrammar_repetition_context_4)) +d_1_xgrammar_repetition_context_3 ::= ("" | ("d")) (=(d_1_xgrammar_repetition_context_4)) +d_1_xgrammar_repetition_context_4 ::= ("" | ("d")) +e_1_xgrammar_repetition_context ::= (("e")) (=(e_1_xgrammar_repetition_context_1 e_1_xgrammar_repetition_context_2 e_1_xgrammar_repetition_context_3 e_1_xgrammar_repetition_context_4)) +e_1_xgrammar_repetition_context_1 ::= ("" | ("e")) (=(e_1_xgrammar_repetition_context_2 e_1_xgrammar_repetition_context_3 e_1_xgrammar_repetition_context_4)) +e_1_xgrammar_repetition_context_2 ::= ("" | ("e")) (=(e_1_xgrammar_repetition_context_3 e_1_xgrammar_repetition_context_4)) +e_1_xgrammar_repetition_context_3 ::= (("e")) (=(e_1_xgrammar_repetition_context_4)) +e_1_xgrammar_repetition_context_4 ::= (("e")) +f_1_xgrammar_repetition_context ::= (("f")) (=(f_1_xgrammar_repetition_context_1 f_1_xgrammar_repetition_context_2)) +f_1_xgrammar_repetition_context_1 ::= (("f")) (=(f_1_xgrammar_repetition_context_2)) +f_1_xgrammar_repetition_context_2 ::= (("f")) """ grammar = _ebnf_to_grammar_no_normalization(before)
tests/python/test_serialization.py+4 −0 modified@@ -60,8 +60,10 @@ def test_serialize_grammar(): "allow_empty_rule_ids": [], "complete_fsm": None, "per_rule_fsms": [], + "exact_lookahead": [], "__VERSION__": "v4", } + print(json.loads(serialized)) # For debugging purposes assert json.loads(serialized) == expected_json @@ -79,6 +81,7 @@ def test_serialize_grammar_exception(): "allow_empty_rule_ids": [], "complete_fsm": None, "per_rule_fsms": [], + "exact_lookahead": [], "__VERSION__": "v4", } @@ -202,6 +205,7 @@ def test_serialize_compiled_grammar(): "root_rule_id": 1, "allow_empty_rule_ids": [0], "complete_fsm": {"data_": [], "indptr_": [0]}, + "exact_lookahead": [], "per_rule_fsms": [None, None], }, "tokenizer_metadata": {
Vulnerability mechanics
Generated on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.
References
4- github.com/advisories/GHSA-9q5r-wfvf-rr7fghsaADVISORY
- nvd.nist.gov/vuln/detail/CVE-2025-58446ghsaADVISORY
- github.com/mlc-ai/xgrammar/commit/ced69c3ad2f8f61b516cc278a342e7c644383e27ghsax_refsource_MISCWEB
- github.com/mlc-ai/xgrammar/security/advisories/GHSA-9q5r-wfvf-rr7fghsax_refsource_CONFIRMWEB
News mentions
0No linked articles in our index yet.