VYPR
Moderate severityNVD Advisory· Published Mar 28, 2023· Updated Feb 18, 2025

Quadratic runtime when parsing Markdown in comrak

CVE-2023-28626

Description

comrak is a CommonMark + GFM compatible Markdown parser and renderer written in rust. A range of quadratic parsing issues are present in Comrak. These can be used to craft denial-of-service attacks on services that use Comrak to parse Markdown. This issue has been addressed in version 0.17.0. Users are advised to upgrade. There are no known workarounds for this vulnerability. This issue is also tracked as GHSL-2023-047

AI Insight

LLM-synthesized narrative grounded in this CVE's description and references.

Comrak Markdown parser contains multiple quadratic parsing issues leading to denial-of-service.

Overview

Comrak, a CommonMark and GitHub Flavored Markdown parser and renderer written in Rust [1], contains a range of quadratic parsing issues that can be exploited to cause a denial-of-service (DoS). These vulnerabilities are tracked as GHSL-2023-047 [2] and are related to inefficient algorithms in the parser that cause processing time to grow quadratically with the length of crafted input.

Exploitation

An attacker can deliver a specially crafted Markdown document that triggers the quadratic behavior. The exact input vectors are not detailed in the public advisories, but the issue is similar to known problems in other Markdown parsers, such as quadratic behavior when parsing smart quotes [3]. No authentication or specific network position is required beyond the ability to submit Markdown content to a service using Comrak.

Impact

Successful exploitation allows an attacker to consume excessive CPU resources on the server, leading to a denial-of-service condition. This can degrade or completely interrupt service availability for legitimate users. The vulnerability does not lead to code execution or data disclosure [2].

Mitigation

The vulnerability has been fixed in Comrak version 0.17.0 [2]. Users are advised to upgrade to this version or later. There are no known workarounds. As of the publication date, the CVE is not listed on CISA's Known Exploited Vulnerabilities (KEV) catalog [2].

AI Insight generated on May 20, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.

Affected packages

Versions sourced from the GitHub Security Advisory.

PackageAffected versionsPatched versions
comrakcrates.io
< 0.17.00.17.0

Affected products

2

Patches

1
ce795b7f471b

Merge pull request from GHSA-8hqf-xjwp-p67v

https://github.com/kivikakk/comrakAsherah ConnorMar 27, 2023via ghsa
19 files changed · +23345 627
  • benches/progit.rs+2 0 modified
    @@ -1,5 +1,7 @@
     #![feature(test)]
     
    +extern crate test;
    +
     use comrak::{format_html, parse_document, Arena, ComrakOptions};
     use test::Bencher;
     
    
  • Cargo.lock+0 117 modified
    @@ -59,15 +59,6 @@ version = "1.3.2"
     source = "registry+https://github.com/rust-lang/crates.io-index"
     checksum = "bef38d45163c2f1dde094a7dfd33ccf595c92905c8f8f4fdc18d06fb1037718a"
     
    -[[package]]
    -name = "block-buffer"
    -version = "0.10.3"
    -source = "registry+https://github.com/rust-lang/crates.io-index"
    -checksum = "69cce20737498f97b993470a6e536b8523f0af7892a4f928cceb1ac5e52ebe7e"
    -dependencies = [
    - "generic-array",
    -]
    -
     [[package]]
     name = "byteorder"
     version = "1.4.3"
    @@ -134,8 +125,6 @@ dependencies = [
      "memchr",
      "ntest",
      "once_cell",
    - "pest",
    - "pest_derive",
      "propfuzz",
      "regex",
      "shell-words",
    @@ -146,15 +135,6 @@ dependencies = [
      "xdg",
     ]
     
    -[[package]]
    -name = "cpufeatures"
    -version = "0.2.5"
    -source = "registry+https://github.com/rust-lang/crates.io-index"
    -checksum = "28d997bd5e24a5928dd43e46dc529867e207907fe0b239c3477d924f7f2ca320"
    -dependencies = [
    - "libc",
    -]
    -
     [[package]]
     name = "crc32fast"
     version = "1.3.2"
    @@ -164,32 +144,12 @@ dependencies = [
      "cfg-if",
     ]
     
    -[[package]]
    -name = "crypto-common"
    -version = "0.1.6"
    -source = "registry+https://github.com/rust-lang/crates.io-index"
    -checksum = "1bfb12502f3fc46cca1bb51ac28df9d618d813cdc3d2f25b9fe775a34af26bb3"
    -dependencies = [
    - "generic-array",
    - "typenum",
    -]
    -
     [[package]]
     name = "deunicode"
     version = "0.4.3"
     source = "registry+https://github.com/rust-lang/crates.io-index"
     checksum = "850878694b7933ca4c9569d30a34b55031b9b139ee1fc7b94a527c4ef960d690"
     
    -[[package]]
    -name = "digest"
    -version = "0.10.6"
    -source = "registry+https://github.com/rust-lang/crates.io-index"
    -checksum = "8168378f4e5023e7218c89c891c0fd8ecdb5e5e4f18cb78f38cf245dd021e76f"
    -dependencies = [
    - "block-buffer",
    - "crypto-common",
    -]
    -
     [[package]]
     name = "dirs"
     version = "4.0.0"
    @@ -281,16 +241,6 @@ version = "1.0.7"
     source = "registry+https://github.com/rust-lang/crates.io-index"
     checksum = "3f9eec918d3f24069decb9af1554cad7c880e2da24a9afd88aca000531ab82c1"
     
    -[[package]]
    -name = "generic-array"
    -version = "0.14.6"
    -source = "registry+https://github.com/rust-lang/crates.io-index"
    -checksum = "bff49e947297f3312447abdca79f45f4738097cc82b06e72054d2223f601f1b9"
    -dependencies = [
    - "typenum",
    - "version_check",
    -]
    -
     [[package]]
     name = "getrandom"
     version = "0.1.16"
    @@ -505,50 +455,6 @@ version = "6.4.1"
     source = "registry+https://github.com/rust-lang/crates.io-index"
     checksum = "9b7820b9daea5457c9f21c69448905d723fbd21136ccf521748f23fd49e723ee"
     
    -[[package]]
    -name = "pest"
    -version = "2.5.2"
    -source = "registry+https://github.com/rust-lang/crates.io-index"
    -checksum = "0f6e86fb9e7026527a0d46bc308b841d73170ef8f443e1807f6ef88526a816d4"
    -dependencies = [
    - "thiserror",
    - "ucd-trie",
    -]
    -
    -[[package]]
    -name = "pest_derive"
    -version = "2.5.2"
    -source = "registry+https://github.com/rust-lang/crates.io-index"
    -checksum = "96504449aa860c8dcde14f9fba5c58dc6658688ca1fe363589d6327b8662c603"
    -dependencies = [
    - "pest",
    - "pest_generator",
    -]
    -
    -[[package]]
    -name = "pest_generator"
    -version = "2.5.2"
    -source = "registry+https://github.com/rust-lang/crates.io-index"
    -checksum = "798e0220d1111ae63d66cb66a5dcb3fc2d986d520b98e49e1852bfdb11d7c5e7"
    -dependencies = [
    - "pest",
    - "pest_meta",
    - "proc-macro2",
    - "quote",
    - "syn",
    -]
    -
    -[[package]]
    -name = "pest_meta"
    -version = "2.5.2"
    -source = "registry+https://github.com/rust-lang/crates.io-index"
    -checksum = "984298b75898e30a843e278a9f2452c31e349a073a0ce6fd950a12a74464e065"
    -dependencies = [
    - "once_cell",
    - "pest",
    - "sha1",
    -]
    -
     [[package]]
     name = "phf"
     version = "0.11.1"
    @@ -864,17 +770,6 @@ dependencies = [
      "serde",
     ]
     
    -[[package]]
    -name = "sha1"
    -version = "0.10.5"
    -source = "registry+https://github.com/rust-lang/crates.io-index"
    -checksum = "f04293dc80c3993519f2d7f6f511707ee7094fe0c6d3406feb330cdb3540eba3"
    -dependencies = [
    - "cfg-if",
    - "cpufeatures",
    - "digest",
    -]
    -
     [[package]]
     name = "shell-words"
     version = "1.1.0"
    @@ -1032,18 +927,6 @@ version = "2.0.1"
     source = "registry+https://github.com/rust-lang/crates.io-index"
     checksum = "0685c84d5d54d1c26f7d3eb96cd41550adb97baed141a761cf335d3d33bcd0ae"
     
    -[[package]]
    -name = "typenum"
    -version = "1.16.0"
    -source = "registry+https://github.com/rust-lang/crates.io-index"
    -checksum = "497961ef93d974e23eb6f433eb5fe1b7930b659f06d12dec6fc44a8f554c0bba"
    -
    -[[package]]
    -name = "ucd-trie"
    -version = "0.1.5"
    -source = "registry+https://github.com/rust-lang/crates.io-index"
    -checksum = "9e79c4d996edb816c91e4308506774452e55e95c3c9de07b6729e17e15a5ef81"
    -
     [[package]]
     name = "unicode-ident"
     version = "1.0.6"
    
  • Cargo.toml+0 2 modified
    @@ -36,8 +36,6 @@ once_cell = "1.13.0"
     entities = "1.0.1"
     unicode_categories = "0.1.1"
     memchr = "2"
    -pest = "2"
    -pest_derive = "2"
     shell-words = { version = "1.0", optional = true }
     slug = "0.1.4"
     emojis = { version = "0.5.2", optional = true }
    
  • .gitattributes+2 0 added
    @@ -0,0 +1,2 @@
    +src/scanners.rs linguist-generated
    +src/scanners.re linguist-language=Rust
    
  • Makefile+2 3 modified
    @@ -1,6 +1,5 @@
    -docker:
    -	docker build -t comrak $(CURDIR)/script
    -	docker run --privileged -t -i -v $(CURDIR):/src/comrak -v $(HOME)/.cargo/registry:/root/.cargo/registry -w /src/comrak comrak /bin/bash
    +src/scanners.rs: src/scanners.re
    +	re2rust -W -Werror -i --no-generation-date -o $@ $<
     
     bench:
     	cargo build --release
    
  • script/cibuild+20 7 modified
    @@ -6,17 +6,30 @@ if command -v apt-get &>/dev/null; then
     	sudo apt-get install python3
     fi
     
    -cargo build --verbose
    -cargo build --verbose --examples
     
     if [ x"$SPEC" = "xtrue" ]; then
    +	cargo build --verbose --release
    +
     	cd vendor/cmark-gfm/test
    -	python3 spec_tests.py --program='../../../target/debug/comrak --syntax-highlighting none'
    -	python3 spec_tests.py --spec extensions.txt --program='../../../target/debug/comrak --syntax-highlighting none' --extensions "table strikethrough autolink tagfilter footnotes tasklist"
    -	python3 roundtrip_tests.py --program='../../../target/debug/comrak --syntax-highlighting none'
    -	python3 spec_tests.py --no-normalize --spec regression.txt --program='../../../target/debug/comrak --syntax-highlighting none'
    -	python3 entity_tests.py --program='../../../target/debug/comrak --syntax-highlighting none'
    +
    +	PROGRAM_ARG="--program=../../../target/release/comrak --syntax-highlighting none"
    +
    +	python3 spec_tests.py --no-normalize --spec spec.txt "$PROGRAM_ARG"
    +	python3 pathological_tests.py "$PROGRAM_ARG"
    +	python3 roundtrip_tests.py --spec spec.txt "$PROGRAM_ARG"
    +	python3 entity_tests.py "$PROGRAM_ARG"
    +	python3 spec_tests.py --no-normalize --spec smart_punct.txt "$PROGRAM_ARG --smart"
    +
    +	python3 spec_tests.py --no-normalize --spec extensions.txt "$PROGRAM_ARG" --extensions "table strikethrough autolink tagfilter footnotes tasklist"
    +	python3 roundtrip_tests.py --spec extensions.txt "$PROGRAM_ARG" --extensions "table strikethrough autolink tagfilter footnotes tasklist"
    +	# python3 roundtrip_tests.py --spec extensions-table-prefer-style-attributes.txt "$PROGRAM_ARG --table-prefer-style-attributes" --extensions "table strikethrough autolink tagfilter footnotes tasklist"
    +	python3 roundtrip_tests.py --spec extensions-full-info-string.txt "$PROGRAM_ARG --full-info-string"
    +
    +	python3 spec_tests.py --no-normalize --spec regression.txt "$PROGRAM_ARG"
     else
    +	cargo build --verbose
    +	cargo build --verbose --examples
    +
     	cargo test --verbose
     	cargo run --example sample
     fi
    
  • src/html.rs+35 18 modified
    @@ -2,7 +2,6 @@ use crate::ctype::isspace;
     use crate::nodes::{AstNode, ListType, NodeCode, NodeValue, TableAlignment};
     use crate::parser::{ComrakOptions, ComrakPlugins};
     use crate::scanners;
    -use crate::strings::build_opening_tag;
     use once_cell::sync::Lazy;
     use regex::Regex;
     use std::borrow::Cow;
    @@ -243,6 +242,40 @@ fn dangerous_url(input: &[u8]) -> bool {
         scanners::dangerous_url(input).is_some()
     }
     
    +fn escape(output: &mut dyn Write, buffer: &[u8]) -> io::Result<()> {
    +    let mut offset = 0;
    +    for (i, &byte) in buffer.iter().enumerate() {
    +        if NEEDS_ESCAPED[byte as usize] {
    +            let esc: &[u8] = match byte {
    +                b'"' => b"&quot;",
    +                b'&' => b"&amp;",
    +                b'<' => b"&lt;",
    +                b'>' => b"&gt;",
    +                _ => unreachable!(),
    +            };
    +            output.write_all(&buffer[offset..i])?;
    +            output.write_all(esc)?;
    +            offset = i + 1;
    +        }
    +    }
    +    output.write_all(&buffer[offset..])?;
    +    Ok(())
    +}
    +
    +pub fn build_opening_tag(tag: &str, attributes: &HashMap<String, String>) -> String {
    +    let mut out = Vec::with_capacity(80);
    +    write!(out, "<{}", tag).unwrap();
    +
    +    for (attr, val) in attributes {
    +        write!(out, " {}=\"", attr).unwrap();
    +        escape(&mut out, val.as_bytes()).unwrap();
    +        write!(out, "\"").unwrap()
    +    }
    +
    +    write!(out, ">").unwrap();
    +    unsafe { String::from_utf8_unchecked(out) }
    +}
    +
     impl<'o> HtmlFormatter<'o> {
         fn new(
             options: &'o ComrakOptions,
    @@ -267,23 +300,7 @@ impl<'o> HtmlFormatter<'o> {
         }
     
         fn escape(&mut self, buffer: &[u8]) -> io::Result<()> {
    -        let mut offset = 0;
    -        for (i, &byte) in buffer.iter().enumerate() {
    -            if NEEDS_ESCAPED[byte as usize] {
    -                let esc: &[u8] = match byte {
    -                    b'"' => b"&quot;",
    -                    b'&' => b"&amp;",
    -                    b'<' => b"&lt;",
    -                    b'>' => b"&gt;",
    -                    _ => unreachable!(),
    -                };
    -                self.output.write_all(&buffer[offset..i])?;
    -                self.output.write_all(esc)?;
    -                offset = i + 1;
    -            }
    -        }
    -        self.output.write_all(&buffer[offset..])?;
    -        Ok(())
    +        escape(&mut self.output, buffer)
         }
     
         fn escape_href(&mut self, buffer: &[u8]) -> io::Result<()> {
    
  • src/lexer.pest+0 75 removed
    @@ -1,75 +0,0 @@
    -atx_heading_start = { "#"{1, 6} ~ (" " | "\t" | "\r" | "\n") }
    -
    -open_code_fence_backtick = _{ open_code_fence_backtick_match ~ (!("`" | "\r" | "\n" | "\x00") ~ ANY)* }
    -open_code_fence_backtick_match = { "`"{3,} }
    -open_code_fence_tilde = _{ open_code_fence_tilde_match ~ (!("\r" | "\n" | "\x00") ~ ANY)* }
    -open_code_fence_tilde_match = { "~"{3,} }
    -open_code_fence = _{ (open_code_fence_backtick | open_code_fence_tilde) ~ ("\r" | "\n") }
    -
    -close_code_fence = _{ close_code_fence_match ~ ("\t" | " ")* ~ ("\r" | "\n") }
    -close_code_fence_match = { "`"{3,} | "~"{3,} }
    -
    -html_block_start_1 = { "<" ~ ("script" | "pre" | "style") ~ (" " | "\t" | "\x0b" | "\x0c" | "\r" | "\n" | ">") }
    -html_block_start_4 = { "<!" ~ 'A'..'Z' }
    -html_block_start_6 = { "<" ~ "/"? ~ ("address" | "article" | "aside" | "base" | "basefont" | "blockquote" | "body" | "caption" | "center" | "col" | "colgroup" | "dd" | "details" | "dialog" | "dir" | "div" | "dl" | "dt" | "fieldset" | "figcaption" | "figure" | "footer" | "form" | "frame" | "frameset" | "h1" | "h2" | "h3" | "h4" | "h5" | "h6" | "head" | "header" | "hr" | "html" | "iframe" | "legend" | "li" | "link" | "main" | "menu" | "menuitem" | "nav" | "noframes" | "ol" | "optgroup" | "option" | "p" | "param" | "section" | "source" | "title" | "summary" | "table" | "tbody" | "td" | "tfoot" | "th" | "thead" | "title" | "tr" | "track" | "ul") ~ (" " | "\t" | "\x0b" | "\x0c" | "\r" | "\n" | "/>" | ">") }
    -
    -space_char = _{ " " | "\t" | "\x0b" | "\x0c" | "\r" | "\n" }
    -tag_name = _{ ('A'..'Z' | 'a'..'z') ~ ('A'..'Z' | 'a'..'z' | '0'..'9' | "-")* }
    -close_tag = _{ "/" ~ tag_name ~ space_char* ~ ">" }
    -attribute_name = _{ ('a'..'z' | 'A'..'Z' | "_" | ":") ~ ('a'..'z' | 'A'..'Z' | '0'..'9' | ":" | "." | "_" | "-")* }
    -attribute_value = _{ (!(" " | "\t" | "\r" | "\n" | "\x0b" | "\x0c" | "\"" | "'" | "=" | "<" | ">" | "`" | "\x00") ~ ANY)+ | ("'" ~ (!("'" | "\x00") ~ ANY)* ~ "'") | ("\"" ~ (!("\"" | "\x00") ~ ANY)* ~ "\"") }
    -attribute_value_spec = _{ space_char* ~ "=" ~ space_char* ~ attribute_value }
    -attribute = _{ space_char+ ~ attribute_name ~ attribute_value_spec? }
    -open_tag = _{ tag_name ~ attribute* ~ space_char* ~ "/"? ~ ">" }
    -html_comment = _{
    -    "!--" ~ (
    -        ">" |
    -        "->" |
    -        (
    -            (
    -                (!("\x00" | "-") ~ ANY)+
    -                |
    -                ("-" ~ !("\x00" | "-") ~ ANY)
    -                |
    -                ("--" ~ !("\x00" | ">") ~ ANY)
    -            )* ~
    -            "-->"
    -        )
    -    )
    -}
    -processing_instruction = _{ "?" ~ ((!("?" | ">" | "\x00") ~ ANY)+ | "?" ~ !(">" | "\x00") ~ ANY | ">")* ~ "?>" }
    -declaration = _{ "!" ~ 'A'..'Z'+ ~ space_char+ ~ (!(">" | "\x00") ~ ANY)* ~ ">" }
    -cdata = _{ "![CDATA[" ~ ((!("]" | "\x00") ~ ANY)+ | "]" ~ (!("]" | "\x00") ~ ANY) | "]]" ~ (!(">" | "\x00") ~ ANY))* ~ "]]>" }
    -html_tag = { open_tag | close_tag | html_comment | processing_instruction | declaration | cdata }
    -
    -html_block_start_7 = { "<" ~ (open_tag | close_tag) ~ ("\t" | "\x0c" | " ")* ~ ("\r" | "\n") }
    -
    -setext_heading_line = { ("="+ | "-"+) ~ (" " | "\t")* ~ ("\r" | "\n") }
    -thematic_break = { (("*" ~ (" " | "\t")*){3,} | ("_" ~ (" " | "\t")*){3,} | ("-" ~ (" " | "\t")*){3,}) ~ (" " | "\t")* ~ ("\r" | "\n") }
    -
    -footnote_definition = { "[^" ~ (!("]" | "\r" | "\n" | "\x00" | "\t") ~ ANY)+ ~ "]:" ~ (" " | "\t")* }
    -
    -scheme = _{ ('A'..'Z' | 'a'..'z') ~ ('A'..'Z' | 'a'..'z' | '0'..'9' | "." | "+" | "-"){1,31} }
    -
    -scheme_rule = { scheme ~ ":" }
    -
    -autolink_uri = { scheme ~ ":" ~ (!('\x00'..'\x20' | "<" | ">") ~ ANY)* ~ ">" }
    -autolink_email = { ('a'..'z' | 'A'..'Z' | '0'..'9' | "." | "!" | "#" | "$" | "%" | "&" | "'" | "*" | "+" | "/" | "=" | "?" | "^" | "_" | "`" | "{" | "|" | "}" | "~" | "-")+ ~ "@" ~ ('a'..'z' | 'A'..'Z' | '0'..'9') ~ (('a'..'z' | 'A'..'Z' | '0'..'9' | "-"){0,61} ~ ('a'..'z' | 'A'..'Z' | '0'..'9')?)? ~ ("." ~ (('a'..'z' | 'A'..'Z' | '0'..'9' | "-"){0,61} ~ ('a'..'z' | 'A'..'Z' | '0'..'9')?)?)* ~ ">" }
    -
    -shortcode_rule = { ":" ~ ('A'..'Z' | 'a'..'z' | "-" | "_")+ ~ ":" }
    -
    -spacechars = { space_char+ }
    -
    -escaped_char = _{ "\\" ~ ANY }
    -link_title = { "\"" ~ (escaped_char | (!("\"" | "\x00") ~ ANY))* ~ "\"" | "'" ~ (escaped_char | (!("'" | "\x00") ~ ANY))* ~ "'" | "(" ~ (escaped_char | (!("(" | ")" | "\x00") ~ ANY))* ~ ")" }
    -
    -table_spacechar = _{ " " | "\t" | "\x0b" | "\x0c" }
    -table_newline = _{ "\r"? ~ "\n" }
    -table_marker = _{ table_spacechar* ~ ":"? ~ "-"+ ~ ":"? ~ table_spacechar* }
    -table_cell = { ( escaped_char | !("|" | "\r" | "\n") ~ ANY)* }
    -
    -table_start = { "|"? ~ table_marker ~ ("|" ~ table_marker)* ~ "|"? ~ table_spacechar* ~ table_newline }
    -table_cell_end = { "|" ~ table_spacechar* ~ table_newline? }
    -table_row_end = { table_spacechar* ~ table_newline }
    -
    -dangerous_url = { ^"data:" ~ !(^"image/" ~ (^"png" | ^"gif" | ^"jpeg" | ^"webp")) | ^"javascript:" | ^"vbscript:" | ^"file:" }
    
  • src/nodes.rs+2 0 modified
    @@ -387,6 +387,7 @@ pub struct Ast {
         pub(crate) content: Vec<u8>,
         pub(crate) open: bool,
         pub(crate) last_line_blank: bool,
    +    pub(crate) table_visited: bool,
     }
     
     impl Ast {
    @@ -398,6 +399,7 @@ impl Ast {
                 start_line: 0,
                 open: true,
                 last_line_blank: false,
    +            table_visited: false,
             }
         }
     }
    
  • src/parser/autolink.rs+7 11 modified
    @@ -120,7 +120,7 @@ fn check_domain(data: &[u8], allow_short: bool) -> Option<usize> {
             }
         }
     
    -    if uscore1 > 0 || uscore2 > 0 {
    +    if (uscore1 > 0 || uscore2 > 0) && np <= 10 {
             None
         } else if allow_short || np > 0 {
             Some(data.len())
    @@ -255,7 +255,6 @@ fn email_match<'a>(
         let size = contents.len();
     
         let mut rewind = 0;
    -    let mut ns = 0;
     
         while rewind < i {
             let c = contents[i - rewind - 1];
    @@ -265,19 +264,14 @@ fn email_match<'a>(
                 continue;
             }
     
    -        if c == b'/' {
    -            ns += 1;
    -        }
    -
             break;
         }
     
    -    if rewind == 0 || ns > 0 {
    +    if rewind == 0 {
             return None;
         }
     
    -    let mut link_end = 0;
    -    let mut nb = 0;
    +    let mut link_end = 1;
         let mut np = 0;
     
         while link_end < size - i {
    @@ -286,7 +280,7 @@ fn email_match<'a>(
             if isalnum(c) {
                 // empty
             } else if c == b'@' {
    -            nb += 1;
    +            return None;
             } else if c == b'.' && link_end < size - i - 1 && isalnum(contents[i + link_end + 1]) {
                 np += 1;
             } else if c != b'-' && c != b'_' {
    @@ -297,14 +291,16 @@ fn email_match<'a>(
         }
     
         if link_end < 2
    -        || nb != 1
             || np == 0
             || (!isalpha(contents[i + link_end - 1]) && contents[i + link_end - 1] != b'.')
         {
             return None;
         }
     
         link_end = autolink_delim(&contents[i..], link_end);
    +    if link_end == 0 {
    +        return None;
    +    }
     
         let mut url = b"mailto:".to_vec();
         url.extend_from_slice(&contents[i - rewind..link_end + i]);
    
  • src/parser/inlines.rs+151 101 modified
    @@ -24,13 +24,15 @@ pub struct Subject<'a: 'd, 'r, 'o, 'd, 'i, 'c: 'subj, 'subj> {
         options: &'o ComrakOptions,
         pub input: &'i [u8],
         pub pos: usize,
    +    flags: Flags,
         pub refmap: &'r mut HashMap<Vec<u8>, Reference>,
         delimiter_arena: &'d Arena<Delimiter<'a, 'd>>,
         last_delimiter: Option<&'d Delimiter<'a, 'd>>,
    -    brackets: Vec<Bracket<'a, 'd>>,
    +    brackets: Vec<Bracket<'a>>,
         within_brackets: bool,
         pub backticks: [usize; MAXBACKTICKS + 1],
         pub scanned_for_backticks: bool,
    +    no_link_openers: bool,
         special_chars: [bool; 256],
         skip_chars: [bool; 256],
         smart_chars: [bool; 256],
    @@ -40,8 +42,17 @@ pub struct Subject<'a: 'd, 'r, 'o, 'd, 'i, 'c: 'subj, 'subj> {
         callback: Option<&'subj mut Callback<'c>>,
     }
     
    +#[derive(Default)]
    +struct Flags {
    +    skip_html_cdata: bool,
    +    skip_html_declaration: bool,
    +    skip_html_pi: bool,
    +    skip_html_comment: bool,
    +}
    +
     pub struct Delimiter<'a: 'd, 'd> {
         inl: &'a AstNode<'a>,
    +    position: usize,
         length: usize,
         delim_char: u8,
         can_open: bool,
    @@ -50,12 +61,10 @@ pub struct Delimiter<'a: 'd, 'd> {
         next: Cell<Option<&'d Delimiter<'a, 'd>>>,
     }
     
    -struct Bracket<'a: 'd, 'd> {
    -    previous_delimiter: Option<&'d Delimiter<'a, 'd>>,
    +struct Bracket<'a> {
         inl_text: &'a AstNode<'a>,
         position: usize,
         image: bool,
    -    active: bool,
         bracket_after: bool,
     }
     
    @@ -73,13 +82,15 @@ impl<'a, 'r, 'o, 'd, 'i, 'c, 'subj> Subject<'a, 'r, 'o, 'd, 'i, 'c, 'subj> {
                 options,
                 input,
                 pos: 0,
    +            flags: Flags::default(),
                 refmap,
                 delimiter_arena,
                 last_delimiter: None,
                 brackets: vec![],
                 within_brackets: false,
                 backticks: [0; MAXBACKTICKS + 1],
                 scanned_for_backticks: false,
    +            no_link_openers: true,
                 special_chars: [false; 256],
                 skip_chars: [false; 256],
                 smart_chars: [false; 256],
    @@ -242,42 +253,40 @@ impl<'a, 'r, 'o, 'd, 'i, 'c, 'subj> Subject<'a, 'r, 'o, 'd, 'i, 'c, 'subj> {
         // different emphasis. Note also that "_"- and "*"-delimited regions have
         // complex rules for which can be opening and/or closing delimiters,
         // determined in `scan_delims`.
    -    pub fn process_emphasis(&mut self, stack_bottom: Option<&'d Delimiter<'a, 'd>>) {
    -        let mut closer = self.last_delimiter;
    -
    +    pub fn process_emphasis(&mut self, stack_bottom: usize) {
             // This array is an important optimization that prevents searching down
             // the stack for openers we've previously searched for and know don't
             // exist, preventing exponential blowup on pathological cases.
    -        let mut openers_bottom: [Option<&'d Delimiter<'a, 'd>>; 11] = [stack_bottom; 11];
    +        let mut openers_bottom: [usize; 11] = [stack_bottom; 11];
     
             // This is traversing the stack from the top to the bottom, setting `closer` to
             // the delimiter directly above `stack_bottom`. In the case where we are processing
             // emphasis on an entire block, `stack_bottom` is `None`, so `closer` references
             // the very bottom of the stack.
    -        while closer.is_some() && !Self::del_ref_eq(closer.unwrap().prev.get(), stack_bottom) {
    -            closer = closer.unwrap().prev.get();
    +        let mut candidate = self.last_delimiter;
    +        let mut closer: Option<&Delimiter> = None;
    +        while candidate.map_or(false, |c| c.position >= stack_bottom) {
    +            closer = candidate;
    +            candidate = candidate.unwrap().prev.get();
             }
     
    -        while closer.is_some() {
    -            if closer.unwrap().can_close {
    +        while let Some(c) = closer {
    +            if c.can_close {
                     // Each time through the outer `closer` loop we reset the opener
                     // to the element below the closer, and search down the stack
                     // for a matching opener.
     
    -                let mut opener = closer.unwrap().prev.get();
    +                let mut opener = c.prev.get();
                     let mut opener_found = false;
                     let mut mod_three_rule_invoked = false;
     
    -                let ix = match closer.unwrap().delim_char {
    +                let ix = match c.delim_char {
                         b'~' => 0,
                         b'^' => 1,
                         b'"' => 2,
                         b'\'' => 3,
                         b'_' => 4,
    -                    b'*' => {
    -                        5 + (if closer.unwrap().can_open { 3 } else { 0 })
    -                            + (closer.unwrap().length % 3)
    -                    }
    +                    b'*' => 5 + (if c.can_open { 3 } else { 0 }) + (c.length % 3),
                         _ => unreachable!(),
                     };
     
    @@ -292,10 +301,9 @@ impl<'a, 'r, 'o, 'd, 'i, 'c, 'subj> Subject<'a, 'r, 'o, 'd, 'i, 'c, 'subj> {
                     // This search short-circuits for openers we've previously
                     // failed to find, avoiding repeatedly rescanning the bottom of
                     // the stack, using the openers_bottom array.
    -                while opener.is_some() && !Self::del_ref_eq(opener, openers_bottom[ix]) {
    -                    if opener.unwrap().can_open
    -                        && opener.unwrap().delim_char == closer.unwrap().delim_char
    -                    {
    +                while opener.map_or(false, |o| o.position >= openers_bottom[ix]) {
    +                    let o = opener.unwrap();
    +                    if o.can_open && o.delim_char == c.delim_char {
                             // This is a bit convoluted; see points 9 and 10 here:
                             // http://spec.commonmark.org/0.28/#can-open-emphasis.
                             // This is to aid processing of runs like this:
    @@ -305,29 +313,28 @@ impl<'a, 'r, 'o, 'd, 'i, 'c, 'subj> Subject<'a, 'r, 'o, 'd, 'i, 'c, 'subj> {
                             // that matches the last ** or *, we need to skip it,
                             // and this algorithm ensures we do. (The sum of the
                             // lengths are a multiple of 3.)
    -                        let odd_match = (closer.unwrap().can_open || opener.unwrap().can_close)
    -                            && ((opener.unwrap().length + closer.unwrap().length) % 3 == 0)
    -                            && !(opener.unwrap().length % 3 == 0
    -                                && closer.unwrap().length % 3 == 0);
    +                        let odd_match = (c.can_open || o.can_close)
    +                            && ((o.length + c.length) % 3 == 0)
    +                            && !(o.length % 3 == 0 && c.length % 3 == 0);
                             if !odd_match {
                                 opener_found = true;
                                 break;
                             } else {
                                 mod_three_rule_invoked = true;
                             }
                         }
    -                    opener = opener.unwrap().prev.get();
    +                    opener = o.prev.get();
                     }
     
    -                let old_closer = closer;
    +                let old_c = c;
     
                     // There's a case here for every possible delimiter. If we found
                     // a matching opening delimiter for our closing delimiter, they
                     // both get passed.
    -                if closer.unwrap().delim_char == b'*'
    -                    || closer.unwrap().delim_char == b'_'
    -                    || (self.options.extension.strikethrough && closer.unwrap().delim_char == b'~')
    -                    || (self.options.extension.superscript && closer.unwrap().delim_char == b'^')
    +                if c.delim_char == b'*'
    +                    || c.delim_char == b'_'
    +                    || (self.options.extension.strikethrough && c.delim_char == b'~')
    +                    || (self.options.extension.superscript && c.delim_char == b'^')
                     {
                         if opener_found {
                             // Finally, here's the happy case where the delimiters
    @@ -343,42 +350,20 @@ impl<'a, 'r, 'o, 'd, 'i, 'c, 'subj> Subject<'a, 'r, 'o, 'd, 'i, 'c, 'subj> {
                             //
                             // In general though the closer will be the next
                             // delimiter up the stack.
    -                        closer = self.insert_emph(opener.unwrap(), closer.unwrap());
    +                        closer = self.insert_emph(opener.unwrap(), c);
                         } else {
                             // When no matching opener is found we move the closer
                             // up the stack, do some bookkeeping with old_closer
                             // (below), try again.
    -                        closer = closer.unwrap().next.get();
    +                        closer = c.next.get();
                         }
    -                } else if closer.unwrap().delim_char == b'\'' {
    -                    *closer
    -                        .unwrap()
    -                        .inl
    -                        .data
    -                        .borrow_mut()
    -                        .value
    -                        .text_mut()
    -                        .unwrap() = "’".to_string().into_bytes();
    -                    if opener_found {
    -                        *opener
    -                            .unwrap()
    -                            .inl
    -                            .data
    -                            .borrow_mut()
    -                            .value
    -                            .text_mut()
    -                            .unwrap() = "‘".to_string().into_bytes();
    -                    }
    -                    closer = closer.unwrap().next.get();
    -                } else if closer.unwrap().delim_char == b'"' {
    -                    *closer
    -                        .unwrap()
    -                        .inl
    -                        .data
    -                        .borrow_mut()
    -                        .value
    -                        .text_mut()
    -                        .unwrap() = "”".to_string().into_bytes();
    +                } else if c.delim_char == b'\'' || c.delim_char == b'"' {
    +                    *c.inl.data.borrow_mut().value.text_mut().unwrap() =
    +                        if c.delim_char == b'\'' { "’" } else { "”" }
    +                            .to_string()
    +                            .into_bytes();
    +                    closer = c.next.get();
    +
                         if opener_found {
                             *opener
                                 .unwrap()
    @@ -387,9 +372,16 @@ impl<'a, 'r, 'o, 'd, 'i, 'c, 'subj> Subject<'a, 'r, 'o, 'd, 'i, 'c, 'subj> {
                                 .borrow_mut()
                                 .value
                                 .text_mut()
    -                            .unwrap() = "“".to_string().into_bytes();
    +                            .unwrap() = if old_c.delim_char == b'\'' {
    +                            "‘"
    +                        } else {
    +                            "“"
    +                        }
    +                        .to_string()
    +                        .into_bytes();
    +                        self.remove_delimiter(opener.unwrap());
    +                        self.remove_delimiter(old_c);
                         }
    -                    closer = closer.unwrap().next.get();
                     }
     
                     // If the search for an opener was unsuccessful, then record
    @@ -398,31 +390,32 @@ impl<'a, 'r, 'o, 'd, 'i, 'c, 'subj> Subject<'a, 'r, 'o, 'd, 'i, 'c, 'subj> {
                     // same opener at the bottom of the stack later.
                     if !opener_found {
                         if !mod_three_rule_invoked {
    -                        openers_bottom[ix] = old_closer.unwrap().prev.get();
    +                        openers_bottom[ix] = old_c.position;
                         }
     
                         // Now that we've failed the `opener` search starting from
                         // `old_closer`, future opener searches will be searching it
                         // for openers - if `old_closer` can't be used as an opener
                         // then we know it's just text - remove it from the
                         // delimiter stack, leaving it in the AST as text
    -                    if !old_closer.unwrap().can_open {
    -                        self.remove_delimiter(old_closer.unwrap());
    +                    if !old_c.can_open {
    +                        self.remove_delimiter(old_c);
                         }
                     }
                 } else {
                     // Closer is !can_close. Move up the stack
    -                closer = closer.unwrap().next.get();
    +                closer = c.next.get();
                 }
             }
     
             // At this point the entire delimiter stack from `stack_bottom` up has
             // been scanned for matches, everything left is just text. Pop it all
             // off.
    -        while self.last_delimiter.is_some() && !Self::del_ref_eq(self.last_delimiter, stack_bottom)
    +        while self
    +            .last_delimiter
    +            .map_or(false, |d| d.position >= stack_bottom)
             {
    -            let last_del = self.last_delimiter.unwrap();
    -            self.remove_delimiter(last_del);
    +            self.remove_delimiter(self.last_delimiter.unwrap());
             }
         }
     
    @@ -712,7 +705,10 @@ impl<'a, 'r, 'o, 'd, 'i, 'c, 'subj> Subject<'a, 'r, 'o, 'd, 'i, 'c, 'subj> {
             } else if c == b'\'' || c == b'"' {
                 (
                     numdelims,
    -                left_flanking && !right_flanking && before_char != ']' && before_char != ')',
    +                left_flanking
    +                    && (!right_flanking || before_char == '(' || before_char == '[')
    +                    && before_char != ']'
    +                    && before_char != ')',
                     right_flanking,
                 )
             } else {
    @@ -725,6 +721,7 @@ impl<'a, 'r, 'o, 'd, 'i, 'c, 'subj> Subject<'a, 'r, 'o, 'd, 'i, 'c, 'subj> {
                 prev: Cell::new(self.last_delimiter),
                 next: Cell::new(None),
                 inl,
    +            position: self.pos,
                 length: inl.data.borrow().value.text().unwrap().len(),
                 delim_char: c,
                 can_open,
    @@ -909,7 +906,68 @@ impl<'a, 'r, 'o, 'd, 'i, 'c, 'subj> Subject<'a, 'r, 'o, 'd, 'i, 'c, 'subj> {
                 return inl;
             }
     
    -        if let Some(matchlen) = scanners::html_tag(&self.input[self.pos..]) {
    +        // Most comments below are verbatim from cmark upstream.
    +        let mut matchlen: Option<usize> = None;
    +
    +        if self.pos + 2 <= self.input.len() {
    +            let c = self.input[self.pos];
    +            if c == b'!' && !self.flags.skip_html_comment {
    +                let c = self.input[self.pos + 1];
    +                if c == b'-' && self.input[self.pos + 2] == b'-' {
    +                    if self.input[self.pos + 3] == b'>' {
    +                        matchlen = Some(4);
    +                    } else if self.input[self.pos + 3] == b'-' && self.input[self.pos + 4] == b'>' {
    +                        matchlen = Some(5);
    +                    } else {
    +                        if let Some(m) = scanners::html_comment(&self.input[self.pos + 1..]) {
    +                            matchlen = Some(m + 1);
    +                        } else {
    +                            self.flags.skip_html_comment = true;
    +                        }
    +                    }
    +                } else if c == b'[' {
    +                    if !self.flags.skip_html_cdata {
    +                        if let Some(m) = scanners::html_cdata(&self.input[self.pos + 2..]) {
    +                            // The regex doesn't require the final "]]>". But if we're not at
    +                            // the end of input, it must come after the match. Otherwise,
    +                            // disable subsequent scans to avoid quadratic behavior.
    +
    +                            // Adding 5 to matchlen for prefix "![", suffix "]]>"
    +                            if self.pos + m + 5 > self.input.len() {
    +                                self.flags.skip_html_cdata = true;
    +                            } else {
    +                                matchlen = Some(m + 5);
    +                            }
    +                        }
    +                    }
    +                } else if !self.flags.skip_html_declaration {
    +                    if let Some(m) = scanners::html_declaration(&self.input[self.pos + 1..]) {
    +                        // Adding 2 to matchlen for prefix "!", suffix ">"
    +                        if self.pos + m + 2 > self.input.len() {
    +                            self.flags.skip_html_declaration = true;
    +                        } else {
    +                            matchlen = Some(m + 2);
    +                        }
    +                    }
    +                }
    +            } else if c == b'?' {
    +                if !self.flags.skip_html_pi {
    +                    // Note that we allow an empty match.
    +                    let m = scanners::html_processing_instruction(&self.input[self.pos + 1..])
    +                        .unwrap_or(0);
    +                    // Adding 3 to matchlen fro prefix "?", suffix "?>"
    +                    if self.pos + m + 3 > self.input.len() {
    +                        self.flags.skip_html_pi = true;
    +                    } else {
    +                        matchlen = Some(m + 3);
    +                    }
    +                }
    +            } else {
    +                matchlen = scanners::html_tag(&self.input[self.pos..]);
    +            }
    +        }
    +
    +        if let Some(matchlen) = matchlen {
                 let contents = &self.input[self.pos - 1..self.pos + matchlen];
                 let inl = make_inline(self.arena, NodeValue::HtmlInline(contents.to_vec()));
                 self.pos += matchlen;
    @@ -925,13 +983,14 @@ impl<'a, 'r, 'o, 'd, 'i, 'c, 'subj> Subject<'a, 'r, 'o, 'd, 'i, 'c, 'subj> {
                 self.brackets[len - 1].bracket_after = true;
             }
             self.brackets.push(Bracket {
    -            previous_delimiter: self.last_delimiter,
                 inl_text,
                 position: self.pos,
                 image,
    -            active: true,
                 bracket_after: false,
             });
    +        if !image {
    +            self.no_link_openers = false;
    +        }
         }
     
         pub fn handle_close_bracket(&mut self) -> Option<&'a AstNode<'a>> {
    @@ -943,12 +1002,13 @@ impl<'a, 'r, 'o, 'd, 'i, 'c, 'subj> Subject<'a, 'r, 'o, 'd, 'i, 'c, 'subj> {
                 return Some(make_inline(self.arena, NodeValue::Text(b"]".to_vec())));
             }
     
    -        if !self.brackets[brackets_len - 1].active {
    +        let is_image = self.brackets[brackets_len - 1].image;
    +
    +        if !is_image && self.no_link_openers {
                 self.brackets.pop();
                 return Some(make_inline(self.arena, NodeValue::Text(b"]".to_vec())));
             }
     
    -        let is_image = self.brackets[brackets_len - 1].image;
             let after_link_text_pos = self.pos;
     
             // Try to find a link destination within parenthesis
    @@ -958,11 +1018,13 @@ impl<'a, 'r, 'o, 'd, 'i, 'c, 'subj> Subject<'a, 'r, 'o, 'd, 'i, 'c, 'subj> {
             let mut n: usize = 0;
             if self.peek_char() == Some(&(b'(')) && {
                 sps = scanners::spacechars(&self.input[self.pos + 1..]).unwrap_or(0);
    -            unwrap_into_2(
    -                manual_scan_link_url(&self.input[self.pos + 1 + sps..]),
    -                &mut url,
    -                &mut n,
    -            )
    +            let offset = self.pos + 1 + sps;
    +            offset < self.input.len()
    +                && unwrap_into_2(
    +                    manual_scan_link_url(&self.input[offset..]),
    +                    &mut url,
    +                    &mut n,
    +                )
             } {
                 let starturl = self.pos + 1 + sps;
                 let endurl = starturl + n;
    @@ -1041,8 +1103,7 @@ impl<'a, 'r, 'o, 'd, 'i, 'c, 'subj> Subject<'a, 'r, 'o, 'd, 'i, 'c, 'subj> {
                         .unwrap()
                         .detach();
                     self.brackets[brackets_len - 1].inl_text.detach();
    -                let previous_delimiter = self.brackets[brackets_len - 1].previous_delimiter;
    -                self.process_emphasis(previous_delimiter);
    +                self.process_emphasis(self.brackets[brackets_len - 1].position);
                     self.brackets.pop();
                     return None;
                 }
    @@ -1064,31 +1125,19 @@ impl<'a, 'r, 'o, 'd, 'i, 'c, 'subj> Subject<'a, 'r, 'o, 'd, 'i, 'c, 'subj> {
                 },
             );
     
    -        let mut brackets_len = self.brackets.len();
    +        let brackets_len = self.brackets.len();
             self.brackets[brackets_len - 1].inl_text.insert_before(inl);
             let mut tmpch = self.brackets[brackets_len - 1].inl_text.next_sibling();
             while let Some(tmp) = tmpch {
                 tmpch = tmp.next_sibling();
                 inl.append(tmp);
             }
             self.brackets[brackets_len - 1].inl_text.detach();
    -        let previous_delimiter = self.brackets[brackets_len - 1].previous_delimiter;
    -        self.process_emphasis(previous_delimiter);
    +        self.process_emphasis(self.brackets[brackets_len - 1].position);
             self.brackets.pop();
    -        brackets_len -= 1;
     
             if !is_image {
    -            let mut i = brackets_len as i32 - 1;
    -            while i >= 0 {
    -                if !self.brackets[i as usize].image {
    -                    if !self.brackets[i as usize].active {
    -                        break;
    -                    } else {
    -                        self.brackets[i as usize].active = false;
    -                    }
    -                }
    -                i -= 1;
    -            }
    +            self.no_link_openers = true;
             }
         }
     
    @@ -1199,7 +1248,7 @@ pub fn manual_scan_link_url_2(input: &[u8]) -> Option<(&[u8], usize)> {
             }
         }
     
    -    if i >= len {
    +    if i >= len || nb_p != 0 {
             None
         } else {
             Some((&input[..i], i))
    @@ -1213,6 +1262,7 @@ pub fn make_inline<'a>(arena: &'a Arena<AstNode<'a>>, value: NodeValue) -> &'a A
             start_line: 0,
             open: false,
             last_line_blank: false,
    +        table_visited: false,
         };
         arena.alloc(Node::new(RefCell::new(ast)))
     }
    
  • src/parser/mod.rs+165 84 modified
    @@ -31,8 +31,11 @@ const TAB_STOP: usize = 4;
     const CODE_INDENT: usize = 4;
     
     macro_rules! node_matches {
    -    ($node:expr, $pat:pat) => {{
    -        matches!($node.data.borrow().value, $pat)
    +    ($node:expr, $( $pat:pat )|+) => {{
    +        matches!(
    +            $node.data.borrow().value,
    +            $( $pat )|+
    +        )
         }};
     }
     
    @@ -99,10 +102,12 @@ pub fn parse_document_with_broken_link_callback<'a, 'c>(
             start_line: 0,
             open: true,
             last_line_blank: false,
    +        table_visited: false,
         })));
         let mut parser = Parser::new(arena, root, options, callback);
    -    parser.feed(buffer);
    -    parser.finish()
    +    let mut linebuf = Vec::with_capacity(buffer.len());
    +    parser.feed(&mut linebuf, buffer, true);
    +    parser.finish(linebuf)
     }
     
     type Callback<'c> = &'c mut dyn FnMut(&[u8]) -> Option<(Vec<u8>, Vec<u8>)>;
    @@ -115,12 +120,14 @@ pub struct Parser<'a, 'o, 'c> {
         line_number: u32,
         offset: usize,
         column: usize,
    +    thematic_break_kill_pos: usize,
         first_nonspace: usize,
         first_nonspace_column: usize,
         indent: usize,
         blank: bool,
         partially_consumed_tab: bool,
         last_line_length: usize,
    +    last_buffer_ended_with_cr: bool,
         options: &'o ComrakOptions,
         callback: Option<Callback<'c>>,
     }
    @@ -585,21 +592,28 @@ impl<'a, 'o, 'c> Parser<'a, 'o, 'c> {
                 line_number: 0,
                 offset: 0,
                 column: 0,
    +            thematic_break_kill_pos: 0,
                 first_nonspace: 0,
                 first_nonspace_column: 0,
                 indent: 0,
                 blank: false,
                 partially_consumed_tab: false,
                 last_line_length: 0,
    +            last_buffer_ended_with_cr: false,
                 options,
                 callback,
             }
         }
     
    -    fn feed(&mut self, s: &str) {
    -        let mut i = 0;
    +    fn feed(&mut self, linebuf: &mut Vec<u8>, s: &str, eof: bool) {
    +        let mut buffer = 0;
             let s = s.as_bytes();
     
    +        if self.last_buffer_ended_with_cr && s.len() > 0 && s[0] == b'\n' {
    +            buffer += 1;
    +        }
    +        self.last_buffer_ended_with_cr = false;
    +
             if let Some(ref delimiter) = self.options.extension.front_matter_delimiter {
                 let front_matter_pattern = RegexBuilder::new(&format!(
                     "\\A(?:\u{feff})?{delim}\\r?\\n.*^{delim}\\r?\\n(?:\\r?\\n)?",
    @@ -609,81 +623,144 @@ impl<'a, 'o, 'c> Parser<'a, 'o, 'c> {
                 .dot_matches_new_line(true)
                 .build()
                 .unwrap();
    -            if let Some(front_matter_size) = front_matter_pattern.shortest_match(s) {
    -                i += front_matter_size;
    -                let node = self.add_child(self.root, NodeValue::FrontMatter(s[..i].to_vec()));
    +            if let Some(front_matter_size) = front_matter_pattern.shortest_match(&s[buffer..]) {
    +                let node = self.add_child(
    +                    self.root,
    +                    NodeValue::FrontMatter(s[buffer..buffer + front_matter_size].to_vec()),
    +                );
    +                buffer += front_matter_size;
                     self.finalize(node).unwrap();
                 }
             }
     
    -        let sz = s.len();
    -        let mut linebuf = vec![];
    +        let end = s.len();
     
    -        while i < sz {
    -            let mut process = true;
    -            let mut eol = i;
    -            while eol < sz {
    +        while buffer < end {
    +            let mut process = false;
    +            let mut eol = buffer;
    +            while eol < end {
                     if strings::is_line_end_char(s[eol]) {
    +                    process = true;
                         break;
                     }
                     if s[eol] == 0 {
    -                    process = false;
                         break;
                     }
                     eol += 1;
                 }
     
    +            if eol >= end && eof {
    +                process = true;
    +            }
    +
                 if process {
                     if !linebuf.is_empty() {
    -                    linebuf.extend_from_slice(&s[i..eol]);
    +                    linebuf.extend_from_slice(&s[buffer..eol]);
                         self.process_line(&linebuf);
                         linebuf.truncate(0);
    -                } else if sz > eol && s[eol] == b'\n' {
    -                    self.process_line(&s[i..eol + 1]);
                     } else {
    -                    self.process_line(&s[i..eol]);
    +                    self.process_line(&s[buffer..eol]);
                     }
    -
    -                i = eol;
    -                if i < sz && s[i] == b'\r' {
    -                    i += 1;
    +            } else {
    +                if eol < end && s[eol] == b'\0' {
    +                    linebuf.extend_from_slice(&s[buffer..eol]);
    +                    linebuf.extend_from_slice(&"\u{fffd}".to_string().into_bytes());
    +                } else {
    +                    linebuf.extend_from_slice(&s[buffer..eol]);
                     }
    -                if i < sz && s[i] == b'\n' {
    -                    i += 1;
    +            }
    +
    +            buffer = eol;
    +            if buffer < end {
    +                if s[buffer] == b'\0' {
    +                    buffer += 1;
    +                } else {
    +                    if s[buffer] == b'\r' {
    +                        buffer += 1;
    +                        if buffer == end {
    +                            self.last_buffer_ended_with_cr = true;
    +                        }
    +                    }
    +                    if buffer < end && s[buffer] == b'\n' {
    +                        buffer += 1;
    +                    }
                     }
    -            } else {
    -                debug_assert!(eol < sz && s[eol] == b'\0');
    -                linebuf.extend_from_slice(&s[i..eol]);
    -                linebuf.extend_from_slice(&"\u{fffd}".to_string().into_bytes());
    -                i = eol + 1;
                 }
             }
         }
     
    -    fn find_first_nonspace(&mut self, line: &[u8]) {
    -        self.first_nonspace = self.offset;
    -        self.first_nonspace_column = self.column;
    -        let mut chars_to_tab = TAB_STOP - (self.column % TAB_STOP);
    +    fn scan_thematic_break_inner(&mut self, line: &[u8]) -> (usize, bool) {
    +        let mut i = self.first_nonspace;
    +
    +        if i >= line.len() {
    +            return (i, false);
    +        }
     
    +        let c = line[i];
    +        if c != b'*' && c != b'_' && c != b'-' {
    +            return (i, false);
    +        }
    +
    +        let mut count = 1;
    +        let mut nextc;
             loop {
    -            if self.first_nonspace >= line.len() {
    +            i += 1;
    +            if i >= line.len() {
    +                return (i, false);
    +            }
    +            nextc = line[i];
    +
    +            if nextc == c {
    +                count += 1;
    +            } else if nextc != b' ' && nextc != b'\t' {
                     break;
                 }
    -            match line[self.first_nonspace] {
    -                32 => {
    -                    self.first_nonspace += 1;
    -                    self.first_nonspace_column += 1;
    -                    chars_to_tab -= 1;
    -                    if chars_to_tab == 0 {
    +        }
    +
    +        if count >= 3 && (nextc == b'\r' || nextc == b'\n') {
    +            ((i - self.first_nonspace) + 1, true)
    +        } else {
    +            (i, false)
    +        }
    +    }
    +
    +    fn scan_thematic_break(&mut self, line: &[u8]) -> Option<usize> {
    +        let (offset, found) = self.scan_thematic_break_inner(line);
    +        if !found {
    +            self.thematic_break_kill_pos = offset;
    +            None
    +        } else {
    +            Some(offset)
    +        }
    +    }
    +
    +    fn find_first_nonspace(&mut self, line: &[u8]) {
    +        let mut chars_to_tab = TAB_STOP - (self.column % TAB_STOP);
    +
    +        if self.first_nonspace <= self.offset {
    +            self.first_nonspace = self.offset;
    +            self.first_nonspace_column = self.column;
    +
    +            loop {
    +                if self.first_nonspace >= line.len() {
    +                    break;
    +                }
    +                match line[self.first_nonspace] {
    +                    32 => {
    +                        self.first_nonspace += 1;
    +                        self.first_nonspace_column += 1;
    +                        chars_to_tab -= 1;
    +                        if chars_to_tab == 0 {
    +                            chars_to_tab = TAB_STOP;
    +                        }
    +                    }
    +                    9 => {
    +                        self.first_nonspace += 1;
    +                        self.first_nonspace_column += chars_to_tab;
                             chars_to_tab = TAB_STOP;
                         }
    +                    _ => break,
                     }
    -                9 => {
    -                    self.first_nonspace += 1;
    -                    self.first_nonspace_column += chars_to_tab;
    -                    chars_to_tab = TAB_STOP;
    -                }
    -                _ => break,
                 }
             }
     
    @@ -704,6 +781,10 @@ impl<'a, 'o, 'c> Parser<'a, 'o, 'c> {
     
             self.offset = 0;
             self.column = 0;
    +        self.first_nonspace = 0;
    +        self.first_nonspace_column = 0;
    +        self.indent = 0;
    +        self.thematic_break_kill_pos = 0;
             self.blank = false;
             self.partially_consumed_tab = false;
     
    @@ -825,10 +906,10 @@ impl<'a, 'o, 'c> Parser<'a, 'o, 'c> {
             let mut matched: usize = 0;
             let mut nl: NodeList = NodeList::default();
             let mut sc: scanners::SetextChar = scanners::SetextChar::Equals;
    -        let mut maybe_lazy = matches!(self.current.data.borrow().value, NodeValue::Paragraph);
    +        let mut maybe_lazy = node_matches!(self.current, NodeValue::Paragraph);
     
    -        while !matches!(
    -            container.data.borrow().value,
    +        while !node_matches!(
    +            container,
                 NodeValue::CodeBlock(..) | NodeValue::HtmlBlock(..)
             ) {
                 self.find_first_nonspace(line);
    @@ -889,13 +970,11 @@ impl<'a, 'o, 'c> Parser<'a, 'o, 'c> {
                     && (unwrap_into(
                         scanners::html_block_start(&line[self.first_nonspace..]),
                         &mut matched,
    -                ) || match container.data.borrow().value {
    -                    NodeValue::Paragraph => false,
    -                    _ => unwrap_into(
    +                ) || (!node_matches!(container, NodeValue::Paragraph)
    +                    && unwrap_into(
                             scanners::html_block_start_7(&line[self.first_nonspace..]),
                             &mut matched,
    -                    ),
    -                })
    +                    )))
                 {
                     let nhb = NodeHtmlBlock {
                         block_type: matched as u8,
    @@ -904,13 +983,11 @@ impl<'a, 'o, 'c> Parser<'a, 'o, 'c> {
     
                     *container = self.add_child(*container, NodeValue::HtmlBlock(nhb));
                 } else if !indented
    -                && match container.data.borrow().value {
    -                    NodeValue::Paragraph => unwrap_into(
    -                        scanners::setext_heading_line(&line[self.first_nonspace..]),
    -                        &mut sc,
    -                    ),
    -                    _ => false,
    -                }
    +                && node_matches!(container, NodeValue::Paragraph)
    +                && unwrap_into(
    +                    scanners::setext_heading_line(&line[self.first_nonspace..]),
    +                    &mut sc,
    +                )
                 {
                     let has_content = {
                         let mut ast = container.data.borrow_mut();
    @@ -928,13 +1005,12 @@ impl<'a, 'o, 'c> Parser<'a, 'o, 'c> {
                         self.advance_offset(line, adv, false);
                     }
                 } else if !indented
    -                && match (&container.data.borrow().value, all_matched) {
    -                    (&NodeValue::Paragraph, false) => false,
    -                    _ => unwrap_into(
    -                        scanners::thematic_break(&line[self.first_nonspace..]),
    -                        &mut matched,
    -                    ),
    -                }
    +                && !matches!(
    +                    (&container.data.borrow().value, all_matched),
    +                    (&NodeValue::Paragraph, false)
    +                )
    +                && self.thematic_break_kill_pos <= self.first_nonspace
    +                && unwrap_into(self.scan_thematic_break(line), &mut matched)
                 {
                     *container = self.add_child(*container, NodeValue::ThematicBreak);
                     let adv = line.len() - 1 - self.offset;
    @@ -961,13 +1037,13 @@ impl<'a, 'o, 'c> Parser<'a, 'o, 'c> {
                     if strings::is_space_or_tab(line[self.offset]) {
                         self.advance_offset(line, 1, true);
                     }
    -            } else if (!indented || matches!(container.data.borrow().value, NodeValue::List(..)))
    +            } else if (!indented || node_matches!(container, NodeValue::List(..)))
                     && self.indent < 4
                     && unwrap_into_2(
                         parse_list_marker(
                             line,
                             self.first_nonspace,
    -                        matches!(container.data.borrow().value, NodeValue::Paragraph),
    +                        node_matches!(container, NodeValue::Paragraph),
                         ),
                         &mut matched,
                         &mut nl,
    @@ -1025,14 +1101,17 @@ impl<'a, 'o, 'c> Parser<'a, 'o, 'c> {
                     };
     
                     match new_container {
    -                    Some((new_container, replace)) => {
    +                    Some((new_container, replace, mark_visited)) => {
                             if replace {
                                 container.insert_after(new_container);
                                 container.detach();
                                 *container = new_container;
                             } else {
                                 *container = new_container;
                             }
    +                        if mark_visited {
    +                            container.data.borrow_mut().table_visited = true;
    +                        }
                         }
                         _ => break,
                     }
    @@ -1281,7 +1360,7 @@ impl<'a, 'o, 'c> Parser<'a, 'o, 'c> {
             if !self.current.same_node(last_matched_container)
                 && container.same_node(last_matched_container)
                 && !self.blank
    -            && matches!(self.current.data.borrow().value, NodeValue::Paragraph)
    +            && node_matches!(self.current, NodeValue::Paragraph)
             {
                 self.add_line(self.current, line);
             } else {
    @@ -1373,7 +1452,11 @@ impl<'a, 'o, 'c> Parser<'a, 'o, 'c> {
             }
         }
     
    -    fn finish(&mut self) -> &'a AstNode<'a> {
    +    fn finish(&mut self, remaining: Vec<u8>) -> &'a AstNode<'a> {
    +        if !remaining.is_empty() {
    +            self.process_line(&remaining);
    +        }
    +
             self.finalize_document();
             self.postprocess_text_nodes(self.root);
             self.root
    @@ -1488,8 +1571,8 @@ impl<'a, 'o, 'c> Parser<'a, 'o, 'c> {
     
                         let mut subch = item.first_child();
                         while let Some(subitem) = subch {
    -                        if nodes::ends_with_blank_line(subitem)
    -                            && (item.next_sibling().is_some() || subitem.next_sibling().is_some())
    +                        if (item.next_sibling().is_some() || subitem.next_sibling().is_some())
    +                            && nodes::ends_with_blank_line(subitem)
                             {
                                 nl.tight = false;
                                 break;
    @@ -1537,7 +1620,7 @@ impl<'a, 'o, 'c> Parser<'a, 'o, 'c> {
     
             while subj.parse_inline(node) {}
     
    -        subj.process_emphasis(None);
    +        subj.process_emphasis(0);
     
             while subj.pop_bracket() {}
         }
    @@ -1704,14 +1787,12 @@ impl<'a, 'o, 'c> Parser<'a, 'o, 'c> {
                 return;
             }
     
    -        match parent.data.borrow().value {
    -            NodeValue::Paragraph => (),
    -            _ => return,
    +        if !node_matches!(parent, NodeValue::Paragraph) {
    +            return;
             }
     
    -        match parent.parent().unwrap().data.borrow().value {
    -            NodeValue::Item(..) => (),
    -            _ => return,
    +        if !node_matches!(parent.parent().unwrap(), NodeValue::Item(..)) {
    +            return;
             }
     
             *text = text[end..].to_vec();
    
  • src/parser/table.rs+36 35 modified
    @@ -11,7 +11,7 @@ pub fn try_opening_block<'a, 'o, 'c>(
         parser: &mut Parser<'a, 'o, 'c>,
         container: &'a AstNode<'a>,
         line: &[u8],
    -) -> Option<(&'a AstNode<'a>, bool)> {
    +) -> Option<(&'a AstNode<'a>, bool, bool)> {
         let aligns = match container.data.borrow().value {
             NodeValue::Paragraph => None,
             NodeValue::Table(ref aligns) => Some(aligns.clone()),
    @@ -28,20 +28,24 @@ fn try_opening_header<'a, 'o, 'c>(
         parser: &mut Parser<'a, 'o, 'c>,
         container: &'a AstNode<'a>,
         line: &[u8],
    -) -> Option<(&'a AstNode<'a>, bool)> {
    +) -> Option<(&'a AstNode<'a>, bool, bool)> {
    +    if container.data.borrow().table_visited {
    +        return Some((container, false, false));
    +    }
    +
         if scanners::table_start(&line[parser.first_nonspace..]).is_none() {
    -        return Some((container, false));
    +        return Some((container, false, false));
         }
     
    +    let marker_row = row(&line[parser.first_nonspace..]).unwrap();
    +
         let header_row = match row(&container.data.borrow().content) {
             Some(header_row) => header_row,
    -        None => return Some((container, false)),
    +        None => return Some((container, false, true)),
         };
     
    -    let marker_row = row(&line[parser.first_nonspace..]).unwrap();
    -
         if header_row.cells.len() != marker_row.cells.len() {
    -        return Some((container, false));
    +        return Some((container, false, true));
         }
     
         if header_row.paragraph_offset > 0 {
    @@ -82,15 +86,15 @@ fn try_opening_header<'a, 'o, 'c>(
         let offset = line.len() - 1 - parser.offset;
         parser.advance_offset(line, offset, false);
     
    -    Some((table, true))
    +    Some((table, true, false))
     }
     
     fn try_opening_row<'a, 'o, 'c>(
         parser: &mut Parser<'a, 'o, 'c>,
         container: &'a AstNode<'a>,
         alignments: &[TableAlignment],
         line: &[u8],
    -) -> Option<(&'a AstNode<'a>, bool)> {
    +) -> Option<(&'a AstNode<'a>, bool, bool)> {
         if parser.blank {
             return None;
         }
    @@ -112,7 +116,7 @@ fn try_opening_row<'a, 'o, 'c>(
         let offset = line.len() - 1 - parser.offset;
         parser.advance_offset(line, offset, false);
     
    -    Some((new_row, false))
    +    Some((new_row, false, false))
     }
     
     struct Row {
    @@ -123,50 +127,47 @@ struct Row {
     fn row(string: &[u8]) -> Option<Row> {
         let len = string.len();
         let mut cells = vec![];
    -    let mut offset = 0;
     
    -    if len > 0 && string[0] == b'|' {
    -        offset += 1;
    -    }
    +    let mut offset = scanners::table_cell_end(string).unwrap_or(0);
     
         let mut paragraph_offset: usize = 0;
    +    let mut expect_more_cells = true;
     
    -    loop {
    +    while offset < len && expect_more_cells {
             let cell_matched = scanners::table_cell(&string[offset..]).unwrap_or(0);
    -        let mut pipe_matched =
    -            scanners::table_cell_end(&string[offset + cell_matched..]).unwrap_or(0);
    +        let pipe_matched = scanners::table_cell_end(&string[offset + cell_matched..]).unwrap_or(0);
     
             if cell_matched > 0 || pipe_matched > 0 {
    -            let cell_end_offset = offset + cell_matched - 1;
    -
    -            if string[cell_end_offset] == b'\n' || string[cell_end_offset] == b'\r' {
    -                paragraph_offset = cell_end_offset;
    -                cells.clear();
    -            } else {
    -                let mut cell = unescape_pipes(&string[offset..offset + cell_matched]);
    -                trim(&mut cell);
    -                cells.push(cell);
    -            }
    +            let mut cell = unescape_pipes(&string[offset..offset + cell_matched]);
    +            trim(&mut cell);
    +            cells.push(cell);
             }
     
             offset += cell_matched + pipe_matched;
     
    -        if pipe_matched == 0 {
    -            pipe_matched = scanners::table_row_end(&string[offset..]).unwrap_or(0);
    -            offset += pipe_matched;
    -        }
    +        if pipe_matched > 0 {
    +            expect_more_cells = true;
    +        } else {
    +            let row_end_offset = scanners::table_row_end(&string[offset..]).unwrap_or(0);
    +            offset += row_end_offset;
     
    -        if !((cell_matched > 0 || pipe_matched > 0) && offset < len) {
    -            break;
    +            if row_end_offset > 0 && offset != len {
    +                paragraph_offset = offset;
    +                cells.clear();
    +                offset += scanners::table_cell_end(&string[offset..]).unwrap_or(0);
    +                expect_more_cells = true;
    +            } else {
    +                expect_more_cells = false;
    +            }
             }
         }
     
         if offset != len || cells.is_empty() {
             None
         } else {
             Some(Row {
    -            paragraph_offset: paragraph_offset,
    -            cells: cells,
    +            paragraph_offset,
    +            cells,
             })
         }
     }
    
  • src/plugins/syntect.rs+2 1 modified
    @@ -1,7 +1,8 @@
     //! Adapter for the Syntect syntax highlighter plugin.
     
     use crate::adapters::SyntaxHighlighterAdapter;
    -use crate::strings::{build_opening_tag, extract_attributes_from_tag};
    +use crate::html::build_opening_tag;
    +use crate::strings::extract_attributes_from_tag;
     use std::collections::HashMap;
     use syntect::easy::HighlightLines;
     use syntect::highlighting::{Color, ThemeSet};
    
  • src/scanners.re+382 0 added
    @@ -0,0 +1,382 @@
    +// TODO: consider dropping all the #[inline(always)], we probably don't know
    +// better than rustc.
    +
    +/*!re2c
    +    re2c:case-insensitive    = 1;
    +    re2c:encoding:utf8       = 1;
    +    re2c:encoding-policy     = substitute;
    +
    +    re2c:define:YYCTYPE      = u8;
    +    re2c:define:YYPEEK       = "if cursor < len { *s.get_unchecked(cursor) } else { 0 }";
    +    re2c:define:YYSKIP       = "cursor += 1;";
    +    re2c:define:YYBACKUP     = "marker = cursor;";
    +    re2c:define:YYRESTORE    = "cursor = marker;";
    +    re2c:define:YYBACKUPCTX  = "ctxmarker = cursor;";
    +    re2c:define:YYRESTORECTX = "cursor = ctxmarker;";
    +    re2c:yyfill:enable       = 0;
    +    re2c:indent:string       = '    ';
    +    re2c:indent:top          = 1;
    +
    +    wordchar = [^\x00-\x20];
    +
    +    spacechar = [ \t\v\f\r\n];
    +
    +    reg_char     = [^\\()\x00-\x20];
    +
    +    escaped_char = [\\][!"#$%&'()*+,./:;<=>?@[\\\]^_`{|}~-];
    +
    +    tagname = [A-Za-z][A-Za-z0-9-]*;
    +
    +    blocktagname = 'address'|'article'|'aside'|'base'|'basefont'|'blockquote'|'body'|'caption'|'center'|'col'|'colgroup'|'dd'|'details'|'dialog'|'dir'|'div'|'dl'|'dt'|'fieldset'|'figcaption'|'figure'|'footer'|'form'|'frame'|'frameset'|'h1'|'h2'|'h3'|'h4'|'h5'|'h6'|'head'|'header'|'hr'|'html'|'iframe'|'legend'|'li'|'link'|'main'|'menu'|'menuitem'|'nav'|'noframes'|'ol'|'optgroup'|'option'|'p'|'param'|'section'|'source'|'title'|'summary'|'table'|'tbody'|'td'|'tfoot'|'th'|'thead'|'title'|'tr'|'track'|'ul';
    +
    +    attributename = [a-zA-Z_:][a-zA-Z0-9:._-]*;
    +
    +    unquotedvalue = [^ \t\r\n\v\f"'=<>`\x00]+;
    +    singlequotedvalue = ['][^'\x00]*['];
    +    doublequotedvalue = ["][^"\x00]*["];
    +
    +    attributevalue = unquotedvalue | singlequotedvalue | doublequotedvalue;
    +
    +    attributevaluespec = spacechar* [=] spacechar* attributevalue;
    +
    +    attribute = spacechar+ attributename attributevaluespec?;
    +
    +    opentag = tagname attribute* spacechar* [/]? [>];
    +    closetag = [/] tagname spacechar* [>];
    +
    +    htmlcomment = "--" ([^\x00-]+ | "-" [^\x00-] | "--" [^\x00>])* "-->";
    +
    +    processinginstruction = ([^?>\x00]+ | [?][^>\x00] | [>])+;
    +
    +    declaration = [A-Z]+ spacechar+ [^>\x00]*;
    +
    +    cdata = "CDATA[" ([^\]\x00]+ | "]" [^\]\x00] | "]]" [^>\x00])*;
    +
    +    htmltag = opentag | closetag;
    +
    +    in_parens_nosp   = [(] (reg_char|escaped_char|[\\])* [)];
    +
    +    in_double_quotes = ["] (escaped_char|[^"\x00])* ["];
    +    in_single_quotes = ['] (escaped_char|[^'\x00])* ['];
    +    in_parens        = [(] (escaped_char|[^)\x00])* [)];
    +
    +    scheme           = [A-Za-z][A-Za-z0-9.+-]{1,31};
    +*/
    +
    +pub fn atx_heading_start(s: &[u8]) -> Option<usize> {
    +    let mut cursor = 0;
    +    let mut marker = 0;
    +    let len = s.len();
    +/*!re2c
    +    [#]{1,6} ([ \t]+|[\r\n])  { return Some(cursor); }
    +    * { return None; }
    +*/
    +}
    +
    +pub fn html_block_end_1(s: &[u8]) -> bool {
    +    let mut cursor = 0;
    +    let mut marker = 0;
    +    let len = s.len();
    +/*!re2c
    +    [^\n\x00]* [<] [/] ('script'|'pre'|'textarea'|'style') [>] { return true; }
    +    * { return false; }
    +*/
    +}
    +
    +pub fn html_block_end_2(s: &[u8]) -> bool {
    +    let mut cursor = 0;
    +    let mut marker = 0;
    +    let len = s.len();
    +/*!re2c
    +    [^\n\x00]* '-->' { return true; }
    +    * { return false; }
    +*/
    +}
    +
    +pub fn html_block_end_3(s: &[u8]) -> bool {
    +    let mut cursor = 0;
    +    let mut marker = 0;
    +    let len = s.len();
    +/*!re2c
    +    [^\n\x00]* '?>' { return true; }
    +    * { return false; }
    +*/
    +}
    +
    +pub fn html_block_end_4(s: &[u8]) -> bool {
    +    let mut cursor = 0;
    +    let mut marker = 0;
    +    let len = s.len();
    +/*!re2c
    +    [^\n\x00]* '>' { return true; }
    +    * { return false; }
    +*/
    +}
    +
    +pub fn html_block_end_5(s: &[u8]) -> bool {
    +    let mut cursor = 0;
    +    let mut marker = 0;
    +    let len = s.len();
    +/*!re2c
    +    [^\n\x00]* ']]>' { return true; }
    +    * { return false; }
    +*/
    +}
    +
    +pub fn open_code_fence(s: &[u8]) -> Option<usize> {
    +    let mut cursor = 0;
    +    let mut marker = 0;
    +    let mut ctxmarker = 0;
    +    let len = s.len();
    +/*!re2c
    +    [`]{3,} / [^`\r\n\x00]*[\r\n] { return Some(cursor); }
    +    [~]{3,} / [^\r\n\x00]*[\r\n] { return Some(cursor); }
    +    * { return None; }
    +*/
    +}
    +
    +pub fn close_code_fence(s: &[u8]) -> Option<usize> {
    +    let mut cursor = 0;
    +    let mut marker = 0;
    +    let mut ctxmarker = 0;
    +    let len = s.len();
    +/*!re2c
    +    [`]{3,} / [ \t]*[\r\n] { return Some(cursor); }
    +    [~]{3,} / [ \t]*[\r\n] { return Some(cursor); }
    +    * { return None; }
    +*/
    +}
    +
    +pub fn html_block_start(s: &[u8]) -> Option<usize> {
    +    let mut cursor = 0;
    +    let mut marker = 0;
    +    let len = s.len();
    +/*!re2c
    +    [<] ('script'|'pre'|'textarea'|'style') (spacechar | [>]) { return Some(1); }
    +    '<!--' { return Some(2); }
    +    '<?' { return Some(3); }
    +    '<!' [A-Z] { return Some(4); }
    +    '<![CDATA[' { return Some(5); }
    +    [<] [/]? blocktagname (spacechar | [/]? [>])  { return Some(6); }
    +    * { return None; }
    +*/
    +}
    +
    +pub fn html_block_start_7(s: &[u8]) -> Option<usize> {
    +    let mut cursor = 0;
    +    let mut marker = 0;
    +    let len = s.len();
    +/*!re2c
    +    [<] (opentag | closetag) [\t\n\f ]* [\r\n] { return Some(7); }
    +    * { return None; }
    +*/
    +}
    +
    +pub enum SetextChar {
    +    Equals,
    +    Hyphen,
    +}
    +
    +pub fn setext_heading_line(s: &[u8]) -> Option<SetextChar> {
    +    let mut cursor = 0;
    +    let mut marker = 0;
    +    let len = s.len();
    +/*!re2c
    +    [=]+ [ \t]* [\r\n] { return Some(SetextChar::Equals); }
    +    [-]+ [ \t]* [\r\n] { return Some(SetextChar::Hyphen); }
    +    * { return None; }
    +*/
    +}
    +
    +pub fn footnote_definition(s: &[u8]) -> Option<usize> {
    +    let mut cursor = 0;
    +    let mut marker = 0;
    +    let len = s.len();
    +/*!re2c
    +    '[^' ([^\] \r\n\x00\t]+) ']:' [ \t]* { return Some(cursor); }
    +    * { return None; }
    +*/
    +}
    +
    +pub fn scheme(s: &[u8]) -> Option<usize> {
    +    let mut cursor = 0;
    +    let mut marker = 0;
    +    let len = s.len();
    +/*!re2c
    +    scheme [:] { return Some(cursor); }
    +    * { return None; }
    +*/
    +}
    +
    +pub fn autolink_uri(s: &[u8]) -> Option<usize> {
    +    let mut cursor = 0;
    +    let mut marker = 0;
    +    let len = s.len();
    +/*!re2c
    +    scheme [:][^\x00-\x20<>]*[>]  { return Some(cursor); }
    +    * { return None; }
    +*/
    +}
    +
    +pub fn autolink_email(s: &[u8]) -> Option<usize> {
    +    let mut cursor = 0;
    +    let mut marker = 0;
    +    let len = s.len();
    +/*!re2c
    +    [a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+
    +        [@]
    +        [a-zA-Z0-9]([a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?
    +        ([.][a-zA-Z0-9]([a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*
    +        [>] { return Some(cursor); }
    +    * { return None; }
    +*/
    +}
    +
    +pub fn html_tag(s: &[u8]) -> Option<usize> {
    +    let mut cursor = 0;
    +    let mut marker = 0;
    +    let len = s.len();
    +/*!re2c
    +    htmltag { return Some(cursor); }
    +    * { return None; }
    +*/
    +}
    +
    +pub fn html_comment(s: &[u8]) -> Option<usize> {
    +    let mut cursor = 0;
    +    let mut marker = 0;
    +    let len = s.len();
    +/*!re2c
    +    htmlcomment { return Some(cursor); }
    +    * { return None; }
    +*/
    +}
    +
    +pub fn html_processing_instruction(s: &[u8]) -> Option<usize> {
    +    let mut cursor = 0;
    +    let mut marker = 0;
    +    let len = s.len();
    +/*!re2c
    +    processinginstruction { return Some(cursor); }
    +    * { return None; }
    +*/
    +}
    +
    +pub fn html_declaration(s: &[u8]) -> Option<usize> {
    +    let mut cursor = 0;
    +    let mut marker = 0;
    +    let len = s.len();
    +/*!re2c
    +    declaration { return Some(cursor); }
    +    * { return None; }
    +*/
    +}
    +
    +pub fn html_cdata(s: &[u8]) -> Option<usize> {
    +    let mut cursor = 0;
    +    let mut marker = 0;
    +    let len = s.len();
    +/*!re2c
    +    cdata { return Some(cursor); }
    +    * { return None; }
    +*/
    +}
    +
    +pub fn spacechars(s: &[u8]) -> Option<usize> {
    +    let mut cursor = 0;
    +    let len = s.len();
    +/*!re2c
    +    [ \t\v\f\r\n]+ { return Some(cursor); }
    +    * { return None; }
    +*/
    +}
    +
    +pub fn link_title(s: &[u8]) -> Option<usize> {
    +    let mut cursor = 0;
    +    let mut marker = 0;
    +    let len = s.len();
    +/*!re2c
    +    ["] (escaped_char|[^"\x00])* ["]   { return Some(cursor); }
    +    ['] (escaped_char|[^'\x00])* ['] { return Some(cursor); }
    +    [(] (escaped_char|[^()\x00])* [)]  { return Some(cursor); }
    +    * { return None; }
    +*/
    +}
    +
    +pub fn dangerous_url(s: &[u8]) -> Option<usize> {
    +    let mut cursor = 0;
    +    let mut marker = 0;
    +    let len = s.len();
    +/*!re2c
    +    'data:image/' ('png'|'gif'|'jpeg'|'webp') { return None; }
    +    'javascript:' | 'vbscript:' | 'file:' | 'data:' { return Some(cursor); }
    +    * { return None; }
    +*/
    +}
    +
    +/*!re2c
    +
    +    table_spacechar = [ \t\v\f];
    +    table_newline = [\r]?[\n];
    +
    +    table_marker = (table_spacechar*[:]?[-]+[:]?table_spacechar*);
    +    table_cell = (escaped_char|[^\x00|\r\n])+;
    +
    +*/
    +
    +pub fn table_start(s: &[u8]) -> Option<usize> {
    +    let mut cursor = 0;
    +    let mut marker = 0;
    +    let len = s.len();
    +/*!re2c
    +    [|]? table_marker ([|] table_marker)* [|]? table_spacechar* table_newline {
    +        return Some(cursor);
    +    }
    +    * { return None; }
    +*/
    +}
    +
    +pub fn table_cell(s: &[u8]) -> Option<usize> {
    +    let mut cursor = 0;
    +    let mut marker = 0;
    +    let len = s.len();
    +/*!re2c
    +    // In fact, `table_cell` matches non-empty table cells only. The empty
    +    // string is also a valid table cell, but is handled by the default rule.
    +    // This approach prevents re2c's match-empty-string warning.
    +    table_cell { return Some(cursor); }
    +    * { return None; }
    +*/
    +}
    +
    +pub fn table_cell_end(s: &[u8]) -> Option<usize> {
    +    let mut cursor = 0;
    +    let len = s.len();
    +/*!re2c
    +    [|] table_spacechar* { return Some(cursor); }
    +    * { return None; }
    +*/
    +}
    +
    +pub fn table_row_end(s: &[u8]) -> Option<usize> {
    +    let mut cursor = 0;
    +    let mut marker = 0;
    +    let len = s.len();
    +/*!re2c
    +    table_spacechar* table_newline { return Some(cursor); }
    +    * { return None; }
    +*/
    +}
    +
    +#[cfg(feature = "shortcodes")]
    +pub fn shortcode(s: &[u8]) -> Option<usize> {
    +    let mut cursor = 0;
    +    let mut marker = 0;
    +    let len = s.len();
    +/*!re2c
    +    [:] [A-Za-z_-]+ [:] { return Some(cursor); }
    +    * { return None; }
    +*/
    +}
    +
    +// vim: set ft=rust:
    
  • src/scanners.rs+22479 146 modified
  • src/strings.rs+18 21 modified
    @@ -154,22 +154,31 @@ pub fn trim(line: &mut Vec<u8>) {
         rtrim(line);
     }
     
    +pub fn ltrim_slice(mut i: &[u8]) -> &[u8] {
    +    while let [first, rest @ ..] = i {
    +        if isspace(*first) {
    +            i = rest;
    +        } else {
    +            break;
    +        }
    +    }
    +    i
    +}
    +
     pub fn rtrim_slice(mut i: &[u8]) -> &[u8] {
    -    let mut len = i.len();
    -    while len > 0 && isspace(i[len - 1]) {
    -        i = &i[..len - 1];
    -        len -= 1;
    +    while let [rest @ .., last] = i {
    +        if isspace(*last) {
    +            i = rest;
    +        } else {
    +            break;
    +        }
         }
         i
     }
     
     pub fn trim_slice(mut i: &[u8]) -> &[u8] {
    +    i = ltrim_slice(i);
         i = rtrim_slice(i);
    -    let mut len = i.len();
    -    while len > 0 && isspace(i[0]) {
    -        i = &i[1..];
    -        len -= 1;
    -    }
         i
     }
     
    @@ -250,18 +259,6 @@ pub fn normalize_label(i: &[u8]) -> Vec<u8> {
         v.into_bytes()
     }
     
    -pub fn build_opening_tag(tag: &str, attributes: &HashMap<String, String>) -> String {
    -    let mut tag_parts = vec![format!("<{}", tag)];
    -
    -    for (attr, val) in attributes {
    -        tag_parts.push(format!(" {}=\"{}\"", attr, val));
    -    }
    -
    -    tag_parts.push(String::from(">"));
    -
    -    tag_parts.join("")
    -}
    -
     #[cfg(feature = "syntect")]
     pub fn extract_attributes_from_tag(html_tag: &str) -> HashMap<String, String> {
         let re = regex::Regex::new("([a-zA-Z_:][-a-zA-Z0-9_:.]+)=([\"'])(.*?)([\"'])").unwrap();
    
  • src/tests.rs+41 5 modified
    @@ -3,12 +3,13 @@ use crate::plugins::syntect::SyntectAdapter;
     use crate::{
         adapters::SyntaxHighlighterAdapter,
         adapters::{HeadingAdapter, HeadingMeta},
    -    cm, format_commonmark, format_html, format_html_with_plugins, html, markdown_to_html, nodes,
    +    cm, format_commonmark, format_html, format_html_with_plugins, html,
    +    html::build_opening_tag,
    +    markdown_to_html, nodes,
         nodes::{AstNode, NodeCode, NodeValue},
    -    parse_document, parse_document_with_broken_link_callback,
    -    strings::build_opening_tag,
    -    Anchorizer, Arena, ComrakExtensionOptions, ComrakOptions, ComrakParseOptions, ComrakPlugins,
    -    ComrakRenderOptions, ComrakRenderPlugins, ListStyleType,
    +    parse_document, parse_document_with_broken_link_callback, Anchorizer, Arena,
    +    ComrakExtensionOptions, ComrakOptions, ComrakParseOptions, ComrakPlugins, ComrakRenderOptions,
    +    ComrakRenderPlugins, ListStyleType,
     };
     use ntest::timeout;
     use std::collections::HashMap;
    @@ -1579,3 +1580,38 @@ fn exercise_full_api<'a>() {
             }
         }
     }
    +
    +#[test]
    +fn regression_424() {
    +    html(
    +        "*text* [link](#section)",
    +        "<p><em>text</em> <a href=\"#section\">link</a></p>\n",
    +    );
    +}
    +
    +#[test]
    +fn example_61() {
    +    html(
    +        r##"
    +`Foo
    +----
    +`
    +
    +<a title="a lot
    +---
    +of dashes"/>
    +"##,
    +        r##"<h2>`Foo</h2>
    +<p>`</p>
    +<h2>&lt;a title=&quot;a lot</h2>
    +<p>of dashes&quot;/&gt;</p>
    +"##,
    +    );
    +}
    +
    +#[test]
    +fn nul_at_eof() {
    +    html("foo\0", "<p>foo\u{fffd}</p>\n");
    +    html("foo\0ba", "<p>foo\u{fffd}ba</p>\n");
    +    html("foo\0ba\0", "<p>foo\u{fffd}ba\u{fffd}</p>\n");
    +}
    
  • vendor/cmark-gfm+1 1 modified
    @@ -1 +1 @@
    -Subproject commit 14212c6fd13eb163e2ac8fc8eb8321add3462601
    +Subproject commit 397d03683b555d55b166d823cde14ef42ce86ad1
    

Vulnerability mechanics

Generated on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.

References

21

News mentions

0

No linked articles in our index yet.