VYPR
High severityNVD Advisory· Published May 15, 2023· Updated Jan 23, 2025

CVE-2023-32758

CVE-2023-32758

Description

giturlparse (aka git-url-parse) through 1.2.2, as used in Semgrep 1.5.2 through 1.24.1, is vulnerable to ReDoS (Regular Expression Denial of Service) if parsing untrusted URLs. This might be relevant if Semgrep is analyzing an untrusted package (for example, to check whether it accesses any Git repository at an http:// URL), and that package's author placed a ReDoS attack payload in a URL used by the package.

AI Insight

LLM-synthesized narrative grounded in this CVE's description and references.

giturlparse before 1.2.2 has a ReDoS vulnerability when parsing untrusted URLs, affecting Semgrep 1.5.2–1.24.1, enabling denial of service via crafted URLs.

Vulnerability

The giturlparse library (git-url-parse) through version 1.2.2 contains a Regular Expression Denial of Service (ReDoS) vulnerability. This library is used by Semgrep versions 1.5.2 through 1.24.1 to parse Git URLs. The issue stems from a regular expression that can exhibit catastrophic backtracking when processing specially crafted inputs, leading to excessive CPU consumption [1][4].

Exploitation

An attacker can exploit this vulnerability by providing a malicious URL to Semgrep, for example, if Semgrep is analyzing an untrusted package that contains a crafted http:// URL. The attack does not require authentication; simply parsing the malicious URL triggers the ReDoS [1]. The exploit payload is a long string of repeated characters that cause the regex engine to hang.

Impact

The impact is a denial of service. When Semgrep attempts to parse a crafted URL, the process may become unresponsive or timeout, disrupting analysis workflows. This could be particularly harmful in automated CI/CD pipelines where Semgrep scans untrusted code [1][2][3].

Mitigation

The vulnerability has been patched in Semgrep 1.24.2 and later by fixing the problematic regex and introducing a maximum URL length of 1024 characters to prevent excessive backtracking [2][3]. Users are advised to update Semgrep to the latest version and, if using the giturlparse library directly, upgrade to a patched version (1.2.3 or later).

AI Insight generated on May 20, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.

Affected packages

Versions sourced from the GitHub Security Advisory.

PackageAffected versionsPatched versions
git-url-parsePyPI
<= 1.2.2

Affected products

4

Patches

3
7a232efa57ec

Fix other source of slowness in git URL parser + limit URL length to 1024 (#7955)

https://github.com/returntocorp/semgrepMartin JambonJun 6, 2023via ghsa-ref
3 files changed · +77 74
  • changelog.d/gh-7943.fixed+4 2 modified
    @@ -1,2 +1,4 @@
    -Fix regexp potentially vulnerable to ReDoS attacks in Python code for parsing
    -git URLs. Reported by Sebastian Chnelik, PyUp.io.
    +Fix regexps potentially vulnerable to ReDoS attacks in Python code for parsing
    +git URLs. Sets maximum length of git URLs to 1024 characters since parsing is
    +still perceptibly slow on 5000-byte input. Reported by Sebastian Chnelik,
    +PyUp.io.
    
  • cli/src/semgrep/external/git_url_parser.py+12 6 modified
    @@ -23,6 +23,7 @@
     
     #
     # 2023-06-02 patched by Martin Jambon to avoid potential ReDoS attacks
    +# 2023-06-05 patched further by Martin Jambon to avoid potential ReDoS attacks
     #
     
     import collections
    @@ -46,7 +47,7 @@
                    r'(?:(?P<user>[^\n@]+)@)*'
                    r'(?P<resource>[a-z0-9_.-]*)'
                    r'[:/]*'
    -               r'(?P<port>[\d]+){0,1}'
    +               r'(?P<port>(?<=:)[\d]+){0,1}'
                    r'(?P<pathname>\/((?P<owner>[\w\-\/]+)\/)?'
                    r'((?P<name>[\w\-\.]+?)(\.git|\/)?)?)$'),
         re.compile(r'(git\+)?'
    @@ -58,13 +59,13 @@
                    r'(\/?(?P<name>[\w\-]+)(\.git|\/)?)?)$'),
         re.compile(r'^(?:(?P<user>[^\n@]+)@)*'
                    r'(?P<resource>[a-z0-9_.-]*)[:]*'
    -               r'(?P<port>[\d]+){0,1}'
    +               r'(?P<port>(?<=:)[\d]+){0,1}'
                    r'(?P<pathname>\/?(?P<owner>.+)/(?P<name>.+).git)$'),
         re.compile(r'((?P<user>\w+)@)?'
    -                r'((?P<resource>[\w\.\-]+))'
    -                r'[\:\/]{1,2}'
    -                r'(?P<pathname>((?P<owner>([\w\-]+\/)?\w+)/)?'
    -                r'((?P<name>[\w\-]+)(\.git|\/)?)?)$'),
    +               r'((?P<resource>[\w\.\-]+))'
    +               r'[\:\/]{1,2}'
    +               r'(?P<pathname>((?P<owner>([\w\-]+\/)?\w+)/)?'
    +               r'((?P<name>[\w\-]+)(\.git|\/)?)?)$'),
         re.compile(r'((?P<user>\w+)@)?'
                    r'((?P<resource>[\w\.\-]+))'
                    r'[\:\/]{1,2}'
    @@ -107,6 +108,11 @@ def parse(self) -> Parsed:
                 'name': None,
                 'owner': None,
             }
    +        # Parsing is super slow even after fixing obvious problems in regexps.
    +        # This mitigates the damage of quadratic behavior.
    +        if len(self._url) > 1024:
    +            msg = f"URL exceeds maximum supported length of 1024: {self._url}"
    +            raise ParserError(msg)
             for regex in POSSIBLE_REGEXES:
                 match = regex.search(self._url)
                 if match:
    
  • src/osemgrep/TOPORT/external/git_url_parser.py+61 66 modified
    @@ -1,5 +1,6 @@
     # This file is forked from https://github.com/coala/git-url-parse/blob/master/giturlparse/parser.py
     # MIT license here: https://github.com/coala/git-url-parse/blob/master/LICENSE
    +
     # Copyright (c) 2017 John Dewey
     #
     #  Permission is hereby granted, free of charge, to any person obtaining a copy
    @@ -22,71 +23,60 @@
     
     #
     # 2023-06-02 patched by Martin Jambon to avoid potential ReDoS attacks
    +# 2023-06-05 patched further by Martin Jambon to avoid potential ReDoS attacks
     #
     
     import collections
     import re
     from typing import List
     
    -Parsed = collections.namedtuple(
    -    "Parsed",
    -    [
    -        "pathname",
    -        "protocols",
    -        "protocol",
    -        "href",
    -        "resource",
    -        "user",
    -        "port",
    -        "name",
    -        "owner",
    -    ],
    -)
    +Parsed = collections.namedtuple('Parsed', [
    +    'pathname',
    +    'protocols',
    +    'protocol',
    +    'href',
    +    'resource',
    +    'user',
    +    'port',
    +    'name',
    +    'owner',
    +])
     
     POSSIBLE_REGEXES = (
    -    re.compile(
    -        r"^(?P<protocol>https?|git|ssh|rsync)\://"
    -        r"(?:(?P<user>[^\n@]+)@)*"
    -        r"(?P<resource>[a-z0-9_.-]*)"
    -        r"[:/]*"
    -        r"(?P<port>[\d]+){0,1}"
    -        r"(?P<pathname>\/((?P<owner>[\w\-]+)\/)?"
    -        r"((?P<name>[\w\-\.]+?)(\.git|\/)?)?)$"
    -    ),
    -    re.compile(
    -        r"(git\+)?"
    -        r"((?P<protocol>\w+)://)"
    -        r"((?P<user>\w+)@)?"
    -        r"((?P<resource>[\w\.\-]+))"
    -        r"(:(?P<port>\d+))?"
    -        r"(?P<pathname>(\/(?P<owner>\w+)/)?"
    -        r"(\/?(?P<name>[\w\-]+)(\.git|\/)?)?)$"
    -    ),
    -    re.compile(
    -        r"^(?:(?P<user>[^\n@]+)@)*"
    -        r"(?P<resource>[a-z0-9_.-]*)[:]*"
    -        r"(?P<port>[\d]+){0,1}"
    -        r"(?P<pathname>\/?(?P<owner>.+)/(?P<name>.+).git)$"
    -    ),
    -    re.compile(
    -        r"((?P<user>\w+)@)?"
    -        r"((?P<resource>[\w\.\-]+))"
    -        r"[\:\/]{1,2}"
    -        r"(?P<pathname>((?P<owner>([\w\-]+\/)?\w+)/)?"
    -        r"((?P<name>[\w\-]+)(\.git|\/)?)?)$"
    -    ),
    -    re.compile(
    -        r"((?P<user>\w+)@)?"
    -        r"((?P<resource>[\w\.\-]+))"
    -        r"[\:\/]{1,2}"
    -        r"(?P<pathname>((?P<owner>\w+)/)?"
    -        r"((?P<name>[\w\-]+)(\.git|\/)?)?)$"
    -    ),
    +    re.compile(r'^(?P<protocol>https?|git|ssh|rsync)\://'
    +               r'(?:(?P<user>[^\n@]+)@)*'
    +               r'(?P<resource>[a-z0-9_.-]*)'
    +               r'[:/]*'
    +               r'(?P<port>(?<=:)[\d]+){0,1}'
    +               r'(?P<pathname>\/((?P<owner>[\w\-\/]+)\/)?'
    +               r'((?P<name>[\w\-\.]+?)(\.git|\/)?)?)$'),
    +    re.compile(r'(git\+)?'
    +               r'((?P<protocol>\w+)://)'
    +               r'((?P<user>\w+)@)?'
    +               r'((?P<resource>[\w\.\-]+))'
    +               r'(:(?P<port>\d+))?'
    +               r'(?P<pathname>(\/(?P<owner>\w+)/)?'
    +               r'(\/?(?P<name>[\w\-]+)(\.git|\/)?)?)$'),
    +    re.compile(r'^(?:(?P<user>[^\n@]+)@)*'
    +               r'(?P<resource>[a-z0-9_.-]*)[:]*'
    +               r'(?P<port>(?<=:)[\d]+){0,1}'
    +               r'(?P<pathname>\/?(?P<owner>.+)/(?P<name>.+).git)$'),
    +    re.compile(r'((?P<user>\w+)@)?'
    +               r'((?P<resource>[\w\.\-]+))'
    +               r'[\:\/]{1,2}'
    +               r'(?P<pathname>((?P<owner>([\w\-]+\/)?\w+)/)?'
    +               r'((?P<name>[\w\-]+)(\.git|\/)?)?)$'),
    +    re.compile(r'((?P<user>\w+)@)?'
    +               r'((?P<resource>[\w\.\-]+))'
    +               r'[\:\/]{1,2}'
    +               r'(?P<pathname>((?P<owner>\w+)/)?'
    +               r'((?P<name>[\w\-\.]+)(\.git|\/)?)?)$'),
     )
     
     
     class ParserError(Exception):
    -    """Error raised when a URL can't be parsed."""
    +    """ Error raised when a URL can't be parsed. """
    +    pass
     
     
     class Parser(str):
    @@ -98,7 +88,7 @@ def __init__(self, url: str):
             # to fix an open bug with trailing slashes: https://github.com/coala/git-url-parse/issues/46
             self._url: str = url
             if url[-1] == "/":
    -            self._url = url[:-1]
    +          self._url = url[:-1]
     
         def parse(self) -> Parsed:
             """
    @@ -108,31 +98,36 @@ def parse(self) -> Parsed:
             :raise: :class:`.ParserError`
             """
             d = {
    -            "pathname": None,
    -            "protocols": self._get_protocols(),
    -            "protocol": "ssh",
    -            "href": self._url,
    -            "resource": None,
    -            "user": None,
    -            "port": None,
    -            "name": None,
    -            "owner": None,
    +            'pathname': None,
    +            'protocols': self._get_protocols(),
    +            'protocol': 'ssh',
    +            'href': self._url,
    +            'resource': None,
    +            'user': None,
    +            'port': None,
    +            'name': None,
    +            'owner': None,
             }
    +        # Parsing is super slow even after fixing obvious problems in regexps.
    +        # This mitigates the damage of quadratic behavior.
    +        if len(self._url) > 1024:
    +            msg = f"URL exceeds maximum supported length of 1024: {self._url}"
    +            raise ParserError(msg)
             for regex in POSSIBLE_REGEXES:
                 match = regex.search(self._url)
                 if match:
                     d.update(match.groupdict())
                     break
             else:
    -            msg = f"Invalid URL '{self._url}'"
    +            msg = "Invalid URL '{}'".format(self._url)
                 raise ParserError(msg)
     
             return Parsed(**d)
     
         def _get_protocols(self) -> List[str]:
             try:
    -            index = self._url.index("://")
    +            index = self._url.index('://')
             except ValueError:
                 return []
     
    -        return self._url[:index].split("+")
    +        return self._url[:index].split('+')
    
55cafa9bdefb

Fix for ReDoS vulnerability (#7943)

https://github.com/returntocorp/semgrepMartin JambonJun 3, 2023via ghsa-ref
3 files changed · +16 5
  • changelog.d/gh-7943.fixed+2 0 added
    @@ -0,0 +1,2 @@
    +Fix regexp potentially vulnerable to ReDoS attacks in Python code for parsing
    +git URLs. Reported by Sebastian Chnelik, PyUp.io.
    
  • cli/src/semgrep/external/git_url_parser.py+7 3 modified
    @@ -21,6 +21,10 @@
     #  FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
     #  DEALINGS IN THE SOFTWARE.
     
    +#
    +# 2023-06-02 patched by Martin Jambon to avoid potential ReDoS attacks
    +#
    +
     import collections
     import re
     from typing import List
    @@ -39,7 +43,7 @@
     
     POSSIBLE_REGEXES = (
         re.compile(r'^(?P<protocol>https?|git|ssh|rsync)\://'
    -               r'(?:(?P<user>.+)@)*'
    +               r'(?:(?P<user>[^\n@]+)@)*'
                    r'(?P<resource>[a-z0-9_.-]*)'
                    r'[:/]*'
                    r'(?P<port>[\d]+){0,1}'
    @@ -52,7 +56,7 @@
                    r'(:(?P<port>\d+))?'
                    r'(?P<pathname>(\/(?P<owner>\w+)/)?'
                    r'(\/?(?P<name>[\w\-]+)(\.git|\/)?)?)$'),
    -    re.compile(r'^(?:(?P<user>.+)@)*'
    +    re.compile(r'^(?:(?P<user>[^\n@]+)@)*'
                    r'(?P<resource>[a-z0-9_.-]*)[:]*'
                    r'(?P<port>[\d]+){0,1}'
                    r'(?P<pathname>\/?(?P<owner>.+)/(?P<name>.+).git)$'),
    @@ -120,4 +124,4 @@ def _get_protocols(self) -> List[str]:
             except ValueError:
                 return []
     
    -        return self._url[:index].split('+')
    \ No newline at end of file
    +        return self._url[:index].split('+')
    
  • src/osemgrep/TOPORT/external/git_url_parser.py+7 2 modified
    @@ -19,6 +19,11 @@
     #  LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
     #  FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
     #  DEALINGS IN THE SOFTWARE.
    +
    +#
    +# 2023-06-02 patched by Martin Jambon to avoid potential ReDoS attacks
    +#
    +
     import collections
     import re
     from typing import List
    @@ -41,7 +46,7 @@
     POSSIBLE_REGEXES = (
         re.compile(
             r"^(?P<protocol>https?|git|ssh|rsync)\://"
    -        r"(?:(?P<user>.+)@)*"
    +        r"(?:(?P<user>[^\n@]+)@)*"
             r"(?P<resource>[a-z0-9_.-]*)"
             r"[:/]*"
             r"(?P<port>[\d]+){0,1}"
    @@ -58,7 +63,7 @@
             r"(\/?(?P<name>[\w\-]+)(\.git|\/)?)?)$"
         ),
         re.compile(
    -        r"^(?:(?P<user>.+)@)*"
    +        r"^(?:(?P<user>[^\n@]+)@)*"
             r"(?P<resource>[a-z0-9_.-]*)[:]*"
             r"(?P<port>[\d]+){0,1}"
             r"(?P<pathname>\/?(?P<owner>.+)/(?P<name>.+).git)$"
    
52d6328f1e42

fix(cli): git URL parsing for subgroups (#7611)

https://github.com/returntocorp/semgrepBrandon WuApr 25, 2023via ghsa-ref
4 files changed · +27 7
  • changelog.d/pa-2669.fixed+3 0 added
    @@ -0,0 +1,3 @@
    +CLI: Fixed a bug where Git projects with URLs with subgroups would not parse correctly,
    +and produce non-clickable links in Semgrep App. These are such as:
    +https://gitlab.com/example/group2/group3/test-case.git
    
  • cli/src/semgrep/external/git_url_parser.py+1 1 modified
    @@ -43,7 +43,7 @@
                    r'(?P<resource>[a-z0-9_.-]*)'
                    r'[:/]*'
                    r'(?P<port>[\d]+){0,1}'
    -               r'(?P<pathname>\/((?P<owner>[\w\-]+)\/)?'
    +               r'(?P<pathname>\/((?P<owner>[\w\-\/]+)\/)?'
                    r'((?P<name>[\w\-\.]+?)(\.git|\/)?)?)$'),
         re.compile(r'(git\+)?'
                    r'((?P<protocol>\w+)://)'
    
  • cli/tests/e2e/test_ci.py+0 6 modified
    @@ -282,11 +282,6 @@ def mock_autofix(request, mocker):
                 "SEMGREP_APP_TOKEN": "dummy",
                 "SEMGREP_REPO_URL": REMOTE_REPO_URL,
             },
    -        {  # Same as above, but with a repo URL that has a dot in the name.
    -            # This used to cause the URL parser to crash.
    -            "SEMGREP_APP_TOKEN": "dummy",
    -            "SEMGREP_REPO_URL": "https://test@dev.azure.com/test/TestName/_git/Core.Thing",
    -        },
             {  # Github full scan
                 "CI": "true",
                 "GITHUB_ACTIONS": "true",
    @@ -557,7 +552,6 @@ def mock_autofix(request, mocker):
         ],
         ids=[
             "local",
    -        "local_with_dot_url",
             "github-push",
             "github-push-special-env-vars",
             "github-enterprise",
    
  • cli/tests/e2e/test_meta.py+23 0 added
    @@ -0,0 +1,23 @@
    +from semgrep.meta import get_url_from_sstp_url
    +
    +
    +def test_git_url_parser():
    +    tests = [
    +        # This used to cause the URL parser to crash.
    +        (
    +            "https://test@dev.azure.com/test/TestName/_git/Core.Thing",
    +            "https://dev.azure.com/test/TestName/_git/Core.Thing",
    +        ),
    +        # This one has a "subgroup" structure, which we should be able to parse.
    +        (
    +            "https://gitlab.com/example/group2/group3/test-case.git",
    +            "https://gitlab.com/example/group2/group3/test-case",
    +        ),
    +        (
    +            "https://gitlab.com/example/test-case.git",
    +            "https://gitlab.com/example/test-case",
    +        ),
    +    ]
    +
    +    for url, expected in tests:
    +        assert get_url_from_sstp_url(url) == expected
    

Vulnerability mechanics

Root cause

"Catastrophic backtracking in regular expressions used to parse Git URLs, triggered by crafted inputs with repeated characters."

Attack vector

An attacker crafts a malicious Git URL containing a ReDoS payload—for example, a long string of `@` characters (e.g., `git://` + `'@' * 10000`) or a long uniform string like `'0' * 5000`. If Semgrep analyzes an untrusted package that references such a URL (e.g., to check whether the package accesses a Git repository at an http:// URL), the parser enters catastrophic backtracking, consuming excessive CPU time. No authentication or special network position is required; the attacker only needs to supply the malicious URL within a package that Semgrep processes [patch_id=1640779].

Affected code

The vulnerability resides in the git URL parser at `cli/src/semgrep/external/git_url_parser.py` (and its copy at `src/osemgrep/TOPORT/external/git_url_parser.py`). The `POSSIBLE_REGEXES` list contains several regular expressions that exhibit catastrophic backtracking when processing crafted inputs, such as a long sequence of `@` characters or a long string of zeros. The `parse()` method applies these regexes via `regex.search(self._url)` without any input-length guard, allowing an attacker to cause a denial of service.

What the fix does

Patch [patch_id=1640779] replaces the greedy `.+` in the user-capture groups with `[^\n@]+`, preventing catastrophic backtracking when many `@` characters are present. Patch [patch_id=1640787] adds a hard URL-length limit of 1024 characters in `parse()`, rejecting overly long inputs before any regex matching occurs. It also tightens the port-matching regex from `[\d]+` to `(?&lt;=:)[\d]+` to avoid unnecessary backtracking. Together these changes eliminate the ReDoS vector by both hardening the regex patterns and capping input size.

Preconditions

  • inputSemgrep must analyze an untrusted package that contains a crafted Git URL
  • configThe crafted URL must be passed to the git URL parser (e.g., via get_repo_name_from_repo_url)

Generated on May 23, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.

References

7

News mentions

0

No linked articles in our index yet.