CVE-2023-32758
Description
giturlparse (aka git-url-parse) through 1.2.2, as used in Semgrep 1.5.2 through 1.24.1, is vulnerable to ReDoS (Regular Expression Denial of Service) if parsing untrusted URLs. This might be relevant if Semgrep is analyzing an untrusted package (for example, to check whether it accesses any Git repository at an http:// URL), and that package's author placed a ReDoS attack payload in a URL used by the package.
AI Insight
LLM-synthesized narrative grounded in this CVE's description and references.
giturlparse before 1.2.2 has a ReDoS vulnerability when parsing untrusted URLs, affecting Semgrep 1.5.2–1.24.1, enabling denial of service via crafted URLs.
Vulnerability
The giturlparse library (git-url-parse) through version 1.2.2 contains a Regular Expression Denial of Service (ReDoS) vulnerability. This library is used by Semgrep versions 1.5.2 through 1.24.1 to parse Git URLs. The issue stems from a regular expression that can exhibit catastrophic backtracking when processing specially crafted inputs, leading to excessive CPU consumption [1][4].
Exploitation
An attacker can exploit this vulnerability by providing a malicious URL to Semgrep, for example, if Semgrep is analyzing an untrusted package that contains a crafted http:// URL. The attack does not require authentication; simply parsing the malicious URL triggers the ReDoS [1]. The exploit payload is a long string of repeated characters that cause the regex engine to hang.
Impact
The impact is a denial of service. When Semgrep attempts to parse a crafted URL, the process may become unresponsive or timeout, disrupting analysis workflows. This could be particularly harmful in automated CI/CD pipelines where Semgrep scans untrusted code [1][2][3].
Mitigation
The vulnerability has been patched in Semgrep 1.24.2 and later by fixing the problematic regex and introducing a maximum URL length of 1024 characters to prevent excessive backtracking [2][3]. Users are advised to update Semgrep to the latest version and, if using the giturlparse library directly, upgrade to a patched version (1.2.3 or later).
AI Insight generated on May 20, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.
Affected packages
Versions sourced from the GitHub Security Advisory.
| Package | Affected versions | Patched versions |
|---|---|---|
git-url-parsePyPI | <= 1.2.2 | — |
Affected products
4- giturlparse/git-url-parsedescription
- Range: <=1.2.2
Patches
37a232efa57ecFix other source of slowness in git URL parser + limit URL length to 1024 (#7955)
3 files changed · +77 −74
changelog.d/gh-7943.fixed+4 −2 modified@@ -1,2 +1,4 @@ -Fix regexp potentially vulnerable to ReDoS attacks in Python code for parsing -git URLs. Reported by Sebastian Chnelik, PyUp.io. +Fix regexps potentially vulnerable to ReDoS attacks in Python code for parsing +git URLs. Sets maximum length of git URLs to 1024 characters since parsing is +still perceptibly slow on 5000-byte input. Reported by Sebastian Chnelik, +PyUp.io.
cli/src/semgrep/external/git_url_parser.py+12 −6 modified@@ -23,6 +23,7 @@ # # 2023-06-02 patched by Martin Jambon to avoid potential ReDoS attacks +# 2023-06-05 patched further by Martin Jambon to avoid potential ReDoS attacks # import collections @@ -46,7 +47,7 @@ r'(?:(?P<user>[^\n@]+)@)*' r'(?P<resource>[a-z0-9_.-]*)' r'[:/]*' - r'(?P<port>[\d]+){0,1}' + r'(?P<port>(?<=:)[\d]+){0,1}' r'(?P<pathname>\/((?P<owner>[\w\-\/]+)\/)?' r'((?P<name>[\w\-\.]+?)(\.git|\/)?)?)$'), re.compile(r'(git\+)?' @@ -58,13 +59,13 @@ r'(\/?(?P<name>[\w\-]+)(\.git|\/)?)?)$'), re.compile(r'^(?:(?P<user>[^\n@]+)@)*' r'(?P<resource>[a-z0-9_.-]*)[:]*' - r'(?P<port>[\d]+){0,1}' + r'(?P<port>(?<=:)[\d]+){0,1}' r'(?P<pathname>\/?(?P<owner>.+)/(?P<name>.+).git)$'), re.compile(r'((?P<user>\w+)@)?' - r'((?P<resource>[\w\.\-]+))' - r'[\:\/]{1,2}' - r'(?P<pathname>((?P<owner>([\w\-]+\/)?\w+)/)?' - r'((?P<name>[\w\-]+)(\.git|\/)?)?)$'), + r'((?P<resource>[\w\.\-]+))' + r'[\:\/]{1,2}' + r'(?P<pathname>((?P<owner>([\w\-]+\/)?\w+)/)?' + r'((?P<name>[\w\-]+)(\.git|\/)?)?)$'), re.compile(r'((?P<user>\w+)@)?' r'((?P<resource>[\w\.\-]+))' r'[\:\/]{1,2}' @@ -107,6 +108,11 @@ def parse(self) -> Parsed: 'name': None, 'owner': None, } + # Parsing is super slow even after fixing obvious problems in regexps. + # This mitigates the damage of quadratic behavior. + if len(self._url) > 1024: + msg = f"URL exceeds maximum supported length of 1024: {self._url}" + raise ParserError(msg) for regex in POSSIBLE_REGEXES: match = regex.search(self._url) if match:
src/osemgrep/TOPORT/external/git_url_parser.py+61 −66 modified@@ -1,5 +1,6 @@ # This file is forked from https://github.com/coala/git-url-parse/blob/master/giturlparse/parser.py # MIT license here: https://github.com/coala/git-url-parse/blob/master/LICENSE + # Copyright (c) 2017 John Dewey # # Permission is hereby granted, free of charge, to any person obtaining a copy @@ -22,71 +23,60 @@ # # 2023-06-02 patched by Martin Jambon to avoid potential ReDoS attacks +# 2023-06-05 patched further by Martin Jambon to avoid potential ReDoS attacks # import collections import re from typing import List -Parsed = collections.namedtuple( - "Parsed", - [ - "pathname", - "protocols", - "protocol", - "href", - "resource", - "user", - "port", - "name", - "owner", - ], -) +Parsed = collections.namedtuple('Parsed', [ + 'pathname', + 'protocols', + 'protocol', + 'href', + 'resource', + 'user', + 'port', + 'name', + 'owner', +]) POSSIBLE_REGEXES = ( - re.compile( - r"^(?P<protocol>https?|git|ssh|rsync)\://" - r"(?:(?P<user>[^\n@]+)@)*" - r"(?P<resource>[a-z0-9_.-]*)" - r"[:/]*" - r"(?P<port>[\d]+){0,1}" - r"(?P<pathname>\/((?P<owner>[\w\-]+)\/)?" - r"((?P<name>[\w\-\.]+?)(\.git|\/)?)?)$" - ), - re.compile( - r"(git\+)?" - r"((?P<protocol>\w+)://)" - r"((?P<user>\w+)@)?" - r"((?P<resource>[\w\.\-]+))" - r"(:(?P<port>\d+))?" - r"(?P<pathname>(\/(?P<owner>\w+)/)?" - r"(\/?(?P<name>[\w\-]+)(\.git|\/)?)?)$" - ), - re.compile( - r"^(?:(?P<user>[^\n@]+)@)*" - r"(?P<resource>[a-z0-9_.-]*)[:]*" - r"(?P<port>[\d]+){0,1}" - r"(?P<pathname>\/?(?P<owner>.+)/(?P<name>.+).git)$" - ), - re.compile( - r"((?P<user>\w+)@)?" - r"((?P<resource>[\w\.\-]+))" - r"[\:\/]{1,2}" - r"(?P<pathname>((?P<owner>([\w\-]+\/)?\w+)/)?" - r"((?P<name>[\w\-]+)(\.git|\/)?)?)$" - ), - re.compile( - r"((?P<user>\w+)@)?" - r"((?P<resource>[\w\.\-]+))" - r"[\:\/]{1,2}" - r"(?P<pathname>((?P<owner>\w+)/)?" - r"((?P<name>[\w\-]+)(\.git|\/)?)?)$" - ), + re.compile(r'^(?P<protocol>https?|git|ssh|rsync)\://' + r'(?:(?P<user>[^\n@]+)@)*' + r'(?P<resource>[a-z0-9_.-]*)' + r'[:/]*' + r'(?P<port>(?<=:)[\d]+){0,1}' + r'(?P<pathname>\/((?P<owner>[\w\-\/]+)\/)?' + r'((?P<name>[\w\-\.]+?)(\.git|\/)?)?)$'), + re.compile(r'(git\+)?' + r'((?P<protocol>\w+)://)' + r'((?P<user>\w+)@)?' + r'((?P<resource>[\w\.\-]+))' + r'(:(?P<port>\d+))?' + r'(?P<pathname>(\/(?P<owner>\w+)/)?' + r'(\/?(?P<name>[\w\-]+)(\.git|\/)?)?)$'), + re.compile(r'^(?:(?P<user>[^\n@]+)@)*' + r'(?P<resource>[a-z0-9_.-]*)[:]*' + r'(?P<port>(?<=:)[\d]+){0,1}' + r'(?P<pathname>\/?(?P<owner>.+)/(?P<name>.+).git)$'), + re.compile(r'((?P<user>\w+)@)?' + r'((?P<resource>[\w\.\-]+))' + r'[\:\/]{1,2}' + r'(?P<pathname>((?P<owner>([\w\-]+\/)?\w+)/)?' + r'((?P<name>[\w\-]+)(\.git|\/)?)?)$'), + re.compile(r'((?P<user>\w+)@)?' + r'((?P<resource>[\w\.\-]+))' + r'[\:\/]{1,2}' + r'(?P<pathname>((?P<owner>\w+)/)?' + r'((?P<name>[\w\-\.]+)(\.git|\/)?)?)$'), ) class ParserError(Exception): - """Error raised when a URL can't be parsed.""" + """ Error raised when a URL can't be parsed. """ + pass class Parser(str): @@ -98,7 +88,7 @@ def __init__(self, url: str): # to fix an open bug with trailing slashes: https://github.com/coala/git-url-parse/issues/46 self._url: str = url if url[-1] == "/": - self._url = url[:-1] + self._url = url[:-1] def parse(self) -> Parsed: """ @@ -108,31 +98,36 @@ def parse(self) -> Parsed: :raise: :class:`.ParserError` """ d = { - "pathname": None, - "protocols": self._get_protocols(), - "protocol": "ssh", - "href": self._url, - "resource": None, - "user": None, - "port": None, - "name": None, - "owner": None, + 'pathname': None, + 'protocols': self._get_protocols(), + 'protocol': 'ssh', + 'href': self._url, + 'resource': None, + 'user': None, + 'port': None, + 'name': None, + 'owner': None, } + # Parsing is super slow even after fixing obvious problems in regexps. + # This mitigates the damage of quadratic behavior. + if len(self._url) > 1024: + msg = f"URL exceeds maximum supported length of 1024: {self._url}" + raise ParserError(msg) for regex in POSSIBLE_REGEXES: match = regex.search(self._url) if match: d.update(match.groupdict()) break else: - msg = f"Invalid URL '{self._url}'" + msg = "Invalid URL '{}'".format(self._url) raise ParserError(msg) return Parsed(**d) def _get_protocols(self) -> List[str]: try: - index = self._url.index("://") + index = self._url.index('://') except ValueError: return [] - return self._url[:index].split("+") + return self._url[:index].split('+')
55cafa9bdefbFix for ReDoS vulnerability (#7943)
3 files changed · +16 −5
changelog.d/gh-7943.fixed+2 −0 added@@ -0,0 +1,2 @@ +Fix regexp potentially vulnerable to ReDoS attacks in Python code for parsing +git URLs. Reported by Sebastian Chnelik, PyUp.io.
cli/src/semgrep/external/git_url_parser.py+7 −3 modified@@ -21,6 +21,10 @@ # FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER # DEALINGS IN THE SOFTWARE. +# +# 2023-06-02 patched by Martin Jambon to avoid potential ReDoS attacks +# + import collections import re from typing import List @@ -39,7 +43,7 @@ POSSIBLE_REGEXES = ( re.compile(r'^(?P<protocol>https?|git|ssh|rsync)\://' - r'(?:(?P<user>.+)@)*' + r'(?:(?P<user>[^\n@]+)@)*' r'(?P<resource>[a-z0-9_.-]*)' r'[:/]*' r'(?P<port>[\d]+){0,1}' @@ -52,7 +56,7 @@ r'(:(?P<port>\d+))?' r'(?P<pathname>(\/(?P<owner>\w+)/)?' r'(\/?(?P<name>[\w\-]+)(\.git|\/)?)?)$'), - re.compile(r'^(?:(?P<user>.+)@)*' + re.compile(r'^(?:(?P<user>[^\n@]+)@)*' r'(?P<resource>[a-z0-9_.-]*)[:]*' r'(?P<port>[\d]+){0,1}' r'(?P<pathname>\/?(?P<owner>.+)/(?P<name>.+).git)$'), @@ -120,4 +124,4 @@ def _get_protocols(self) -> List[str]: except ValueError: return [] - return self._url[:index].split('+') \ No newline at end of file + return self._url[:index].split('+')
src/osemgrep/TOPORT/external/git_url_parser.py+7 −2 modified@@ -19,6 +19,11 @@ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING # FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER # DEALINGS IN THE SOFTWARE. + +# +# 2023-06-02 patched by Martin Jambon to avoid potential ReDoS attacks +# + import collections import re from typing import List @@ -41,7 +46,7 @@ POSSIBLE_REGEXES = ( re.compile( r"^(?P<protocol>https?|git|ssh|rsync)\://" - r"(?:(?P<user>.+)@)*" + r"(?:(?P<user>[^\n@]+)@)*" r"(?P<resource>[a-z0-9_.-]*)" r"[:/]*" r"(?P<port>[\d]+){0,1}" @@ -58,7 +63,7 @@ r"(\/?(?P<name>[\w\-]+)(\.git|\/)?)?)$" ), re.compile( - r"^(?:(?P<user>.+)@)*" + r"^(?:(?P<user>[^\n@]+)@)*" r"(?P<resource>[a-z0-9_.-]*)[:]*" r"(?P<port>[\d]+){0,1}" r"(?P<pathname>\/?(?P<owner>.+)/(?P<name>.+).git)$"
52d6328f1e42fix(cli): git URL parsing for subgroups (#7611)
4 files changed · +27 −7
changelog.d/pa-2669.fixed+3 −0 added@@ -0,0 +1,3 @@ +CLI: Fixed a bug where Git projects with URLs with subgroups would not parse correctly, +and produce non-clickable links in Semgrep App. These are such as: +https://gitlab.com/example/group2/group3/test-case.git
cli/src/semgrep/external/git_url_parser.py+1 −1 modified@@ -43,7 +43,7 @@ r'(?P<resource>[a-z0-9_.-]*)' r'[:/]*' r'(?P<port>[\d]+){0,1}' - r'(?P<pathname>\/((?P<owner>[\w\-]+)\/)?' + r'(?P<pathname>\/((?P<owner>[\w\-\/]+)\/)?' r'((?P<name>[\w\-\.]+?)(\.git|\/)?)?)$'), re.compile(r'(git\+)?' r'((?P<protocol>\w+)://)'
cli/tests/e2e/test_ci.py+0 −6 modified@@ -282,11 +282,6 @@ def mock_autofix(request, mocker): "SEMGREP_APP_TOKEN": "dummy", "SEMGREP_REPO_URL": REMOTE_REPO_URL, }, - { # Same as above, but with a repo URL that has a dot in the name. - # This used to cause the URL parser to crash. - "SEMGREP_APP_TOKEN": "dummy", - "SEMGREP_REPO_URL": "https://test@dev.azure.com/test/TestName/_git/Core.Thing", - }, { # Github full scan "CI": "true", "GITHUB_ACTIONS": "true", @@ -557,7 +552,6 @@ def mock_autofix(request, mocker): ], ids=[ "local", - "local_with_dot_url", "github-push", "github-push-special-env-vars", "github-enterprise",
cli/tests/e2e/test_meta.py+23 −0 added@@ -0,0 +1,23 @@ +from semgrep.meta import get_url_from_sstp_url + + +def test_git_url_parser(): + tests = [ + # This used to cause the URL parser to crash. + ( + "https://test@dev.azure.com/test/TestName/_git/Core.Thing", + "https://dev.azure.com/test/TestName/_git/Core.Thing", + ), + # This one has a "subgroup" structure, which we should be able to parse. + ( + "https://gitlab.com/example/group2/group3/test-case.git", + "https://gitlab.com/example/group2/group3/test-case", + ), + ( + "https://gitlab.com/example/test-case.git", + "https://gitlab.com/example/test-case", + ), + ] + + for url, expected in tests: + assert get_url_from_sstp_url(url) == expected
Vulnerability mechanics
Root cause
"Catastrophic backtracking in regular expressions used to parse Git URLs, triggered by crafted inputs with repeated characters."
Attack vector
An attacker crafts a malicious Git URL containing a ReDoS payload—for example, a long string of `@` characters (e.g., `git://` + `'@' * 10000`) or a long uniform string like `'0' * 5000`. If Semgrep analyzes an untrusted package that references such a URL (e.g., to check whether the package accesses a Git repository at an http:// URL), the parser enters catastrophic backtracking, consuming excessive CPU time. No authentication or special network position is required; the attacker only needs to supply the malicious URL within a package that Semgrep processes [patch_id=1640779].
Affected code
The vulnerability resides in the git URL parser at `cli/src/semgrep/external/git_url_parser.py` (and its copy at `src/osemgrep/TOPORT/external/git_url_parser.py`). The `POSSIBLE_REGEXES` list contains several regular expressions that exhibit catastrophic backtracking when processing crafted inputs, such as a long sequence of `@` characters or a long string of zeros. The `parse()` method applies these regexes via `regex.search(self._url)` without any input-length guard, allowing an attacker to cause a denial of service.
What the fix does
Patch [patch_id=1640779] replaces the greedy `.+` in the user-capture groups with `[^\n@]+`, preventing catastrophic backtracking when many `@` characters are present. Patch [patch_id=1640787] adds a hard URL-length limit of 1024 characters in `parse()`, rejecting overly long inputs before any regex matching occurs. It also tightens the port-matching regex from `[\d]+` to `(?<=:)[\d]+` to avoid unnecessary backtracking. Together these changes eliminate the ReDoS vector by both hardening the regex patterns and capping input size.
Preconditions
- inputSemgrep must analyze an untrusted package that contains a crafted Git URL
- configThe crafted URL must be passed to the git URL parser (e.g., via get_repo_name_from_repo_url)
Generated on May 23, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.
References
7- github.com/advisories/GHSA-4xqq-73wg-5mjpghsaADVISORY
- nvd.nist.gov/vuln/detail/CVE-2023-32758ghsaADVISORY
- github.com/coala/git-url-parse/blob/master/giturlparse/parser.pyghsaWEB
- github.com/returntocorp/semgrep/pull/7611ghsaWEB
- github.com/returntocorp/semgrep/pull/7943ghsaWEB
- github.com/returntocorp/semgrep/pull/7955ghsaWEB
- pypi.org/project/git-url-parseghsaWEB
News mentions
0No linked articles in our index yet.