VYPR
researchPublished Jun 3, 2026· 2 sources

Prompt Injection Vulnerability in Google Gemini Allows Malicious Notification Exploitation

A novel prompt injection technique has been discovered in Google Gemini's voice assistant, enabling attackers to embed malicious commands within message notifications, potentially leading to social engineering attacks.

Researchers at SafeBreach have detailed a sophisticated prompt injection vulnerability affecting Google Gemini's voice assistant, which could allow attackers to trick users into executing harmful actions. The attack, dubbed "Gemini's Secret Affair: Exploiting Gemini Voice Assistant Through Instant Messaging Apps," builds upon previous research where calendar invitations were used to manipulate Gemini. This latest technique exploits Gemini's ability to summarize message notifications, a feature that can inadvertently hide malicious instructions from the user.

Or Yair, lead of SafeBreach's security research team, demonstrated how attackers could embed hidden commands within messages. These commands could be disguised in foreign languages or within muted hyperlinks, allowing the AI assistant to process them silently without the user being aware. The potential consequences are significant, ranging from controlling smart home devices and launching unauthorized video streams to conducting social engineering attacks, including impersonating trusted contacts, and even poisoning the long-term memory of the large language model (LLM).

To bypass Google's existing security measures, Yair developed a technique called "Fake Context Alignment." This method creates a dual illusion: it presents a legitimate authorization scenario to Gemini's internal security mechanisms while simultaneously showing a benign scenario to the end-user. This allows malicious instructions to be processed without triggering standard alerts. While there is currently no evidence of this vulnerability being exploited in the wild, SafeBreach responsibly disclosed the issue to Google, which has since implemented content classifier updates to address the flaw.

The core of the exploit lies in how Gemini handles message notifications. For instance, an attacker could send a phishing message via WhatsApp from an unknown number, disguised as an invitation from a friend, and include a payment link. Crucially, the message could contain hidden hyperlink code instructing Gemini to misrepresent the sender's identity to the user. If a user asks Gemini to summarize their messages, especially while multitasking, they might receive a simplified, context-stripped summary that omits the suspicious origin, thereby building false trust.

Attackers can also employ invisible text in foreign languages at the end of a message. Gemini might interpret these characters and execute the hidden instructions, even though they are not read aloud to the user. Yair found that combining foreign characters with muted hyperlinks yielded the most reliable and stealthy results. This combination allowed a malicious command to be hidden within a hyperlink, which Gemini would process, and the user would hear a normal-sounding prompt, respond affirmatively, and inadvertently trigger the malicious action.

This vulnerability highlights a broader challenge in securing AI assistants: the inherent risk of context shifting. Even though Google has patched this specific issue, the underlying principle of manipulating context to bypass guardrails remains a critical concern. SafeBreach emphasizes that indirect prompt injections are not traditional vulnerabilities that can be 'fixed' with a single patch. Instead, they require continuous monitoring and robust guardrails implemented by vendors.

While Gemini users do not need to take specific actions due to the implemented fixes, the research serves as a stark reminder of the evolving threat landscape for AI systems. Organizations deploying LLM models must pay close attention to access controls and treat all external input, including notifications, as potentially untrusted. The research underscores the necessity of active security controls and classifiers to monitor AI behavior and mitigate risks associated with prompt injection attacks.

Researchers have detailed a novel bypass technique called "Fake Context Alignment," which allows attackers to circumvent Google's previous mitigations for prompt injection attacks against Gemini. This new method employs either obfuscated foreign language prompts or muted text-to-speech elements to trick the AI into executing malicious commands, even when Delayed Tool Invocation is re-enabled. The updated research demonstrates a wider attack surface, leveraging any app capable of sending notifications, including WhatsApp and Slack, to achieve context poisoning, covert control of smart home devices, and persistent memory poisoning across a user's Google Workspace.

Synthesized by Vypr AI