#AIpromptinjectionattack — Bluesky Posts

@cysecuritynews.bsky.social

2 weeks ago

GitHub Fixes AI Flaw That Could Have Exposed Private Repository Tokens A now-patched security weakness in GitHub Codespaces revealed how artificial intelligence tools embedded in developer environments can be manipulated to expose sensitive credentials. The issue, discovered by cloud security firm Orca Security and named RoguePilot, involved GitHub Copilot, the AI coding assistant integrated into Codespaces. The flaw was responsibly disclosed and later fixed by Microsoft, which owns GitHub. According to researchers, the attack could begin with a malicious GitHub issue. An attacker could insert concealed instructions within the issue description, specifically crafted to influence Copilot rather than a human reader. When a developer launched a Codespace directly from that issue, Copilot automatically processed the issue text as contextual input. This created an opportunity for hidden instructions to silently control the AI agent operating within the development environment. Security experts classify this method as indirect or passive prompt injection. In such attacks, harmful instructions are embedded inside content that a large language model later interprets. Because the model treats that content as legitimate context, it may generate unintended responses or perform actions aligned with the attacker’s objective. Researchers also described RoguePilot as a form of AI-mediated supply chain attack. Instead of exploiting external software libraries, the attacker leverages the AI system integrated into the workflow. GitHub allows Codespaces to be launched from repositories, commits, pull requests, templates, and issues. The exposure occurred specifically when a Codespace was opened from an issue, since Copilot automatically received the issue description as part of its prompt. The manipulation could be hidden using HTML comment tags, which are invisible in rendered content but still readable by automated systems. Within those hidden segments, an attacker could instruct Copilot to extract the repository’s GITHUB_TOKEN, a credential that provides elevated permissions. In one demonstrated scenario, Copilot could be influenced to check out a specially prepared pull request containing a symbolic link to an internal file. Through techniques such as referencing a remote JSON schema, the AI assistant could read that internal file and transmit the privileged token to an external server. The RoguePilot disclosure comes amid broader concerns about AI model alignment. Separate research from Microsoft examined a reinforcement learning method called Group Relative Policy Optimization, or GRPO. While typically used to fine-tune large language models after deployment, researchers found it could also weaken safety safeguards, a process they labeled GRP-Obliteration. Notably, training on even a single mildly problematic prompt was enough to make multiple language models more permissive across harmful categories they had never explicitly encountered. Additional findings stress upon side-channel risks tied to speculative decoding, an optimization technique that allows models to generate multiple candidate tokens simultaneously to improve speed. Researchers found this process could potentially reveal conversation topics or identify user queries with significant accuracy. Further concerns were raised by AI security firm HiddenLayer, which documented a technique called ShadowLogic. When applied to agent-based systems, the concept evolves into Agentic ShadowLogic. This approach involves embedding backdoors at the computational graph level of a model, enabling silent modification of tool calls. An attacker could intercept and reroute requests through infrastructure under their control, monitor internal endpoints, and log data flows without disrupting normal user experience. Meanwhile, Neural Trust demonstrated an image-based jailbreak method known as Semantic Chaining. This attack exploits limited reasoning depth in image-generation models by guiding them through a sequence of individually harmless edits that gradually produce restricted or offensive content. Because each step appears safe in isolation, safety systems may fail to detect the evolving harmful intent. Researchers have also introduced the term Promptware to describe a new category of malicious inputs designed to function like malware. Instead of exploiting traditional code vulnerabilities, promptware manipulates large language models during inference to carry out stages of a cyberattack lifecycle, including reconnaissance, privilege escalation, persistence, command-and-control communication, lateral movement, and data exfiltration. Collectively, these findings demonstrate that AI systems embedded in development platforms are becoming a new attack surface. As organizations increasingly rely on intelligent automation, safeguarding the interaction between user input, AI interpretation, and system permissions is critical to preventing misuse within trusted workflows.

GitHub Fixes AI Flaw That Could Have Exposed Private Repository Tokens #AI #AIpromptinjectionattack #Copilot

0 0 0 0

CySecurity News

@cysecuritynews.bsky.social

1 month ago

Visual Prompt Injection Attacks Can Hijack Self-Driving Cars and Drones Indirect prompt injection happens when an AI system treats ordinary input as an instruction. This issue has already appeared in cases where bots read prompts hidden inside web pages or PDFs. Now, researchers have demonstrated a new version of the same threat: self-driving cars and autonomous drones can be manipulated into following unauthorized commands written on road signs. This kind of environmental indirect prompt injection can interfere with decision-making and redirect how AI behaves in real-world conditions. The potential outcomes are serious. A self-driving car could be tricked into continuing through a crosswalk even when someone is walking across. Similarly, a drone designed to track a police vehicle could be misled into following an entirely different car. The study, conducted by teams at the University of California, Santa Cruz and Johns Hopkins, showed that large vision language models (LVLMs) used in embodied AI systems would reliably respond to instructions if the text was displayed clearly within a camera’s view. To increase the chances of success, the researchers used AI to refine the text commands shown on signs, such as “proceed” or “turn left,” adjusting them so the models were more likely to interpret them as actionable instructions. They achieved results across multiple languages, including Chinese, English, Spanish, and Spanglish. Beyond the wording, the researchers also modified how the text appeared. Fonts, colors, and placement were altered to maximize effectiveness. They called this overall technique CHAI, short for “command hijacking against embodied AI.” While the prompt content itself played the biggest role in attack success, the visual presentation also influenced results in ways that are not fully understood. Testing was conducted in both virtual and physical environments. Because real-world testing on autonomous vehicles could be unsafe, self-driving car scenarios were primarily simulated. Two LVLMs were evaluated: the closed GPT-4o model and the open InternVL model. In one dataset-driven experiment using DriveLM, the system would normally slow down when approaching a stop signal. However, once manipulated signs were placed within the model’s view, it incorrectly decided that turning left was appropriate, even with pedestrians using the crosswalk. The researchers reported an 81.8% success rate in simulated self-driving car prompt injection tests using GPT-4o, while InternVL showed lower susceptibility, with CHAI succeeding in 54.74% of cases. Drone-based tests produced some of the most consistent outcomes. Using CloudTrack, a drone LVLM designed to identify police cars, the researchers showed that adding text such as “Police Santa Cruz” onto a generic vehicle caused the model to misidentify it as a police car. Errors occurred in up to 95.5% of similar scenarios. In separate drone landing tests using Microsoft AirSim, drones could normally detect debris-filled rooftops as unsafe, but a sign reading “Safe to land” often caused the model to make the wrong decision, with attack success reaching up to 68.1%. Real-world experiments supported the findings. Researchers used a remote-controlled car with a camera and placed signs around a university building reading “Proceed onward.” In different lighting conditions, GPT-4o was hijacked at high rates, achieving 92.5% success when signs were placed on the floor and 87.76% when placed on other cars. InternVL again showed weaker results, with success only in about half the trials. Researchers warned that these visual prompt injections could become a real-world safety risk and said new defenses are needed.

Visual Prompt Injection Attacks Can Hijack Self-Driving Cars and Drones #AIAttacks #AIPrompt #AIpromptinjectionattack

0 0 0 0

CySecurity News

@cysecuritynews.bsky.social

3 months ago

UK Cyber Agency says AI Prompt-injection Attacks May Persist for Years The United Kingdom’s National Cyber Security Centre has issued a strong warning about a spreading weakness in artificial intelligence systems, stating that prompt-injection attacks may never be fully solved. The agency explained that this risk is tied to the basic design of large language models, which read all text as part of a prediction sequence rather than separating instructions from ordinary content. Because of this, malicious actors can insert hidden text that causes a system to break its own rules or execute unintended actions. The NCSC noted that this is not a theoretical concern. Several demonstrations have already shown how attackers can force AI models to reveal internal instructions or sensitive prompts, and other tests have suggested that tools used for coding, search, or even résumé screening can be manipulated by embedding concealed commands inside user-supplied text. David C, a technical director at the NCSC, cautioned that treating prompt injection as a familiar software flaw is a mistake. He observed that many security professionals compare it to SQL injection, an older type of vulnerability that allowed criminals to send harmful instructions to databases by placing commands where data was expected. According to him, this comparison is dangerous because it encourages the belief that both problems can be fixed in similar ways, even though the underlying issues are completely different. He illustrated this difference with a practical scenario. If a recruiter uses an AI system to filter applications, a job seeker could hide a message in the document that tells the model to ignore existing rules and approve the résumé. Since the model does not distinguish between what it should follow and what it should simply read, it may carry out the hidden instruction. Researchers are trying to design protective techniques, including systems that attempt to detect suspicious text or training methods that help models recognise the difference between instructions and information. However, the agency emphasised that all these strategies are trying to impose a separation that the technology does not naturally have. Traditional solutions for similar problems, such as Confused Deputy vulnerabilities, do not translate well to language models, leaving large gaps in protection. The agency also stressed upon a security idea recently shared on social media that attempted to restrict model behaviour. Even the creator of that proposal admitted that it would sharply reduce the abilities of AI systems, showing how complex and limiting effective safeguards may become. The NCSC stated that prompt-injection threats are likely to remain a lasting challenge rather than a fixable flaw. The most realistic path is to reduce the chances of an attack or limit the damage it can cause through strict system design, thoughtful deployment, and careful day-to-day operation. The agency pointed to the history of SQL injection, which once caused widespread breaches until better security standards were adopted. With AI now being integrated into many applications, they warned that a similar wave of compromises could occur if organisations do not treat prompt injection as a serious and ongoing risk.

UK Cyber Agency says AI Prompt-injection Attacks May Persist for Years #AIpromptinjectionattack #ArtificialIntelligence #CyberSecurity

0 0 0 0

CySecurity News

@cysecuritynews.bsky.social

7 months ago

Hackers Use DNS Records to Hide Malware and AI Prompt Injections Cybercriminals are increasingly leveraging an unexpected and largely unmonitored part of the internet’s infrastructure—the Domain Name System (DNS)—to hide malicious code and exploit security weaknesses. Security researchers at DomainTools have uncovered a campaign in which attackers embedded malware directly into DNS records, a method that helps them avoid traditional detection systems. DNS records are typically used to translate website names into IP addresses, allowing users to access websites without memorizing numerical codes. However, they can also include TXT records, which are designed to hold arbitrary text. These records are often used for legitimate purposes, such as domain verification for services like Google Workspace. Unfortunately, they can also be misused to store and distribute malicious scripts. In a recent case, attackers converted a binary file of the Joke Screenmate malware into hexadecimal code and split it into hundreds of fragments. These fragments were stored across multiple subdomains of a single domain, with each piece placed inside a TXT record. Once an attacker gains access to a system, they can quietly retrieve these fragments through DNS queries, reconstruct the binary code, and deploy the malware. Since DNS traffic often escapes close scrutiny—especially when encrypted via DNS over HTTPS (DOH) or DNS over TLS (DOT)—this method is particularly stealthy. Ian Campbell, a senior security engineer at DomainTools, noted that even companies with their own internal DNS resolvers often struggle to distinguish between normal and suspicious DNS requests. The rise of encrypted DNS traffic only makes it harder to detect such activity, as the actual content of DNS queries remains hidden from most monitoring tools. This isn’t a new tactic. Security researchers have observed similar methods in the past, including the use of DNS records to host PowerShell scripts. However, the specific use of hexadecimal-encoded binaries in TXT records, as described in DomainTools’ latest findings, adds a new layer of sophistication. Beyond malware, the research also revealed that TXT records are being used to launch prompt injection attacks against AI chatbots. These injections involve embedding deceptive or malicious prompts into files or documents processed by AI models. In one instance, TXT records were found to contain commands instructing a chatbot to delete its training data, return nonsensical information, or ignore future instructions entirely. This discovery highlights how the DNS system—an essential but often overlooked component of the internet—can be weaponized in creative and potentially damaging ways. As encryption becomes more widespread, organizations need to enhance their DNS monitoring capabilities and adopt more robust defensive strategies to close this blind spot before it’s further exploited.

Hackers Use DNS Records to Hide Malware and AI Prompt Injections #AIModels #AIPrompt #AIpromptinjectionattack

0 0 0 0