• Menu
  • Skip to main content
  • Skip to primary sidebar

The Cyber Security News

Latest Cyber Security News

Header Right

  • Latest News
  • Vulnerabilities
  • Cloud Services
meta launches llamafirewall framework to stop ai jailbreaks, injections, and

Meta Launches LlamaFirewall Framework to Stop AI Jailbreaks, Injections, and Insecure Code

You are here: Home / General Cyber Security News / Meta Launches LlamaFirewall Framework to Stop AI Jailbreaks, Injections, and Insecure Code
April 30, 2025

Meta on Tuesday announced LlamaFirewall, an open-source framework designed to secure artificial intelligence (AI) systems against emerging cyber risks such as prompt injection, jailbreaks, and insecure code, among others.

The framework, the company said, incorporates three guardrails, including PromptGuard 2, Agent Alignment Checks, and CodeShield.

PromptGuard 2 is designed to detect direct jailbreak and prompt injection attempts in real-time, while Agent Alignment Checks is capable of inspecting agent reasoning for possible goal hijacking and indirect prompt injection scenarios.

✔ Approved Seller From Our Partners
Mullvad VPN Discount

Protect your privacy by Mullvad VPN. Mullvad VPN is one of the famous brands in the security and privacy world. With Mullvad VPN you will not even be asked for your email address. No log policy, no data from you will be saved. Get your license key now from the official distributor of Mullvad with discount: SerialCart® (Limited Offer).

➤ Get Mullvad VPN with 12% Discount


Cybersecurity

CodeShield refers to an online static analysis engine that seeks to prevent the generation of insecure or dangerous code by AI agents.

“LlamaFirewall is built to serve as a flexible, real-time guardrail framework for securing LLM-powered applications,” the company said in a GitHub description of the project.

“Its architecture is modular, enabling security teams and developers to compose layered defenses that span from raw input ingestion to final output actions – across simple chat models and complex autonomous agents.”

Alongside LlamaFirewall, Meta has made available updated versions of LlamaGuard and CyberSecEval to better detect various common types of violating content and measure the defensive cybersecurity capabilities of AI systems, respectively.

CyberSecEval 4 also includes a new benchmark called AutoPatchBench, which is engineered to evaluate the ability of a large language model (LLM) agent to automatically repair a wide range of C/C++ vulnerabilities identified through fuzzing, an approach known as AI-powered patching.

“AutoPatchBench provides a standardized evaluation framework for assessing the effectiveness of AI-assisted vulnerability repair tools,” the company said. “This benchmark aims to facilitate a comprehensive understanding of the capabilities and limitations of various AI-driven approaches to repairing fuzzing-found bugs.”

Cybersecurity

Lastly, Meta has launched a new program dubbed Llama for Defenders to help partner organizations and AI developers access open, early-access, and closed AI solutions to address specific security challenges, such as detecting AI-generated content used in scams, fraud, and phishing attacks.

The announcements come as WhatsApp previewed a new technology called Private Processing to allow users to harness AI features without compromising their privacy by offloading the requests to a secure, confidential environment.

“We’re working with the security community to audit and improve our architecture and will continue to build and strengthen Private Processing in the open, in collaboration with researchers, before we launch it in product,” Meta said.

Found this article interesting? Follow us on Twitter  and LinkedIn to read more exclusive content we post.


Some parts of this article are sourced from:
thehackernews.com

Previous Post: «indian court orders action to block proton mail over ai Indian Court Orders Action to Block Proton Mail Over AI Deepfake Abuse Allegations
Next Post: RansomHub Went Dark April 1; Affiliates Fled to Qilin, DragonForce Claimed Control ransomhub went dark april 1; affiliates fled to qilin, dragonforce»

Reader Interactions

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Primary Sidebar

Report This Article

Recent Posts

  • Zero-Click Agentic Browser Attack Can Delete Entire Google Drive Using Crafted Emails
  • Critical XXE Bug CVE-2025-66516 (CVSS 10.0) Hits Apache Tika, Requires Urgent Patch
  • Chinese Hackers Have Started Exploiting the Newly Disclosed React2Shell Vulnerability
  • Intellexa Leaks Reveal Zero-Days and Ads-Based Vector for Predator Spyware Delivery
  • “Getting to Yes”: An Anti-Sales Guide for MSPs
  • CISA Reports PRC Hackers Using BRICKSTORM for Long-Term Access in U.S. Systems
  • JPCERT Confirms Active Command Injection Attacks on Array AG Gateways
  • Silver Fox Uses Fake Microsoft Teams Installer to Spread ValleyRAT Malware in China
  • ThreatsDay Bulletin: Wi-Fi Hack, npm Worm, DeFi Theft, Phishing Blasts— and 15 More Stories
  • 5 Threats That Reshaped Web Security This Year [2025]

Copyright © TheCyberSecurity.News, All Rights Reserved.