{"id":58125,"date":"2025-02-06T15:54:00","date_gmt":"2025-02-06T15:54:00","guid":{"rendered":"http:\/\/60401cbe-fce7-41d8-8ad2-f7ce427421de"},"modified":"2025-02-06T15:54:00","modified_gmt":"2025-02-06T15:54:00","slug":"anthropic-offers-20000-to-whoever-can-jailbreak-its-new-ai-safety-system","status":"publish","type":"post","link":"https:\/\/www.threatshub.org\/blog\/anthropic-offers-20000-to-whoever-can-jailbreak-its-new-ai-safety-system\/","title":{"rendered":"Anthropic offers $20,000 to whoever can jailbreak its new AI safety system"},"content":{"rendered":"<figure class=\"c-shortcodeImage u-clearfix c-shortcodeImage-large\">\n<div class=\"c-shortcodeImage_imageContainer\">\n<div class=\"c-shortcodeImage_image\"><picture class=\"c-cmsImage c-cmsImage_loaded\"><source media=\"(max-width: 767px)\" srcset=\"https:\/\/www.zdnet.com\/a\/img\/resize\/ea22eb71daa05a8b674a128c98957235b7dc01ba\/2025\/02\/04\/b5eeff24-2621-486e-a288-ce687dbd5a71\/gettyimages-1210077086.jpg?auto=webp&amp;precrop=2120,1191,x0,y66&amp;width=768\" alt=\"gettyimages-1210077086\"><source media=\"(max-width: 1023px)\" srcset=\"https:\/\/www.zdnet.com\/a\/img\/resize\/aa352af12015298b58857e833f50eccb4844a3e8\/2025\/02\/04\/b5eeff24-2621-486e-a288-ce687dbd5a71\/gettyimages-1210077086.jpg?auto=webp&amp;precrop=2120,1191,x0,y66&amp;width=1024\" alt=\"gettyimages-1210077086\"><source media=\"(max-width: 1440px)\" srcset=\"https:\/\/www.zdnet.com\/a\/img\/resize\/11778e3f6b4e21ba08719473c7fa5b7168bd7d22\/2025\/02\/04\/b5eeff24-2621-486e-a288-ce687dbd5a71\/gettyimages-1210077086.jpg?auto=webp&amp;precrop=2120,1191,x0,y66&amp;width=1280\" alt=\"gettyimages-1210077086\"><img decoding=\"async\" src=\"https:\/\/www.zdnet.com\/a\/img\/resize\/11778e3f6b4e21ba08719473c7fa5b7168bd7d22\/2025\/02\/04\/b5eeff24-2621-486e-a288-ce687dbd5a71\/gettyimages-1210077086.jpg?auto=webp&amp;precrop=2120,1191,x0,y66&amp;width=1280\" alt=\"gettyimages-1210077086\" width=\"1280\" height=\"719.0943396226414\" fetchpriority=\"low\"><\/picture><\/div>\n<p> <!----><\/div><figcaption> <span class=\"c-shortcodeImage_credit g-outer-spacing-top-xsmall u-block\">MirageC\/Getty Images<\/span><\/figcaption><\/figure>\n<p>Can you jailbreak Anthropic&#8217;s latest <a href=\"https:\/\/www.zdnet.com\/article\/what-is-ai-heres-everything-you-need-to-know-about-artificial-intelligence\/\">AI<\/a> safety measure? Researchers want you to try &#8212; and are offering up to $20,000 if you succeed.<\/p>\n<p>On Monday, the company <a href=\"https:\/\/arxiv.org\/abs\/2501.18837\" target=\"_blank\" rel=\"noopener nofollow\" class=\"c-regularLink\">released<\/a> a new paper outlining an AI safety system called Constitutional Classifiers. The process is based on&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/2212.08073\" target=\"_blank\" rel=\"noopener nofollow\" class=\"c-regularLink\">Constitutional AI<\/a>, a system Anthropic used to make Claude &#8220;harmless,&#8221; in which one AI helps monitor and improve another. Each technique is guided by a constitution, or &#8220;list of principles&#8221; that a model must abide by, Anthropic explained in a&nbsp;<a href=\"https:\/\/www.anthropic.com\/research\/constitutional-classifiers?utm_source=www.therundown.ai&amp;utm_medium=newsletter&amp;utm_campaign=softbank-and-openai-announce-cristal-intelligence&amp;_bhlid=b3ade57d44c16d7d8e04e7a18b153611ceacb6d3\" target=\"_blank\" rel=\"noopener nofollow\" class=\"c-regularLink\">blog<\/a>.&nbsp;<\/p>\n<p><strong>Also: <a href=\"https:\/\/www.zdnet.com\/article\/deepseeks-ai-model-proves-easy-to-jailbreak-and-worse\/\">Deepseek&#8217;s AI model proves easy to jailbreak &#8211; and worse<\/a><\/strong><\/p>\n<p>Trained on <a href=\"https:\/\/www.zdnet.com\/article\/can-synthetic-data-solve-ais-privacy-concerns-this-company-is-betting-on-it\/\">synthetic data<\/a>, these &#8220;classifiers&#8221; were able to filter the &#8220;overwhelming majority&#8221; of jailbreak attempts without excessive over-refusals (incorrect flags of harmless content as harmful), according to Anthropic.&nbsp;<\/p>\n<p>&#8220;The principles define the classes of content that are allowed and disallowed (for example, recipes for mustard are allowed, but recipes for mustard gas are not),&#8221; Anthropic noted. Researchers ensured prompts accounted for jailbreaking attempts in different languages and styles.&nbsp;<\/p>\n<figure class=\"c-shortcodeImage u-clearfix c-shortcodeImage-large c-shortcodeImage-hasCaption\">\n<div class=\"c-shortcodeImage_imageContainer\">\n<div class=\"c-shortcodeImage_image\"><picture class=\"c-cmsImage\"><!----> <img decoding=\"async\" src=\"https:\/\/www.zdnet.com\/article\/anthropic-offers-20000-to-whoever-can-jailbreak-its-new-ai-safety-system\/\" alt=\"2e997f9fca176fd82966ea5e9bf000873337cfd1-1650x1077\" width=\"1280\" height=\"835.4909090909091\" fetchpriority=\"low\"><\/picture><\/div>\n<p> <!----><\/div><figcaption>\n<div class=\"c-shortcodeImage_caption g-inner-spacing-right-small g-color-black\" readability=\"7\">\n<div class=\"c-ShortcodeContent\" readability=\"34\">\n<p>Constitutional Classifiers define harmless and harmful content categories, on which Anthropic built a training set of prompts and completions.&nbsp;<\/p>\n<\/div>\n<\/div>\n<p> <span class=\"c-shortcodeImage_credit g-outer-spacing-top-xsmall u-block\">Anthropic<\/span><\/figcaption><\/figure>\n<p>In initial testing, 183 human red-teamers spent more than 3,000 hours over two months attempting to <a href=\"https:\/\/www.zdnet.com\/article\/how-many-shot-jailbreaking-can-be-used-to-fool-ai\/\">jailbreak<\/a> Claude 3.5 Sonnet from a prototype of the system, which was trained not to share any information about &#8220;chemical, biological, radiological, and nuclear harms.&#8221; Jailbreakers were given 10 restricted queries to use as part of their attempts; breaches were only counted as successful if they got the model to answer all 10 in detail.&nbsp;<\/p>\n<p><!----><\/p>\n<p>The Constitutional Classifiers system proved effective. &#8220;None of the participants were able to coerce the model to answer all 10 forbidden queries with a single jailbreak &#8212; that is, no universal jailbreak was discovered,&#8221; Anthropic explained, meaning no one won the company&#8217;s $15,000 reward, either.&nbsp;<\/p>\n<p><strong>Also: <a href=\"https:\/\/www.zdnet.com\/article\/chatgpts-deep-research-just-identified-20-jobs-it-will-replace-is-yours-on-the-list\/\">ChatGPT&#8217;s Deep Research just identified 20 jobs it will replace. Is yours on the list?<\/a><\/strong><\/p>\n<p>The prototype &#8220;refused too many harmless queries&#8221; and was resource-intensive to run, making it secure but impractical. After improving it, Anthropic ran a test of 10,000 synthetic jailbreaking attempts on an October version of Claude 3.5 Sonnet with and without classifier protection using known successful attacks. Claude alone only blocked 14% of attacks, while Claude with Constitutional Classifiers blocked over 95%.&nbsp;<\/p>\n<figure class=\"c-shortcodeImage u-clearfix c-shortcodeImage-large\">\n<div class=\"c-shortcodeImage_imageContainer\">\n<div class=\"c-shortcodeImage_image\"><picture class=\"c-cmsImage\"><!----> <img decoding=\"async\" src=\"https:\/\/www.zdnet.com\/article\/anthropic-offers-20000-to-whoever-can-jailbreak-its-new-ai-safety-system\/\" alt=\"cd6520ed645ade7f12ab336cd02ef5954211dfa8-1650x1077\" width=\"1280\" height=\"835.4909090909091\" fetchpriority=\"low\"><\/picture><\/div>\n<p> <!----><\/div><figcaption> <span class=\"c-shortcodeImage_credit g-outer-spacing-top-xsmall u-block\">Anthropic<\/span><\/figcaption><\/figure>\n<p>But Anthropic still wants you to try beating it. The company stated in an&nbsp;<a href=\"https:\/\/x.com\/AnthropicAI\/status\/1887227067156386027\" target=\"_blank\" rel=\"noopener nofollow\" class=\"c-regularLink\">X post<\/a> on Wednesday that it is &#8220;now offering $10K to the first person to pass all eight levels, and $20K to the first person to pass all eight levels with a universal jailbreak.&#8221;<\/p>\n<p>Have prior red-teaming experience? You can try your chance at the reward by&nbsp;<a href=\"https:\/\/link.mail.beehiiv.com\/ss\/c\/u001.a3gBHu6_kDRL6l3yEfNWAZC-ROnOkp_EzJSiXFEJ3DuL3Gf1-VxaQMiQlKUWXWDk3p-adTjkk5YCu5uQXOBVQYzhkLzjC6YNouJn-3rPfKOMj7NCqZJeVPG7kqnIITsclYKr5IbkYB2UmVAubB1t98-79t88mJtvdX_SMW4Tf99nEgebwNOzqhMXWhDD7VlacYrA1JWYAemz0rFTGs1AccVmnDuf13sevL47XejQ9Dwb0fcdBxvMBFqXARtlr6QaYDTM_qErrqPGVF5BX46PdwPX0xkH6cDYYnGuR92VU_U\/4dq\/e2xs16QOT2CcI95zc8xvzQ\/h14\/h001.7UPJnelChczwsPbaMw0svmQKlILmB72y7DvDaMjTtSY\" target=\"_blank\" rel=\"noopener nofollow\" class=\"c-regularLink\">testing the system<\/a>&nbsp;yourself &#8212; with only eight required questions, instead of the original 10 &#8212; until Feb. 10.&nbsp;<\/p>\n<p><strong>Also:&nbsp;<a href=\"https:\/\/www.zdnet.com\/article\/the-us-copyright-offices-new-ruling-on-ai-art-is-here-and-it-could-change-everything\/\">The US Copyright Office&#8217;s new ruling on AI art is here &#8211; and it could change everything<\/a><\/strong><\/p>\n<p>&#8220;Constitutional Classifiers may not prevent every universal jailbreak, though we believe that even the small proportion of jailbreaks that make it past our classifiers require far more effort to discover when the safeguards are in use,&#8221; Anthropic continued. &#8220;It&#8217;s also possible that new jailbreaking techniques might be developed in the future that are effective against the system; we therefore recommend using&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/2411.07494\" target=\"_blank\" rel=\"noopener nofollow\" class=\"c-regularLink\">complementary<\/a>&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/2411.17693\" target=\"_blank\" rel=\"noopener nofollow\" class=\"c-regularLink\">defenses<\/a>. Nevertheless, the constitution used to train the classifiers can rapidly be adapted to cover novel attacks as they&#8217;re discovered.&#8221;<\/p>\n<p>The company said it&#8217;s also working on reducing the compute cost of Constitutional Classifiers, which it notes is currently high.&nbsp;<\/p>\n<div id=\"pinbox-60401cbe-fce7-41d8-8ad2-f7ce427421de\" class=\"c-shortcodePinbox-carousel\">\n<div class=\"c-listingCarouselHorizontal\">\n<h4 class=\"c-sectionHeading\">Artificial Intelligence<\/h4>\n<p> <!----> <span class=\"c-listingCarouselHorizontal_loadingIndicator\"><\/span> <\/div>\n<\/div>\n<p>READ MORE <a href=\"https:\/\/www.zdnet.com\/article\/anthropic-offers-20000-to-whoever-can-jailbreak-its-new-ai-safety-system\/\">HERE<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The company has upped its reward for red-teaming Constitutional Classifiers. Here&#8217;s how to try.READ MORE HERE&#8230;<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"colormag_page_layout":"default_layout","footnotes":""},"categories":[62],"tags":[],"class_list":["post-58125","post","type-post","status-publish","format-standard","hentry","category-zdnet-security"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.6 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Anthropic offers $20,000 to whoever can jailbreak its new AI safety system 2026 | ThreatsHub Cybersecurity News<\/title>\n<meta name=\"description\" content=\"ThreatsHub Cybersecurity News | ThreatsHub.org | Cloud Security &amp; Cyber Threats Analysis Hub. 100% Free OSINT Threat Intelligent and Cybersecurity News.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.threatshub.org\/blog\/anthropic-offers-20000-to-whoever-can-jailbreak-its-new-ai-safety-system\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Anthropic offers $20,000 to whoever can jailbreak its new AI safety system 2026 | ThreatsHub Cybersecurity News\" \/>\n<meta property=\"og:description\" content=\"ThreatsHub Cybersecurity News | ThreatsHub.org | Cloud Security &amp; Cyber Threats Analysis Hub. 100% Free OSINT Threat Intelligent and Cybersecurity News.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.threatshub.org\/blog\/anthropic-offers-20000-to-whoever-can-jailbreak-its-new-ai-safety-system\/\" \/>\n<meta property=\"og:site_name\" content=\"ThreatsHub Cybersecurity News\" \/>\n<meta property=\"article:published_time\" content=\"2025-02-06T15:54:00+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.zdnet.com\/a\/img\/resize\/11778e3f6b4e21ba08719473c7fa5b7168bd7d22\/2025\/02\/04\/b5eeff24-2621-486e-a288-ce687dbd5a71\/gettyimages-1210077086.jpg?auto=webp&amp;precrop=2120,1191,x0,y66&amp;width=1280\" \/>\n<meta name=\"author\" content=\"TH Author\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@threatshub\" \/>\n<meta name=\"twitter:site\" content=\"@threatshub\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"TH Author\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.threatshub.org\\\/blog\\\/anthropic-offers-20000-to-whoever-can-jailbreak-its-new-ai-safety-system\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.threatshub.org\\\/blog\\\/anthropic-offers-20000-to-whoever-can-jailbreak-its-new-ai-safety-system\\\/\"},\"author\":{\"name\":\"TH Author\",\"@id\":\"https:\\\/\\\/www.threatshub.org\\\/blog\\\/#\\\/schema\\\/person\\\/12e0a8671ff89a863584f193e7062476\"},\"headline\":\"Anthropic offers $20,000 to whoever can jailbreak its new AI safety system\",\"datePublished\":\"2025-02-06T15:54:00+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.threatshub.org\\\/blog\\\/anthropic-offers-20000-to-whoever-can-jailbreak-its-new-ai-safety-system\\\/\"},\"wordCount\":608,\"publisher\":{\"@id\":\"https:\\\/\\\/www.threatshub.org\\\/blog\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.threatshub.org\\\/blog\\\/anthropic-offers-20000-to-whoever-can-jailbreak-its-new-ai-safety-system\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.zdnet.com\\\/a\\\/img\\\/resize\\\/11778e3f6b4e21ba08719473c7fa5b7168bd7d22\\\/2025\\\/02\\\/04\\\/b5eeff24-2621-486e-a288-ce687dbd5a71\\\/gettyimages-1210077086.jpg?auto=webp&amp;precrop=2120,1191,x0,y66&amp;width=1280\",\"articleSection\":[\"ZDNet | Security\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.threatshub.org\\\/blog\\\/anthropic-offers-20000-to-whoever-can-jailbreak-its-new-ai-safety-system\\\/\",\"url\":\"https:\\\/\\\/www.threatshub.org\\\/blog\\\/anthropic-offers-20000-to-whoever-can-jailbreak-its-new-ai-safety-system\\\/\",\"name\":\"Anthropic offers $20,000 to whoever can jailbreak its new AI safety system 2026 | ThreatsHub Cybersecurity News\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.threatshub.org\\\/blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.threatshub.org\\\/blog\\\/anthropic-offers-20000-to-whoever-can-jailbreak-its-new-ai-safety-system\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.threatshub.org\\\/blog\\\/anthropic-offers-20000-to-whoever-can-jailbreak-its-new-ai-safety-system\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.zdnet.com\\\/a\\\/img\\\/resize\\\/11778e3f6b4e21ba08719473c7fa5b7168bd7d22\\\/2025\\\/02\\\/04\\\/b5eeff24-2621-486e-a288-ce687dbd5a71\\\/gettyimages-1210077086.jpg?auto=webp&amp;precrop=2120,1191,x0,y66&amp;width=1280\",\"datePublished\":\"2025-02-06T15:54:00+00:00\",\"description\":\"ThreatsHub Cybersecurity News | ThreatsHub.org | Cloud Security & Cyber Threats Analysis Hub. 100% Free OSINT Threat Intelligent and Cybersecurity News.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.threatshub.org\\\/blog\\\/anthropic-offers-20000-to-whoever-can-jailbreak-its-new-ai-safety-system\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.threatshub.org\\\/blog\\\/anthropic-offers-20000-to-whoever-can-jailbreak-its-new-ai-safety-system\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.threatshub.org\\\/blog\\\/anthropic-offers-20000-to-whoever-can-jailbreak-its-new-ai-safety-system\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.zdnet.com\\\/a\\\/img\\\/resize\\\/11778e3f6b4e21ba08719473c7fa5b7168bd7d22\\\/2025\\\/02\\\/04\\\/b5eeff24-2621-486e-a288-ce687dbd5a71\\\/gettyimages-1210077086.jpg?auto=webp&amp;precrop=2120,1191,x0,y66&amp;width=1280\",\"contentUrl\":\"https:\\\/\\\/www.zdnet.com\\\/a\\\/img\\\/resize\\\/11778e3f6b4e21ba08719473c7fa5b7168bd7d22\\\/2025\\\/02\\\/04\\\/b5eeff24-2621-486e-a288-ce687dbd5a71\\\/gettyimages-1210077086.jpg?auto=webp&amp;precrop=2120,1191,x0,y66&amp;width=1280\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.threatshub.org\\\/blog\\\/anthropic-offers-20000-to-whoever-can-jailbreak-its-new-ai-safety-system\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.threatshub.org\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Anthropic offers $20,000 to whoever can jailbreak its new AI safety system\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.threatshub.org\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/www.threatshub.org\\\/blog\\\/\",\"name\":\"ThreatsHub Cybersecurity News\",\"description\":\"%%focuskw%% Threat Intel \u2013 Threat Intel Services \u2013 CyberIntelligence \u2013 Cyber Threat Intelligence - Threat Intelligence Feeds - Threat Intelligence Reports - CyberSecurity Report \u2013 Cyber Security PDF \u2013 Cybersecurity Trends - Cloud Sandbox \u2013- Threat IntelligencePortal \u2013 Incident Response \u2013 Threat Hunting \u2013 IOC - Yara - Security Operations Center \u2013 SecurityOperation Center \u2013 Security SOC \u2013 SOC Services - Advanced Threat - Threat Detection - TargetedAttack \u2013 APT \u2013 Anti-APT \u2013 Advanced Protection \u2013 Cyber Security Services \u2013 Cybersecurity Services -Threat Intelligence Platform\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.threatshub.org\\\/blog\\\/#organization\"},\"alternateName\":\"Threatshub.org\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.threatshub.org\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.threatshub.org\\\/blog\\\/#organization\",\"name\":\"ThreatsHub.org\",\"alternateName\":\"Threatshub.org\",\"url\":\"https:\\\/\\\/www.threatshub.org\\\/blog\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.threatshub.org\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.threatshub.org\\\/blog\\\/wp-content\\\/uploads\\\/2025\\\/05\\\/Threatshub_Favicon1.jpg\",\"contentUrl\":\"https:\\\/\\\/www.threatshub.org\\\/blog\\\/wp-content\\\/uploads\\\/2025\\\/05\\\/Threatshub_Favicon1.jpg\",\"width\":432,\"height\":435,\"caption\":\"ThreatsHub.org\"},\"image\":{\"@id\":\"https:\\\/\\\/www.threatshub.org\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/x.com\\\/threatshub\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.threatshub.org\\\/blog\\\/#\\\/schema\\\/person\\\/12e0a8671ff89a863584f193e7062476\",\"name\":\"TH Author\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/066276f086d5155df79c850206a779ad368418a844da0182ce43f9cd5b506c3d?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/066276f086d5155df79c850206a779ad368418a844da0182ce43f9cd5b506c3d?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/066276f086d5155df79c850206a779ad368418a844da0182ce43f9cd5b506c3d?s=96&d=mm&r=g\",\"caption\":\"TH Author\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Anthropic offers $20,000 to whoever can jailbreak its new AI safety system 2026 | ThreatsHub Cybersecurity News","description":"ThreatsHub Cybersecurity News | ThreatsHub.org | Cloud Security & Cyber Threats Analysis Hub. 100% Free OSINT Threat Intelligent and Cybersecurity News.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.threatshub.org\/blog\/anthropic-offers-20000-to-whoever-can-jailbreak-its-new-ai-safety-system\/","og_locale":"en_US","og_type":"article","og_title":"Anthropic offers $20,000 to whoever can jailbreak its new AI safety system 2026 | ThreatsHub Cybersecurity News","og_description":"ThreatsHub Cybersecurity News | ThreatsHub.org | Cloud Security & Cyber Threats Analysis Hub. 100% Free OSINT Threat Intelligent and Cybersecurity News.","og_url":"https:\/\/www.threatshub.org\/blog\/anthropic-offers-20000-to-whoever-can-jailbreak-its-new-ai-safety-system\/","og_site_name":"ThreatsHub Cybersecurity News","article_published_time":"2025-02-06T15:54:00+00:00","og_image":[{"url":"https:\/\/www.zdnet.com\/a\/img\/resize\/11778e3f6b4e21ba08719473c7fa5b7168bd7d22\/2025\/02\/04\/b5eeff24-2621-486e-a288-ce687dbd5a71\/gettyimages-1210077086.jpg?auto=webp&amp;precrop=2120,1191,x0,y66&amp;width=1280","type":"","width":"","height":""}],"author":"TH Author","twitter_card":"summary_large_image","twitter_creator":"@threatshub","twitter_site":"@threatshub","twitter_misc":{"Written by":"TH Author","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.threatshub.org\/blog\/anthropic-offers-20000-to-whoever-can-jailbreak-its-new-ai-safety-system\/#article","isPartOf":{"@id":"https:\/\/www.threatshub.org\/blog\/anthropic-offers-20000-to-whoever-can-jailbreak-its-new-ai-safety-system\/"},"author":{"name":"TH Author","@id":"https:\/\/www.threatshub.org\/blog\/#\/schema\/person\/12e0a8671ff89a863584f193e7062476"},"headline":"Anthropic offers $20,000 to whoever can jailbreak its new AI safety system","datePublished":"2025-02-06T15:54:00+00:00","mainEntityOfPage":{"@id":"https:\/\/www.threatshub.org\/blog\/anthropic-offers-20000-to-whoever-can-jailbreak-its-new-ai-safety-system\/"},"wordCount":608,"publisher":{"@id":"https:\/\/www.threatshub.org\/blog\/#organization"},"image":{"@id":"https:\/\/www.threatshub.org\/blog\/anthropic-offers-20000-to-whoever-can-jailbreak-its-new-ai-safety-system\/#primaryimage"},"thumbnailUrl":"https:\/\/www.zdnet.com\/a\/img\/resize\/11778e3f6b4e21ba08719473c7fa5b7168bd7d22\/2025\/02\/04\/b5eeff24-2621-486e-a288-ce687dbd5a71\/gettyimages-1210077086.jpg?auto=webp&amp;precrop=2120,1191,x0,y66&amp;width=1280","articleSection":["ZDNet | Security"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.threatshub.org\/blog\/anthropic-offers-20000-to-whoever-can-jailbreak-its-new-ai-safety-system\/","url":"https:\/\/www.threatshub.org\/blog\/anthropic-offers-20000-to-whoever-can-jailbreak-its-new-ai-safety-system\/","name":"Anthropic offers $20,000 to whoever can jailbreak its new AI safety system 2026 | ThreatsHub Cybersecurity News","isPartOf":{"@id":"https:\/\/www.threatshub.org\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.threatshub.org\/blog\/anthropic-offers-20000-to-whoever-can-jailbreak-its-new-ai-safety-system\/#primaryimage"},"image":{"@id":"https:\/\/www.threatshub.org\/blog\/anthropic-offers-20000-to-whoever-can-jailbreak-its-new-ai-safety-system\/#primaryimage"},"thumbnailUrl":"https:\/\/www.zdnet.com\/a\/img\/resize\/11778e3f6b4e21ba08719473c7fa5b7168bd7d22\/2025\/02\/04\/b5eeff24-2621-486e-a288-ce687dbd5a71\/gettyimages-1210077086.jpg?auto=webp&amp;precrop=2120,1191,x0,y66&amp;width=1280","datePublished":"2025-02-06T15:54:00+00:00","description":"ThreatsHub Cybersecurity News | ThreatsHub.org | Cloud Security & Cyber Threats Analysis Hub. 100% Free OSINT Threat Intelligent and Cybersecurity News.","breadcrumb":{"@id":"https:\/\/www.threatshub.org\/blog\/anthropic-offers-20000-to-whoever-can-jailbreak-its-new-ai-safety-system\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.threatshub.org\/blog\/anthropic-offers-20000-to-whoever-can-jailbreak-its-new-ai-safety-system\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.threatshub.org\/blog\/anthropic-offers-20000-to-whoever-can-jailbreak-its-new-ai-safety-system\/#primaryimage","url":"https:\/\/www.zdnet.com\/a\/img\/resize\/11778e3f6b4e21ba08719473c7fa5b7168bd7d22\/2025\/02\/04\/b5eeff24-2621-486e-a288-ce687dbd5a71\/gettyimages-1210077086.jpg?auto=webp&amp;precrop=2120,1191,x0,y66&amp;width=1280","contentUrl":"https:\/\/www.zdnet.com\/a\/img\/resize\/11778e3f6b4e21ba08719473c7fa5b7168bd7d22\/2025\/02\/04\/b5eeff24-2621-486e-a288-ce687dbd5a71\/gettyimages-1210077086.jpg?auto=webp&amp;precrop=2120,1191,x0,y66&amp;width=1280"},{"@type":"BreadcrumbList","@id":"https:\/\/www.threatshub.org\/blog\/anthropic-offers-20000-to-whoever-can-jailbreak-its-new-ai-safety-system\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.threatshub.org\/blog\/"},{"@type":"ListItem","position":2,"name":"Anthropic offers $20,000 to whoever can jailbreak its new AI safety system"}]},{"@type":"WebSite","@id":"https:\/\/www.threatshub.org\/blog\/#website","url":"https:\/\/www.threatshub.org\/blog\/","name":"ThreatsHub Cybersecurity News","description":"%%focuskw%% Threat Intel \u2013 Threat Intel Services \u2013 CyberIntelligence \u2013 Cyber Threat Intelligence - Threat Intelligence Feeds - Threat Intelligence Reports - CyberSecurity Report \u2013 Cyber Security PDF \u2013 Cybersecurity Trends - Cloud Sandbox \u2013- Threat IntelligencePortal \u2013 Incident Response \u2013 Threat Hunting \u2013 IOC - Yara - Security Operations Center \u2013 SecurityOperation Center \u2013 Security SOC \u2013 SOC Services - Advanced Threat - Threat Detection - TargetedAttack \u2013 APT \u2013 Anti-APT \u2013 Advanced Protection \u2013 Cyber Security Services \u2013 Cybersecurity Services -Threat Intelligence Platform","publisher":{"@id":"https:\/\/www.threatshub.org\/blog\/#organization"},"alternateName":"Threatshub.org","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.threatshub.org\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.threatshub.org\/blog\/#organization","name":"ThreatsHub.org","alternateName":"Threatshub.org","url":"https:\/\/www.threatshub.org\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.threatshub.org\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.threatshub.org\/blog\/coredata\/uploads\/2025\/05\/Threatshub_Favicon1.jpg","contentUrl":"https:\/\/www.threatshub.org\/blog\/coredata\/uploads\/2025\/05\/Threatshub_Favicon1.jpg","width":432,"height":435,"caption":"ThreatsHub.org"},"image":{"@id":"https:\/\/www.threatshub.org\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/threatshub"]},{"@type":"Person","@id":"https:\/\/www.threatshub.org\/blog\/#\/schema\/person\/12e0a8671ff89a863584f193e7062476","name":"TH Author","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/066276f086d5155df79c850206a779ad368418a844da0182ce43f9cd5b506c3d?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/066276f086d5155df79c850206a779ad368418a844da0182ce43f9cd5b506c3d?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/066276f086d5155df79c850206a779ad368418a844da0182ce43f9cd5b506c3d?s=96&d=mm&r=g","caption":"TH Author"}}]}},"_links":{"self":[{"href":"https:\/\/www.threatshub.org\/blog\/wp-json\/wp\/v2\/posts\/58125","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.threatshub.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.threatshub.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.threatshub.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.threatshub.org\/blog\/wp-json\/wp\/v2\/comments?post=58125"}],"version-history":[{"count":0,"href":"https:\/\/www.threatshub.org\/blog\/wp-json\/wp\/v2\/posts\/58125\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.threatshub.org\/blog\/wp-json\/wp\/v2\/media?parent=58125"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.threatshub.org\/blog\/wp-json\/wp\/v2\/categories?post=58125"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.threatshub.org\/blog\/wp-json\/wp\/v2\/tags?post=58125"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}