How to deploy AI safely

May 29, 2025 TH Author

In this blog you will hear directly from Corporate Vice President and Deputy Chief Information Security Officer (CISO) for AI, Yonatan Zunger, about how to build a plan to deploy AI safely. This blog is part of a new ongoing series where our Deputy CISOs share their thoughts on what is most important in their respective domains. In this series you will get practical advice, forward-looking commentary on where the industry is going, things you should stop doing, and more.

How do you deploy AI safely?

As Microsoft’s Deputy CISO for AI, my day job is to think about all the things that can go wrong with AI. This, as you can imagine, is a pretty long list. But despite that, we’ve been able to successfully develop, deploy, and use a wide range of generative AI products in the past few years, and see significant real value from them. If you’re reading this, you’ve likely been asked to do something similar in your own organization—to develop AI systems for your own use, or deploy ones that you’ve acquired from others. I’d like to share some of the most important ways that we think about prospective deployments, ensure we understand the risks, and have confidence that we have the right management plan in place.

This is way more than can fit into a single blog, so this post is just the introduction to a much wider set of resources. In this post, I’ll articulate the basic principles we use in our thinking. These principles are meant to be applicable far beyond Microsoft, and indeed most of them scope far beyond AI—they’re really methods for safely adopting any new technology. Because principles on their own can be abstract, I’m releasing this with a companion video in which I work through a detailed example, taking a hypothetical new AI app (a tool to help loan officers do their jobs at a bank) through this entire analysis process to see how it works.

We have even deeper resources coming soon intended to help teams and decision makers innovate safely that will build on this content. Meanwhile, if you want to learn about how Microsoft applies these ideas to safe AI deployment in more detail, you can learn about the various policies, processes, frameworks, and toolboxes we built for our own use on our Responsible AI site.

Basic principles

What does “deploying safely” mean? It doesn’t mean that nothing can go wrong; things can always go wrong. In a safe deployment, you understand as many of the things that can go wrong as possible and have a plan for them that gives you confidence that a failure won’t turn into a major incident, and you know that if a completely unexpected problem arises, you’re ready to respond to that as well.

It also means that you haven’t limited yourself to very specific kinds of problems, like security breaches or network failures, but are just as prepared for privacy failures, or people using the system in an unexpected way, or organizational impacts. After all, there’s no surer guarantee of disaster than a security team saying “that sounds like a privacy problem” while the privacy team says “that sounds like a security problem” and neither team dealing with it. As builders of systems, we need to think about the ways in which our systems might fail, and plan for all of those, where “the systems” includes not just the individual bits of software, but the entire integrated system that they’re a part of—including the people who use them and how they’re used.

These ideas probably sound familiar, because they’re the basics we learned at the start of our careers, and are the same concepts that underlie everything from the National Institute of Standards and Technology (NIST) Risk Management Framework to Site Reliability Engineering. If I had to state them as briefly as possible, the basic rules would be:

Understand the things that might go wrong in your system, and for each of those things, have a plan. A “plan” could mean anything from changing how the system works to reduce the impact of a risk, to making the failure of some component no big deal because the larger system compensates for it, to simply knowing that you’ll be able to detect it and have the flexibility and tools to respond when it does.
Analyze the entire system, including the humans, for any type of thing that could go wrong. Your “system” means the entire business process that uses it, including the people, and “things that might go wrong” includes anything that could end up with you having to respond to it, whether it’s a security breach or your system ending up on the front page of the paper for all the wrong reasons.
- Tip: Whether you’re using AI software that you bought or building your own systems, you’re always the builder of your own business processes. Apply your safety thinking to the complete end-to-end process either way.
You think about what could go wrong from the day you get the idea for the project and do it continuously until the day it shuts down. Planning for failure isn’t an “exercise”; it’s the parallel partner to designing the features of your system. Just as you update your vision of how the system should work every time you find a new use case or see customer needs changing, you update your vision of how the system might fail whenever it or the situation changes.

You implement these three principles through a fourth one:

Make a written safety plan: a discussion of these various risks and your plan for each. Don’t forget to include a brief description of what the system is and what problem it’s meant to solve, or the plan will be illegible to future readers, including yourself.

If your role is to review systems and make sure they’re safe to deploy, that safety plan is the thing you should look at, and the question you need to ask is whether that plan covers all the things that might go wrong (including “how we’ll handle surprises”) and if the proposed solutions make sense. If you need to review many systems, as CISOs so often do, you’ll want your team to create standard forms, tools, and processes for these plans—that is, a governance standard, like Microsoft does for Responsible AI.

These first four rules aren’t specific to AI at all; these are general principles of safety engineering, and you could apply them to anything from ordinary cloud software deployments to planning a school field trip. The hard part that we’ll cover in later materials is how best to identify the way things could go wrong (including when radically new technologies are involved) and build mitigation plans for them. The second rule will repeatedly prove to be the most important, as problems in one component are very often solved by changes in another component—and that includes the people.

AI-specific principles

When it comes to building AI systems, we’ve uncovered a few rules that are exceptionally useful. The most important thing we’ve learned is that error is an intrinsic part of how AI works; problems like hallucination or prompt injection are inherent, and if you need a system that deterministically gives the right answer all the time, AI is probably not the right tool for the job. However, you already know how to build reliable systems out of components that routinely err: they’re called “people” and we’ve been building systems out of them for millennia.

The possible errors that can happen in any analysis, recommendation, or decision-making step (human or AI) are:

Garbage in, garbage out, also known as GIGO—if the input data is bad, the output will be, too.
Misinterpreted data—if the data provided doesn’t have exactly the same meaning as the analysis expects, situations where they differ can cause subtle but dangerous errors. For example, if analysis of a loan applicant received a number it thought was “mean duration of continuous employment over the past five years,” but was actually receiving “mean duration of each job over the past five years,” it would produce extremely wrong results for consultants and other people who stack short-term jobs.
Hallucination, also known as false positives—the analysis introduces information not supported by the grounding data.
Omission, also known as false negatives—the analysis leaves out some critical caveat or context that changes the meaning of the data.
Unexpected preferences—every summary or recommendation chooses some aspects of the input to emphasize and prioritize over others (that’s the whole point of a summary); are the factors it prioritizes the ones you wanted it to?

We can combine these to add some AI-specific rules:

Reason about the safety of AI components by imagining “what would happen if I replaced the AI with a room full of well-intentioned but inexperienced new hires?” Don’t think of the AI like a senior person—think of it like a new hire fresh out of school, enthusiastic, intelligent, ready to help, and occasionally dumb as a rock. Build safety into your process by considering what you’d do for humans in the same place—for example, having multiple sets of (AI or human) eyes on key decisions. This doesn’t mean “human in the loop” at every stage; instead, find the moments where it would make sense for more experienced eyes to step in and check before proceeding.
Expect testing to take much more of your time, and coding to take less of your time, than with traditional software. It’s very easy to build AI software that works right in the two cases that you thought of, but much harder to make sure it works right when real user inputs are involved. Build extensive libraries of test cases, including intended uses, things confused users might do, and things threat actors might do; the line between functionality and security testing will be fuzzy. In general, you should think of AI development as a “prototype-break-fix” cycle rather than a “develop-test-ship” cycle.

And some more rules that apply to any analysis, recommendation, or decision-making stage, whether it’s human or AI. (This similarity goes to the heart of Rule 5; it shouldn’t be surprising that humans and AI require similar mitigations!)

Accompany decision-making criteria with a suite of test cases, validated by having multiple people evaluate the test cases per the criteria and tweaking the criteria until they agree with each other and your intent. This is a good way to make sure that your written criteria (whether they be guidelines for human raters or AI metaprompts) are understood in line with your intentions. It’s a good idea for the policy writers to provide a bunch of the test cases themselves, because things get lost in translation even between them and the engineering team; you can also use AI to help extend a list of test cases, then manually decide what the expected outputs for each should be. Having multiple reviewers independently decide on expectations is a good way to detect when your intentions weren’t clear even to yourself.
Monitor and cross-check decision making. Send some random subset of decisions to multiple (human or AI) reviewers in parallel and monitor inter-rater agreement as a way of measuring if the stated criteria are clear enough to produce consistent answers. Automatically escalate disagreements, as well as “high impact” cases (for example, large-value bank loan decisions) to more experienced people. Simultaneously, log carefully and monitor for the “revealed preferences” of your decision system, to ensure that they align with your intended preferences.
Present information carefully. Whenever information transits a people boundary—whether this is AI outputs being presented to a human for a decision, or data collected by one team flowing to analysis run by another team—you have a high risk of misinterpretation. Invest heavily here in clarity: in very sharp and rigorous API definition if it’s machine-to-machine, or in extremely clear user experience if it’s machine-to-human. After all, if you’re running an expensive AI decision to help people and then the information is lost in translation, you aren’t getting any value out of it at all—and people will blame AI for the resulting human errors.

The deepest insight is: novel technologies like AI don’t fundamentally change the way we design for safety; they simply call on us to go back to basic principles and execute on them well.

Learn more

You can find a detailed worked example of how to apply these ideas in this video. You can learn more about our responsible AI practices at our Responsible AI site, and about our best practices for avoiding overreliance in our Overreliance Framework.

To learn more about Microsoft Security solutions, visit our website. Bookmark the Security blog to keep up with our expert coverage on security matters. Also, follow us on LinkedIn (Microsoft Security) and X (@MSFTSecurity) for the latest news and updates on cybersecurity.