How one data scientist is pioneering techniques to detect security threats

Data science is an increasingly popular field of study that’s relevant to every industry. When Maria Puertas Calvo was a student, she never imagined that one day she would pioneer data science techniques to detect security threats. She started her Microsoft career on the Safety Platform team, developing algorithms to identify Microsoft accounts that send spam emails. She then worked on machine learning to detect account compromise in real-time for Microsoft accounts.

Maria now leads the data science team for security in the identity division, working on several problems: protecting users from account compromise, protecting our own infrastructure from fraud and abuse, and making sure that spammers and bots don’t create accounts that will harm people or other organizations. Her work has been so critical that her team is doubling in size, expanding from Redmond to Dublin and Atlanta.

In honor of Women’s History Month, Alex Simons, Corporate Vice President of Identity Program Management, sat down with Maria to learn more about her groundbreaking work and inspiring story. The interview has been edited for clarity and length. 

Maria Puertas Calvo leaning against a wall next to an open window.

Alex: Maria, how did you get into engineering?

Maria: I am originally from Madrid, Spain. I was always interested in math and science. Since I was little, I was always the straight-A student—really in love with numbers. When I started studying physics in high school, I knew that I wanted to have a career in science, doing something innovative.

In Spain, you don’t really leave for college. You go to class during the day, and then you go back to your parents’ home at night. A really good university was only 10 minutes away from my house. It wasn’t really engineering-focused, but it did have one technical school that offered electrical engineering and computer science.

Alex: What prompted you to make the transition from pure electrical engineering into something more forensics-focused and then eventually data science-focused?

Maria: It was all pretty accidental, not something I had planned. I did really well in college—I was first in my class. And I finished in 2010, in the middle of the Great Recession, which hit the Spanish labor market horribly. At that time, the unemployment rate was 25 percent. The lucky ones, like people in engineering, were getting job offers. But when you’re in technology, the only options in Spain are to work for a consulting company or to do support or sales. There weren’t any entry-level jobs in research and development.

So, I started a master’s with a group doing research on biometrics. The master’s was also in computer science and very related to artificial intelligence and a lot of interconnected fields like multimedia signal processing, computer vision, and natural language processing. I did my thesis on statistics around forensic fingerprints, and the probability of a random match between a latent fingerprint found at a crime scene and a random person that could have been wrongly convicted of that crime.

Alex: So what is the likelihood of that happening?

Maria: It’s really, really low. But it’s not zero.

Alex: Okay, that’s totally fascinating! I recall that you actually did your graduate work here in the United States, right?

Maria: When I finished my master’s, I was dating my now-husband, who is American. And I did not want to finish the whole PhD. I wanted to go work in the industry because I had been a student for seven years already, plus the rest of my life before that. I also wanted to start a life, and not do long distance between Madrid and Washington, D.C. So, I took advantage of my scholarship to become a visiting researcher at the University of Maryland working on iris recognition biometrics. I ended up staying about nine months.

Alex: Oh, wow. My father actually got his PhD from the University of Maryland—I was born there. Small world! So then, tell us how you ended up at Microsoft.

Maria: I didn’t find academia very rewarding. So, I said, “Let’s go find an industry job.” I was living in the U.S. with my fiancé on a student visa. And in the D.C. area, most technology companies are government contractors that require security clearances, which only American citizens can get. I also didn’t have any industry experience whatsoever.

I spent a couple months applying for jobs and never getting called back. Then out of the blue, I got a LinkedIn message from a Microsoft recruiter saying, “Hey, I have a role for a data scientist on the Safety Platform team in Seattle.” And I was like, “Seattle, no, no way.” All I knew about Seattle is that it rains a lot, and it’s on the other side of the country. And it’s so far away from Spain. But my husband said, “Hey, you got an interview, go do it, see how it goes, you’ll get to practice.”

Alex: In the Safety Platform team, you did innovative work that has been the underpinnings of the success we’ve had so far. Tell us a little more.

Maria: I worked on machine learning to detect compromises in real-time for Microsoft accounts. When a user signed into their Outlook.com account or Xbox account, or any service Microsoft offers, we would run a machine learning (ML) model to determine if that sign-in was legitimate, or if some hacker had gotten the user’s password. I was the data scientist working on training the model and improving its accuracy. Although I stopped working on it a while ago, we’re still monitoring it and it’s still working really well.

Alex: At the time, nobody else in the industry was succeeding at this kind of work. Today, there’s a whole industry around user entity behavioral analytics, but you were one of the first in the world to do it! I love the work your team does, Maria. I feel like you are superheroes defending the world, but I don’t know that our customers have a great view into the kind of magic you do.

Maria: Thanks, Alex. We use analytics and machine learning to detect bad activity in the identity ecosystem. Our goal is to protect our customers from fraud and account compromise using advanced AI techniques and data science. We come up with ways to detect malicious attacks, fraud, or account compromises among all the security data that we examine at Microsoft, which is hundreds of terabytes every day. It’s a complicated problem, but it’s really cool.

Alex: Some of your most innovative work recently was around password spray detection. Can you tell us how having data from multiple customers enables you to detect things maybe no one else could?

Maria: In a password spray attack, a person grabs a list of email addresses that they find or collect from a breach. And then they try a few common passwords against all those email addresses—against thousands of users from thousands of organizations.

We knew these attacks were happening to our customers, but we wanted to detect them really accurately. These attacks are spread across tons of different IP addresses and countries and proxies, so it’s not easy to isolate the sign-ins that are part of an actual attack. We created a detection rule that says, “Okay, we’re seeing IP addresses that have a lot of failed sign-ins that all are using the same password.” And then we see these attempts move to a different password.

Doing that, we can isolate a possible attack, but there’s also a lot of noise. So, one of the awesome data scientists in my team, Sergio, added a layer of artificial intelligence and trained a model with known attacks. He improved the accuracy of the initial heuristic by 50 percent.

Alex: This is such an important key set of capabilities for us. I’m really excited we’re growing your team. Switching gears, another thing that you and I share in common is the joy of being the parent of twins. My twins are obviously a lot older than yours, but tell us how you manage the challenge of, “Hey, I want to be a family person and I also want to be a real leader in the industry.”

Maria: I’m still figuring it out! It’s definitely a lot of work and you have to learn to better manage your time in order to be successful at both jobs. Luckily, Microsoft offers great parental leave that both my husband and I could enjoy (he’s an engineer in Azure). I have an amazing team and they kept things running really smoothly in my absence. Also, working remotely because of the pandemic has made things easier for me. I can check on my kids in between meetings, and not having a commute gives me more time to be with my family.

Alex: The pandemic has definitely accelerated the move to remote work. As a parent, I think it’s a beautiful thing to be able to manage your time and not have to worry about your location.

So, Maria, one last question. Let’s say I’m a college student, or maybe I’m working on my master’s or PhD in data science and AI. If I wanted to come work on your team, what would you suggest I work on or think about to prepare for that kind of job?

Maria: One good approach is to find an internship that has some connection between doing data science and security and fraud, even if it’s just loosely related. Right now, there are so many people studying data science and machine learning, so specializing and really understanding the world of the cloud and cybersecurity is important. There are tons of resources just to learn about it, such as specific courses in cybersecurity. An internship doesn’t have to be specific to hardcore security—it could be fraud, like finance fraud. Anything in an adversarial-type world where you’re trying to catch bad things happening would be useful. Internships are easier to get without any kind of specialization but having done an internship gives you a foot in the door for a full-time position that specializes in cybersecurity. In addition, Microsoft Security sponsors many cybersecurity educational programs and I would encourage women in high school to investigate these.

Learn more

To learn more about Microsoft Security solutions visit our website. Bookmark the Security blog to keep up with our expert coverage on security matters. Also, follow us at @MSFTSecurity for the latest news and updates on cybersecurity.

Security Unlocked podcast icon displaying illustration of lock with microphone inside

Listen to the Security Unlocked podcast

To learn more about applying data science to cybersecurity, listen to Maria’s guest appearance on the episode Identity Threats, Tokens, and Tacos.

Listen now

READ MORE HERE