Nathalie Baracaldo leads the AI Security and Privacy Solutions team and is a Research Staff Member at IBM’s Almaden Research Center in San Jose, CA.We spoke about her career journey, current passions, and advice for prospective data science and AI practitioners. (This transcript has been edited for brevity and clarity.)
Q: For starters, can you just tell us the highlights story of your career and how it's evolved over time?
I wasn't sure if I was going to be more into finance or more into information security. It turned out that after a while I definitely understood that security was one of those things that Ifeelpassionate about.
I have a master's, andtwo undergrad degrees. For the master's, I decided to focus on security. After a while, I joined a PhD program at the University of Pittsburgh where my whole focus was on insider attack detection and prevention.This is basically core security with a lot of machine learning models and so forth.
And it turned out that there was a possibility to do an internship at IBM's Almaden Research Center. So I worked here for a while as an intern. The project was initially on cryptography, and I liked it so much here that I came back for another summer internship in a different topic with the team that I'm working on right now, this time around with cloud security.
So I was jumping from topic to topic, but the overall scope has been security. And finally, when I graduated, they offered up a position as a Research Staff Member. I started exploring different projects, and one of the projects that I proposed was on adversarial machine learning.This is basically when you have a bad actor trying to manipulate your machine learning models for their nefarious purpose - so basically to achieve certain misclassification or to reduce the accuracy of the models. The other project, which is also one of my main focuses right now, is about federated learning.
That's how I became a manager. I started the project and the project kept growing and growing until I had my own group to help execute on the vision.
Q: Can you share a real world example of how someone might use adversarial machine learning?
Sure. So to have machine learning models, you have to start with data. If you don't have data, you don't really have a good machine learning model. My primary focus is on poisoning attacks. Basically an adversary starts manipulating the training data that the targeted person or organization is going to use to create a machine learning model.
There are two potential ways they may want to attack the model. One of them is what we call a back door in which basically the manipulation of the training data are such that it's fine if you tested with expected samples, but if you tested with an adversary, then you would trigger on these classifications. So that one is particularly worrisome because you don't want to have a model deployed in the real world, and then have somebody creating misclassifications at their will. You can think about critical infrastructure, critical applications and what that happen with misclassification. One of the things that we have been exploring are ways to defend against that type of attack.
The other one is what is called a performance degradation attack. In that one, for instance, if you have an IOT environment, then you just draw collecting data from different places and sources, and you may not necessarily trust those data and somebody may have bad intentions and may manipulate the data that you're collecting. The adversary in that case may want to reduce the model performance or accuracy or precision so that you cannot deploy or use your model effectively in real life.
Q: What about your interest in federated learning? How did that come about?
With federated learning you basically try to train machine learning models without getting access to the training data. Throughout my PhD thesis, I did a lot of machine learning models and risk awareness systems to detect insider attacks. Some of the comments that I got when I presented that work was, "What about privacy?" I think that's one of the reasons why right now I'm working on privacy and trying to really have the best of both worlds where you have the machine learning model, so that you can use it for good purposes, well-sealed, preventing privacy leakage.
Q: What excites you about the fields in which you're working?
Endless possibilities. In machine learning in general, you are going to see a lot of focus on methods and algorithms that really require you to know the overall distribution of the data.This means having access to the entire data set. Now when you think about federated learning, that very basic assumption is completely broken, and this means that a whole bunch of possibilities to create new algorithms, to create new stuff, comes up. We are at a point where we can create new technology that really excites me and makes me want to work in this area.
The same happens with the adversarial machine learning work. We are trying to basically break some of the underlying assumptions. The other interesting aspect is, in my team, I have people from very different backgrounds just because of the nature of what we do. So we have experts in machine learning and other people are experts in differential privacy and security. Other people are really into systems. And this intersection between topics makes it very, very interesting to work on this. There are a lot of challenges that if you just look into a single aspect of the entire spectrum, they're not going to see. I love really working with people and collaborating, andI see a lot of possibilities for how this can change the landscape for machine learning applications.
Q: How would you describe the role IBM in playing in this space?
I think we have a huge opportunity here because IBM has been working a lot with companies. So I think we understand their requirements and what needs to be done differently. If you look at other companies that have been working in this area, federated learning, for instance, they have a huge focus on just smart phone usage. That is really interesting, and I think they are doing great work, but once you start looking at companies at the enterprise level, then, some of the assumptions that we need to make and some of the solutions are very, very different. And I think IBM has the expertise to make sure this is done for the industry. With four different companies, we are working in that area heavily to try to address those use cases that are very different from smart phones and fuller IOT devices. Also, I think IBM Research and expertise is something that we bring to the table as well.
Q: What advice if any would you offer to people at the start of their career journey with machine learning?
I have three pieces of advice. The first one is to follow your passion. Ifyou don't believe in what you're doing, you're going to do a poor job. It needs to be something that motivates you and you feel like you really want to go to work to do it. And if you find that what you're doing is gratifying, you'll put the hours on, and it's not going to feel like work.
The second thing is,when in doubt, try it. So, as I mentioned, I have two bachelor's degrees, one in industrial engineering and the other one in computer science. There were these points while I was a student where I really wasn't sure where I was going. And the thing that really made me change my mind was an internship I had at a bank that really let me touch upon both security and also finance. After that, I understood that I really didn't want to go work in the stock market. I just found that the security part was more interesting to me. So I think it's just looking for opportunities that allowed you to validate whether you like something or not.
And the third one is to communicate to people around you what you want. I think if I hadn't mentioned that I wanted to be a manager, probably I wouldn't have been given the opportunity.
What are some ways you have orchestrated similar data science activities in your communities? Marcelo’s contributions exemplify the mission of IBM Community: to nurture users, practitioners and enthusiasts in a peer-to-peer fashion, with the guidance of mentors and experts from around the world. Nominate yourself or someone you know to be featured in our Data Science Community Spotlight!
Check out Nathalie's other articles:
#GlobalAIandDataScience#GlobalDataScience#memberspotlight