This is the second of a two-part series discussion DevSecOps and Development. In Part 1 I covered how the global application security market is growing rapidly and that there is a lot of momentum around involving software developers in security efforts earlier in the software development lifecycle. In addition, there is more responsibility being shared by developers as the testing of production applications for quality, performance and outcomes increases. I then discussed how having a culture that supports these efforts in paramount to success and that leadership and trust are core critical components to that success.
In this part I will take a closer look at specific ways to help build trust and better leverage developers as part of an effective application security strategy and I’ll start with how context makes all the difference.
Context is Everything
Have you ever noticed that the right small piece of information at just the right time can make all the difference in your experience? For example, recently I had to take an early flight that left the airport at 8 am. Some of you might already be cringing just thinking of that. For them, the real problem with it was that I live over an hour away from the airport in a part of the country where traffic is very unpredictable. Add while there have been some nice improvements at airports for (see the TSA wait times app) to this that you never really know how a TSA security line with go, I figured I needed to be at the airport and in that line by 7:15 am. I had worked backwards and figured out when I needed to get up and leave to make that work, and even tried to account for possible traffic on the freeway. All worked ok – there was a little more traffic than expected but I got my car parked at the car lot and got on a shuttle. All good right? Well that’s when the fun started…
First, my shuttle driver informed me that he could not drop me off at my terminal because the drop of place for shuttles had not be prepared yet (they recently moved spots from an upper level to lower level). That meant I needed to walk from one terminal to the other, so add a few minutes to the trip travel time. Next, once I get through security and I think I am in the clear, I get to what I think is my gate only to find out it is really just one of those access areas where you have to get on a bus and be taken to the other side of the airport to an express terminal. At this point I was really glad I decided NOT to get in that line at a coffee shop.
The good news is I made my flight (you were all worried right?) but imagine the difference in my planning and the state I arrive at the gate in if I know those two little details ahead of time instead of finding them out on the way. In fairness to the airline my gate WAS listed clearly, but just having a gate number and letter combination didn’t register with me that it was a satellite terminal. Most of us don’t really look to locate our gate until we are at the airport. Knowing the shuttle schedule change and the terminal location might have caused me to just get up and out a little earlier, and be able to get in that coffee line, instead of being rushed trying to get to the gate on time.
And in essence, software developers want the same thing. When they are coding up new capabilities and wanting to know if it “works” and is “ok” they want to have the right information in the right context at the right time. This means that they when they commit changes, they would really like to get quick feedback on how those changes really perform – first in their local environment, and then in production so that if there are problems, they are able to deal with them while things are still fresh in their mind.
Imagine the scenario where you are that developer. You spend a couple of hours working on a new feature to add payment processing using a touch ID (thumbprint) authentication for a mobile application. You get to a place where you are pretty happy with the code and it seems to work. You run some tests against it and there are no failures. You verify the test results and decide commit the change. Once that is done, you move on to your next task and start coding for that. A few weeks go by and one morning you get an email from your defect tracking system telling you that you have a security bug filed against the payment processing code you committed two weeks ago. Question: what is the first thing you do? If you like many, the first thing you try to do is reproduce the defect because you need to know if it is a real problem or not – but what was your environment like two weeks ago? Is your machine even the same? Has it been patched? Upgraded? Modified? New libraries installed? And so on… the myriad number of variables can make this effort complicated – and time consuming.
Now imagine that same scenario but this time, before you commit the change, you are able to run a simple source code scan and you uncover a security vulnerability. You are able to quickly look up how to remedy it and you are able to make the change, scan again and see clean results. You then commit the change, and it is immediately put into a integration build and deployed into an environment that exactly mirrors your production environment. And as soon as it is deployed, full test suites are run against it – including security scans. The next morning when you arrive at work, you have an email waiting in your inbox with the results from that build, the tests run, and the scan results. Imagine how much easier it is to resolve issues in this model versus the first.
The moral of the story? Get the right information in the right context at the right time and to the right people who can do something about it. Now let’s look at one other big idea – in two parts.
Learn to Fail Fast …
It wasn’t that long ago where software development release cycles were measured in years for a major release, and the releases themselves were a major undertaking. I can recall days past where you would have to get an entire suite of offerings spread across multiple CDs so that you could actually load the 1 or 2 that you needed to. Everything was tied together and had to be released together. What that meant in the development side was that teams were very dependent on one another and changes had to be tightly coordinated. And for that reason, many teams worked in isolation so that they could focus on their offering and their changes and NOT be adversely impacted by the unplanned behavior from changes from another team. It sounded good in theory but – and those of you that developed like that know – there was one small problem. It made integrating changes from these teams together a real challenge. If even just one team was using a different set of libraries, a different platform, etc… it meant a lot of last minute re-work, which meant real risk to the release schedule.
And so we started to learn that waiting until the end of release cycle to try and incorporate a lot of changes at once was not as great an idea as first thought. So now in the world of Agile and DevOps, we better understand the notion of incorporating smaller change sets with greater frequency and that has the benefit of actually lowering our risk. If you want to see this expressed more mathematically, check out this video from Maciej Zawadzki
So we have learned to incorporate change faster, but to really get ahead today we have to take it one step further. We need to learn how to fail faster.
It’s been said by many over the years, in different ways that we always learn more from our failures then our successes – as evidenced by these quotes gathered together by Forbes. This notion is true for business as well. We must be able to rigorously test, re-test and continuously validate our applications to elicit the kind of meaningful feedback we need to make the kind of changes that make a real impact. For instance, recall a few years back when Samsung introduced the concept of tapping your phone against someone else’s to share something? It looked great in commercials – but in the marketplace, there just aren’t that many people interesting in smacking their phone against someone else’s and now, that feature was replaced by one called NFC (Near Field Communication) that uses more of a back-to-back model for the devices. Think of the time, energy, money and manpower invested to originally bring that capability to market – and the subsequent efforts to modify it to where it is today. Now think of what it would have meant to know in advance that the first model wasn’t going to work. That is the difference in failing FAST versus just failing.
So learn to run meaningful tests sooner in your development lifecycle – and especially include security tests in this because the concern over cybersecurity is real and people want applications they can trust. Fail fast by finding your vulnerabilities before your consumers do.
… But Don’t Play the Blame Game
We have all been in that situation. You know, the one where something went wrong in production, it was pretty serious and everyone was scrambling around working to get things back to normal to figure out what happened, and now we are sitting in a conference room trying to do a post-mortem. And in these we can subtly often make a critical mistake that just only ends up delaying things further – we spend too much time trying to figure out “who” instead of “what”. Who checked in that bad code? Who approved that change? Was it even approved in the first place? Who applied that update? Who forgot to patch that system? And so on. We make the assumption that if we can find out who broke something then that is the key to knowing how to prevent it from happening again. But if you have ever tried fixing things around your home or car you know that sometimes you just have to get help from an expert. And when you go to talk to an expert in those areas, what is the first thing they usually ask – “What’s the problem”? Have you ever gone to your mechanic, told them you had a problem with your car making a noise and they turned around and scolded you asking if you drove it too fast or if you might have turned the steering wheel too hard? Of course not. Instead they ask you questions about the problem and your experience so that they can get an idea of how best to help you.
Now I am not saying that at some point there isn’t value in getting to the “who” in these situations. We do want to be able to help people learn and grow and avoid future mistakes. What I am saying is that we need to spend our time and effort focusing in on the “what” and “why” around fixing the problem and not on affixing blame on the “who”. Especially in the world of security vulnerabilities, where there are many times where people don’t even know that what they have committed was vulnerable in the first place. Can you say “zero-day exploits”?
Sometimes our processes themselves are flawed. It may be that we aren’t testing early enough, often enough or testing the right things. Or it may be that we have not equipped people well enough. For instance – do your developers have the ability to run tests against changed code in an environment that mirrors production? If not then your tests run in that development environment are really just comparing apples to oranges and you can’t really have confidence they will work as expected in production. Or sometimes people just make honest mistakes and we need to have processes that can account for an accommodate it. I remember years ago when I was early in my career doing configuration management (version control). One of our servers was getting pretty full and I was asked to clean out disk space (some of you already know where this is going because you’ve been there). Well I found an area near the top level of the machine that was pretty full. At the time we were using Rational ClearCase on Unix operating systems and I thought I was looking at an old copy of a source code tree. So, because it appeared old to me I went to the top of that folder and did an “rm -rf *” on it to remove it. About 5 minutes later, my current supervisor came into the office trying to figure out what happened to our source code! What I didn’t realize was that older source code tree I was looking at was actually a mount point connected back to our main server and I was inadvertently removing it there. Big oops. Fortunately we caught it quickly, had great backups and we were able to restore things fast. But this was a classic case of the right guy having the right access doing the wrong thing because they just didn’t know the full story. When you translate that into today’s world of cybersecurity where Forbes found that nearly half of all companies are delaying cloud deployments because of cybersecurity concerns and Frost & Sullivan found there is an expected shortage of nearly 2 million jobs by 2022, it is easy to realize that people are going to make some honest mistakes. So instead of figuring out clever ways for who is going to get the local office item given when things break, invest that time in facilitating an environment where learning and growth can occur so your developers are free to go after innovation.
I sincerely hope that you have enjoyed this two-part series and that it has proven beneficial. For further information on application security and the difference it makes, see this case study, visit the IBM Marketplace, or just try it out for yourself! And if you want to learn more about the impact that cognitive capabilities can have on application security efforts, download our complimentary Ponemon Institute study.