Enterprise Knights of IBM Z

Enterprise Knights of IBM Z

Enterprise Knights of IBM Z

Providing insights to cyber security & resiliency on our platform

 View Only

Why use zACS? Why use zACM? A brief technical review of both.

By Mike Kasper posted 2 days ago

  

In my last blog post, I discussed how program checks can be used to detect security vulnerabilities on z/OS through program check analysis, an established method that IBM has been using internally for decades now.  So it might not surprise you that these proven techniques form the basis for IBM's publicly available z/OS system integrity scanner and monitor tools, the z/OS Authorized Code Scanner and the z/OS Authorized Code Monitor, both of which find bugs using program check analysis.

How does the z/OS Authorized Code Scanner (zACS) find so many severe security vulnerabilities?

When I was creating the internal tools that later formed the basis for zACS, the problem I was trying to solve was clear.  How can you test something without an API guide?  What if you didn't even have the source code?  Would you need to disassembled the code and figure it out, or was there another way?  IBM knew how to tell if a program check was a sign of a security vulnerability.  If an authorized service accesses storage it should not in a key it should not, or branches to a location it should not, as a result of a call from an unauthorized caller, it is a strong indication the failing instruction is violating system integrity and needs to be corrected.  So that part I already knew how to do.  I also knew that z/OS system services typically use four byte parameters, so parameter lists that consist of pointers to storage can usually be tested by providing pointers on every four byte boundary until the end of the parameter list is reached, and those pointers could point to second level parameter lists, which could point to third level parameter lists and so forth.  The possibilities are infinite.  However to perform dynamic security testing, we do not have an infinite amount of time or an infinite number of test systems, so I needed to find a way to limit the number of possibilities and test meaningful, not random, parameters.

Then it dawned on me.  One of the fields we were using to analyze program checks was the translation exception address.  However, we were only using the store or fetch indicator bits, not the rest of the address.  What if we used the rest of the translation exception address to determine if the address the program was attempting to store of fetch was the same one we supplied?  Then we could tell it was accessing a user provided address using the wrong key or by branching in the wrong state.  Taking it a step further, by providing specific addresses at specific offsets in the parameter list, we could tell which offset in a parameter list caused a failed reference.  Based on that information we could keep a record of which offsets in a parameter list were being used as addresses and which were not, and next we could provide valid addresses at the offsets that were being used in order to access the second level parameter lists based on those parameters.  The same approach could be used for the second level parameter list to determine which offsets contain pointers to third level parameters lists and so on, going as many levels deep as you want.  An analogy I like to use is the technology known as sonar.  Sound waves are sent out and the manner in which they bounce back helps map objects they bounce off.  Radar and lidar use the same concept, and now zACS uses a similar idea to map parameter lists.  It sends out addresses as tests and observes what bounces back.

Determining which parameters contains pointers and which do not greatly reduces the combinations of parameter values that need to be tested.  However, building dynamic tests for authorized services based on those results still poses practical challenges.  How can you get repeatable results and avoid randomized fuzz testing?  What about parameter lists with function codes or bit flags?  Before you determine which parameters contain pointers, any parameter could be a pointer or a flag byte, so the technique the zACS development team decided on was to set the values not being tested for at the time to a specific pattern that also points to storage zACS has obtained in private storage and set to be hidden storage, '003FF7FF'x.  This works as either a 24-bit or 31-bit address and fails predictably if used as a pointer, even pointing to the middle of a page to allow for positive or negative offsets, but it also contains a specific bit pattern for more repeatable results than random testing.  Once the pointers have been identified, zACS also tests the fields that are not pointers, to see if function codes provided in those fields result in different outcomes.  Security vulnerabilities are found by analyzing any program checks that occur during the testing process.  If an SVC or PC is being tested dynamically, and it abends attempting to read or write to caller provided storage using system key, zACS reports a potential vulnerability for that instruction.  It should use the caller's key instead.  Similarly, if it abends due to an attempt to execute an instruction at a caller provided address in an authorized state, then zACS reports a potential privilege escalation.

How does the z/OS Authorized Code Monitor (zACM) observe security vulnerabilities in real time?

While zACS allows for the dynamic testing of meaningful parameter lists for authorized services in an efficient way, as another IBMer once told me, "There is no system test like the real world."  What if we could analyze any program check in an authorized program or service, as it happens, to look for signs of a vulnerability?  The benefits are obvious, but at first we did not have any easy way to do it.  The initial internal tool I created to try to capture vulnerabilities in real time used a storage alteration trap to try capture authorized programs writing into user key storage using system key.  This posed a number of challenges and had performance impacts on the system being tested, so I moved on to program checks next.  Using an internal exit, I tried analyzing any program check that occurred under certain conditions.  Specifically, supervisor state ABEND0C1s or ABEND0C6s might mean a wild branch and system key ABEND0C4s or ABEND0E0s might reveal a bad fetch or store.  Using my internal exit I could do some additional analysis to determine when to capture a dump or produce a report to describe any potential security vulnerabilities.

However, in order to make this available to customers, like the zACS development team did for zACS, they needed an external interface to use, not just an internal slip exit.  This resulted in the creation of the ERRTYPE=INTEGMON slip:  https://www.ibm.com/docs/en/zos/3.1.0?topic=trap-slip-set-parameters#slpsetp__integmon  Customers wanting to capture diagnostic data when a program check indicates a high likelihood of a security vulnerability could now use this new interface to do so.  Which is what the zACS development team did and they called the new tool the z/OS Authorized Code Monitor, or zACM.  Now when a program check occurs on your system in an authorized program, zACM can analyze it in real time and create a potential vulnerability report or capture a system dump to record the problem as well. 

With the most recent release of zACM they also include a capability called auto slip, similar to the auto slip capability in zACS, so they even if the matchlim has been reached for the default zACM slip, it can set a new slip for any new unique problems that occur, in order to capture a dump for them the second time if they reoccur.  Another problem they have had to tackle is the problem of false positives, reporting problems that can not be exploited.  So zACM often ignores abends in authorized address spaces as a result, because only authorized code can be executed there, where as SVC and PC routines that can be called by unauthorized programs can be much more easily influenced by the calling program.  By focusing on environments that can be impacted by an unauthorized caller, zACM gives more accurate results.

In summary, while zACS is a disruptive testing tool that should never be run in a production environment, due to all the invalid parameters it passes to authorized system services and all the program checks that occur as a result, zACM is a non-disruptive monitor that could be used on any system in a production environment.  Together they use program check analysis to find security vulnerabilities revealed by asking not what happens if the faulty instruction abends, but what if it does not?

0 comments
3 views

Permalink