IBM Z and LinuxONE - IBM Z

IBM Z

The enterprise platform for mission-critical applications brings next-level data privacy, security, and resiliency to your hybrid multicloud.

 View Only

Enabling Go disassembler support (GNU Syntax) on s390x

By Pokala Srinivas posted Tue August 27, 2024 09:40 AM

  

Authors: @Pokala Srinivas & @Vishwanatha H.D

Requirement:

s390x (IBM-Z) machines didn't have the disassembler support to disassemble Go binaries. Debugging the binaries used to be impossible without the support of the disassemblers.

We worked on this requirement and enabled the support for the same. This will help all the Go developers working on s390x architecture.

Introduction:

A disassembler is a computer program that translates binaries or machine language into assembly language. The Disassembly, the output of a disassembler, is formatted to print the instructions in a human readable format.
Advantages of disassemblers:
1) Helps the developers to analyse/debug the binary executable (exe) or the object code that the compiler generates. Helps in the compiler optimizations as well.
2) Helps to recover the assembly instructions/program of the original source code.
3) Helps in Malware analysis and reverse engineering etc...

Stages of generating Go disassembler output:

There are mainly 4 stages while generating a Go disassembler output.
 
Stage #1: Parse the s390x "Principles_of_Operation_Z_Arch.pdf" document and generate a CSV file which will have complete s390x instruction set.
Stage #2: Construct s390x opcode map in the form of map tables from the instruction set CSV file (which is generated in stage #1).
Stage #3: Parse the byte data from the Go binary file to be disassembled and decode an instruction opcode and its arguments with the help of the s390x map table.
Stage #4: Print the decoded instruction either on the console or to any redirected file. Instructions are printed in 2 forms.
a) GNU Syntax: This is the native (AT&T) Go assembly instruction syntax.
b) Go Syntax: This is the pseudo Go assembly instruction (plan9) syntax.

In the first phase of development, our Go disassembler output is formatted to print the assembly output as per GNU (AT&T) syntax. In the 2nd phase, we are working on printing the assembly output as per Go (pseudo assembly) syntax as well.

Detailed Block Diagram:

Detailed Insight about each stages:

Stage #1:

“Principles_of_Operation_Z_Arch.pdf” file is parsed to generate “s390x.csv”, an instruction set CSV file.
 
Repo cloning, code flow and directory structure:
  • An “arch” repo i.e. ”https://go.googlesource.com/arch” is cloned and a support for “s390x” arch is defined inside it.
  • A new package “s390xspec” is created inside “arch/s390x” directory to support the entire stage #1 functionality.
  • A “spec.go” file inside “arch/s390x/s390xspec/” directory will contain the code to parse the pdf and generate an instruction set CSV file i.e. “s390x.csv”.
  • A successful compilation of the “spec.go” file will result in “s390xspec” binary.
  • “s390xspec” binary is run to parse the z-ISA pdf file and to generate “s390x.csv” file.
Commands:
  • go build -o s390xspec spec.go
  • ./s390xspec Principles_of_Operation_Z_Arch.pdf > s390x.csv
Each line of “s390x.csv” file contains following three fields:
 
  • // instruction

An instruction opcode string, such as "ADD (64) or ADD (32) or BRANCH AND LINK“ etc…

  • // mnemonic

 An instruction mnemonic, such as "AG R1,D2(X2,B2)".

  • // encoding

An instruction encoding i.e. sequence of opcode and operands encoded in respective bit positions such as "operand@bitposition", each separated by “|” character.

For eg: "47368@0|0@16|R1@24|R2@28|//@32"

“s390x.csv” file contents:
 
"ADD (32)","A R1,D2(X2,B2)","90@0|R1@8|X2@12|B2@16|D2@20|//@32",
"ADD (32)","AR R1,R2","26@0|R1@8|R2@12|//@16",
"ADD (64)","AG R1,D2(X2,B2)","227@0|R1@8|X2@12|B2@16|D2@20|8@40|//@48",
"BRANCH AND LINK","BAL R1,D2(X2,B2)","69@0|R1@8|X2@12|B2@16|D2@20|//@32",
"BRANCH AND LINK","BALR R1,R2","5@0|R1@8|R2@12|//@16",
"COMPARE AND SIGNAL (short BFP)","KEBR R1,R2","45832@0|0@16|R1@24|R2@28|//@32",
"COMPARE AND SWAP (32)","CS R1,R3,D2(B2)","186@0|R1@8|R3@12|B2@16|D2@20|//@32",

Flow chart Diagram:

Stage #2:

Construct s390x opcode map to form s390x map tables using the instruction set CSV file, generated from stage #1.
 
Code flow and directory structure:
  • A new package “s390xmap” is created inside “arch/s390x” directory.
  • A “map.go” file inside “arch/s390x/s390xmap/” directory will contain the code to parse “s390x.csv” file and construct an opcode map table.
  • A successful compilation of the “map.go” file will result in “s390xmap” binary.
  • “s390xmap” binary is run to read the “s390x.csv” file and to generate s390x opcode map table, in the form of “tables.go” file.
  • “s390xasm” package is created inside “arch/s390x/” directory and “tables.go” file is placed inside the “s390xasm” package.
 
Commands:
  • go build -o s390xmap map.go
  • ./s390xmap -fmt=decoder s390x.csv > arch/s390x/s390xasm/tables.go
Note: “decoder” is a format to print decoded map tables in the form of "tables.go" file.

Each line of opcode map table in “tables.go” file contains four fields. It will have all the necessary decoding information and rules to decode a specific instruction form.
 
  • // opcode mnemonic
 An instruction mnemonic, such as "AG R1,D2(X2,B2)".
  • // Mask
64-bit Mask value indicating a match-rule for fetching the corresponding instruction opcode.
  • // Value
64-bit Opcode value corresponding to the instruction.
  • // argument fields
Contains the argument details in the same form and structure as instruction manual. It's an array of “argField” structure which indicates how to decode an argument to an instruction.

Opcode map table contents in “tables.go” file:
 
{ARK, 0xffff000000000000, 0xb9f8000000000000, // ADD (32) (ARK R1,R2,R3)
          [7]*argField{ap_Reg_24_27, ap_Reg_28_31, ap_Reg_16_19}},
{AY, 0xff00000000ff0000, 0xe3000000005a0000, // ADD (32) (AY R1,D2(X2,B2))
         [7]*argField{ap_Reg_8_11, ap_DispSigned20_20_39, ap_IndexReg_12_15, ap_BaseReg_16_19}},
{AG, 0xff00000000ff0000, 0xe300000000080000, // ADD (64) (AG R1,D2(X2,B2))
         [7]*argField{ap_Reg_8_11, ap_DispSigned20_20_39, ap_IndexReg_12_15, ap_BaseReg_16_19}},
{KMA, 0xffff000000000000, 0xb929000000000000, // CIPHER MESSAGE WITH AUTHENTICATION (KMA R1,R3,R2)
           [7]*argField{ap_Reg_24_27, ap_Reg_16_19, ap_Reg_28_31}},
{KMC, 0xffff000000000000, 0xb92f000000000000, // CIPHER MESSAGE WITH CHAINING (KMC R1,R2)
            [7]*argField{ap_Reg_24_27, ap_Reg_28_31}},

Along with opcode map tables, “tables.go” file also contains:
 
  • A structure containing the enum constant definitions of all the supported instructions.
const (
    _ Op = iota
    A
    AR
    ARK
    …. }
  • A structure containing the opcode strings of all the supported instructions.
var opstr = [...]string{
    A:       "a",
    AR:      "ar",
    ARK:     "ark",
         …….     }
  • A structure containing characteristics of an operand such as Type, Bit-field position and size.

var (
      ap_Reg_8_11  = &argField{Type: TypeReg, flags: 0x1, BitField: BitField{8, 4}}
        ap_DispUnsigned_20_31  = &argField{Type: TypeDispUnsigned, flags: 0x10, BitField: BitField{20, 12}}
        ap_IndexReg_12_15      = &argField{Type: TypeIndexReg, flags: 0x41, BitField: BitField{12, 4}}
    …. }

Flow Chart Diagram:

Stage #3:

Parse the byte data from the Go binary file to be disassembled. Decode an instruction opcode and its arguments with the help of the s390x map table. Length of an instruction on s390x can either be a 2 or 4 or 6 bytes.
Code flow and directory structure:
  • A new package "s390xasm" is created inside arch/s390x/" directory.
  • Following files inside "s390xasm" package will help to support the entire stage #3 functionalities.
    • decode.go” file is responsible for decoding the leading bytes from input go binary source as a single instruction.
    • “field.go” file is responsible for defining “BitField” structure and its related operations. This helps in extracting individual bit field and its offsets from 64-bit double word.
    • “inst.go” file is responsible for defining an instruction format and its related operations. Each instruction information is contained in “Inst” structure which has opcode information, 64-bit raw encoding bits, a length of encoding bits and instruction arguments defined as per the z-ISA pdf document.
    • “gnu.go” file is responsible for printing the disassembled output either in the form of GNU (AT&T) syntax or Go pseudo assembly syntax. “GNUSyntax()” function inside this file takes “Inst” structure and “pc (a program counter)" as an input argument and it will return a complete disassembled instruction string. This file is also responsible for handling all the extended mnemonic cases for many instructions as per z-ISA pdf document.

Flow Chart Diagram:

Stage #4:

Print the disassembled output based on the instruction format, either on the console or to any redirected file. 
 
Code flow and directory structure:
  • s390x arch specific support is added to "disasm.go" file inside “src/cmd/internal/objfile” directory to handle the disassembler command to get the disassembled output.
  • “go tool objdump –gnu <go_binary>” is command executed to disassemble the go binary. "--gnu" is the option used to print the disassembler output as per GNU (AT&T) syntax.
  • This command will invoke a “Decode( )” function in “decode.go” file, by passing the binary source to be disassembled.
  • Successful completion of the “Decode( )” function will result in returning of the matching instruction along with its arguments in the form of “Inst” structure.

“go tool objdump –gnu <go_binary>” command output:
 
TEXT internal/abi.(*RegArgs).Dump(SB) /root/go_compiler/dev/go_disasm1/src/internal/abi/abi.go
  abi.go:47             0x11000                 e330d0100004           lg       %r3, 16(%r0,%r13)
  abi.go:47             0x11006                 ec3f0069a065           clgrjnl  %r3, %r15, 110d8
  abi.go:47             0x1100c                 e3e0ffe8ff24           stg      %r14, -24(%r0,%r15)
  abi.go:47             0x11012                 e3ff0fe8ff71           lay      %r15, -24(%r15,%r0)
  abi.go:47             0x11018                 e3e0f0000024           stg      %r14, 0(%r0,%r15)
  abi.go:48             0x1101e                 c0e500025eb9           brasl    %r14, 0x5cd90

Flow Chart Diagram:

References:

  • https://www.ibm.com/docs/en/SSQ2R2_15.0.0/com.ibm.tpf.toolkit.hlasm.doc/dz9zr006.pdf
  • https://en.wikipedia.org/wiki/Disassembler
0 comments
9 views

Permalink