Previous article in this series : Linux Assembly Part 1 about Syscalls
This is the second article in the
Linux Assembly
series. This time, we will focus on how
to represent different data types in
nasm
so that we can do something with them.
Registers
Remember the registers we used last time?
Assembly syntaxes sometimes feels a little special because of the way some registers are reserved for special purposes, so it's important to understand those registers and how they are used together with operators and functions.
There are various registers available for different purposes. See the table below to find out how
they are named, and whether or not they're persistent if you make a
call
.
| Description | 64 bit | 32 bit | 16 bit | 8 bit | Persistent? |
|---|---|---|---|---|---|
| Accumulator | RAX | EAX | AX | AL | No |
| Base | RBX | EBX | BX | BL | Yes |
| Counter / 4th Argument | RCX | ECX | CX | CL | No |
| Data / 3rd Argument | RDX | EDX | DX | DL | No |
| Stack Pointer | RSP | ESP | SP | SPL | Yes |
| Base Pointer / Frame Pointer | RBP | EBP | BP | BPL | Yes |
| 1st Argument | RDI | EDI | DI | DIL | No |
| 2nd Argument | RSI | ESI | SI | SIL | No |
| 5th Argument | R8 | R8D | R8W | R8B | No |
| 6th Argument | R9 | R9D | R9W | R9B | No |
| Temporary | R10 - R11 | R10D - R11D | R10W - R11W | R10B - R11B | No |
| Callee-Saved Registers | R12 - R15 | R12D - R15D | R12W - R15W | R12B - R15W | Yes |
Data Types
Remember the
.data
section in which we declared our
Hello, World!
message from the previous article?
section .data msg db "Hello, World!" section .text global _start _start: ; (...)
In the
nasm
language you can also declare other data types, which we are going to learn about now.
Behind the scenes, everything is a
byte
or a
word
due to how
x86
as an instruction set was
designed at the time; but the
NASM
language abstracts away somewhat higher-level data types and how
you can use them in a more typesafe manner.
Bytes and Words
The basic data types in
nasm
are
:
byteis a byte that is8 bitslong.wordis2 byteslong.doublewordis4 byteslong.quadwordis8 byteslong.doublequadwordis16 byteslong.
Unsigned Integers
Unsigned integers or
signed int
are binary numbers that can be represented as a
byte
,
a
word
, a
doubleword
, a
quadword
, or
doublequadword
. The byte length influences
the range of numbers we can represent.
unsigned integerasbytecan represent the numbers from0to255.unsigned integeraswordcan represent the numbers from0to65535.unsigned integerasdoublewordcan represent the numbers from0to4294967295.unsigned integerasquadwordcan represent the numbers from0to18446744073709552000.unsigned integerasdoublequadwordcan represent the numbers from0to340282366920938463463374607431768211456
Signed Integers
Signed integers or
unsigned int
are binary numbers that can be represented in the same way
as
unsigned int
but they represent different number ranges.
The first bit is set to
1
for
negative
numbers and is set to
0
for
positive
numbers.
This means that the number range doesn't start at
0
, and instead starts at
-(bitlength/2)
and ends at
+(bitlength/2)-1
.
signed integerasbytecan represent the numbers from-128to+127.signed integeraswordcan represent the numbers from-32768to+32767.signed integerasdoublewordcan represent the numbers from-2^31to+2^31-1.signed integerasquadwordcan represent the numbers from-2^63to+2^63-1.signed integerasdoublequadwordcan represent the numbers from-2^127to+2^127-1.
Strings
Strings are represented as
double word
chunks behind the scenes, which makes them a little
quirky to work with. That means strings that are larger than a
double word
or
4 bytes
need
to be concatenated together to be used by instructions like
cmp
due to the bit size limitations
of registers.
In order to prevent doing that most of the time,
Kernel
developers decided to offer
syscalls
that use references (or pointers) to addresses that contain the
strings
for that very reason.
So strings are special case that's important to keep in mind. Usually, userspace libraries try
to abstract away dealing with
string
lengths. A common convention in the
C ABI
world, for
example, is that strings are
NULL
delimited. This means that they have a trailing
0x00
byte
that marks the end of the series of bytes that contain a
string
value.
NASM Pseudo Instructions
The
NASM
language specifies so-called pseudo instructions. These instructions are not part
of the
x86
(or
x86_64
) instruction set, but allow us to declare data in a much easier manner.
Declaring Initialized Data
The current pseudo instructions to declare initialized data are :
DBto declare a byte (8 bit)DWto declare a word (16 bit)DDto declare a double word (32 bit)DDQto declare a double quad word (64 bit)DOto declare a generic output file (64 bit)DYandDZto declareYMMandZMMregisters (See AVX512 )
The limitations of what kind of data you can declare are as follows :
DDcan declare a floatDQcan declare a double-precision floatDTcan declare a extended-precision floatDTdoes not accept numeric constants.DDQdoes not accept float constants as operands.- Any operand size larger than
DD(double word) does not accept strings as operands.
However, the pseudo instructions are somewhat data type independent, which means that they can have a different effect depending on what data type you're using to declare the data.
Declaring Bytes and Words
As the
x86
(and therefore
x86_64
) instruction set is
little-endian
, the above pseudo
instructions also exist to do the conversion from/to endianness for us.
db 0x12 ; 0x12 db 0x11,0x12,0x13 ; 0x12 0x12 0x13 dd 0x11223344 ; 0x44 0x33 0x22 0x11 (note the endianness) dq 0x1122334455667788 ; 0x88 0x77 0x66 0x55 0x44 0x33 0x22 0x11 (note the endianness)
Declaring Floating-Point Numbers
The floating point number precision is a little quirky due to their byte length to represent the precision after the comma.
dd 1.234567e20 ; floating-point constant dq 1.234567e20 ; double-precision float dt 1.234567e20 ; extended-precision float
Declaring Strings
Both character constants and strings can be declared using single quotation marks around
them. However, behind the scenes,
string
is almost always declared and processed as a
double word
in many instructions. If you declare a
dw 'string'
that doesn't fill out
all the reserved bytes, a trailing
0x00
byte is added.
db 'A',0x42 ; 'AB' string in ASCII dw 'A' ; 0x41 0x00 (filled with trailing 0x00 byte) dw 'AB' ; 0x41 0x42 dw 'ABC' ; 0x41 0x42 0x43 0x00 (filled with trailing 0x00 byte)
You can read more about pseudo instructions in Chapter 3 of the NASM Documentation .
Declaring Uninitialized Data
The current pseudo instructions to declare uninitialize data are :
RESBto reserve a byte (8 bit)RESWto reserve a word (16 bit)RESDto reserve a double word (32 bit)RESOto reserve a generic output file (64 bit)RESYandRESZto declareYMMandZMMregisters (See AVX512 )RESDto reserve a floatRESQto reserve a double-precision floatRESTto reserve an extended-precision float
resb 32 ; reserve 32 bytes resw 2 ; reserve 2 words resq 10 ; reserve 10 double-precision floats resy 2 ; reserve 2 YMM registers resz 4 ; reserve 4 ZMM registers