Labels

new blog 2.0

2008/07/28

[0x00]. Notes on Assembly - Basic Terms

Coding with programmatic Lego bricks.

Assembly language is the lowest-level human readable programming language. It consists of a set of instructions that directly manipulate the CPU and memory, thus taking the programmer as close to the core of the machine as it gets. Writing programs in Assembly is like playing with Lego bricks in computing terms and this is probably why I like it so much.

Below I have put together a quick Assembly talk reference, differences between Intel and AT&T syntax, and some good practices. I hope some Assembly virgins may find it useful.

What is what?


Opcode - literally OPeraction CODE, is a numeric value (binary in its original form, but most often represented hexadecimally) that stands for a basic machine instruction, e.g. incrementing one of the registers by 1, an AND instruction and such. The CPU fetches opcodes and executes them according to instruction set architecture provided by the vendor.

Mnemonic - A human-readable nickname for opcodes. Mnemonics are usually from 2-5 characters long (e.g. jz, movsb). Mnemonics, just like opcodes, are specific to the CPU.

Operand - If a mnemonic takes arguments, those arguments are called operands (since they are input values to an operator). In Assembly a valid operand can possibly be a register, a memory address, a constant or a label (which in fact is a memory address). E.g. the instruction "movl %eax,%ebx" has two operands - eax and ebx.

Immediate Operand - a literal (immediate) value, e.g. a numeric constant, like a memory address

Register - a basic data storage space, which can be accessed by CPU extremely fast, due to the fact that it lives in a CPU. The registers can be 64bit (?), 32bit (eax),16bit(ax), and 8bit (ah) in value. A pair of 4bit "nybbles" (byte halves) can be extracted from an 8bit register using ANDing or SHL/SHR tricks.

Paragraph - a 16-byte sized chunk of memory. Main memory is broken down into paragraphs starting from the address of 0000:0000 throughout, marking paragraph boundry every 16 (10h) bytes. In Segment:Offset memory addressing mode, every segment has to start in a position being a paragraph boundry.

Segment:Offset - standard memory addressing notation. Because of the nature Segment:Offset notation is computed, the Segment is denoted by a hexadecimal integer number always starts at a position in memory that is divisible by a paragraph size (16 bytes), so that actual memory address that the Segment points to equals to Segment value * 16. The Offset is a distance measured in bytes from the Segment address to the place in memory that you want to refer to. Please revert to sources section at the end of this post for more details.

Stack - a LIFO queue nested in the upper part of the main memory but ruled by the CPU mainly with push, pop commands.

Stack frame - a data structure in memory (on the stack) containing information about subroutine state.

Local and global labels - in the assembly code you can insert a label at any point by typing a label name followed by a colon. During compilation labels are translated into memory addresses. Global labels start with an alphanumeric character or an _ (underscore) and can be jumped to from any place in the code. Local labels start with a . (dot) character and can be placed after global labels. Local labels can be accessed by jmp instructions only within the code between two global labels.

Sources:
Assembly Language Step-By-Step by Jeff Duntemann
http://en.wikipedia.org/wiki/Operand
http://en.wikipedia.org/wiki/64-bit
http://en.wikipedia.org/wiki/Opcode
Explaination of the Segment:Offset notation by Daniel B. Sedory
http://en.wikipedia.org/wiki/Call_stack

No comments: