System Programming: Assembler

Introduction to Assemblers and

Assembly Language

Encoding instructions as binary numbers is natural and efficient for computers. Humans, however, have a great deal of difficulty understanding and manipulating these numbers. People read and write symbols (words) much better than long sequences of digits.

This topic describes the process by which a human-readable program is translated into a form that a computer can execute, provides a few hints about writing assembly programs and how to run them.

What is an assembler ?

A tool called an assembler translates assembly language into binary instructions.

Assemblers provide a friendlier representation than a computer’s 0s and 1s that simplifies writing and reading programs.

Symbolic names for operations and locations are one fact of this representation. Another fact is programming facilities that increase a program’s clarity.

An assembler reads a single assembly language source file and produces an object file containing machine instructions and bookkeeping information that helps combine several object files into a program.

Figure illustrates how a program is built.

Most programs consist of several files—also called modules— that are written, compiled, and assembled independently. A program may also use pre written routines supplied in a program library . A module typically contains References to subroutines and data defined in other modules and in libraries. The code in a module cannot be executed when it contains unresolved References to labels in other object files or libraries.

Another tool, called a linker, combines a collection of object and library files into an executable file , which a computer can run.

1) Assembler :

a program to handle all the tedious mechanical translations

2) Allows you to use:

• symbolic opcodes

• symbolic operand values

• symbolic addresses

3) The Assembler

• keeps track of the numerical values of all symbols

• translates symbolic values into numerical values

4) Time Periods of the Various Processes in Program Development

5) The Assembler Provides:

a. Access to all the machine’s resources by the assembled program.

This includes access to the entire instruction set of the machine.

b. A means for specifying run-time locations of program and data in

memory.

c. Provide symbolic labels for the representation of constants and

addresses.

d. Perform assemble-time arithmetic.

e. Provide for the use of any synthetic instructions.

f. Emit machine code in a form that can be loaded and executed.

g. Report syntax errors and provide program listings

h. Provide an interface to the module linkers and program loader.

i. Expand programmer defined macro routines.

Syntax:

Label OPCODE Op1, Op2, ... ; Comment field

Pseudo-operations (sometimes called “pseudos,” or directives) are

“opcodes” that are actually instructions to the assembler and that do

not result in code being generated.

Assembler maintains several data structures

• Table that maps text of opcodes to op number and instruction format(s)

• “Symbol table” that maps defined symbols to their value

Disadvantages of Assembly

• programmer must manage movement of data items between memory locations and the ALU.

• programmer must take a “microscopic” view of a task, breaking it down to manipulate individual memory locations.

• assembly language is machine-specific.

• statements are not English-like (Pseudo-code)

The 2-Pass Assembly Process

• Pass 1:
1. Initialize location counter (assemble-time “PC”) to 0
2. Pass over program text: enter all symbols into symbol table
a. May not be able to map all symbols on first pass
b. Definition before use is usually allowed
3. Determine size of each instruction, map to a location
a. Uses pattern matching to relate opcode to pattern
b. Increment location counter by size
c. Change location counter in response to ORG pseudos

• Pass 2:
1. Insert binary code for each opcode and value
2. “Fix up” forward references and variable-sizes instructions

Examples include variable-sized branch offsets and
constant fields

Pass 1 illustration by diagram ::

1. Input Source Program.
2. A Location Counter ( LC ), is used to keep track of each instruction's location.
3. A Table, the Machine Operation Table ( MOT ), that indicates the symbolic mnemonic for each instruction and its length ( two or six bytes ).
4. A Table, the Pseudo Code Table ( POT ), that indicates the symbolic mnemonic and action to be taken for each pseudo-op in pass 1.
5. A Table, the Symbol Table ( ST ), that is used to store each label and its corresponding value.
6. A Table, the Literal Table ( LT ), that is used to store each literal encountered and its corresponding assigned location.
7. A copy of the input to be used by pass 2. This may be stored in a secondary storage device , such as magnetic tape , disk , drum or the original source deck may be read by the assembler a second time for pass 2.

Pass 2 illustration by diagram ::

Copy the source program into pass 1.
Location Counter (LC).
A table, the Machine Operation Table (MOT) , that indicates (a) symbolic mnemonic (b) length (c) binary machine op-code (d) format.
A table, Pseudo Operation Table (POT), that indicates for each pseudo-op the symbolic mnemonic and the action to be taken in pass 2.
The Symbol Table , prepared by pass 1, containing each label and its corresponding value.
A table, the Base Table ( BT ), that indicates which registers are currently specified as base registers by USING pseudo-ops and what are specified contents of these registers.
A work-space, INST , that used to hold each instruction as its various parts (e.g. ,binary op-code, register field, length field, displacement field) are being assembled together.
A work-space, PRINT LINE, used to produce a printed listing .
A work-space, PUNCH CARD, used prior to actual outputting for converting the assembled instructions into the format needed by the loader.
An output deck of assembled instructions in the format needed by the loader.

Directives Assembler

1. Directives are commands to the Assembler

2. They tell the assembler what you want it to do,

e.g.

a. Where in memory to store the code

b. Where in memory to store data

c. Where to store a constant and what its value is

d. The values of user-defined symbols

Object File Format

Assemblers produce object files. An object file on Unix contains six distinct sections (see Figure ):

• The object file header describes the size and position of the other pieces of the file.

• The text segment contains the machine language code for routines in the source file. These routines may be not executable because of unresolved references.

• The data segment contains a binary representation of the data in the source file. The data also may be incomplete because of unresolved references to labels in other files.

• The relocation information identifies instructions and data words that depend on absolute addresses. These references must change if portions of the program are moved in memory.

• The symbol table associates addresses with external labels in the source file and lists unresolved references.

• The debugging information contains a concise description of the way in which the program was compiled, so a debugger can find which instruction addresses correspond to lines in a source file and print the data structures in readable form.

The assembler produces an object file that contains a binary representation of the program and data and additional information to help link pieces of a program. This relocation information is necessary because the assembler does not know which memory locations a procedure or piece of data will occupy after it is linked with the rest of the program. Procedures and data from a file are stored in a contiguous piece of memory, but the assembler does not know where this memory will be located. The assembler also passes some symbol table entries to the linker. In particular, the assembler must record which external

symbols are defined in a file and what unresolved references occur in a file.

Assembler Data Structure and Variable

n Two major data structures:

¨ Operation Code Table (OPTAB): is used to look up mnemonic

operation codes and translate them to their machine language equivalents

¨ Symbol Table (SYMTAB): is used to store values (addresses)

assigned to labels

n Variable:

¨ Location Counter (LOCCTR) is used to help the assignment of

addresses

¨ LOCCTR is initialized to the beginning address specified in

the START statement

¨ The length of the assembled instruction or data area to be

generated is added to LOCCTR

OPTAB and SYMTAB

n OPTAB must contain the mnemonic operation code and

its machine language

n In more complex assembler, it also contain information

about instruction format and length

n For a machine that has instructions of different length,

we must search OPTAB in the first pass to find the instruction

length for incrementing LOCCTR

n SYMTAB includes the name and value (address) for each

label, together with flags to indicate error conditions

n OPTAB and SYMTAB are usually organized as hash

tables, with mnemonic operation code or label name as the

key, for efficient retrieval

System Programming

Monday, 7 April 2014

Assembler

No comments:

Post a Comment

Blog Archive