JWASM

JWASM Macro Assembler is an x86 assembler that targets 16, 32 and 64 bit platforms. JWASM is designed as a MASM-compatible assembler using the historical Intel notation and is available under the Sybase Open Watcom Public License. It produces binaries for the DOS, Windows, Linux, OS/2 and FreeBSD operating systems. JWASM is an almost complete rewrite of the earlier Watcom assembler WASM. JWasm is written in portable C and has been successfully tested with the Open Watcom development environment, the Microsoft Visual Studio family of development tools, the GNU (GCC) compiler and others. It is currently being upgraded by Japheth.
History
JWASM is a upgrade of the earlier Open Watcom assembler WASM. It has been extensively rewritten to modernise it, extend its capacity and add additional platform support. Among its design targets is a very high level of MASM compatibility. Its initial release is dated 05/20/2008 as v1.7. The current version as of 1/19/2010 is JWasm v2.02 and it is now 64 bit capable. It is continually updated to support the latest operating systems and its capacity to build 32 bit binaries is effectively release standard as the assembler enjoys a very high level of compatibility with the 32 bit version of MASM to the exent that it successfully builds most 32 bit MASM code.
Copyright String
JWasm v2.02, Jan 19 2010, Masm-compatible assembler.
Portions Copyright (c) 1992-2002 Sybase, Inc. All Rights Reserved.
Source code is available under the Sybase Open Watcom Public License.
Usage
JWASM conforms to the historical Intel x86 assembly notation commonly associated with the Microsoft Macro Assembler and uses the standard Microsoft documentation and later as technical reference.
Abbreviated Notation
The historical Intel notation is a fully specified format which occurs in the following form.
mov eax, DWORD PTR
Over time the parsers in assemblers have improved to the stage where if the assembler can recognise the size of the data then the SIZE specifier can be omitted.
mov eax,
This allows for clearer code that is easier to read but there are some contexts where the assembler cannot determine the data size if the source operand is a memory operand and in this context the historical data SIZE specifiers must be used.
movzx eax, ; this generates an error as the data SIZE cannot be determined by the assembler
movzx eax, BYTE PTR ; zero extend a BYTE into the 32 bit EAX register
OFFSET Operator
Characteristic of the historical Intel notation is the distinction between fixed and transient addressing using the OFFSET operator. Data written in either the initialised or uninitialised data sections is a known ADDRESS at assembly time as are code labels, all of which are referenced by the OFFSET operator. Transient addressing is performed with the normal Intel mnemonics for reading the stack within a procedure.
For a corresponding data entry in the initialised data section,
textitem db "This is a text item",0
You address this data entry in the following manner.
mov eax, OFFSET textitem
Transient Stack Addressing
Operating systems provide memory for the area of memory referred to as the stack and under x86 hardware the stack is used as the main method of transferring arguments to procedures. Arguments are normally place on the stack by the PUSH mnemonic in the following form. This example assumes the STDCALL calling convention and the example is in 32 bit data sizes.
push arg3
push arg2
push arg1
call FunctionName
The CALL mnemonic pushed the return address onto the stack then branches to the address of the named procedure. If the procedure has a stack frame where the stack pointer register ESP is stored in the base pointer register EBP the first argument for the procedure occurs at address [ebp+8]. While this form of mnemonic notation can be written by experienced assembler programmers, the assembler provides a naming method to remove an un-necessary level of abstraction from writing code of this type.
The programmer can use the name of the argument in the place of the direct [EBP+displacement] notation to make the code more readable with no loss of performance. When the programmer needs to use the ADDRESS of a transient stack variable (normally referred to as a LOCAL variable) they have a number of methods. In a prototyped function call they can use the ADDR operator to obtain the address of a LOCAL variable. Alternatively they can use the direct Intel mnemonic LEA to load the effective address of the variable into a register
lea eax, named_local_variable
Square Brackets
JWASM supports the historical Intel technique of using named variables to represent both fixed and transient addresses. Square brackets are used around the complex addressing Intel notation to denote that the contents are a memory operand. Programmers coming from a different background where square brackets are used as general ADDRESS operators have at time had problems with this notation difference but the historical Intel notation as it is implimented in JWASM tolerates the use of square brackets around named variables by simply ignoring them. Intel and compatible x86 processors do not have mnemonics to produce an extra level of indirection implied by the ambiguous usage of square brackets.
There is some flexibility in how square brackets can be used in historical Intel notation compatible assemblers.
mov eax, [ecx+edx]
mov eax,
Both notations are correct here and in the second example the extra pair of square brackets function as an ADDITION operator.
Limited Type Checking
JWASM supports a pseudo high level notation for creating procedures that perform argument size and count checking. It is part of a system using the PROC ENDP PROTO and INVOKE operators. The PROTO operator is used to define a function prototype that has a matching PROC that is terminated with the ENDP operator. The prototyped procedure can then be called with the INVOKE operator which is protected by the limited size and argument count checking. There is additional notation at a more advanced level for turning off the automatically generated stack frame for the procedure where stack overhead in the procedure call may have an effect with very small procedures. JWASM is also capable of being written completely free of the pseudo high level notation using only bare Intel mnemonics.
Using an example prototype from the 32 bit Windows API function set,
SendMessage PROTO STDCALL :DWORD,:DWORD,:DWORD,:DWORD
SendMessage equ <SendMessageA>
The code to call this function using the INVOKE notation is as follows.
invoke SendMessage,hWin,WM_COMMAND,wParam,lParam
Which is translated exactly to,
push lParam
push wParam
push WM_COMMAND
push hWin
call SendMessage
The advantage of the INVOKE method is that it tests the size of the data types and the argument count and generates an assembly time error if the arguments do not match the prototype.
Pseudo High Level Emulation
JWASM conforms to the historical MASM notation in terms of emulating high level control and loop structures.
It supports the .IF block structure,
.if
-
.elseif
-
.else
-
.endif
It also supports the .WHILE loop structure,
.while eax > 0
sub eax, 1
.endw
And the .REPEAT loop structure.
.repeat
sub eax, 1
.until eax < 1
The high level emulation also supports C runtime comparison operators that work according to the same rules as Intel mnemonic comparisons. For the .IF block notation the distinction between SIGNED and UNSIGNED data is handles with a minor data type notation variation where the storage size DWORD which is by default UNSIGNED can also be specified as SDWORD for SIGNED comparison. This data type distinction is only appropriate for the pseudo high level notation as it is unused at the mnemonic level of code where the distinction is determined by the range of conditional evaluation techniques available in the Intel mnemonics.
The combined pseudo high level emulation allows JWASM to more easily interface with the later current operating systems that use a C style application programming interface. Generally the pseudo high level interface is used for non-speed critical code where clarity and readability are the most important factors, speed critical code is usually written in directly in mnemonics.
Pre-processor
The pre-processor in JWASM emulates the capacity in the Microsoft Macro Assembler and for most practical purposes it is near enough to identical. It is an old design dating back to about 1990 When Microsoft introduced the MASM 6.00 series of assemblers that is known to experienced users as quirky and complicated to use for advanced macro designs. Notwithstanding its archaic format it is a reasonably powerful pre-processor with loop techniques, conditional testing, text manipulation commands and the normal text substitution methods associated with arguments passed to the pre-processor.
At its simplest a macro in JWASM is constructed as follows.
ItemName MACRO argument1, argument2, argument3:VARARG
mov argument1, argument2
mov argument3, argument1
ENDM
This macro is called as follows,
ItemName eax, ecx, edx
It is expanded by the pre-processor to,
mov eax, ecx
mov edx, eax
Support
JWasm Home
JWasm project page on SourceForge
The MASM Forum
Licence
JWASM is licenced under the Sybase Open Watcom Public License and is available for use in environments and projects that are excluded by the Microsoft EULA for MASM. JWASM has no restrictions in writing Open Source software or writing software for non-Microsoft operating systems.
 
< Prev   Next >