----------------------------------------------------------------- HUGO v2.5 TECHNICAL SYSTEM SPECIFICATION Under the Hood of Hugo and the .HEX File Format (Revision 0.20) Copyright (c) 1995-1999 by Kent Tessman ----------------------------------------------------------------- This is a draft document and as such is subject to revision. Comments are welcome. Current revision: 02/01/99 ----------------------------------------------------------------- TABLE OF CONTENTS ----------------------------------------------------------------- I. INTRODUCTION I.a. How Hugo Works II. ORGANIZATION OF THE .HEX FILE II.a. Memory Map II.b. The Header III. TOKENS AND DATA TYPES III.a. Tokens III.b. Data Types IV. ENGINE PARSING V. GRAMMAR VI. EXECUTABLE CODE VI.a. A Simple Program VI.b. Expressions VII. ENCODING TEXT VIII. THE OBJECT TABLE VIII.a. Objects VIII.b. Attributes IX. THE PROPERTY TABLE IX.a. Before, After, and Other Complex Properties X. THE EVENT TABLE XI. THE DICTIONARY AND SPECIAL WORDS XI.a. Dictionary XI.b. Special Words XII. RESOURCEFILES XIII. THE HUGO COMPILER AND HOW IT WORKS XIII.a. Compile-Time Symbol Data XIII.b. The Linker XIV. THE HUGO ENGINE AND HOW IT WORKS XIV.a. Runtime Symbol Data XIV.b. Non-Portable Functionality XIV.c. Savefile Format XV. DARK SECRETS OF THE HUGO DEBUGGER XV.a. Debugger Expression Evaluation XV.b. The .HDX File Format APPENDIX A: CODE PATTERNS ----------------------------------------------------------------- I. INTRODUCTION ----------------------------------------------------------------- Most Hugo programmers will likely never need to bother with the detailed information in this manual, but anyone porting Hugo to a new platform, writing an interface or tool for the language, or just interested in taking a closer look at how the Hugo Compiler writes a compiled program (and how the Hugo Engine interprets it) might find a technical specification useful, even if only to verify the occasional behavior or detail. What this look under the hood attempts to do is to outline the configuration of data and code storage used by Hugo, as well as giving an extensive overview of how the various aspects of the language are compiled and interpreted. This technical specification of the language internals is not a complete programming guide; familiarity with the language and a handy copy of the Hugo Programming Manual will be helpful, as will access to the Hugo source code (written in ANSI C and available at the time of this writing at ftp://ftp.gmd.de/if- archive/ programming/hugo/source/). The standard Hugo source distribution is hugov*_source.tar.gz. Operating-system-specific sources (i.e., implementations of non- portable functions) are typically hugov*_OSname_source.zip. Please note that while this document does address differences between the current version of Hugo and previous versions, it is by no means complete in that respect. For example, a current- version implementation of the Hugo Engine that conforms to this specification is not guaranteed to run programs compiled with all previous versions of Hugo. For further elaboration on such differences, please see the Hugo source itself. Author e-mail (The General Coffee Company Film Productions): Hugo Home Page: http://www.geocities.com/hollywood/academy/5976/hugo.html (As of this revision) ----------------------------------------------------------------- I.a. HOW HUGO WORKS ----------------------------------------------------------------- The Hugo system is composed of two parts: the compiler and the engine (the interpreter). (The debugger is actually a modified build of the engine, with an additional command layer to facilitate debugging examination and manipulation of the runtime state.) The compiler is responsible for reading source files and writing executable code; it does this by first tokenizing a given line of code--breaking it down into a series of byte values representing its contents--and then determining how the line(s) should be written (i.e., identified, optimized, and encoded) in order to fit properly into the current construct. The compiler is also responsible for organizing and writing tables representing object data, property data, the dictionary, etc. The engine in turn reads the file produced by the compiler (called a .HEX file, after the default extension), and follows the compiled instructions to execute low-level functions such as object movement, property assignment, text output, and expression evaluation. These low-level operations are, for the most part, transparent to the programmer. ----------------------------------------------------------------- II. ORGANIZATION OF THE .HEX FILE ----------------------------------------------------------------- ----------------------------------------------------------------- II.a. MEMORY MAP ----------------------------------------------------------------- If all the separate segments of a .HEX file were stacked sequentially on top of each other, the resulting pile would look something like this: DATA STORAGE: MAXIMUM SIZE: +-----------------+---------------+ | (Link data for | | | .HLB files) | | +-----------------+---------------+ | Text bank | 16384K | +-----------------+---------------+ | Dictionary | 64K | +-----------------+---------------+ | Special words | 64K | +-----------------+---------------+ | Array space | 64K | +-----------------+---------------+ | Event table | 64K | +-----------------+---------------+ | Property table | 64K | +-----------------+---------------+ | Object table | 64K | +-----------------+---------------+ | Grammar and | | | Executable code | 256K | +-----------------+---------------+ | Header | $0040 bytes | +-----------------+---------------+ (Bottom: $000000) MAXIMUM SIZE: 17024K bytes Each new segment begins on a boundary divisible by 16; an end-of- segment is padded with zeroes until the next boundary. For each segment, data is general stored in sequential chunks, following two or more bytes giving information about the size of the table. Dictionary table: the first two bytes give the total number of entries. The third byte is always 0, so that dictionary entry 0 is an empty string (""). Following the dictionary table, a number of bytes may optionally be written for runtime dictionary expansion. Special words: the first two bytes give the total number of special words. Array space: the first 420 bytes give the global variable defaults (2 bytes each). For each array entry, the first two bytes give the array length. Event table: the first two bytes give the total number of events. Property table: the first two bytes give the total number (n) of properties. The following n*2 bytes give the property defaults. Object table: the first two bytes give the total number of objects. ----------------------------------------------------------------- II.b. THE HEADER ----------------------------------------------------------------- The header is reserved a total of 64 bytes at the start of the compiled .HEX file, immediately preceding the grammar table. It contains the bulk of information regarding table offsets, junction routine addresses, etc.: essentially, it is a map to where to find things in the file. Compile with the -u switch to display a map of memory usage in the .HEX file that reflects the offsets and addresses encoded in the header. Byte Length Description $00 1 Version of Hugo Compiler used (The version format was changed between v2.0 and v2.1. Version 2.0 programs contained the value 2; version 2.1 programs contain the value 21, version 2.2 programs contain 22, etc.) 01 2 ID string (compiler-generated) (Pre-v2.3 allowed the programmer to specify an ID string, an unnecessary convention now--the ID string used to be used to create the default savefile name. Precompiled headers have the ID string "$$".) 03 8 Serial number 0B 2 Address of start of executable code 0D 2 Object table offset 0F 2 Property table offset 11 2 Event table offset 13 2 Array space offset 15 2 Dictionary offset 17 2 Special words table offset (Table offsets are equal to the offset of the beginning of the table from the start of data, divided by 16.) 19 2 Init routine indexed address 1B 2 Main routine indexed address 1D 2 Parse routine indexed address 1F 2 ParseError routine indexed address 21 2 FindObject routine indexed address 23 2 EndGame routine indexed address 25 2 SpeakTo routine indexed address 27 2 Perform routine indexed address (Pre-v2.5 had no Perform junction routine; verb routines were called directly by the engine.) 29 2 Text bank offset In .HDX (debuggable) Hugo executables only: 3A 1 Debuggable flag, set to 1 3B 3 Absolute start of debugging information 3E 2 Debug workspace (in array table) A note on data storage: Whenever 16-bit words (i.e., two bytes representing a single value) are written or read, it is in low- byte/high-byte order, with the first byte being the remainder of x/256 (or the modulus x%256), and the second byte being the integer value x/256. (This order is used consistently in Hugo's internal structure. For another example, see "Appendix A: Code Patterns". Several of the conditional statements--'if', 'elseif', etc.--use two bytes to give the absolute skip distance to the next statement if the conditional test fails. The pair is coded in low-byte/high- byte order.) ----------------------------------------------------------------- III. TOKENS AND DATA TYPES ----------------------------------------------------------------- The first two places to start inspecting how the Hugo compiler writes a .HEX file are: 1.) what byte values are written to represent each individual token (i.e. keywords, built-in functions, etc.), and 2.) how different data types and values are formatted. ----------------------------------------------------------------- III.a. TOKENS ----------------------------------------------------------------- $00 (not used) 10 # 20 for 01 ( 11 ~ 21 return 02 ) 12 >= 22 break 03 . 13 <= 23 and 04 : 14 ~= 24 or 05 = 15 & 25 jump 06 - 16 > 26 run 07 + 17 < 27 is 08 * 18 if 28 not 09 / 19 , 29 true 0A | 1A else 2A false 0B ; 1B elseif 2B local 0C { 1C while 2C verb 0D } 1D do 2D xverb 0E [ 1E select 2E held 0F ] 1F case 2F multi 30 multiheld 40 eldest 50 window 31 newline 41 younger 51 random 32 anything 42 elder 52 word 33 print 43 prop# 53 locate 34 number 44 attr# 54 parse$ 35 capital 45 var# 55 children 36 text 46 dictentry# 56 in 37 graphics 47 textdata# 57 pause 38 color 48 routine# 58 runevents 39 remove 49 label# 59 arraydata# 3A move 4A object# 5A call 3B to 4B value# 5B stringdata# 3C parent 4C eol# 5C save 3D sibling 4D system 5D restore 3E child 4E notheld 5E quit 3F youngest 4F multinotheld 5F input 60 serial$ 70 readfile 61 cls 71 writeval 62 scripton 72 readval 63 scriptoff 73 playback 64 restart 75 colour 65 hex 76 picture 66 object 77 sound 67 xobject 78 music 68 string 79 repeat 69 array 6A printchar 6B undo 6C dict 6D recordon 6E recordoff 6F writefile Some of these, particularly the early tokens, are as simple as punctuation marks that are recognized by the engine as delimiting expressions, arguments, etc. Non-punctuation stand-alone tokens ('to', 'in', 'is') are used for similar purposes, to give form to a particular construction. Others, such as 'save', 'undo', 'recordon', and others are engine functions that, when read, trigger a specific action. Note also tokens ending with '#': these primarily represent data types that are not directly enterable--the '#' character is separated and read as a discrete word in a parsed line of Hugo source. For example, the occurrence of a variable name in the source will be compiled into 'var#' (token $45) followed by two bytes giving the number of the variable being referenced. (See the following section on Data Types for more details.) ----------------------------------------------------------------- III.b. DATA TYPES ----------------------------------------------------------------- Internally, all data is stored as signed 16-bit integers (that may be treated as unsigned as appropriate). The valid range is - 32768 to 32767. Following are the formats for the various data types used by Hugo; to see them in practice, it is recommended to consult the Hugo C source code and the functions CodeLine() in HCCODE.C--for writing them in the compiler--and GetValue() and GetVal() in HEEXPR.C--for reading them via the engine. ATTRIBUTE: <1 byte> The single byte represents the number of the attribute, which may range from $00 to $7F (0 to 127). Attribute $10, for example, would be written as: $44 10 DICTIONARY ENTRY: <2 bytes> The 2 bytes (one 16-bit word) represent the address of the word in the dictionary table. The null string ("") is $00. If the word "apple" was stored at the address $21A0, it would be written as: $46 A0 21 OBJECT: <2 bytes> The two bytes (one 16-bit word) give the object number. Objects $0002 and $01B0 would be written as, respectively: $4A 02 00 $4A B0 01 PROPERTY: <1 byte> The single byte gives the number of the property being referenced. Property $21 would be written as: $43 21 ROUTINE: <2 bytes> The two bytes (one 16-bit word) give the indexed address of the routine. In Hugo v2.5, all blocks of executable code begin on an address divisible by 4; this allows 256K of memory to be addressable via the range 0 to 65536. (Code is padded with up to three null ($00) values to the next address divisible by 4.) For example, a routine beginning at $004004 would be divided by 4 and encoded as the indexed address $1001, in the form: $48 01 10 This goes for routines, events, property routines, and even conditional code blocks following 'if', 'while', etc. VALUE (i.e., INTEGER CONSTANT): <2 bytes> A value may range from -32768 to 32767; negative numbers follow signed-value 16-bit convention by being x + 65536 where x is a negative number. For example, the values 10 ($0A), 16384 ($4000), and -2 would be written as: $4B 0A 00 $4B 00 40 $4B FE FF ($FFFE = 65534 = -2 + 65536) VARIABLE: <1 byte> A program may have up to 240 global variables (numbered 0 to 239), and 16 local variables for the current routine (numbered 240 to 255). Since 240 + 16 = 256, the number of the variable being specified will fit into a single byte. In the compiler, the first global variable (i.e. variable 0) is predefined as "object". It would be written as a sequence of two bytes: $45 00 A routine's second argument or local would be numbered 241 (since 240 ($F0) is the first local variable), and would be written as: $45 F1 ----------------------------------------------------------------- IV. ENGINE PARSING ----------------------------------------------------------------- The engine is responsible for all the low-level parsing of an input line (i.e., player command). Upon receiving an input, the engine parses the line into separate words, storing them in the word array. The word array--i.e., that which is referenced in a Hugo program via "word[n]"--is an internal structure coded using the 'word' token instead of 'array#'. A static, read-only parser string called 'parse$' is used for storage of important data, such as a parser-error-causing word/phrase that cannot otherwise be communicated as an existing dictionary entry. The first parsing pass also does the following: 1. Allows integer numbers for -32768 to 32767. 2. Time given in "hh:mm" format is converted to an integer number representing the total minutes since midnight, i.e., through the formula: hh * 60 + mm. The original "hh:mm" is stored in parse$. 3. Up to one word (or set of words) in quotation marks is allowed; if found, it is stored in parse$. 4. Special words are processed, i.e., removals and user-defined punctuation are removed, compounds are combined, and synonyms are replaced. (See the section below on Special Words for an explanation of how these are encoded.) If a user-defined Parse routine exists (i.e., if bytes $1D-1E in the header are not $0000), it is called next. If the routine returns true, the engine parsing routine is called a second time to reconcile any changes to the word set. If at any point the parser is unable to continue, either because an unknown word--one not found in the dictionary table--is found, or because there is a problem later, in grammar matching (described below), a parser error is generated, and parsing is stopped. (The unknown or otherwise problem-causing word is stored in parse$.) The engine has a set of standard parser errors that may be overridden by a user-provided ParseError (i.e., if bytes $1F-20 in the header are not $0000). If there is no ParseError routine, or if ParseError returns false, the default parser error message is printed. ----------------------------------------------------------------- V. GRAMMAR ----------------------------------------------------------------- The grammar table starts immediately following the header (at $40, or 64 bytes into the .HEX file). It is used for matching against the player's input line to determine the verbroutine to be called, and if applicable, the object(s) and xobject (i.e, the indirect object). (Note that if the input line begins with an object instead of a verb--i.e., if it is directed toward a character, as in "Bob, get the object", then grammar is matched against the phrase immediately following the initial object.) The grammar table is comprised of a series of verb or xverb (i.e., non-action verb) blocks, each beginning with either ($2C) or ($2D). A $FF value instead of either or indicates the end of the grammar table. A grammar table that looks like 000040: FF has no entries. Following the verb type indicator is a single byte giving the number of words (i.e., synonyms) for this particular verb. Following that are the dictionary addresses of the individual words. Think of the simple grammar definition: verb "get", "take" * object DoGet If this were the first verb defined, the start of the grammar table would look like 000040: 2C 02 x2 x1 y2 y1 where $x1x2 is the dictionary address of "get", and $y1y2 is the dictionary address of "take". (With v2.5 was introduced a separate--although rarely used-- variation to the verb header. A verb or xverb definition can contain something like verb get_object where get_object is an object or some other value. In this case, the verb word is get_object.noun instead of an explicitly defined word. The grammar table in this case would look like 000040: 2C 01 FF FF 4A x2 x1 where $FFFF is the signal that instead of a dictionary word address, the engine must read the following discrete value, where $4A is the token, and $xxxx is the object number of get_object. This extension is provided so that grammar may be dynamically coded and changed at runtime.) Following the verb header giving the number of verb words and the dictionary address of each is one or more grammar lines, each beginning with a "*" signifying the matched verb word. (For an elaboration of valid grammar syntax specification, please see the Hugo Manual.) Grammar lines are encoded immediately following the verb header, so that in the first example given above, verb "get", "take" * object DoGet becomes: 000040: 2C 02 x2 x1 y2 y1 000046: 08 66 48 z2 z1 00004B: FF where $z1z2 is the indexed routine address of DoGet. The $FF byte marks the end of the current verb definition. Immediately following this is either another or token, or a second $FF to indicate the end of the verb table. ----------------------------------------------------------------- VI. EXECUTABLE CODE ----------------------------------------------------------------- ----------------------------------------------------------------- VI.a. A SIMPLE PROGRAM ----------------------------------------------------------------- The following is a simple Hugo program: routine main { print "Hello, Sailor!" pause return } It will print "Hello, Sailor!", wait for a keypress, and exit. When compiled, the grammar table and executable code look like this: 000040: FF 00 00 00 33 6B 0E 00 5C 79 80 80 83 40 34 67 000050: 75 7D 80 83 86 35 4C 57 21 4C 0D 21 4C 00 00 00 Here is what those 32 bytes represent: 000040: FF - The grammar table is empty; no grammar has been defined. The first entry in the grammar table is $FF, signifying end- of-table. 000041: 00 00 00 - Padding to the next address boundary. 000044: 33 - A 'print' token. 000045: 5B 0E 00 5C 79 80 80 83 40 34 67 75 7D 80 83 86 35 H e l l o , S a i l o r ! - A 'stringdata#' ($5B) token of 14 characters ($000E), followed by the encoded string "Hello, Sailor!" (Since this is a print statement, the text is written directly into the code instead of in the text bank.) 000056: 4C - An 'eol#' token, to signal end-of-line for the current print statement. 000057: 57 - A 'pause' token. 000058: 21 4C - A 'return' token, followed by 'eol#'. (If there is a value being returned, that expression comes between $21 and $4C. Since in this case the expression is blank, the $4C comes immediately.) 00005A: 0D 21 4C - The closing brace symbol $0D marks the end of the routine. All routines are automatically followed by a default $21 and $4C--the equivalent of "return false". ----------------------------------------------------------------- VI.b. EXPRESSIONS ----------------------------------------------------------------- Expressions are encoded as the tokenized representation of the expression. Consider the following code excerpts, assuming that global initializations have included global glob array arr[10] and, within the current routine, local loc (Assume also that 'glob' and 'loc' are the first global variable and first local variable defined.) 1. loc = 10 This is coded using the pattern <1 byte> = <2 bytes> so that the resulting code looks like: 45 F0 05 4B 0A 00 4C loc = 10 The variable number $F0 specifies the first local variable (i.e., local variable 0, where the variable number of local variable 'n' is 240+n). 2. glob = 5 * (2 + 1) Again, this is coded as a variable assignment: <1 byte> = 45 0C 05 4B 05 00 08 01 4C 02 00 07 4B 01 00 02 4C glob = 5 * ( 2 + 1 ) Since the compiler always defines a number of global variables itself, the first-defined global is never 0. If there are 12 pre-defined globals, the first user-defined global has variable number $0C. 3. arr[loc] = word[2] The pattern for this array element assignment is: [ ] = [ ] 59 F0 00 0E 45 F0 0F 05 52 0E 4B 02 00 0F 4C arr[loc] [ loc ] = word [ 2 ] (Note that word[n] is not handled the same as array[n].) 4. array[1] = random(obj.prop #2) (Assuming that 'obj' and 'prop' are the first-defined object and property, respectively.) [ ] = random ( ) 59 F0 00 0E 4B 01 00 0F = 51 arr [ 1 ] random 01 4A 00 00 03 43 06 10 4B 02 00 02 4C ( obj . prop # 2 ) 5. glob += (loc++ * arr[7]) 45 0C 07 05 01 45 F0 07 07 08 glob + = ( loc + + * 59 F0 00 0E 4B 07 00 0F 02 4C arr [ 7 ] ) 6. if loc = glob + 11 (See the section below on Code Patterns for details on how 'if' statements and other conditionals are coded.) 18 21 00 45 F0 05 45 0C 07 4B 0B 00 4C if loc = glob + 11 2 bytes give the skip distance (i.e., $0021 bytes) to the next-executed instruction if the current expression evaluates false. ----------------------------------------------------------------- VII. ENCODING TEXT ----------------------------------------------------------------- Text is written uncompressed into the .HEX file (since there is not really any need to nor any great memory savings from whatever minor compression might be practical). All text, however-- including text in 'print' statements, dictionary entries, and the text bank--is encoded by adding $14 (decimal 20) to each 8-bit ASCII value in order to prevent casual browsing of game data. Text in 'print' statements is written directly into the code in the form: <2 bytes> ...encoded string... where the length of the string is given by the first two bytes following . Text in dictionary entries is encoded in the dictionary table. A dictionary entry with a given address appears in the dictionary at address+2 (since the first two bytes in the dictionary table are reserved for the number of entries) as: <1 byte> ...encoded dictionary entry... where the maximum allowable length of a dictionary entry is 255 characters. Text written to the text bank is encoded at a given address as: <2 bytes> ...encoded text... where the length of the encoded text is given by the first two bytes. (Note that an address in the text bank requires 3 bytes in the game code, however, since the length of the text bank can exceed 64K.) ----------------------------------------------------------------- VIII. THE OBJECT TABLE ----------------------------------------------------------------- ----------------------------------------------------------------- VIII.a. OBJECTS ----------------------------------------------------------------- The object table begins with two bytes giving the total number of objects. The objects then follow in sequential order. Each object requires 24 bytes: Bytes 0 - 15 Attributes (128 bits in total, 1 bit/attribute) 16 - 17 Parent 18 - 19 Sibling 20 - 21 Child 22 - 23 Property table position The offset of any given object 'n' from the start of the object table can therefore be found using: offset = n * 24 + 2 (Pre-v2.1 objects had only 32 possible attributes, and the object size was only 12 bytes, with only 4 bytes given to the attribute array.) If a parent has no parent, sibling, and/or child, the appropriate two-byte word is set to $0000. The property table position represents the offset of the beginning of the given object's property data from the start of the property table, as described below. ----------------------------------------------------------------- VIII.b. ATTRIBUTES ----------------------------------------------------------------- The 16 bytes of the attribute array contain 8 bits each, giving a total of 128 possible attributes (in v2.1 and later; 32 in earlier versions). Essentially, if the bits are thought of sequentially in that the first byte represents attributes 0 to 7, the second byte represents attributes 8 to 15, etc. ----------------------------------------------------------------- IX. THE PROPERTY TABLE ----------------------------------------------------------------- The property table begins with two bytes giving the total number of properties. This is followed by a list of default property values, each of one 16-bit (2 byte) word each. After this, the properties themselves begin, starting with object 0. The property values are entered sequentially, with no explicit identification of what object a particular value belongs to. It is the object's object-table entry that gives the location of a given object's property data in the property table. Each property requires at least 2 bytes: Byte 0 Property number 1 Number of data words 2 - Data in 16-bit word form (2 bytes each) Property routines are given a "length" of 255 ($FF), which indicates that one word of data follows, representing the (indexed) address of the routine. At the end of each object in the property table comes the property number 255 ($FF)--not to be confused with the "length" 255, which denotes a routine address. "Property" 255 is an exception to the two-byte minimum; it does not have any attached length byte or data words. Each object has a place in the object table, even if it has no properties per se. A propertyless object simply has the value 255 at its position in the property table. (Property data being written for an .HLB linkable file is slightly altered. For example, property routines are marked by $FE instead of $FF. See the section below entitled The Linker.) ----------------------------------------------------------------- IX.a. BEFORE, AFTER, AND OTHER COMPLEX PROPERTIES ----------------------------------------------------------------- Consider the following complex property for an unspecified object: after { object DoGet { "You pick up the object." } object { "You can't do that with the object." } } (A simple explanation of the above is that .after is called following a call to a verbroutine with which was involved. If was the object of the verbroutine (i.e., the 'object' global), and the 'verbroutine' global was DoGet, the first block runs. The second block will run if no previous block has run. For a full description of complex properties, see the Hugo Manual.) First of all, the entry in the property table for .after will point to the first line of code in the property routine. Arbitrarily, let's assume this is $000044: the earliest possible code address following a blank grammar table. 000040: FF 00 00 00 45 00 48 1A 00 25 15 00 47 00 00 00 000050: 0D 00 00 00 45 00 25 18 00 47 00 16 00 0D 00 00 000060: 0D 21 29 That can be compared to the original source code as: 000044: 45 00 48 1A 00 The initial "object DoGet" block header, assuming that the engine-defined global 'object' is global variable number 0, and that the address of DoGet is $000068 (represented as an indexed address as $001A). 000049: 25 15 00 Following the 'jump' token ($25) is the indexed address to jump to if "object DoGet" isn't matched. In this case, it is $0015, which translates to the absolute address $000054 (i.e., the address of the next header). 00004C: 47 00 00 00 The label is followed by three bytes giving the address in the text bank of the printed string "You pick up the object." 000050: 0D 00 00 00 $0D signals the end of this block of executable code, followed by 00s padding to the next address boundary. 000054: 45 00 This block header is simply "object". 000056: 25 18 00 As above, following the 'jump' token ($25) is the indexed address to jump to if the block header isn't matched. In this case, it is $0018, which translates to $000060 (i.e., the closing $0D of the 'after' routine). 000059: 47 00 19 00 0D 00 00 The second line of text is printed here, followed by $0D to signal the end of this block of code and 00s padding to the next address boundary. 000060: 0D 21 29 4C A $0D signals the end of the 'after' routine. Property routines are followed by an automatic $21, $29, and $4C (i.e., "return true"). ----------------------------------------------------------------- X. THE EVENT TABLE ----------------------------------------------------------------- The event table begins with two bytes giving the total number of events. Each event requires 2 bytes: Bytes 0 - 1 Associated object (0 for a global event) 2 - 3 Address of event routine ----------------------------------------------------------------- XI. THE DICTIONARY AND SPECIAL WORDS ----------------------------------------------------------------- ----------------------------------------------------------------- XI.a. DICTIONARY ----------------------------------------------------------------- The dictionary begins with two bytes giving the total number of entries. Each dictionary entry is composed of 1 or more bytes: Byte 0 Length of entry (number of characters) 1 - Entry as an encrypted string ----------------------------------------------------------------- XI.b. SPECIAL WORDS ----------------------------------------------------------------- The special words table begins with two bytes giving the total number of entries. Each entry requires 5 bytes: Byte 0 Type (0 = synonym, 1 = removal, 2 = compound, 3 = user-defined punctuation) 1 - 2 First dictionary address 3 - 4 Second address (for synonyms and compounds) ----------------------------------------------------------------- XII. RESOURCEFILES ----------------------------------------------------------------- A resourcefile is used to store multiple images, sounds, music tracks, etc. in one manageable file format. The format of a Hugo resourcefile is fairly straightforward. Every resourcefile starts with a header of 6 bytes: 00 'R' 01 Version number (i.e., 25 for version 2.5) 02 - 03 Number of resources 04 - 05 Length of index, in bytes Following the header is the index itself. Each resource entry in the index looks like: 00 Length of entry name (i.e., 'n' bytes) 01 - n Entry name 3 bytes Offset in resourcefile from end of index 3 bytes Length of resource, in bytes Resources are then appended sequentially immediately following the index. ----------------------------------------------------------------- XIII. THE HUGO COMPILER AND HOW IT WORKS ----------------------------------------------------------------- For reference, here is a simplified map of the compiler's function calls, along with the source files in which they are located. The leftmost functions are all called from main() in hc.c. +----------------+ | ParseCommand() | - Parse command line, including filenames, | hcmisc.c | switches, and other settings +----------------+ | +----------------+ | OpenFiles() | - Open initial source file, objectfile, | hcfile.c | listing, and temporary files +----------------+ | +----------------+ +----------------------------------+ | Pass1() | | GetLine() - hcfile.c | | hcpass.c |--| | | | | CompilerDirective() - hccomp.c | [1.1] | (Definitions) | | CompilerMem() - hccomp.c | +----------------+ | AddDirectory() - hcmisc.c | | | | | | Def...() - hcdef.c | [1.2] | | | | | PrinttoAll() - hcmisc.c | | | | | | (LinkerPass1() - hclink.c) | [1.3] | +----------------------------------+ | +----------------+ +----------------------------+ | Pass2() | | GetWords() - hcfile.c | | hcpass.c |--| | | | | Build...() - hcbuild.c | [2.1] | (Build) | | | +----------------+ | (LinkerPass2() - hclink.c) | [2.2] | +----------------------------+ | | +----------------+ +--------------------------+ | Pass3() | | BuildCode() - hcbuild.c | [2.3] | hcpass.c |--+ +--------------------------+ | | | | | (Resolve/Link) | | +-----------------------+ +----------------+ | | Code...() - hccode.c | [2.4] | | Codeline() - hccode.c | | +-----------------------+ | | [2.5] | \+------------------------+ +--------------| Write...() - hcfile.c | /| WriteCode() - hcfile.c | +------------------------+ In PASS 1, the initial source file and any included files are read into one contiguous temporary file (called "allfile" in the source). Any compiler directives (i.e., lines beginning with '#', '$' or '@') are processed here [1.1], as are definitions of objects, attributes, properties, global variables, constants, and routines [1.2]. Once a line of source has been parsed and split into discrete words, it is written to "allfile" using PrinttoAll(). PASS 2 is where the bulk of compilation takes place. Lines of pre-parsed source are read from "allfile". After Pass 1, all symbols (except local variables) are known. Individual constructs such as verbs, objects, routines, and events are processed via Build...() functions (i.e., BuildVerb(), BuildObject(), etc.) [2.1]. At any point in Pass2() or later, the tokenized line currently being processed is held in the global 'word[]' array, with the number of tokens in the current line in 'words'. Sections of executable code, such as routines, events, or property routines, are generated by calling BuildCode() [2.3], which in turn calls appropriate Code...() functions as necessary (i.e., CodeDo(), CodeIf(), CodeWhile(), etc.), or simply CodeLine() for any line that doesn't require special treatment [2.4]. Compiled byte-code is emitted to the objectfile via WriteCode() [2.5]. (In a departure from the normal order of defining symbols, synonyms, compounds words, removals, and user-defined punctuation are defined in Pass2(). Local variables are defined in BuildCode().) By PASS 3, all executable code has been written to the objectfile, structures exist in memory representing to-be- constructed tables, and the text bank (long sections of printed text) exists in a temporary file. First, ResolveAddr() (from hcmisc.c) patches all references that were unknown at the time they were compiled. Pass 3 then writes the object table, the property table, the event table, the array table, synonyms/removals/compounds/user-defined punctuation, the dictionary, and the text bank. If a debuggable executable (called an .HDX file) is being generated, the last thing Pass3() does is to write the symbolic names of all objects, properties, attributes, aliases, globals, routines, events, and arrays to the end of the file. ----------------------------------------------------------------- XIII.a. COMPILE-TIME SYMBOL DATA ----------------------------------------------------------------- Here are the various structures, arrays, and variables used by the compiler to keep track of symbols at compile-time: Objects: objctr - total number of objects object[n] - symbolic name of object n object_hash[n] - hash value of symbol name objattr[n][s] - attribute set s (32 attributes/set) oprop[n] - location in propdata[] array objpropaddr[n] - location in property table parent[n] - physical parent (not ancestor) sibling[n] - physical sibling child[n] - physical child oreplace[n] - number of times replaced using the 'replace' directive Attributes: attrctr - total number of attributes attribute[n] - symbolic name of attribute n attribute_hash[n] - hash value of symbol name Properties: propctr - total number of properties property[n] - symbolic name of property n property_hash[n] - hash value of symbol name propset[p] - true if property p has been defined for current object propadd[p] - ADDITIVE_FLAG bit is true if property p is additive; COMPLEX_FLAG bit is true if property p is a complex property propdata[a][b] - array of all property data propheap - size of property table Labels: labelctr - total number of labels label[n] - symbolic name of label n label_hash[n] - hash value of symbol name laddr[n] - indexed address of label Routines: routinectr - total number of routines routine[n] - symbolic name of routine n routine_hash[n] - hash value of symbol name raddr[n] - indexed address of routine rreplace[n] - number of times replaced using the 'replace' directive Events (although not really symbols): eventctr - total number of events eventin[n] - object to which event n is attached eventaddr[n] - indexed address of event code Aliases: aliasctr - total number of aliases alias[n] symbolic name of alias n alias_hash[n] - hash value of symbol name aliasof[n] - attribute or property aliased (either the attribute number, or the property number plus MAXATTRIBUTES) Global variables: globalctr - total number of global variables global[n] - symbolic name of global n global_hash[n] - hash value of symbol name globaldef[n] - initial value of global at startup Local variables: localctr - total number of locals defined in the current code block local[n] - symbolic name of local n local_hash[n] - hash value of symbol name unused[n] - true until local n is used Constants: constctr - total number of constants constant[n] - symbolic name of constant n constant_hash[n] - hash value of symbol name constantval[n] - defined value of constant n Array: arrayctr - total number of arrays array[n] - symbolic name of array n array_hash[n] - hash value of symbol name arrayaddr[n] - location in array table arraylen[n] - length of array n arraysize - current size of array table Dictionary: dictcount - total number of dictionary entries dicttable - current size of dictionary lexentry[n] - dictionary entry n lexaddr[n] - location of entry n in dictionary table lexnext[n] - location of word following n in the lexentry[] array lexstart[c] - location of first word beginning with character c in lexentry[] lexlast[c] - location of last word beginning with character c in lexentry[] Special words: syncount - total number of synonyms, compounds, removals, and user-defined punctuation syndata[n] - synstruct structure of n The use of ..._hash[n] is a rough form of hash-table coding. The compiler, in FindHash() in hcdef.c, produces an ALMOST unique value for a given symbol based on the characters in it. Only if ..._hash[n] matches an expected value does a more expensive string comparison have to be performed to validate the "match" (or reject it). ----------------------------------------------------------------- XIII.b. THE LINKER ----------------------------------------------------------------- The compiler has to be able to both create a linkable file (called an .HLB file, as it is usually a precompiled version of the library) and read it back when a '#link' directive is encountered. In the first case, the compiler writes an .HLB file whenever the -h switch is set at invocation. In order to do that, it does the following things: 1. Property routines, normally marked by a "length" of 255, are changed to a "length" of 254. 2. All addresses are appended to the end of the file instead of being resolved in Pass3(). (Labels, being local and therefore not visible outside the .HLB file, are an exception; they are resolved as usual.) 3. Additional data (such as symbolic names) of objects and properties are written in Pass3(). Immediately following the object table, the compiler, in Pass3(), writes all the relevant data for attributes, aliases, globals, constants, routines. 4. The value "$$" is written into the ID string in the header. Reading back (i.e., linking) an .HLB file is done in two steps: LinkerPass1() [1.3], called from Pass1(), and LinkerPass2() [2.2], called from Pass2(). (The linker routines are found in the source file hclink.c.) LinkerPass1() simply skims the .HLB file for symbols and defines them accordingly, along with any relevant data. It also reads the .HLB file's text bank and writes it to the current file's temporary file containing the current text bank. Note that since linking must be done before any other definitions, there is no need to calculate offsets here for things like object numbers, addresses in the text bank, etc. LinkerPass2() is responsible for reading the actual executable code. It does this mainly with a simple read/write (in blocks of 16K or smaller). It then reads the resolve table appended to the end of the .HLB file and writes it to the current resolve table so that Pass3() can properly resolve the offset code addresses at the end of compilation. (Since the actual start of executable code will vary depending on the length of the grammar table, it is not known at the .HLB file's compile time what a given address may ultimately be. It is only known that, for example, routine R is called from position P in the source. Both R and P are must be adjusted for the offset.) In Pass3(), ResolveAddr() is now able to resolve addresses from the linked file. Additionally, those properties with a "length" of 254 are adjusted so that their values--which are really addresses of property routines--are adjusted as per the offset; the "length" of these properties is then written as 255. ----------------------------------------------------------------- XIV. THE HUGO ENGINE AND HOW IT WORKS ----------------------------------------------------------------- Here is a simple map of the main engine loop and the associated functions: +-------------+ +----------------------------+ | RunGame() |----| RunRoutine("init" routine) | | herun.c | | herun.c | +-------------+ +----------------------------+ /|\ \|/ | | | | \+----------------------------+ | +----| RunRoutine("main" routine) | MAIN EXECUTION | | /| herun.c | LOOP [1.1] | | +----------------------------+ | | | | | \ | | +-------Player input [1.2] | | / | | | \|/ | | +----------------+ | | | Parse() | [2.1] | | | heparse.c | | | +----------------+ /|\ /|\ | | | +----------------+ +---------------------------+ | | | MatchCommand() |--| MatchWord() - heparse.c | | | | heparse.c | | MatchObject() - heparse.c | | | +----------------+ +---------------------------+ | | | [2.2] [2.3] | If input | | is not-------+ | valid | | | | If input | is valid | | | \|/ | +-------------------------+ | | RunRoutine(performaddr) | [3.1] | | herun.c | | +-------------------------+ |/ | . +-----------------+ . \ . +-----------------------+ | Expression evaluator: | [4.1] | heexpr.c | | | | SetupExpr() | | | | | GetValue()--GetVal() | | | | | EvalExpr() | +-----------------------+ The functions in herun.c comprise most of the core game loop and calling points. RunGame() manages the game loop itself [1.1], which can be thought of as being: Main --> Player input --> Parsing --> Action (if valid) Player input [1.2] is the point at which the engine requests a new input line (usually from the keyboard, but possibly from another source such as a file during command playback). The Parsing section [2.1] refers to the in-engine breakdown and analysis of the input line. The input line is matched against the grammar table in MatchCommand() [2.2]--using MatchWord() and MatchObject() [2.3] to identify either individual words as specified in the grammar, or groups of words that may represent an object name. If a match is made, the appropriate globals (object, xobject, verbroutine) are set, and Perform() is called [3.1] (or, if Perform() has not been defined, the built-in substitute). RunRoutine() is the method by which any function calls are executed. At any point in RunRoutine() (or in functions called by it), the value 'mem[codeptr]' is the byte value (i.e., the token number) of the current instruction. The value of 'codeptr' advances as execution progresses. Whenever it is necessary for the engine to evaluate an expression, the expression evaluator subsystem in heexpr.c is invoked [4.1]. Here, the 'eval[]' array is initialized with the expression to be evaluated by calling SetupExpr() (which will in turn call GetValue() to sequentially retrieve the elements of the expression). The expression currently in 'eval[]' is solved by calling EvalExpr(). ----------------------------------------------------------------- XIV.a. RUNTIME SYMBOL DATA ----------------------------------------------------------------- Code execution: mem[] - loaded .HEX file image defseg - current memory segment codeseg - code segment (i.e., 0) codeptr - current code position stack_depth - current calling depth Display: pbuffer[] - print buffer for line-wrapping currentpos - current position (pixel or character) currentline - current row (line) full - counter for PromptMore() page-ending fcolor, bgcolor, - colors for foreground, background, icolor, input, and default background default_bgcolor currentfont - current font bitmask textto - if non-zero, text is printed to this array SCREENWIDTH, - maximum possible screen dimensions SCREENHEIGHT inwindow - true if in a window physical_windowwidth, - "physical" window dimensions, physical_windowheight, in pixels or characters physical_windowleft, physical_windowtop, physical_windowright, physical_windowbottom charwidth, lineheight, - for font output management FIXEDCHARWIDTH, FIXEDLINEHEIGHT, current_text_x, current_text_y Parsing: words - number of parsed words in input word[] - breakdown of input into words wd[] - breakdown of input into dictionary entries Arguments and expressions: var[] - global and local variables passlocal[] - locals passed to a routine arguments_passed - number of arguments passed ret - return value (from a routine) incdec - amount a value is being incremented or decremented Undo management: undostack[] - for saving undo information undoptr - number of operations undoable undoturn - number of operations for this turn undoinvalid - when 'undo' is invalid undorecord - true when recording undo info ----------------------------------------------------------------- XIV.b. NON-PORTABLE FUNCTIONALITY ----------------------------------------------------------------- The Hugo Engine requires a number of non-portable functions which provide the interface layer between the engine and the operating system on which it is running. These functions are: hugo_blockalloc - Large-block malloc() hugo_blockfree - Large-block free() hugo_splitpath - For splitting/combining filename/path hugo_makepath elements as per OS naming conventions hugo_getfilename - Asks the user for a filename hugo_overwrite - Verifies overwrite of a filename hugo_closefiles - fcloseall() or equivalent hugo_getkey - getch() or equivalent hugo_getline - Keyboard line input hugo_waitforkey - Cycles while waiting for a keypress hugo_iskeywaiting - Reports if a keypress is waiting hugo_timewait - Waits for 1/n seconds hugo_init_screen - Performs necessary screen init. hugo_hasgraphics - Returns graphics availability hugo_setgametitle - Sets title of window/screen hugo_cleanup_screen - Performs necessary screen cleanup hugo_clearfullscreen - Clears entire display area hugo_clearwindow - Clears currently defined window hugo_settextmode - Performs necessary text init. hugo_settextwindow - Defines window in display area hugo_settextpos - Sets cursor/text-output position hugo_scrollwindowup - Scrolls currently defined window hugo_font - Sets font for text output hugo_settextcolor - Sets foreground color for text hugo_setbackcolor - Sets background color for text hugo_color - Returns a valid color reference hugo_print - Outputs formatted text hugo_charwidth - Returns width of a given character hugo_textwidth - Returns width of a given string hugo_strlen - strlen() accommodating embedded codes hugo_specialchar - Translation for special characters For elaboration of the intent and implementation of these functions, see heblank.c in the standard source distribution (hugov*_source.tar.gz), or one of the implementations such as hewin.c (in hugov*_win32_source.zip, the Windows 9x/NT source package), hedjgpp.c (in hugov*_32bit_source.zip, the 32-bit DOS package), etc. ----------------------------------------------------------------- XIV.c. SAVEFILE FORMAT ----------------------------------------------------------------- Hugo saves the game state by (among other things) saving the volatile memory from start of the object table to the start of the text bank (i.e., including objects, properties, array data, and the dictionary). It does this, however, in a format that only notes if the data has changed from its initial state. The structure of a Hugo savefile looks like this: 0000 - 0001 ID (assigned by compiler at compile-time) 0002 - 0009 Serial number 000A - 0209 All variables (global and local, 256*2 bytes) 020A - 0C09 Undo data (256*5*2 bytes, assuming a MAXUNDO depth of 256 operations) 0C0A - 0C0B undoptr 0C0C - 0C0D undoturn 0C0E undoinvalid 0C0F undorecord 0C10 - Object table to text bank (see below) In saving from the object table up to the start of the text bank, the engine performs a comparison of the original gamefile against in-memory data (which may have changed). If a given byte 'n' is non-zero, it represents that the next 'n' sequential bytes are identical between the gamefile and the memory image being saved. If 'n' is 0, the byte n+1 gives the value from the memory image. (Although it takes 2 bytes to represent a single changed byte, the position within both the gamefile and the memory image only increases by 1.) The practical implementation of the Hugo savefile format is found in RunSave() and RunRestore() in herun.c. ----------------------------------------------------------------- XV. DARK SECRETS OF THE HUGO DEBUGGER ----------------------------------------------------------------- The Hugo Debugger is basically a modified build of the Hugo Engine; the two share the same core code for program execution, but the debugger wraps it in a calling framework that allows the user (or the debugger itself) to control--i.e., start, stop, or step through--execution. The key difference with the debugger build of the engine is in RunRoutine(), which in the debugger looks more like this: ... | | +--------------+ +------------+ | RunRoutine() |---->| Debugger() | (if debugger_interrupt | herun.c | | hd.c | is non-false) +--------------+ +------------+ | | ... The debugger build contains a global flag called debugger_interrupt; if this flag is non-false, RunRoutine() is interrupted before executing the next instruction. The Debugger() function is responsible for switching to and updating the debugger display. Debugger() is also the hub for any debugger functions initiated by the user, such as setting breakpoints, setting watch expressions, changing values, moving objects, etc. The debugger controls program execution by returning from Debugger()to RunRoutine(). If debugger_interrupt is true, only the current instruction will execute, then control will pass back to Debugger() (i.e., stepping). In order to resume free execution, Debugger() returns with debugger_interrupt set to false. A number of other variables in the debugger influence program execution in addition to debugger_interrupt: debugger_run - true when engine is running freely debugger_collapsing - true when collapsing the call debugger_step_over - true if stepping over (i.e., same- level stepping) debugger_skip - true if skipping next instruction debugger_finish - true if finishing current routine debugger_step_back - true if stepping backward step_nest - for stepping over nested calls (i.e., with debugger_step_over) ----------------------------------------------------------------- XV.a. DEBUGGER EXPRESSION EVALUATION ----------------------------------------------------------------- The debugger must evaluate expressions in several contexts, including when solving watch expressions and when changing an existing value. (In-debugger expression management is contained primarily in hdval.c.) In order to do this, the debugger includes a minimal version of the compiler's expression parser. It parses a user-supplied expression in the function ParseExpression(). What ParseExpression() does is to essentially compile that expression, storing the result in the debug workspace in the array table. (The address of the debug workspace--256 bytes after any user- defined array storage--is found in the header in .HDX files.) After writing the expression, the debugger can then set codeptr to the start of the debug workspace, then call the engine's SetupExpr() and EvalExpr() functions as it would to evaluate any other expression. ----------------------------------------------------------------- XV.b. THE .HDX FILE FORMAT ----------------------------------------------------------------- The .HDX file format for Hugo debuggable executables, as well as having some additional information in the header (see the section above on The Header) and a 256 byte workspace reserved at the end of the array table, appends symbolic debugging data as follows: Object names - For each object: 1 byte giving the length, followed by the name as a string # of properties - 2 bytes Property names - For each property: 1 byte (length), then the name # of attributes - 2 bytes Attribute names - For each attribute: 1 byte (length), then the name # of aliases - 2 bytes Alias names - For each alias: 1 byte (length), then the name, then two bytes for the association # of routines - 2 bytes Routine names - For each routine: 1 byte (length), then the name # of events - 2 bytes Event data - 4 bytes for each--2 bytes for the parent; 2 bytes for the address # of arrays - 2 bytes Array data - For each array: 1 byte for the name length, followed by the name, followed by 2 bytes for the address (Note that it isn't necessary to store the total number of objects, since that is already available at the start of the normal object table.) ----------------------------------------------------------------- APPENDIX A: CODE PATTERNS ----------------------------------------------------------------- What follows is a detailed breakdown of how the set of valid tokens in Hugo is encoded and read within compiled code. Tokens simply marked TOKEN are coded just as the byte value of the token in question; no other formatting or necessary token/value is required to follow. These are typically used for delimitation, signalling the end of a structure or structure component, etc. STATEMENTS are those tokens that are read by the engine as some sort of operation--typically, these are "start of line" tokens, with some exceptions. VALUES return an integer value to the engine within the context of an expression. See Section III.b on Data Types, which describes all the valid types of values. INTERNAL tokens never appear in source code. These are added by the compiler for use by the engine. A "code block" is any executable statement(s) followed by a terminating $0D ('}'). Constructions may include "expressions" or "values"; the difference between the two is that values are expected to be discrete data types. Note also that GetVal() in "heexpr.c" allows a solvable expression bracketed by $01 ('(') and $02 (')') to be treated as a discrete value. "Source:" references point to places in the Hugo C source code that may help to clarify how a particular construction is coded/interpreted. While not specifically mentioned, the compiling of many tokens is localized in CodeLine() in "hccode.c", and the execution of many simple statements is localized in RunRoutine() in "herun.c". The reading of values from data types or expressions begins with GetValue() in "heexpr.c", with the basic identification of values in GetVal(). 01 ( TOKEN 02 ) TOKEN 03 . TOKEN 04 : reserved (not coded) 05 = TOKEN 06 - TOKEN 07 + TOKEN 08 * TOKEN 09 / TOKEN 0A | TOKEN 0B ; TOKEN 0C { TOKEN 0D } TOKEN (Signifies the end of a code block) 0E [ TOKEN 0F ] TOKEN 10 # TOKEN 11 ~ TOKEN 12 >= TOKEN 13 <= TOKEN 14 ~= TOKEN 15 & TOKEN 16 > TOKEN 17 < TOKEN 18 if STATEMENT 18 4C As in: if {...} Where the two bytes of are the absolute distance--in low-byte/high-byte order-- from the first byte of the pair to the next line of code that will execute if evaluates to false, i.e., the distance to . If evaluates to a non- false value, is run. Note that $4C indicates end-of-line. is simply a TOKENized representation of the expression as it appears in the source line. Source: hccode.c - CodeIf() herun.c - RunIf() 19 , TOKEN 1A else STATEMENT 1A As in: else {...} Where runs only if no immediately preceding if or elseif condition has been met. If a previous condition has been met, control passes ahead to , i.e., forward the number of bytes given by the two bytes of . Source: hccode.c - CodeLine() herun.c - RunIf() 1B elseif STATEMENT 1B 4C As in: elseif {...} See 'if'. Source: hccode.c - CodeIf() herun.c - RunIf() 1C while STATEMENT : 1C 4C 25 As in: while {...} As long as evaluates to a non-false value, is run. Note the implicit 'jump' ($25) coded by the compiler to maintain the loop-- is only an address; only the two-byte address following $25 is written as a jump-back point. See 'if'. (NOTE: Because the is written as a two-byte indexed address, it must begin on an address boundary, padded with up to three $00 values, if necessary.) Source: hccode.c - CodeWhile() herun.c - RunIf() 1D do STATEMENT 1D : 1C 4C As in: do {...} while If, after executes, evaluates to a non-false value, the engine returns to (which must begin on an address boundary). The two bytes following 'while' ($1C) match the syntax of the normal WHILE loop, but are undefined for this usage. Instead, the distance to the next statement is given after the 'do' token ($1D) in the two bytes of . Source: hccode.c - CodeDo() herun.c - RunDo() 1E select STATEMENT 1E When encountered by the engine, resets the conditional-statement evaluator, i.e., so that the next 'case' conditional is treated as an 'if' instead of an 'elseif'. Note that the variable that follows 'select' in a line of source code is not coded here (but it is needed by the compiler to construct subsequent 'case' statements). See 'case'. Source: hccode.c - CodeSelect() herun.c - RunIf() 1F case STATEMENT Treated identically by the engine to 'elseif' once a 'select' token ($1E) has reset the conditional- statement evaluator to no previous matches. In other words, what the compiler does is take select case case ... case else and restructure it into 1F 05 4C 1F 05 4C 1A Note that $1A is the 'else' token, $05 is the '=' token, and that the two bytes of give the distance to the next 'case'. Source: hccode.c - CodeSelect() herun.c - RunIf() 20 for STATEMENT : 20 4C 25 As in: for (; ; ) {...} The , if given in the source code, is coded as a regular executable assignment of some data type. Again, nothing is explicitly coded at --it is simply a reference point for the 'jump' ($25) to return to. The 'for' ($20) line operates as a regular conditional test (see 'if'). The is appended after the conditional block is coded. This, like the is simply a regular executable assignment. Source: hccode.c - CodeFor() herun.c - RunIf() 21 return STATEMENT 21 4C As in "return ". Where is optional, so that a standalone 'return' order can be coded as: 21 4C 22 break STATEMENT 22 23 and TOKEN 24 or TOKEN 25 jump STATEMENT 25
As in: "jump