Throughout this documentation, written text may have several meanings which must be carefully separated. In order to do so, some graphical conventions will be used:
The vertical bar ( | ) will denote a choice to be made among a finite number of options, like in for|if|while.
The type of an atomic entity is defined by a basic type and, optionally, by a validation function. Nonatomic types are defined by the way other types are grouped together.
The basic built-in atomic types do not need declaration. They are:
Note that no reference to actual size is being made. The programmer must know that he can use 32-, 64- or 128-bit integers, but the source needs not to.
Nonatomic types group together other types in various ways. The supported layouts are:
The type string is a shorthand for sequence of char. The type fixedstring(number) is a shorthand for array(number) of char.
A user-defined type is a refinement of one of the above types, defined through a validation function. Such a function is defined using the keyword type or reftype as a routine type. It may have side effects, and must return an atomic value.
A variable
x
has the user-defined typemytype
if:
- a routine called
mytype
can be called and is declared using the keyword type or reftype;mytype(x)
is a non-zero atom. or- a record of the type
mytype
had been declared andx
has exactly as many fields as listed in the declaration ofmytype
, and each field ofx
has the expected type.
Please note that variables carry user-defined type information, but values don't. This subtle difference will surface in 5.3.2 (assigning elements to nonatoms and tracking their types).
The type of a member of a record may be specified using the context information of the other members of the record.
To do this, the member must be declared as "_" in the record declaration. Later, the type declaration has the following form:
[global ]type|reftype record_name.member_name(parameters)
...
end type|reftype
parameters has one or two parameters (see 8.1 below).
To refer to members of the structure, use the syntax .membername.Example:
record stringWithIndex(
string s,_ index)
--type of this member will be defined later as StringWithIndex.index
end record
...some possibly unrelated code ...
type stringWithIndex.index(integer i)
return i>=0 and i<=length(.s)
end type
A reftype might have been used as well.
When a variable is going to be assigned a value, it may be checked that the variable can hold such a value. To determine this, the type or reftype function attached to the variable is called, and the check succeeds if it returns True. A failed type check causes an exception The handler for this exception may take corrective action or juts let the running program abort.
When the "with typecheck" directive is in force, this process is performed on every assignment. As this impacts performance, systematic type checking may be turned off. For obvious reasons, type checking will still take place at a few implementation specific places.
When systematic type checking is turned off, you may wish to keep some control over which type checks are performed, because you need the checking - for instance, when the type functions have side effects.
To this end, you can add the keyword check as a prefix to a (ref)type. Types thus earmarked are always checked, even when type checking is off.
You can further restrict checking to some variables by creating two twin types, one without the check prefix and one wrapping the latter, but with the check prefix.
You can further fine tune the checking by having user defined types that return True (check passed) unless some condition is met, in which case some real action takes place instead.
You can give alternate names to types. This may enhance code readibility, as the same type may have several interpretations in the same program. But it is mostly needed to call type checking functions for types with a compound name, like sequence of integer. You can do this using the statement:
type | reftype altname is aliased
This creates an alias and a function. The alias altname can be used wherever aliased could be used. The function altname() function checks the appartenance to the type aliased; it is a type or a reftype according to the statement used.
An identifier is a string of consecutive letters, digits and underscores, strting with a letter. What a letter precisely means depends on implementation, but always includes the ranges 'a'-'z' and 'A'-'Z'.
Lower and upper case letters are different; contrary to some other languages, OpenEuphoria is case sensitive, which means that the exact spelling of an identifier is taken into account.
So, "var" and VaR" are different, valid identifiers, while "_top" or "2read" are not valid. "Ý3z2" may or may not be valid, according to implementation specific rules.
A quoted character is a single (double-byte) character inside simple quotes, or an escape sequence, also inside simple quotes. Supported escape sequences are:
\' simple quote
\" double quote
\n newline (ASCII 10) (you cannot use \N)
\r carriage return (ASCII 13)
\t tab (ASCII 9)
\\ backslash
\(number) the character with ASCII/Unicode code number.
Text is anything between double quotes. It is not processed at all, except for escape sequence resolution.
Verbatim text is enclosed between matching groups of three consecutive double quotes ("""). In verbatim mode, any character, including whitespace, is considered as part of the string. There is no special meaning for the backslash character, and there is no escape sequence processing as a result.
There is also an intermediate long text mode. Strings in this mode are enclosed between matching $" and "$. line_end characters '\r' and '\n' are ignored in long text mode, but other characters, including escape sequences, are treated as in normal text mode.
Example 1:
"This is a very long string which "&
"has to be broken for readibility reasons."could be written as:
$"This is a very long string which
"has to be broken for readibility reasons."$Example 2:
"This is a very long string which \n"& "has to be broken for readibility reasons."
could be written as:
"""This is a very long string which
has to be broken for readibility reasons."""
They fall into three categories:
The number must be in decimal digits.
Internally, OpenEuphoria performs as many automatic conversions as it can, taking advantage of available hardware, to minimize memory usage by numerical items, while retaining the precision of these numbers.
Additionally, underscores may be freely used inside numbers to enhance readibility. An underscore is not a real digit since it must not start a number.
Support for fractional numbers, which allow exact computations using the four elementary operations, is to be included.
Comments may appear at the end of any physical line of a source file. If the line was empty, it may start the line.
A comment starts by the characters "--" and extends to the physical end of line (the next line_end).
OpenEuphoria does not process comments in any way. The facility is provided in order to document your code so that others, or possibly yourself, find understanding the code a relatively easy task, so that it can be maintained or upgraded fairly easily. Time spent commenting code will often bring a large reward in terms of cuts in maintainance and debugging time, if nothing else.
Precise, concise, relevant, useful commenting is an obscure art that may make the difference between ordinary and outstanding coders.
They are defined by the use of infix or prefix operators, as opposed to routine calls, which use prefix identifiers acting on a list of arguments enclosed between parentheses.
Remember that an infix notation is one that goes in between its operands (like the usual multiplication), while a prefix notation appears before its operands.
They are:
+ addition of numbers
- substraction of numbers
* multiplication of numbers
/ division of numbers
& concatenation of sequences
&& bitwise and
|| bitwise or
^ binary inversion
~~ bitwise xor
<< binary left shift
>> binary right shift
>>> binary signed right shift
If the left operand of one of the above operator is not atomic while the right operand is, the operation will be performed on each of the elements of the left operand. So, adding 1 to an array means adding 1 to each array element.
If both operands are nonatomic and have the same length, the operation is performed on each pair of matching elements in turn. So, {3,5}+{2,-4}={3+2,5-4}.
Contrary to Euphoria, this scheme does not automatically extend to logical operators. The "with seq_compat" directive turns the legacy behaviour on and off at will.
When more than two operators appear in a row, without parentheses separating them, there is a choice to be made: which operation to perform first? This is an important question, since the results may differ.
There is a predefined set of rules to help OpenEuphoria interpreter make a reasonable guess. The rules may be overridden using parentheses to force another evaluation order.
Here is the chart of operator precedence:
highest precedence: routine calls
unary +/- bit-level operations * / lowest precedence: &
Routine calls are evaluated first, then parenthesized expressions, starting at the deepest nesting level of them. The order of evaluation of items of the same precedence is undefined.
Thus, for instance, 3+2*4 is the same as 3+(2*4). To perform the addition first, code (3+2)*4.
Also, if the function f sets x to 3 whatever its argument, x+f(x) is 3+3=6 regardless of what x is.
The construct {{expression}} creates an object of non atomic datatype whose first element is the first expression in the list and so on. {} denotes an empty nonatomic object.
For this purpose, records are ordered in the way their elements were declared in the record definition of their type.
"" is equivalent to an empty string.
Single elements of nonatomic objects (or nonatoms in the sequel) are
accessed
using an index enclosed between square brackets, as in: ThisList
[4]. Note that
any nonatom, even the returned value from a function, may be indexed.
Records have named parts, or members, which are used to access them
using the syntax: record name.member name, like in: ThisCustomer.name
.
Since record fields are declared in an ordered way, records also support indexed accessing: the index "n" then refers to the n-th field in the declaring enumeration.
Indexes may be negative, in which case the elements are counted
backwards.
So, ThisList
[-1] is the last element of ThisList
, ThisList
[-2] the second last
and so on.
You can use floating point numbers as indexes. They are rounded to the next integer downward before any further processing. So, s[-0.3] is s[-1].
0 is never a valid standalone index. See section 3.5 below for valid uses of 0 in index specifications.
Indexes whose absolute value are greater than one plus the length of the container they index always cause an exception. 0 and +/-(length(container)+1) are only allowed when specifying an empty slice (see 3.5.2 below).
Accessing several elements in a row is possible, and is done through slices. A slice is a comma separated list of indexes and ranges. A range is specified as lower..upper, where lower and upper are the lower and upper desired index values. Obviously, the latter is not less than the former, after conversion to positive standard indexes.
So, the statement:
NewList=ThisList[3,1..-4,-2,3..6]
generates a list formed of elements of ThisList
, in the following
way:
NewList[1]
isThisList[3]
NewList[2]
isThisList[1]
NewList[3]
isThisList[2]
...
NewList[-5]
isThisList[-2]
NewList[-4]
isThisList[3]
...
NewList[-1]
isThisList[6]
Of course, if an element of a non-atom is a non-atom itself, several
square
bracketed index specifications may follow one another, like in: mymatrix[1..3][4]
. This is a sequence of length three, exactly
{mymatrix[1][4],mymatrix[2][4],mymatrix[3][4]
}.
For records, use names rather than indexes, even though they are just as valid ways to access record parts. For instance, the following
CustList[27..41][name,zipcode,nbOrders]
will generate a sequence of data extracted from a 15-element subsequence of
CustList
starting with the 27-th. We assumed that CustList is a sequence of
Customer
s, which are records the declaration of which involves members named
name
, nbOrders
and zipcode
. The statement above generates a sequence
since the type of all its elements are the same. Each of its element is a
sequence (of object) of length 3, since it is quite likely formed by a string,
another string and an integer.
name
has a rank in the enumeration of fields that build the
Customer
type.
If that rank is 3, you could code
CustList[27..41][3,zipcode,nbOrders]
with exactly the same meaning as above. However, if that rank changes in future versions of your program, the "3" index will have to be changed to its new value, while the field name would remain the same. This is why using names is recommended over using indexes when possible.
[..] and [] are shorthands for [1..-1]. [n..] is a shorthand for [n..-1]. The word end may be used as the last element of a sequence, synonym with -1 in this context only. The same is to be said of the $ sign.
You may specify empty slices when they are made of ranges the upper value of which is exactly one less than the lower value. One of the index values must be valid though.
Thus, s[2..1], s[4..-5] and s[1..0] are all empty objects, assuming length(s)=7 so that -5 reads as 3. The last example is the only valid use of 0 in indexes; any other situation causes an exception. In the same vein, s[2..3,1..0] has length 2, while s[3..2,1..0] is empty.
But s[13..12] causes an exception, since both index values are way out of range.
end and $ are again synonyms for "the last element of", or -1.
If s
has nonatomic type and if t
has the format described below, s
[[t
]]
is a
valid syntax for a part of s
with variable index depth. This specially handy for
tree management. s
[[t
]] is s[t[1]][t[2]]...[t[length(t)]].
Each t[i] must be a sequence made of atoms and sequences of length 1 or 2. Atoms are converted to sequences of length 1, and both specify single indexes. Sequences of length 2 stand for slices in an obvious lower..upper way.
For instance, assume t={2,{-1,{3,4}},{{1},4}}. Then
s[[t]]=s[2][-1,3..4][1..4]
This sequence has three nonatomic elements, each of which is of length 4. Each of them consists of the 4 first elements of elements of s[2]. The first of these elements is the last of s[2]; the second and third are respectively third and fourth in s[2]. Note that the last {1} could be written 1 as well.
The sequence t
is said to be a subexpression representation for s[[t]]
relative to s
.
Additionally, note that a list of indexes may come from a sequence through desequencing (see 5.5.2 below), so that s
[#(t
)] stands for s
[t
[1],t
[2],...,t
[-1] ].
It is always bnecessary, once nonatoms are created and populated. While these manipulations can be done through a limited set of operations and routines stored in external files, this implies loss of performance and frequent reinventing of the wheel. For these reasons, OpenEuphoria provides quite a few built-in handling routines for nonatoms.
Functions are provided in order to know how many, and which, elements are in the nonatom:
If what is an atom, match returns as find would.
i
such that
some slice of target starting at i
equals what. Always returns the empty sequence if what is the empty sequence.
Elements may be added to nonatoms as single objects or sequences, at any position in the sequence.
Here is a list of available routines:
s1
and s2
are nonatoms, s1
& s2
is a sequence of length the sum of the lengths of s1
and
s2
. The length(s1
) first elements of s1
& s2
are those of s1
, followed by those of s2
.
s
,x
)
x
to s
as its last element.
Its length is length(s
)+1 whatever x
is.
s
,x
)
x
to s
as its first element.
Its length is length(s
)+1 whatever x
is.
Three functions are provided:
remove(target,places) returns the sequence target from which the elements whose index belongs to the sequence of integers places were removed, notionally starting from the last. places is assumed to be sorted in ascending order; strange, but sometimes desired results might happen otherwise.
replace(target,places,items) returns a nonatom of the same type as target. It is obtained by removing from target the slices specified in places and replacing the carved out slices by elements of the sequence of nonatoms items, inserted as sequences. Each element of the sequence places is a pair of integers, the lower and upper bounds of each slice to process. items must have the same length as places, as each slice specified in places is replaced by the element at the same position in items.
If places has the form {i1
,i2
}, where
i1
and i2
are integers, this is converted to
{{i1
,i2
}} first. If places is just an integer i
, this is changed to
{{i
,i
}}.
The call fit(target,source,padding) causes source to be copied to target even though the lengths may not match. In tis case, an ordinary assignment would have raised an error.
If length(target)<=length(source), only the portion of source that fits into target is copied, effectively discarding elements of source with higher indexes.
Otherwise, if padding is a char, the elements of target in excess relative to source's length are replaced by that char. padding may have the special value _, in which case these elements of target remain unchanged.
Nonatoms are orderes sets of elements; so, they can be reordered. As the number of permutations on a given number of symbols rapidly oncreases with that number, it is neither practical nor efficient to directly specify a permutation of a nonatom. However, the following functions cover the most frequent cases and can be combined into any sort of shuffling.
Conditions are made of clauses linked together by logicals. A condition must evaluate to a boolean value of True or False. 0 stands for False, any other atom stands for True.
A truth table is a table assigning a boolean return value to any couple of booleans. To draw truth tables esaily, we'll represent True by T and False by F.
! F ! T ! ---+---+---! F ! F ! F ! Read: "and" returns False, except when both arguments are True. ---+---+---! In that case only, it returns true. T ! F ! T ! -----------!
! F ! T ! ---+---+---! F ! F ! T ! Read: "or" returns True, except when both arguments are False. ---+---+---! In that case only, it returns False. T ! T ! T ! -----------!
! ! ---+---! F ! T ! Read: "not" returns True if its argument is False, and False ---+---! otherwise. T ! F ! -------!
! F ! T ! ---+---+---! F ! T ! F ! Read: "=" returns True when its operands have the same boolean ---+---+---! value; else it returns False. This is the truth table of the T ! F ! T ! "xor" logical operator, which is not ssupported for this reason. -----------!
From close inspection of the tables above, it follows that you need not always compute both arguments of a logical to know its return value; computing the first one is often enough.
This saves useless instruction execution, and may greatly simplify programming. The short-circuit rules are:
- The second argument of "and" is computed if and only if the first argument is True.
- The second argument of "or" is computed if and only if the first one is False.
Note that short-circuit evaluation applies to any use of logicals. This is not true in Euphoria, where it only applies inside the conditions of if, elsif and while statements. The "with RDS" directive turns compatibility mode on and off in his respect as well.
Assume Address
is a record type that has a member called name
, and
that
addrbook
is a sequence of Address
. Then
i=1
while i<=length(addrbook) and addrbook[i].name!="myname" do
i+=1
end while
if i>length(addrbook) then i=0 end if
will scan the address book for a record whose name
member is equal to "myname".
A value of 0 stands for name not found; else i
holds the ordinal of the first
occurrence of "myname" in a member of a record in addrbook
.
Without short-circuit evaluation, this code would fail if "myname" is
not
found, because addrbook[length(addrbook)+1]
would be evaluated, causing an
exception. In such a case, the code would be something like:
found=0
for i=1 to length(addrkook) do
if addrbook.name="myname" then
found=i exit
end if
end for
So, an extra state variable is needed: even if i is available after
the
end for statement, a maximal value for i may mean that "myname" appeared
as the last name or did not appear at all. The found
variable is 0 on
failure, and else means as above. And what if there was no exit statement?
As routine calls are resolved first, they may affect the variables appearing in a condition.
Further, it may be desirable to record the value of expressions that appeared inside conditions. Because of short-circuit evaluation capabilities of OpenEuphoria, it is not always possible to compute the expressions prior to the condition evaluation, as this may raise exceptions.
To address this situation, you can embed assignments in conditions, using the := form of the assignment operator.
So:
if f0(a)=x:=f(b) and b=y:=g(a) then ...
will result in the following:
- x will always hold the value of f(a), regardless of what happens next.
The =f(a) assignment might have been taken out of the if statement, for better readibility.
- if f0(a)=f(b), y will hold the value of g(a) at the time it was computed,
thus taking into account the possible side effects of f and f0. It is not modified otherwise.
An obvious use of this feature is to know why an if block was entered or not in the case of several clauses in the condition.
Variables are tags that identify data the program where they appear will act upon. These tags are general_identifiers.
A variable has a number of attrib=utes, or attached data that can be retrieved. They are called metadata, and are retrieved using the construct general_identifier@meta.
The available metadata are:
- name
- x@name is "x". Seems redundant, but see 5.4 below.
- assigned
- x@assigned is False if x never was assigned a value, True otherwise.
- value
- x@value holds the contents of x. Valid only if x@assigned is True.
- size
- the number of bytes x occupies in memory. This is mainly useful for interfacing with other languages.
- type
- the routine id of the type checking routine assigned to the variable when it was declared.
- deftype
- the routine id of the common type of all elements of a nonatom. Available for nonatoms only.
- id
- an integer you can use as an alternate way to access x (see 5.4.
- scope
- a value that tells in which part of the program the symbol is defined. See 5.2 below.
- format
- a default format used to display the value of the variable. See format string specification in the entry for printf() in part C.
- decl_mode
- this is True if the variable was declared using new_var(), and False otherwise.
- readonly
- roNo for variables, roYes for locked variables, roConst for constants.
- types
- available for nonatoms only. It is a sequence of integers the length of the nonatom. Each integer is the routine_id of the type function of the matching element.
Only the value and format metadata can be directlly changed; the other attributes are read-only, or can be changed only through dedicated routines.
A record of all metadata a single symbol has can be retrieved using the get_meta function. The record has the reserved type SystemVarMeta and has elements with the names and indexes as in the list above. The argument of get_meta is either a double quooted variable name, or an expression evaluating to the id of the variable the metadata of which are requested.
Formats are specified like for printf() use. See the entry for this function in the alphabetical part C. The @format is used only if it has another value than "", which it has by default.
A program is made of a main file (the one you feed the interpreter with) and zero or more auxiliary files. Named scopes inside files may exist (see Chapter 6, "Included files and namespaces".). Both are referred to as abstract files.
A symbol can be visible:
- from more than one abstract file in the program
- from the abstract file it lives in only
- from only part of a single file
As a result, the scope metadata has three possible values: sGlobal, sPublic and sPrivate, respectively.
Symbols that have a different scope coexist together. But, at any given time, only one of them is referred by the name they share. This symbol is said to shadow the others.
Private symbols shadow public symbols, and public symbols shadow global symbols without namespace.
Clash between two symbols sharing the same name and both visible at some point is an error condition, since the interpreter does not know which one the general_identifier designates. Obviously, the error occurs only when the ambiguous symbol is used.
The word "symbol" is purposely used here instead of "variable", because the notions above also apply to routines (see Chapter 8 "Routines").
Types in OpenEuphoria describe logical properties of values a variable may hold. There are four ways to declare a variable, and all but one require an explicit typr:
- declaration in a var-decl statement;
- declaration by on-the-fly creatiion;
- declaration as formal parameter of a routine;
- declaration as a for loop index
Only in the last case explicit typing is absent. But, from the values the three parameters of a for loop have, an integer or atom type is guaranteed.
Formally, there are four sorts of types in OpenEuphoria:
Nonatoms rely on a default type, which is their deftype metadata. Elements of nonatoms may have any type, but they should pass the type checking thus defined. They are registered ashaving this default type.
The programmer always has the option to specify the type of an element in a nonatom using the cast primitive. When this happens, a type check of the current element using the supplied type is performed, again regardless of current type checking status.
The cast primitive has the following form:
cast(general_identifier,index,type)
This a procedure call which acts on the nonatom specified as first argument. It sets the type information for element container[index] to type. type is either a type name or the routine id of the type function.
A variable must be declared before being used. There are no exception to this principle but "for" loop indexes.
A variable definition takes the following form:
[global |static ]type {identifier[=value] }
, a type name followed by one or more items. These items are either variable names or name=value initialized variables.
The variable's initial value is computed before the variable is created. This allows an identifier to shadow another one while retrieving the shadowed value at initialization time.
The optional global keyword makes the variable(s) visible outside of their current abstract file, giving them a scope metadata of sGlobal. It is not allowed for private routines.
The optional static keyword applies to routine private variables. It makes their values persist between invocations of the routine.
A declaration may appear in any place outside routines or blocks.
Declarations in routines must be grouped right after the routine definition, as in:
function deloddnumbers(sequence of integer s)
integer i=1
sequence s0
--you can't move any of the two lines above past here.
s0=remainder(s,2)
while i<=length(s) doif s[i]=1 then s=remove(s,i)end while
else i+=1 --it is easy to forget, but definitely necessary...
end if
return s0
end function
Additionally, since routine variables are private, you cannot declare them as global.
Section 5.4 below will show you how to relax the restrictions above.
Constants are identifiers that are assigned a value at initialization time. That value cannot change hereafter. Using constants instead of hard-coded repeated values is recommended for two reasons at least:
VAT_rate
may look more self-explanatory, when looking at the program souce for maintainance or debugging, than say 0.0825;
Declaring a constant takes the following form, quite similar to a variable declaration:
[global ]constant {[type ]name=value}Indeed, you can declare any number of constants in a single statement.
This statement may appear everywhere a variable declaration is allowed. Contrary to variables, the typze secification is optional, a type of object being assumed if it is not present.
It may happen that a constant is declared with some value even though a constant with the same name and the same value is visible. In this case, the duplicate declarations are ignored; Euphoria throws an error in this situation. Note that a constant defined inside a routine cannot be global.
Attempting to modify the value of a constant wil raise an exception. There is no way to change the value of a constant using only OpenEuporia statements.
Rather than being referred to by its name, a symbol can be accessed through its id metadata. Routines will have routine_id's (see Chapter 8), and variables have variable_id's. One may consider that all variables are named elements of a large sequence, and the variable_id's are indexes into this sequence.
When a variable is destroyed in any fashion, mainly because it is a private, nonstatic variable of a returning routine, its id is not recycled. This guarantees that an id always refer to the same variable or to no variable at all, which will cause an error on assignment.
The built-in function isvarid takes an integer and returns True if this integer is the id of a variable and False else.
Individual elements of nonatoms have variable ids, so that "s[3][5]@id" makes sense and returns an integer you can use as shown below. The id "follows" the element it tags during the transformations of the host array/sequence, so that the returned id may well give you the contents of s[2][7] if some elements were added or removed from s[3] or s.
Five routines are provided to handle variables through their id's:
- id(name) returns the id of the variable whose name name evaluates to. -1 is returned if no such variable exists.
- get_var(id) returns the value of the variable with that id.
- set_var(id,value) sets the value of the variable with that id to value.
- var(id) returns the name of the variable with that id. For elements of nonatoms, their name is returned, or "" if none is applicable.
- analyze_id(id) returns a sequence of object of length 3. The first term is the variable name, or element name, or "" for unnamed nonatom elements. The second element is the index of the element if applicable, or 0 else. The third element is the id of the parent if applicable, or id itself otherqise.
Recursively calling analyze_id will yield the index sequence by which you can access the element with id id deeply nested in a nonatom.
Note that set_var will fail if the symbol with this id is not to be written to ( var(id)@readonly != roNo ).
Example: assume you have a variable named balance
. Its value must be assigned
to the variable credit
if it is nonnegative, and to the variable debit
if it
is less than 0. You also want to print a message reflecting what has just been
done. The printing format of credit
s may not be the same as for debit
s.
A simple solution can be devised using the tools above:
baltype={var_id(credit),var_id(debit)}
...
b_id=baltype[1+(balance<0)]
set_var(b_id,balance)
msg=sprintf("Your %s is " & var(x)@format,{var(x),get_var(b_id)})
In Euphoria (2.4 and before), you'd have to explicitly write an if statement to perform this admittedly simple task.
Also note that, since variable id's are global, they can be used to access shadowed symbols or static private variables.
It may be useful to create variables in other places than in variable declarations, specially inside routines. This can be done as follows:
id=new_var(type,name,_)
id=new_var(type,name,value)
This is equivalent to saying in the proper place type name" or type name=value", and gives you the id for this variable. Note the use of the anonymous placeholder '_' when no initial value is provided.
Also note that variables can be created conditionally using this mechanism. You cannot new_var() a global or static variable. A variable declared in this way is private if it is inside a routine and just public else.
Creating a symbol clashing with an existing one, or accessing an id that does not exist, are error conditions, as might be expected.
As new_var() is primarily intended to create temporary variables, you may remove them once their short life span is over. This can be done as
del_var(id)
For obvious reasons, there are limitations to use such a tool:
id=var(id)@id
.If the general_identifier of a variable appears on the left side of an assignment symbol (see 5.5.2 below), its value will be subject to change:
- the righthand side of the assignment symbol is evaluated;
- the type function associated to the variable is called, with the resulting value as an argument.
- if the variable can be written to, and if the above call returned a nonzero atom, the value becomes the new value of the variable. Otherwise, an exception is raised.
- Otherwise, the value of the variable is substituted to its identifier at run-time.
If a variable is passed by reference to a routine (see 8.3), the routine will modify it only if it can write to it.
A variable may be assigned a value using its id and the set_var() routine, or using an initialization on declaration; but these are by no means the most frequent way of doing it.
There are three ways to assign a value to a variable using assignment operators:
general_identifier assignment expression
#({general_identifiers}) assignment expression
#({general_identifiers})# assignment expression
In the first form, a variable gets (modified by) the value on the right side. The second form allows this to take place on several variables at the same time, so that they are assigned, or modified by, the same value to which the righthand side evaluates.
The third form normally requires the righthand side of the assignment to be a nonatom. The first element of the list on the left side of the second # is assigned the first element on the right side, and so on until one or both sides run out of elements. It could be called "desequencing", as it sends the contents of a sequence to several variables. If the righthand side is an atom, it is treated as a sequence of length 1.
To retrieve only some elements from the righthand side in intermediate position (an element of higher index is retrieved), use the "_" universal placeholder where a variable would be expected. This effevtively discard the value that would be in the assignment otherwise.
As an example, if fIxed
is an array of char and seQ
is a sequence,
you can
assign the contents of seQ
to fIxed
, truncating extra characters if seQ
is too
long, by coding
#(. IffIxed
)#=seQ
seQ
is not long enough, extra characters
to the right of seQ are not affected.
You can specify an alternate name for an element of a nonatom. This is specially handy when complex index specifications are involved. The available tools are:
name aliased as alias
rename aliased as newalias
unname alias
name supplies an alternate name for an element in an array or sequence.
rename changes an existing alias to another one.
unname makes an alias unavailable.
Aliases, in all this section, are identifiers, while aliased has the form identifier{index specification}. They act exactly as structure members do. As a result, an element keeps its name even if its position in the host sequence changes, as long as it exists.
OpenEuphoria adopted the open philosophy of Euphoria in the sense that a lot of functionality is to be found in libraries rather than in the language itself. The main advantage is that anyone can customize or upgrade routines in the libraries easily - they are plain text source files -, rather than tinker with the OpenEuphoria interpreter/compiler source itself, which may be written in another language. The drawbacks are loss of performance, version conflicts and symbol clashes.
Physical files the program will look for stuff into are called included files.
Because symbols from different files may share the same name and be visible from the same location in the program, there must be a way to unambiguously refer to any of them.
Namespaces are the way. They are identifiers that prefix the symbol name. The prefix is separated from the raw symbol name by a colon ':'.
Namespaces apply to global symbols only. By construction, there is only zero or one public symbol and zero or one private symbol to be seen from any given location in the program. However, global symbols do not necessarily harbor explicit namespaces. Global symbols are in the default namespace.
Auxiliary files are made available to the main file using the include statement:
include filename|(expression)
include filename|(expr) as namespace
Remember that a filename is eitker a string or a parenthesized expression whose run-time value is to be interpreted as a file name. The (generated) string is passed yo the operating system as a filename as-is, and must conform to whatever syntax rules the OS enforces, like double quoting long file names with spaces in them.
The simplest form makes global symbols in filename visible to the other files. This may lead to symbol clashes, some of them are caused by files the coder did not write. See sections 6.3 and 7 below.
The second form allows using the prefix namespace: for global symbols in filename. Several filenames may share the same namespace.
When a file is included for the first time, its statements are executed. Other subsequent include statements relative to this file do not trigger this action.
include statements always declare namespaces: the "default" namespace is used even when none is supplied.
The same file may appear with various namespaces in the same physical file. This is not really a feature, but legacy behaviour. On the brighter side, various files may include the same file with different namespaces.
Using a string enclosed between parentheses causes the string to be considered as an expression, the evaluation of which is used as a filename.
Namespaces are a way for a given file to refer to symbol in another given file. As a result, namespaces are known only in the file they appear after an as keyword. So, they are two sorts of symbol clashes only:
- clashes between symbols sharing the same explicit namespace: the coder is responsible for them and must alter his/her own code to set things right;
- clashes between symbols without namespaces may originate from files the coder did not write. And (s)he included them in order not to rewrite them. Tools are provided for the coder to manage such conflicts between external libraries.
Because of the somewhat undiscriminating nature of the include statement, Which has symbols appearing in two namespaces when one was specified, and which acts in the same way upon all symbols in the included file, another construct is needed to get a more controllable behaviour. Changing the rules for include would most likely break too much Euphoria code.
The statement
import filename|(expression) as namespace
makes the symbols of filename appear in the namespace namespace. The symbols are not visible in the default: namespace, contrary to what the include statement does.
A string immediately following import and enclosed in parentheses is an expression that must evaluate to a string. That string is then processed as a filenamme, just like it would for an include statement.
Thus, import (misc.oe
) as msc
will look for a record called misc
,
with a member named oe
, or a sequence misc
with a named element oe
. If this
can be found and holds a string, this string is the filename to be imported.
But import misc.oe
as msc
will look for a file called ²
and will make its global visible in the namespace msc
only.
Because it is sometimes convenient or useful to use global symbols without using prefixes, it is possible to select symbols to be promoted to the default (unprefixed) namespace.
The supported syntaxes are:
promote "{identifier}" from namespace --grant unprefixed access to symbols explicitly specified in the list.
promote identifier from namespace --identifier is assumed to be a sequence of strings, each of them being the name of a promoted symbol as above.
promote _ from namespace --promotes all symbols from namespace.
promote but list from namespace --promote all symbols but the supplied exclusion list. The list may have any of the two first forms above.
Promoted symbols can then be accessed as if the file they come from had been included using the longer form of the include statement.
Promoted symbols can be demoted, which means they still exist in their source namespace, but no longer in default. The following syntaxes are supported:
demote " {identifier}" [from namespace]
demote identifier [from namespace]
demote _ from namespace
demote but "{identifier}" from namaspace demote but identifier from namespace
allow to drop unprefixed access for the symbols listed, in an almost symmetric way as promote adds them.
Why "almost"? because the first two forms do not need to specify namespaces.
Indeed, there is normally one symbol of each name in the default namespace, and there is no ambiguity in the command given, hence no systematic need for the extra argument.
The construct
[global ]scope identifier
... some code ...
end scope
allows to pretend that the enclosed code comes from an external file, whose name does not matter. Any code that may appear at some position in a source file may appear in a scope block at the same position.
A global symbol inside a named scope can be used:
- using the identifier: prefix, as if they ame from another file;
- without prefix; this requires the use statement (see 7.3 below).
A scope declared as global can be considered as an external file, and is assumed to have been included using the include statement, with the scope name as a namespace. For this reason, namespaces are said to be associated to abstract files, and not only to physical files.
Unnamed scopes may appear in routines and, in this case, follow the same rules as code in routines.
This is specially handy when a small part of a routine needs a few variables not needed elsewhere in the routine. For clarity of code, putting them inside a scope block help separate them from the ones most likely to be used.
The global symbols of a named scope can be accessed without any prefixing by issuing use scopename.
The use statement is always local, which means that its effects stop at the end of executiion of code in the scope or routine it appears in.
A routine is a piece of code which can be called and eventually returns. The first phrase means that control may be transferred to the first executable statement of the routine. This is not necessarily the first general-code statement, since variable initializations are executable statements. The second phrase means that, when the routine is finished with its work, it returns control to the statement following the calling statement.
On return, a routine may provide a value, be it an atom or not. If the routine returns anything, the routine call is evaluated to the returned value.
There are several keywords to define routines, because they have different distory and role to play.
The definition of a routine involves:
- attributes, which may be check, global or forward;
- a routine type keyword, chosen in the following list:
- routine
- procedure
- function
- type
- reftype
- handler
- an identifier, which must not clash with a variable name;
- a pair of parentheses enclosing a possibly empty list of formal parameters.
Section 8.3 below describes formal parameters of routines.
routine is the generic word designating a piece of code with its own variables, can be called and must return, unless it terminates some process.
A procedure is a routine which does not return any value.
A function is a routine which must return a value.
A type is a special sort of function. It must return a boolean, and takes exactly one argument. Specifying a type routine defines a user-defined type with the same name.
A reftype is a special sort of function. It must return a boolean, and takes exactly two arguments: an integer, which is the id of the variable to be assigned, and the value to be assigned to it. Specifying a reftype routine defines a user-defined type of the same name.
A handler is a routine designed to handle events. These events are triggered at run-time, and handlers are not primarily meant to be called explicitly, even though they can as any routine. The argument list of a handler is described in 11.2 below.
Additionally, when a variable is assigned, the (ref)type function associated to it is executed prior to the assignment, and an rxception is raised if it retuns false (see Chapter 1, "Types", for details). For this reason, (ref)types may be considered as an hybrid between functions and handlers.
Sometimes, it makes the code clearer to use a routine even though it was not defined yet. In order to do this, you can use the forward attribute, followed by the routine definition.
When time has come to code the routine statements, just issue the statement routine type routine name, and go ahead with the statements in the routine. The full definition of the routine is already known, so that this shorter form is enough. You still can give the full definition again, but an error will occur if there is a mismatch between the two definitions.
Obviously, if a routine is declared forward and no flesh is added to this bone, an esception will occur at the end of the parsing of the source file.
An explicit routine call is made of:
- the routine name;
- a pair of parentheses enclosing a list of values, called arguments of the call.
The list of values should conform to the list of types specified in
the
routine definition. For instance, if foo
was defined as
routine foo(integer i,string s)
, then a call to foo must be like foo(
expr1,expr2). expr1 is
Checked to
be an integer, and expr2 is checked to be a string. If one of the checks fails,
or if there are not exactly two arguments, an exception will occur.
If the routine is to return a value, and if this value is to be used, then one of the syntaxes below must be used:
general_identifier assignmentfoo(i,s)
#({general_identifiers}) assignmentfoo(i,s)
#({general_identifiers})# assignmentfoo(i,s)
Here, assignment stands for the equal sign preceded with any operator listed in 3.1.
These forms are only special cases of corresponding forms of assignment shown in section 5.5.2 above, and don't need much more commenting, as a routine call is just another expression.
You may ignore the value returned by a function by desequencing it to the
empty list, as in #{}#=foo(i,s)
. #(_)#=foo(i,s)
would do as well.
To signal that a routine must terminate and return control to the statement logically following the routine call, use the return statement. return by itself just terminates the routine; return expr does so and returns the value of expr.
The concept of "statement logically following the call" is as follows:
- if no value is returned, the statement logically following the call is the statement physically following the call;
- if a alue is returned and the routine call is not part of a compound expression, the statement following the call is the assignment of the returne value;
- If the call belongs to a compound expression,the statement logically following the call is the next step in evaluating this expression.
If a routine does not have an explicit return statement, return is assumed right before the end mark of the routine.
The resume statement asks the routine to terminate and reexecute the statement which triggered the routine. For this reason, this statement is meant for exception handlers and, normally, should not be excuted when the routine is called explicitly. There is no point in returning a value here, so that the mention of a value to return is not supported.
Formal parameters are characterised by three properties:
- a passing mode
- a type
- a name
A formal parameter is either a single variable name, a constant or a more complex expression like "x+1". "variable name" extends to whatever might have a variable id, like a named sequence element or record member.
An argument which is not a variable name cannot be passed but by value.
However, when an argument is a variable name, two actions can be performed. Proceed as above is a first option; the alternative is to let the variable be temporarily aliased by the associated formal parameter in the routine body.
The first method is called "pass by value", and is the only method explicitly used by Euphoria. It is the default passing mode in OpenEuphoria.
The second method is called "pass by reference", and must be explicitly enabled in the formal parameter specification.
To allow passing by reference of a formal parameter, prefix it by the update keyword in the routine definition. When calling a routine, if the n-th argument is an expression other than a variable name and is supposed to be passed by reference, it will be passed by value instead. You can use the keyword byval just before the expression to emphasize that the effect of the update keywprd is temporarily suppressed.
When calling a routine, if the n-th argument is an a variable bame and is supposed to be passed by reference, it must be preceded by one of the keywords byref or byval; otherwise an exception will occur. The variable will be passed by reference if byref is used, and passed by value if byval is used.
Remember that, when a variable is passed by reference, any modification made by the called routine to the formal parameter to which this variable is mapped by the routine call is reflected to this variable. When passed by value, no modification is reflected, since the routine operates only on a local copy of the variable.
The type of a parameter is explicitly stated or not. In the latter case, the type of the previous parameter is assumed. Hence, the first formal parameter in a routine definition must have an explicit type.
Specifying a formal parameter as array[(size)] or sequence allows to indicate that no specific type is expected for the elements of the nonatom passed. You may give a size to an array or just leave it out.
--the second parameter will be passed by value regardless of the update keyword, as nothing can be mapped to this compound expression.
forward function foo(string s,byref result,integer i) ...
seq="heLlo, woRld!"
...
x=foo(seq,append(seq,x),3)x=foo(length(s),s,j)
--ERROR: length(s) is an integer, and the first parameter --is a string.x=foo(seq,seq,0)
--this one is corresct ...routine foo
--nothing more to say, here comes the beef. ...result=lower(seq)
--after the correct call tofoo
,seq
is "hello, world!",--since
result
is the second parameter offoo
, is passed by reference and
--seq
is a variable name passed tofoo
as second argument, so thatresult
--aliases it.
Variables declared inside a routine are private and shadow any existing symbol with the same name. Formal parameters of routines have the same behaviour.
A routine can access by name any public or global variable in scope at the time of the call, as well as its own private variables. A private variable cannot be accessed by name outside of its routine.
On return from a routine, all its private variables cease to exist, except those declared as static.
Explicit invocation of a routine takes the following form: the routine name, a left parenthesis, the possibly empty, comma separated list of arguments, and finally a right parenthesis.
For instance, i=find("myself",someSequence)
is a routine call with
two
arguments. First argument is "myself"
and the second one is someSequence
. It is a function-like call, since a value is retrieved from the called routine on return.
The arguments must match the formal parameters in number and type. Failure to do so will raise a "ArgError" exception.
You can use a sequence to represent several consecutive arguments in a routine call. To do so, the sequence must be prefixed by the # desequencing sign, and be enclosed in parentheses to avoid any risk of being mistaken for a hex number.
Thus, if you want to issue the call foo(1,2,3,0)
, and you have a
nonatom
fooArgs
at hand with the value {1,2,3}
, you can issue foo(#(fooArgs),0)
with
precisely the same effect, which is to call the foo
routine with the four
arguments 1, 2, 3 and 0.
Just like variables, routines have an index called routine_id. Five functions are provided to manage dynamic calls:
- routine_id(expr) returns the id of the routine designated by the value of expr. If no such routine exists, -1 is returned.
- get_name(expr) returns the name of the routine whose id expr evaluates to. If no such routine exists, the empty string "" is returned.
- call_routine(expr,list) calls the routine whose id is given by the value of expr with the argument list list. In particular, if the routine called does not take arguments, list should be {}.
- call_proc(expr,list) acts as call_routine, but applies only to procedures. Provided for compatibility only.
- call_func(expr,list) acts as call_routine, but applies only to function-like calls. Provided for compatibility only.
Built-in routines, even though they are defined in no file, have routine id's which can be retrieved as for any user-defined routine.
Global routines may be accessed by name when the abstract file they are defined in is included by other files. Any routine can be accessed through its routine id.
OpenEuphoria provides quite a few built-in routines, like the length() function (counts the number of elements), the integer() type (retuns True on integers and False else) and so on. They are treated as global symbols, and also have a namespace of their own, called builtin.
Defining a routine with the same name as an existing one in a given namespace (including default: or builtin:) generates a warning and shadows the preexisting routine With the newer one.
The @ construct used for variables (see 6.1) is also available for routines, with a few restrictions however because some of them don't always make sense. The available Metadata for routines are:
- name
- the name of the routine
- type
- a small integer representing the keyword used to define the routine. The recommended mapping is
- routine
- function
- procedure
- type
- reftype
- handler
- id
- the routine id of the visible routine with the given name.
- format
- available for types and reftypes only. Default format to be used to print variables of this type.
- types
- a possibly empty array of integers, which are the routine_ids of the types of the formal parameters, in the order that they were enumerated at definition time.
- scope
- sGlobal or sPublic, as for variables.
Unless stated otherwise, the meanings of the same metadata for
routines or
variables is very similar. The get_meta function is also available for routines (see 5.1) exactly as for variables, except that it returns a record of the reserved type of
SystemRtMeta, with names and positions as in the list above.
Code blocks are code between a blocktype statement and the matching end blocktype statement.
Blocks are nested, which means that the order of the blocktypes and the order of the end blocktypes must be exactly the reverse of each other. Failure to do so causes an irrecoverable syntax error.
The statement label identifier may appear just before a code block and uniquely identifies it. The identifier can be used in instructions which control code block execution (see 9.4 and 9.5 below).
Remember that this block may have a substructure:
if cond then
general-code
[elsif cond then general-code]
[else general-code]
end if
This will execute some general-code according to the values of the cond, according to the following rules:
- Each cond is evaluated until one of them is true or all of them are false.
- If one of the cond evaluates to True, the next general-code statements are executed until the next elsif, else or end if is found. Execution then resumes right after the end if statement closing the if block.
- Otherwise, the code following the else statement is executed if there is such a statement. Execution then resumes right after the closing end if statement.
When several courses of action might be taken according to the value of some expression, you can always stack a few elsif statements inside an if block. However, it may not be the clearest way to code this sort of situation, and this is why an alternative construct is provided. Also, you may want to take several branches in succession, which the if statement does not allow.
The structure of a select block is as follows:
select expr
case statement
[otherwise general code]end select
and a case statement is as follows:
- case expr:
- general code
- case rel_op expr:
- general code
- case expr thru expr:
- general code
- case condition:
- general code
The expression following the keyword select is evaluated, and this value is called the selector of the block. Decisions will be made according to its value. Each branch of the decision tree is represented by a case statement; an otherwise branch may be there as well.
The keyword case may be followed by four different types of items:
- a single expression, whose value is computed and matched against the selector. The general code that follows is executed if and only if the two values were equal.
- a relational operator, followed by an expression. As above, the expression is evaluated. The general code executes if and only if the condition selector rel_op expr value" is true. A relational operator of '=' may be omitted, since it leads to the case above.
- two expressions separated by the keyword thru. The two expressions are evaluated; the general code executes if and only if the selector is inside the closed interval those two values bound.
- a condition involving the symbol "_": the condition is evaluated as for an if statement, except that the "_" symbol stands for the selector. The general code executes if and only if the condition evaluates to True.
Each case statement starts by (a simplified form of a) conditional clause. The code following a case statement is executed whenever the corresponding condition is true. After that, if the block was not exited, the next case statement is inspected.
The process goes on until one of the three mutually exclusive situations happens:
- the select block is exited using the break statement. No more case statements are processed, and execution resumes right after the end select statement.
- the end select statement is reached: no action is taken, and the block is exited.
- the otherwise statement is reached: see section below.
This statement is optional, and is allowed only inside a select block. It may appear at most once in a block.
If it is reached, and one of the branches of the select block was taken, the block is exited; otherwise, execution continues past the otherwise statement.
Allowed only inside a case branch, it causes that branch to be exited and the next case condition to be tested. If there is none left, the select block is exited.
The complete syntax is as follows:
for identifier=start value to end value [by increment] do
end fori>general code
When the for statement is reached from outside the block, start value, end value and increment are computed. If the "by" clause is not present, the increment is set to 1. They all must evaluate to an atom. These values are not computed again during the susequent loop iterations.
The loop index variable identifier must not have been declared, and is assigned the start value. It cannot be modified.
If (start value-end value)*increment is greater thn zero, no iteration is performed and the loop is exited, as there is no way for the index variable to get closer to the end value. Otherwise, the first iteration starts.
If the index is not between the start and end values, and if the for statement is reached from inside the loop, the loop is exited without any further iteration. Execution resumes right after the end for statement. The loop index variable remains availabe until the next for statement using the same identifier as its index variable. Otherwise, a new iteration starts.
When the end for statement is reached, the index is incremented by the increment and control is transferred to the for statement.
while cond do general-code end while
Executes an iteration of the loop if cond is true. Otherwise transfers control right after the "end while" statement.
The end while statement makes the while statement to be executed.
The complete syntax is as follows:
wfor identifier=start value to end value [by increment] do
general codeend wfor
This loop is a sort of hybrid between a for and a while loop, hence the wfor name.
If the by clause is not present, the increment is set to 1. These values are computed whenever the wfor statement is executed, and always must evaluate to atoms.
The loop index variable identifier must have been declared. It is an ordinary variable which may be assigned inside the loop.
If the index is not between the start and end values, the loop is exited without any (further) iteration. Execution resumes right after the end wfor statement. Otherwise, a new iteration starts.
When the end wfor statement is reached, the index is incremented by the increment and control is transferred to the wfor statement.
Exiting a block means that the next executed statement is the one Following the end blocktype statement which ends the block.
A code block will be said to be "active" relative to this statement if it contains the statement.
The exit statement can be used to exit a loop, the exif statement can be used to exit an if block, and the break statement allows to exit a select block.
They all have an optional argument. If they don't, the current relevant block is exited. Otherwise, the specified block (see below) is exited.
The optional argument of an exiting keyword is either a number or an identifier. The phrase "relevant block" translatse as "loop block" when referring to an exit statement, an if block when referring to an exif statement and as select block when a break statement is involved.
If the argument is an identifier, it must be a label tagging an active relevant block. Labels are dropped using the label statement in 9.1 above. Failing this consistency criterium raises an exception. Otherwise, the block tagged by this label is exited.
If the argument is an integer greater than zero, this number is the number of relevant blocks nesting the current one that must be exited. Thus exit 1 means "exit the active loop above the current one", exit 2 exits the loop above the one above the current one, and so on.
If the number is negative, then the active relevant blocks above the current one are counted backwards from the topmost one to determine the block to be exited. Thus, exit -1 means "exit the topmost active loop block", exit -2 means "exit the active loop just below the topmost one", and so on.
An argument of 0 is ignored, as it would only emphasize that the current relevant block is to be exited.
A loop iteration can be stopped at any point during its execution using the keywords next or retry. These keywords accept the same optional argument as exit.
This statement causes a new iteration of the loop to occur. This means that control is transferred to the opening statement of the loop, causing index update in for or wfor loops, and condition evaluation ina while loop.
This statement causes the current iteration of the loop to start again. This means that control is transferred to the first statement inside the loop block. Thus, the index of a for or wfor loop is not updated, and the condition of a while loop is not evaluated.
A run-time debugger makes it extremely easy to debug a program, much easier at least than scattering a few print() statements and having to guess what is going wrong in program flow, variable assignments and other issues.
The integrated debugger is enabled by the with trace statement, and completely turned off by the without trace statement. This default behaviour saves execution time.
If the debugger is enabled, you start it by the trace(1) statement, and turn it off from the running program by the trace(0) statement.
A command/status line will be also provided, as the debugger may process user input (see 10.2 below) and display some information to the user.
The main debugger screen shows about 15 lines of code, highlighting the one to be executed. This line will remain about the middle of the screen most of the time, so that some code before and after it can be seen always. It will be called the active line. Another line may be highlighted in some other way, and will be called the spot linr.
Another part of the screen is reserved to show the values of most Recently accessed variables. These values are updated as source statements are executed.
The debugger must be implemented in such a way that it will not trace itself, nor trace events it may (cause to) trigger.
The following actions should be requested using one-key keyboard shortcuts:
- toggle display between debugger screen and running application (recomended: F2);
- stop tracing and go (recommended: q);
- quit program and debugger altogether (recommended: Q);
- see more code upstream (recommended: PageUp for one page, UpArrow for one line);
- see more code downstream (recommended: PageDn for one page, DnArrow for one line);
- set the spot line as next statement to execute (recommended: F5);
- restore display of active line (recommended: End);
- execute active statement (recommended: Enter);
- toggle breakpoint at spot line (recommended: F8);
- show more of a large variable value in a text box (recommended: s);
- revert to normal display, closing the variable display box (recommended: S);
- reinitialize program and restart debugging from scratch (recommended: Home).
Scrolling through code using the mouse buttons, movements or wheel actions is to be provided.
Rather than immediate actions, the following are commands aimed at inducing specific behaviour from the debugger, or to set some trace scheduling.
The b command allows you to enter a conditional expression. This expression must be a valid OpenEuphoria condition. This condition sets up a dynamic breakpoint, which is triggered any time the condition is true. The expression may use any variable in scope at the time it is defined. Whenever one of these variables gets out of scope (for instance, returning from a routine), the dynamic breakpoint is disabled.
This breakpoint is independent from the static brakpoint F8 toggles on and off.
The ? command allows you to enter a valid OpenEuphoria expression. This expression will be treated as a most recently modified variable and displayed as such.
The s command will prompt you to enter an OpenEuphoria expression. if this exoression is not among the displayed variables, it is added as the ? command would. Moreover, a text box will open up and display a good deal of the expression value, quite more than ? would have allowed. The S command (10.1) closes the box.
The status line referred to in 10.1.1 will display information about the state of the static and dynamic breakpoint, as well as the indication of which one was last triggered.
Exceptions are situations which most likely come from an error. Exceptions will cause default or user-defined routines - handlers actually - to be executed when available. This way, the program knows that something possibly went wrong and may take corrective action as needed to avoid or soften the crash.
Events are actions taken by the machine code being executed. Trapping them, also using handlers, allows to be informed of what is going on. Such hooks are of obvious use for debugging or profiling purposes, but they may serve many more useful programming needs as well.
Because the same mechanism is used in both contexts, the term of event will be used to refer to both exceptions and program events indifferently.
This is done using the statement set_handler(expr, expr)". The first expr must resolve to the routine id of a handler. The second expr must resolve to a string representing an event name.
get_handler(expr), where expr resolves to an event name, returns the id of the handler for this event.
A handler is called with five parameters:
The following table lists the events and exceptions that call a handler, the parameter they pass as a last argument and the default handler action.
Name | Last parameter | Default action |
AfterAssign | {} | does nothing |
AfterCall | argument list | does nothing |
AfterIndex | {} | does nothing |
AfterRead | value | returns the value |
AfterReturn | {value}, or {} if none | does nothing |
AfterWarning | {warning text,warning code} | does nothing |
ArgError | {argument #,value} | aborts |
BeforeAssign | value | calls type checking code, possibly |
issuing TypeError, or does nothing. | ||
BeforeCall | argument list | checks types, possibly issuing ArgError |
BeforeExecute | statement text | does nothing |
BeforeIndex | {} | conveerts indices to positive, possibly issuing IndexBounds |
BeforeRead | {} | does nothing |
BeforeReturn | {value}, or {} if none | does nothing |
BeforeWarning | {warning text,warning code} | displays the warning |
IndexBounds | value | aborts |
RaisedError | error message | prints the line # and the supplied error message, then aborts |
RuntimeError | {statement text,error code} | aborts |
SyntaxError | statement text | aborts |
TypeError | value | aborts |
UnknownToken | statement text | aborts |
ZeroDivide | {} | aborts |
An assignment is preceded by a BeforeAssign and followed by an AfterAssign.
Reading a variable is preceded by a BeforeRead and followed by an AfterRead.
Calling a routine is preceded by a BeforeCall in the current scope and followed by an AfterCall in the routine scope.
Returning from a routine is preceded by a BeforeReturn in the routine scope and followed by an AfterReturn in the callong statement's scope.
Accessing an element in a nonatom is preceded by a BeforeIndex and followed by an AfterIndex.
UnknownToken is issued by the interpreter when it does not recognise a token.
RuntimeError is issued on a variety of reasons.
ZeroDivide is called when a division by zero happens.
RaisedError is called only by the "error" primitive.
This procedure takes a string as its argument, and passes it, as xell as the usual four other arguments, to the RaisedError handler. Both second and third arguments are zero.
It may be desitable, when a resume and return instruction is executed, to execute a dynamically generated statement after leaving the handler, but before the standard action being taken. The argument for resume_execute() and return_execute() is an expression which is fed to execute() at the appropriate time.
Example: assume that a string is being scanned, and its length may vary in the process. A while or wfor loop may do the trick, except that, after almost any statement modifying the scanning index or the length of the string, a check must be performed to avoid index out of bounds exception.
A clean solution then is to instruct the relevant handler to quietly exit the loop whenever this condition happens.
Thus, one may code:
IOBhandler=get_handler("IndexBounds")
handler IndexBounds(integer varid,sequence subex,integer indexid)
if varid=scanned@id then return_execute("exit")
--when scanned is subscripted with an out-of-bounds index, just exit
else call_proc(IOBhandler,{varid,subex,indexid})
--otherwie, chain to previous handler
end if
end handler
--now the loop
i=1
while i<=length(scanned) do
....
--code that no longer needs repeated checks like
--"if i>length(scanned) then exit end if"
...
end while
set_handler("IndexBounds",IOBhandler) --restore previous handler
The code inside the loop got rid of repeated checks and is clearer and leaner as a result. There is hardly any performance loss, since the handler is invoked only on an error condition.
Object oriented programming, or OOP, is not directly built in OpenEuphoria. External libraries will get notifications of OOP syntactic constructs and will have to implement these constructs.
The OOP library is to be included in the reserved "OO" namespace. Normally, Functions of the library are not called directly; the interpreter plugs in the appropriate OO calls, like a preprocessor would.
start a class definition:
Action | OE syntax | Translation |
Starts a class definition | class Identifier | constant Identifier=OO:begin_class() |
End a class definition: | end class | OO:end_class() |
private do | OO:begin_private() | |
Ends a private part of a class: | end private | OO:end_private() |
Declares a public part of a class: | public do | OO:begin_public() |
Ends a public part of a class: | end public | OO:end_public() |
Declares a protected part of a class: | protected do | OO:begin_protected() |
Ends a protected part of a class: | end protected | OO:end_protected() |
Apply a method to an object | Identifier1->Identifier2({Epr}) | OO:call_method (Identifier1,Identifier2,{{Expr}}) |
Get a member from a class: | identifier1->identifier2 | OO:get_member(identifier1, identifier2) |
Set a member from a class | identifier1->identifier2=expr | OO:set_member(identifier1,identifier2,expr) |