The Slogan Handbook - Basic Data Types

Data types in Slogan falls into three categories - basic types, composite types and user-defined types. Basic types, the topic of this chapter, include numbers, characters, strings, symbols and boolean. Composite types are more complex because they are formed by combining values of several simpler ones. Arrays, pairs, lists, hash tables, sets and records are examples of composite types. Slogan also allow the definition of new types that conform to user-specified interfaces.

4.1 Numbers

Slogan classifies numbers as integers, rational, real and complex. This classification is hierarchical, in that all integers are rational, all rational numbers are real, and all real numbers are complex. Orthogonal to these categories, a number is also either exact or inexact. In most cases, computations that involve an inexact number will produce an inexact result. One exception to this rule is multiplying an inexact number with the exact 0, which will produce an exact number. Operations that mathematically produce irrational numbers for some rational arguments (e.g., sqrt) may produce inexact results even for exact arguments.

There are predicates¹ that can be used to determine the specific type of a number.


is_integer(123)
// true
is_real(123)
// true
is_real(1/23)
// true
is_integer(1/23)
// false
is_number(1/23)
// true
is_number(123)
// true
is_rational(1/23)
// true

Exact integer and rational arithmetic is supported to arbitrary precision; the size of an integer or of the denominator or numerator of a ratio is limited only by system storage constraints.

Slogan numbers are written in a straightforward manner not much different from ordinary conventions for writing numbers. An exact integer is normally written as a sequence of numerals preceded by an optional sign. For example, 3, +19, -100000, and 208423089237489374 all represent exact integers.

An exact rational number is normally written as two sequences of numerals separated by a slash (/) and preceded by an optional sign. For example, 3/4, -6/5, and 1/1208203823 are all exact rational numbers. A ratio is reduced immediately to lowest terms when it is read and may in fact reduce to an exact integer.

Inexact real numbers are normally written in either floating-point or scientific notation. Floating-point notation consists of a sequence of numerals followed by a decimal point and another sequence of numerals, all preceded by an optional sign. Scientific notation consists of an optional sign, a sequence of numerals, an optional decimal point followed by a second string of numerals, and an exponent; an exponent is written as the letter e followed by an optional sign and a sequence of numerals. For example, 1.0 and -200.0 are valid inexact integers, and 1.5, 0.034, -10e-10 and 1.5e-5 are valid inexact rational numbers. The exponent is the power of ten by which the number preceding the exponent should be scaled, so that 2e3 is equivalent to 2000.0.

Exact and inexact real numbers are written as exact or inexact integers or rational numbers; no provision is made in the syntax of Slogan numbers for non-rational real numbers, i.e., irrational numbers.

The exactness of a numeric representation may be overridden by preceding the constant by either 0e or 0i. 0e forces the number to be exact, and 0i forces it to be inexact. For example, 1, 0e1, 1/1, 0e1/1, 0e1.0, and 0e1e0 all represent the exact integer 1, and 0i3/10, 0.3, 0i0.3, and 3e-1 all represent the inexact rational 0.3.

    
is_exact(123 * 100)
// true
is_exact(123 * 100.0)
// false
is_inexact(123 * 100.0)
// true
1 == 1.0
// false
1 == 0e1.0
// true
inexact(1) == 1.0
// true
0i1 == 1.0
// true
0i1 == exact(1.0)
// false

Numbers are written by default in base 10, although the special prefixes 0b (binary), 0o (octal), 0d (decimal), and 0x (hexadecimal) can be used to specify base 2, base 8, base 10, or base 16. For radix 16, the letters a through f or A through F serve as the additional numerals required to express digit values 10 through 15. For example, 0b10101 is the binary equivalent of 21₁₀, 0o72 is the octal equivalent of 58₁₀, and 0xC7 is the hexadecimal equivalent of 199₁₀. Numbers written in floating-point and scientific notations are always written in base 10.

Underscores may be added to a number to improve readability. For example, the integer 1234567 could be formatted as 1_23_4567.

Complex number literals takes the form R+Ii, where R is the real part and I is the imaginary part. E.g: 3+7i.

There are functions that corresponds to the arithmetic and comparison operators. These functions can accept an arbitrary number of arguments.


add(1,2,3,4,5)
// 15
number_is_lt(1,2,3,4,5)
// true
number_is_lt(1,2,3,40,5)
// false
number_is_lt(1,2,3,4,4)
// false
number_is_lteq(1,2,3,4,4)
// true
mult(20, 30, 40)
// 24000

4.1.1 Bitwise Operations

In this section we will discuss functions that perform bitwise binary operations on integers. Some of the most useful of these functions are listed below:

If the number of bits to shift is negative, bshift performs a right-shift. Otherwise, the bits are shifted left.

Bitwise operations assume that integer are represented in two's complement, even if they are not represented that way internally.

The following program show how to interpret an integer as a compact set of independent bits.² Note that we make use of only the first 32 bits of the integer, while the underlying value may have more bits. To view the binary representation of an integer, the built-in number_to_string function is called. It takes an optional second argument that specifies the base in which the result string should be formatted. To get a binary formatted string, we have to pass 2 here.


function turn_bit_on(bits, i)
  if (i <= 31) bior(bits, bshift(1, i))
  else bits

let a, b = turn_bit_on(0, 31), turn_bit_on(0, 31)

number_to_string(a, 2)
// 10000000000000000000000000000000
number_to_string(b, 2)
// 10000000000000000000000000000000

a = turn_bit_on(turn_bit_on(a, 1), 5)
b = turn_bit_on(turn_bit_on(b, 1), 2)

number_to_string(a, 2)
// 10000000000000000000000000100010
number_to_string(b, 2)
// 10000000000000000000000000000110

number_to_string(band(a, b), 2) // intersection
// 10000000000000000000000000000010

number_to_string(bior(a, b), 2) // union
// 10000000000000000000000000100110

is_bit_set(a, 1) // membership test
// true
is_bit_set(a, 10)
// false

4.1.2 Fixnums

Fixnums represent exact integers within a closed range [-2^w-1, 2^w-2 - 1], where w is the fixnum width. The implementation-specific value of w can be determined via the function fixnum_width, and the endpoints of the range may be determined via the functions least_fixnum and greatest_fixnum.

The names of arithmetic procedures that operate only on fixnums begin with the prefix "fx" to set them apart from their generic counterparts. The following example demonstrates some of the most useful operations on fixnums:


fxadd(1, 21)
// 22
fxadd(1, greatest_fixnum())
//> error: FIXNUM overflow

fx_is_eq(1, 1)
// true
fx_is_gt(1, 2)
// false
fx_is_gt(10, 2)
// true
fx_is_lteq(10, 2)
// false
fx_is_lteq(10, 10)
// true
fxsub(20, 32)
// -12
fxmult(20, 32)
// 640
fxdiv(20, 32)
0
fxdiv(20, 2)
// 10

Bit and shift operations on fixnums assume that fixnums are represented in two's complement, even if they are not represented that way internally.


number_to_string(fxior(4294967296, fxshift(1, 2)), 2)
// 100000000000000000000000000000100
number_to_string(fxior(4294967296, fxshift(fxshift(1, 2), -3)), 2)
// 100000000000000000000000000000000

Flonums

Flonums are inexact real numbers. Implementations typically use the IEEE double-precision floating-point representation for flonums. Flonum-specific function names begin with the prefix "fl" to set them apart from their generic counterparts.


fladd(1.2, 4.5)
// 5.7
flmult(1.2, 4.5)
// 5.3999999999999995
fl_is_eq(0, 0.)
//> error: (Argument 1) FLONUM expected
fl_is_eq(0., 0.)
// true
fl_is_lt(-1., 0.)
// true

4.2 Characters

Characters are atomic objects representing letters, digits, special symbols such as $ or #, and certain non-graphic control characters such as space and newline. Characters literals are written with the \ prefix. For example, the character literal A will be represented in Slogan source code as \A.


\newline
\return
\tab
\space
\backspace
\alarm
\vtab
\esc
\delete
\nul

Any Unicode character may be written with the syntax '\xhh', '\uhhhh' or '\Uhhhhhhhh' where n consists of two, four or eight hexadecimal digits representing a valid Unicode scalar value.


\A == \A
// true
\A == \a
// false
\A == char_upcase(\a)
// true
\c > \b
// true

There are many predicates useful for finding information about characters and for comparing them:


is_char(\A)
// true
is_char(65)
// false
is_char(integer_to_char(65))
// true
char_is_numeric(\2)
// true
char_is_alphabetic(\e)
// true
char_is_lower_case(\e)
// true
char_is_eq(\a, \a) // `==` optimized to work with characters
// true
char_is_lteq(\a, \b) // `<=` optimized to work with characters
// true

Let us write a new predicate for characters which return true if its argument is a vowel. This function will also introduce you to the case expression.


function is_vowel(c)
  case (c)
    \a -> true
  | \e -> true
  | \i -> true
  | \o -> true
  | \u -> true
  | else -> false

Case evaluates an expression and compares its value to those in a list of clauses. This comparison is done using the is_eq function which basically checks if two values are stored in the same location in memory. On a successful match, the value of the clause is returned. An optional else can be defined to return a default value if all matches fail.

Multiple clauses that return the same value can be compressed into a single list. In is_vowel the else can also be omitted because the default value of case is false. These two points leads to the following rewrite of the function:


function is_vowel(c)
  case (c) [\a, \e, \i, \o, \u] -> true

// Usage:
is_vowel(\o)
// true
is_vowel(\b)
// false
is_vowel(\a)
// true

4.3 Strings

A string is a sequence of characters enclosed in double-quotes. Slogan supports the Unicode standard. That means, Slogan strings can represent scripts from all of the world's written languages. The following are examples of valid string literals:


"hello, world"

// a string may span multiple lines.
"this is a really
long message..."

"ἐγὼ εἰμί"


"he said: \"hello, there\""
// he said: "hello there"

A list of all escape characters that can appear in string literal and their purpose is listed below:


\n    newline
\t    tab
\r    return
\\    backslash
\b    backspace
\a    alarm
\v    vertical-tab
\"    double-quote
\e    escape
\d    delete
\0    nul
\u    unicode character encoded in 4 hexadecimal digits
\x    unicode character encoded in 2 hexadecimal digits
\U    unicode character encoded in 8 hexadecimal digits


let s = "For all its power, the computer is a harsh taskmaster."

// accessing individual characters by index:
string_at(s, 2)
// \r
s[2]
// \r

// splicing or extracting sub-strings:
substring(s, 4, 17)
// all its power
s[4:17]
// all its power
s[4:]
// all its power, the computer is a harsh taskmaster.
s[:17]
// For all its power
string_length(s[:17])
// 17

/* `count` is more generic than `string_length`, it can
   also find the length of other "collections" of data, like arrays and lists. */
count(s)
// 54

// searching:
string_index_of(s, ",")
// 17
string_index_of(s, "computer")
// 23

string_append("abc", "def", "xyz")
// abcdefxyz

// split the string at commas and spaces: 
string_split(s, [\,, \space])
// [For, all, its, power, the, computer, is, a, harsh, taskmaster.]
strings_join("-", string_split(s, [\,, \space]))
// For-all-its-power-the-computer-is-a-harsh-taskmaster.

// comparisons
string_is_eq("abc", "abc")
// true
"abc" == "abc"
// true
string_is_eq("aBC", "abc")
// false
string_is_ci_eq("aBC", "abc")
// true
string_is_lt("abc", "xyz")
// true
"abc" < "xyz"
// true
"abc" >= "abc"
// true

4.4 Symbols

Symbols are used for a variety of purposes as symbolic names in Slogan programs. Symbol constants are written by prefixing identifiers with the quote mark ('). All characters valid in identifiers can be used in symbols. Symbols with spaces and special characters are written by enclosing the symbol in tick (`) quotes. The following are all valid symbols:


'abc
'$abc
'`abc def`
'`abc+def`

Strings could be used for most of the same purposes, but an important characteristic of symbols makes comparisons for equality much more efficient. This is because two symbols with the same sequence of characters are stored in the same memory location. This makes it possible to test them for equality with the is_eq function, which does a fast check if its arguments point to the same location in memory. On the other hand, effective string comparisons always require checking each character in both strings.


let a = "hello"
let b = "hello"

// `==` will compare each character in both strings
a == b
// true

// so does `string_is_eq`
string_is_eq(a, b)
// true

/* `is_eq` only checks of two objects belong to the same
   location in memory */
is_eq(a, b)
// false
is_eq(a, a)
// true

// In contrast to strings, two symbols made of the same sequence of characters
// can be efficiently compared for equality just by checking their memory locations.
let x = 'hello
let y = 'hello
let z = 'Hello

is_eq(x, y)
// true
is_eq(x, z)
// false

It is possible to construct new symbols from strings and to convert symbols to strings:


is_eq('hello, string_to_symbol("hello"))
// true
"hello" == symbol_to_string('hello)
// true

¹A predicate is a function that answers a question with a true or false value.

²Slogan has a composite type bitarray that can represent bitmaps of arbitrary sizes. This type will be introduced in the next chapter.

band	bitwise AND
bior	bitwise inclusive OR
bxor	bitwise exclusive OR
bnot	bitwise NOT
bshift	left/right arithmetic shift
is_bit_set	predicate to test bit by position