14

Processing Text Files

Part II of this book present complete Slogan programs that perform useful real-world tasks. These programs will also illustrate how to write good Slogan code.

The first program we study will demonstrate how to take textual data as input, do some processing and output the result. This example is inspired by the AWK programming language which is an important component of the UNIX programming environment. The program we are going to develop will read "records" from an input file and process each record by calling a function defined by the user. A record is defined as a single line in the file, with each entry separated by spaces. As an example, the following listing shows a file (named emp.dat) with five records. Each record has three elements: the name, hourly rate and total hours worked by an employee.


Mark   45  12
Susan  56  10
Matt   67  14
Jake   45   0
Jones  34  15 
      

Our program will enable the user to generate a report of total wage payable to each employee by writing a function like this:


function action(name, hourly_rate, hours_worked)
  when(hours_worked > 0)          
    showln(name, " ", hourly_rate * hours_worked)          
      

To prepare the arguments for calling the action function, each line in the input file has to be processed as follows:

  1. Split the line into individual elements, using space as the delimiter.
  2. If an element is in numeric format convert that to a number.
  3. If an element represents a boolean literal (true or false), convert that into the corresponding boolean value.
  4. Apply action to the elements.

The algorithm described above is captured by the following handle_lines function:


function handle_lines()
  let (line = read_line(input))
    when (not(is_eof_object(line)))
    { apply(action, map(parse_token, string_split(line)))
      handle_lines() }
      

Handle_lines call the utility parse_token to convert appropriate individual elements to numbers and boolean values:


function parse_token(t)
  if (char_is_numeric(t[0]) || t[0] == \.)
    try string_to_number(t)
    catch (_) t
  else if (t == "true") true
  else if (t == "false") false
  else t
      

As mentioned earlier, the logic to process each record is expressed by the action function defined by the user. How can we load this user defined function into our data processor? One way is to let the user write this function as a Slogan script and pass this script's name as a command line argument to the program. This script can then be dynamically loaded and evaluated. Thus all definitions in the script are made available to the program. The following code snippet shows how to get the script name from the command line, then load and evaluate it:


// ignore the first command line argument, which is the program's name.
let args = rest(command_line())

if (is_empty(args))
  error("expected script name as command line argument")
else
  reload(first(args))
      

The only thing that remains to be done is to initialize the global variable input to point to the data file. This file's name is also accepted by the program as a command line argument. If no data file is specified by the user, input will default to the standard input.


args = rest(args)

let input = current_reader()
when (not(is_empty(args)))
  input = file_reader(first(args))
      

That finishes our text file processor! The full listing of the program is available in the script awk.sn. This can be compiled into a stand-alone program by running the command:


$ slogan -x awk          
      

User defined action functions can now be written as separate scripts and fed to this program along with the data file to process. For example, we can save the original action function in a script called "myaction.sn" and run it on the "emp.dat" file.


// myaction.sn          
function action(name, rate_per_hour, total_hours)
  when (total_hours > 0)
    showln(name, " ", rate_per_hour * total_hours)    
      

$ ./awk myaction.sn emp.dat
//> Mark 540
    Susan 560
    Matt 938
    Jones 510
      

Exercise 14.1.   Modify the data processor to accept multiple data files as input. This means the command line interface has to be extended as: ./awk action.sn data1 data2 ...dataN. The action function should be applied to records in each file sequentially.

Exercise 14.2.   Make the name of the data file that is currently being processed available to the action function.

Exercise 14.3.   Update the handle_lines function to report invalid inputs with user-friendly messages and make a clean exit.

Exercise 14.4.   The data processor can only deal with records where elements are separated by spaces. Change the program in such a way that the delimiter character can be specified by the user.


Next | Previous | Contents