NESH

Version 1.0

Introduction

Nesh, meaning "Splendid new unix shell" in the black tongue of Mordor, or Not Exactly a Shell in English, is a scripting language intended for scripts which handle networks, strings and shared data, with other minor bonuses compared with standard shells or C. This document describes its elements with example scripts.

Text Literals

Literal text is enclosed in double quotes and is modelled on the C printf format string, including escaped special characters such as \n for newline and %s for string parameter. These strings can be used on the right hand side of assignments or on their own, meaning output the result. Thus the nesh hello world script is:

#/bin/nesh
"Hello world\n"

Variables

Variables are named with the $ character and a single letter, e.g. $a. Variables can be one of these types: number, string, server, client, file, named pipe, structure or view, with the type set by what the script puts into them. Variables can appear on the left or right side of assignments, or on their own to mean output the contents.

External environment variables (eg $HOME) and the script's parameters (eg $1) are available as read-only string variables.

String Variables

String variables differ from both C and Pascal strings in having neither visible null-termination nor length bytes, and can freely mix printable text with binary. This can be confusing because it looks as though it shouldn't work: data can be read from the network and manipulated as text (ie substring searches don't fall off the end) or as binary (ie it's not been modified to make string operations work).

String Slicing

Most string manipulation is done with the slice feature, which has many variants, but essentially extracts a substring from a variable, specified by symbols attached to the source variable. The symbols are brackets (round or square) enclosing start and end indicators (string, number or chop).

Square brackets indicate inclusion of the end points, round brackets indicate exclusion. Start and end brackets do not have to match. A string for the start means search for that substring from the start of the source string, and a string for the end means search for that string from just after the start string. Strings followed by an asterisk and count (eg * 3) mean count that many occurrences of the string, so * 1 is the default.

Numbers mean indeces into the string, with 0 at the end meaning the end of the string. For example,

$a = "Hello world\n"
$b = $a["wo" "ld"]        # sets $b to "world"
$b = $a("He" "lo")        # sets $b to "l"
$b = $a["l" * 3 0]        # sets $b to "ld\n"
$b = $a[0 " ")            # sets $b to "Hello"
$b = $a(5 0]              # sets $b to "world\n"

The special symbol \ is equivalent to 0 for defining the result, but has the additional "chop" function of changing the source string by removing the result of the slice from the start or end as appropriate. So:

$a = "First line\nSecond line\Third line\n"
$b = $a[\ "\n"] # sets $b to "First line\n" and reduces $a to "Second line\Third line\n"

Client Variables

Client variables represent network connections where the script is the client, in the sense of making contact with an existing process already listening on a known port, and talking to it. Support for passing through firewalls via a SOCKS-5 proxy is mostly automatic: if the target machine's name contains a dot and the SOCKS_PROXY environment variable is set, then it's done via the proxy. Nothing needs to be different in the scripts themselves.

Using the network is a two-stage process: the variable must first be "bound" to the target machine and port with the bind symbol := with a source in the format of computer name followed by a colon and port number. The variable can then be used for reading and writing the network as if it were a string variable.

Normally only the currently available data is read, but reading the variable as a capital letter means keep reading data until the connection closes and return all of it. Writing to it as a capital letter means close the connection after the write.

For example, a local finger utility can be implemented as:

#!/bin/nesh
$w := "localhost:79" # bind to target machine and port, local machine for simplicity
$w = "%s\n" $1        # send target username from command line,
                      # adding \n for finger server's protocol
$r = $w               # read result from network
$r                    # display result

As conveniences, the port number can be given by a TCP service name from /etc/services and the network variable can be output directly without the intermediate result variable, so:

#!/bin/nesh
$w := "localhost:finger" # bind to target machine and port, local machine for simplicity
$w = "%s\n" $1            # send target username from command line,
                          # adding \n for finger server's protocol
$w                        # read result from network and output it

More generally, string slicing makes it easy to parse a general user@machine command for remote fingering:

#!/bin/nesh
$x = $1 # copy parameter to variable for slicing

$u = $x[0 "@") # get user name without the @
$m = $x("@" 0] # get machine name, also without the @

$w := "%s:finger" $m      # bind to target machine and port
$w = "%s\n" $u            # send target username, adding \n for finger server's protocol
$w                        # read result from network and output it

In some cases the port number can be given as a symbolic name resolved on the remote computer, if that machine is running the port brokering service on port 54321. Eg one can connect to "pbm.com:tbg" without knowing which port the tbg service is running on today.

Server Variables

Server variables represent a service provided over the network by a script listening to a particular port. Reading from these variables produces client variables, one per connection made by a remote client, and writing to them has no meaning other than to close the service by overwriting it.

Server variables are set up by binding to a port with the format colon and port (no machine name, it's always the local one), e.g. a simple chat demo expecting to be contacted with "telnet hostname 2000":

#!/bin/nesh
$w := ":2000" # listen on port 2000
$c = $w # accept a connection

$c = "I'm a chatter, who are you? "          # strike up a conversation
$r = $c                                      # read their response
$c = "Hello, %s\n" %r                        # reply to them by name

"Logged a connection from %s\n" $r # and report it

File Variables

File variables represent files, set up by binding the file's name to the variable. Reading or writing the file variable reads or writes the entire file. For example, to show a user's details from the password file:

#!/bin/nesh
$f := "/etc/passwd"        # bind filename to variable
$p = $f                    # read in whole file
$r = $p[$1 "\n"]           # get the username parameter and find its line
$r                         # output it (or nothing if not found)

New files are created with permissions 777, relying on umask to cut them back to what's actually required.

Existing file descriptors can also be bound to file variables as direct numbers, with the only useful cases normally 0, 1 and 2 for standard input, standard output and standard error. For example:

#!/bin/nesh
$k := 0                                # standard input
$o := 1                                # standard output
$e := 2                                # standard error

$o = "Who are you? "                   # printf
$r = $k                                # gets
$o = "Hello, %s\n" $k                  # printf
$e = "Aargh, it's all gone wrong\n"    # sudden panic attack

Named Pipes

These variables are defined by binding to the name of the pipe preceded by the pipe character (|) and can be read and written as for files. Their semantics follow the underlying operating system's named pipes so are only useful in special cases, usually with labels (see sleeping below).

Structures

Structures are compound variables, most generally an array of multi-field elements, where each field is itself a variable of any type. The size of the array and the fields themselves can be changed while the script runs. Structure shape changing operations are done by assignment (=) if the left hand side is already a structure, or binding (:=) to define a new structure.

Fields are referenced with the / character and their name, which can be variable, and array size is set with the * character and a number. Array size can also be increased by one with the ++ operator, allowing gradual buildup of the structure.

(Note all structure operations involve the / character and for convenience literal strings consisting entirely of digits and numbers following a / don't need to be in quotes.)

Examples of structure operations:

$a := /name + /email * 1000    # define a new structure
$a = $a + /url                 # add a field to each element of array
$a = $a * 2000                 # make the array bigger, keeping contents

Individual variables within the structure are accessed via view variables, see next section.

Views

These variables provide access to all the fields of a single element of a structure's array. They are set by indexing into a structure or searching for a textual match in various ways, and are a reference into the structure, not a copy, so they can be updated in the natural way.

Examples of view use:

$a := /name + /email * 0            # start with empty array
$v = $a/++                          # grow as entries appear
$v/name = "Sauron"                  # set fields
$v/email = "barad-dur@hotmail.com"
...
$v = $a/name/Saruman                # search on any field
"Saruman's address is %s\n" $v/email

$v = $a/42 # or index directly
"Darklord 42 is %s, email %s\n" $v/name $v/email

Control Structures

There are three control structures: if, while and labels, each followed by a block of code. A block is enclosed in curly brackets {}.

If

The if symbol is a question mark, followed by an expression and a block, and optionally by an else (another question mark) and another block. The expression is considered true if it's a number that isn't zero, a string that isn't empty (""), a network connection or named pipe that's open or a file variable of a file that exists. For example, the hello world script can accept an optional parameter to replace world:

#!/bin/nesh

? $1                    # does the command line parameter contain any text?
{
    "Hello %s\n" $1     # yes it does, so use it
}
?                       # else
{
    "Hello world\n"     # no it doesn't, so default
}

While

The while symbol is a double question mark followed by an expression as for the if symbol. For example, a simple text search script with usage like grep:

#!/bin/nesh

$f := $2                # bind to name of file to search
$p = $f                 # read file into $p
?? $p                   # while there's any of it left...
{
    $x = $p[\ "\n"]     # extract the first line, chopping it off the remainder
    $t = $x[$1 0]       # try to find the target string in it
    ? $t                # did this produce anything?
    {
        $x              # output the line containing the text
    }
}

More ambitiously, an http proxy:

$w := ":3128" # become a server on this port

?? "cows not home yet"       # run forever
{
$c = $w                     # pick up a client connection
$r = $c                     # get their request
$s = $r("GET http://" "/") # parse servername
$x = $r("GET http://" 0]
$p = $x["/" " ")            # and page
$r = $x("\n" 0)             # and rest of request
$x = "%s:http" $s           # target server and port
$b := $x                    # make connection
$b = "GET %s HTTP/1.0\n%s\n\n" $p $r    # send command
$c = $B                     # get and forward full page
}

Unlike if, the while condition can also be a view into a structure, and unlike most uses of view variables it can usefully be non-unique. The while block is executed once for each match, with the matching view put into a special variable ($) meaningful only within that block.

For example, assuming $a is a structure from the structure/view examples above:

"The following users (if any) have hotmail addresses\n"
?? $a/email/hotmail
{
"%s (%s)\n" $/name $/email
}

The whole structure may also be used as a condition, meaning loop over every element.

Labels and Sleep

Scripts such as general two-way proxies often have to be waiting on input from more than one source, or other events, and react to whichever arrives first. This is done by going to sleep and providing labels for blocks to execute when something happens. Note the labels are not pre-emptive interrupt handlers, they are only given control when the script is asleep.

At the end of a label's block of code the script goes back to sleep. Since the sleep command can take a time-out parameter and the script may sleep in different places, it's significant that return from a label is to the same sleep command and that the time-out period is restarted then.

Labels for network connections or named pipes consist of a variable name followed immediately by a colon and a block of code in curly brackets. Labels for signal handlers consist of the signal number followed immediately by a colon and the block. Sleep consists of a colon with an optional time-out period in milliseconds.

For client variables, standard input and named pipes "incoming" means data ready to be read from the variable. For server variables it means a client is trying to make contact.

Here's a simple telnet example for protocols which (unlike telnet itself) just pass user-suitable strings back and forth, such as echo, finger and http:

#!/bin/nesh

# parameters are hostname and port, eg tel localhost finger

$k := 0 # get a keyboard input variable
: # and sleep until something happens

$k:                             # user typing
{
        ? $w                    # are we already connected?
        {
            $w = $k             # then send command to server
        }
        ?                       # not connected, could be first time through, or a protocol
                                # that needs a new connection for each command
        {
            $w := "%s:%s" $1 $2 # open network connection
            $w = $k             # and send the command
        }
}

$w: # incoming from network
{
$w # display network message
}

Here's an example of cleaning up on control-C (signal 2)

$f := "datafile"                # access database file
$a = %m $f                      # read as memory format structure
...                             # fiddle with it

2:                              # jump here on control-C when asleep
{
    $f = %f $a                  # save to file in flat binary format
    . 0                         # and exit with success code
}

Forking

Forking splits the script into two separate processes at the point the fork command (the @ character) is executed. The fork command returns true to the parent process and false to the child, so it's normally used in an if statement to allow each side to know which it is.

A default SIGCHLD handler in the nesh interpreter waits when a child terminates before the parent so zombie processes are cleared up without the script's intervention.

A somewhat complex but typical example of forking is a generic transparent proxy script which retains a single control process listening for new clients while forking off a worker-process to deal with each new client. The parent sleeps waiting on new connections, and each worker-process sleeps waiting for new traffic in either direction. Since all traffic goes through here, it can be monitored, logged or modified as required.

Note the exit command (dot, followed by numerical error code or string termination message), needed to eliminate worker-processes after their connection ends. Only the command process remains running permanently.

#!/bin/nesh

# first parameter is destination machine and port, second is port for this service

$w := $2 # listen on this port

$w:                              # new client connection arrives
$s = $w                          # make connection
? @                              # split off a worker
{
    $s := 0                      # parent closes connection
}
?                                # worker starts here
{
    $b := $1                     # and pass through
    $w := 0                      # child discards listener
}

$s:                              # worker gets message from client
{
    $a = $s
    ? $a                         # if it's data
    {
        $b = $a                  # pass it on
        # data in $a can be logged or fiddled with here
    }
    ?                            # if it's not data, the connection is closing
    {
        . 0                      # normal exit, worker must die here or it
                                 # would hang around forever
    }
}

$b:                              # traffic going the other way, mirror of above case
{
    $a = $b
    ? $a
    {
        $s = $a
    }
    ?
    {
        . 0
    }
}

Running Programs

Programs can be run by putting commands in backward single quotes (`), providing their output as their string value. The command itself can be a literal (in double quotes) or a variable - it can't be loose text. So a simple remsh utility for executing context-less commands interactively on a remote machine would be:

#!/bin/nesh

$w := $1                # become a server on this port
?? 1
{
        $s = $w         # get a connection from a client
        $s = `$s`       # execute incoming command and return results
}

This can be used with the nesh telnet above by tel hostname port and typing commands.

Arithmetic

Variables are of type number if a number is put into them. Numbers consist of digits not enclosed in quotes. The arithmetic operations are +, -, * and / with the usual precedence rules. So, for example, scanning ports up to 10000 can be done with:

#!/bin/nesh

$p = 0 # port counter

?? $p - 10000                     # is it 10000 yet?
{
    $w := "localhost:%d" $p       # try opening it
    ? $w                          # any luck?
    {
        "Port %d works\n" $p
    }
    $p = $p + 1                   # increment counter
}

Format Operators

The format of data can be changed in various ways by format operators, consisting of the % character and a letter, somewhat like printf formatting commands. The format operator is followed by an expression of the appropriate input type and can be assigned or output as a value of the appropriate output type.

Specifically:

    %d changes a string to a number, or 0 if it's not a valid number
    %s changes a number to a string
    %t changes a binary string to text, each byte becoming two hexadecimal characters
    %b changes a string of hex characters into a binary string, or goes wrong if the input isn't valid
    %f changes a structure variable into a flat binary dump for filing
    %m reverses %f, restoring a structure from the flat format
    %l makes a structure from a text buffer by making each line an element of the /line field
    %e changes a structure into XML-like format for export to non-nesh systems

This page written and maintained by Jeremy Maiden, last updated 19th April 2001.