cupl — interpreter for the CORC and CUPL languages


cupl [-f fieldwidth] [-v nnn[y]] [-w linewidth]


The -w options sets the line width (default 80).

The -f option sets the field width (default 20).

The -v option enables debugging output. At level 1, the parse tree is prettyprinted. At level 2, definition/reference counts for each variable and label are printed after each run. At level 3, an execution trace is displayed as the parse tree is printed. At level 4, each token intern and cons-cell allocation during parsing is also dumped. A suffix of y enables parser debugging messages.


CUPL was an early (1966) teaching language implemented as a batch compiler on the IBM/360 at Cornell University. It was descended from an earlier (1962) experimental language called CORC (CORnell Compiler), which was in turn derived loosely from Algol-58. Statements made without qualification about CUPL below also apply to CORC.

CUPL is documented in CUPL: The Cornell University Programming Language, by R.J. Walker, a manual first printed in November 1966. This implementation is based on the July 1967 second printing.

CORC is documented in An Instruction Manual For CORC, by R.W. Conway, W.L. Maxwell, R.J. Walker. This implementation tracks the 2nd edition of September 1963.

The purpose of this implementation is to preserve a CUPL/CORC implementation for the edification of historians and students of programming-language design. CUPL and CORC were representative of a significant class of teaching languages in its period, and study of their design casts a clear light on the preoccupations of their time.

Introduction to the Languages

The source distribution includes, in the file cupl.doc, a transcription of all the relevant parts of the CUPL manual (the bulk of the text is a general tutorial on scientific programming). Another file, corc.doc, similarly excerpts the CORC manual.

CUPL has only one scalar type, a long floating-point real corresponding to C double (round-off rules coerce scalars to integer in contexts like subscripting). It supports vector and matrix aggregates, and has operations specialized for linear-algebra calculations. There is no function abstraction and all variables are global; program chunking is achieved through BLOCK or BEGIN blocks which resemble parameterless subroutines.

CUPL rather resembles early BASICs, minus BASIC's string facility. It is oriented towards scientific calculation and linear algebra, and would be nearly impossible (or, at any rate, extremely painful) to use for anything else.

The programming-support features of CUPL and CORC resembled those of the better-known WATFOR and WATFIV compilers, incorporating elaborate error-correction and trace output features using a runtime monitor.

The only incompatibility between the CUPL and CORC languages documented was the interpretation of GO TO <label> when <label> is associated with a block. In CUPL, this is a go to beginning of block; in CORC, it's go to end of block (which in CUPL is GO TO <block> END. The interpreter switches on CORC interpretation whenever it detects a CORC-specific word (such as NOTE) during lexing.

The CORC statement TITLE and the triple iteration construct have no counterparts in CUPL.


We reproduce here a nearly exact transcription of Appendix A of the Walker manual, CUPL: The Cornell University Programming Language.

Tags of the form [See m-n: ...] not part of the original document; they are references (by section-page number in the original manual) to notes which follow the appendix transcription. These notes are also excerpts from the manual.

There is one correction in the text. The original manual listed both LN and LOG as built-in functions and wrote "LOG(a)" is "natural log of a". We believe this is incorrect; we have changed the "LOG(a)" in the original to LN(a) and inserted a new "LOG(a)" entry. This implementation's LOG function is, accordingly, log_10() and not log_e().

There are a few typographical changes to fit it into the ASCII character set. The differences:

  • ^[-+]nnn is used to render exponent superscripts.

  • subscripts are simply appended to their metavariables.

  • `x' between digits is used to render the multiplication sign.

  • lines of dashes below headings indicate underscores.

  • page breaks in the original are represented by form feeds here.

Otherwise, the appendix A transcription is exact, even down to hyphen breaks and spacing. ^L represents a page break. In the following notes, hyphen breaks and exact spacing are not preserved, but the original text is, with the following additional typographical changes:

  • |a| is used to render the absolute-value operation.

  • <= and < are used to render non-ASCII symbols.

  • In the original, a couple of instances of |x * 10**n| and |y * 10**n| were actually set as |10^nx| and |10^ny|, where ^n represents a superscript. This is excessively hard to read in ASCII.

The combination of Appendix A and the notes includes essentially all the manual's documentation of the CUPL language itself. We have not transcribed appendix D, "Error Considerations and Actions", nor appendix B, "Functions", because the former depends on the parsing machinery of the original compiler and the latter documents range restrictions and precision constraints for the special functions (which are not duplicated in our implementation).

The following .cupl files, included with this distribution, are also transcribed from the manual. We include every non-pathological program example. The following list maps programs to original page numbers:

cubic.cupl -- 7-7  (boxed)
fancyquad.cupl -- 2-8  (coding form example)
poly11.cupl -- 7-9  (boxed)
power.cupl -- 3-4  (exercise 5b) 
prime.cupl -- 12-5 (boxed)
quadratic.cupl -- 2-2  (boxed)
random.cupl -- 10-2 (boxed)
rise.cupl -- 3-15 (exercise 7)
simplequad.cupl -- 2-1  (boxed)
squares.cupl -- 5-7  (output example)
sum.cupl -- 3-4  (exercise 5a)

We have supplied leading comments for most of these; they are otherwise unaltered.

                          Appendix A

                        Summary of CUPL


          Letters: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
          Digits:  0 1 2 3 4 5 6 7 8 9
          Special Characters:  + - * / ( ) . , ' =

          Normal decimal usage:  e.g. 3, 1.725, -.06 .
          "Scientific" notation:  -1.2E-3  for -1.2 x 10^3 .
          Truncated to 9 significant figures by the system.
          Range:  Absolute values from 10^-78 to 10^76, and 0 .

     Variables and Labels
          a.  Consist of 1 to 8 letters or digits, beginning with
              a letter; no blanks or special characters.

          b.  Must not be one of the following "reserved words":

                 ABS        DOT     IF    OR        THEN
                 ALL        ELSE    INV   PERFORM   TIMES
                 ALLOCATE   END     LE    POSMAX    TO
                 AND        EXP     LET   POSMIN    TRC
                 ATAN       FLOOR   LN    RAND      TRN
                 BLOCK      FOR     LOG   READ      WATCH
                 BY         GE      LT    SGM       WHILE
                 COMMENT    GO      MAX   SIN       WRITE
                 COS        GT      MIN   SQRT
                 DET        IDN     NE    STOP

          c.  Must be unique--same name cannot be used for both a
              label and a variable.

          d.  Variables have automatic initial value of zero.

          e.  Variables can be scalar, vector, or matrix.  [See 8-3 and 9-8]

                If A   is a matrix variable:

                   A        refers to the entire matrix.
                   A(I,J)   refers to the element at the
                              intersection of the ith row
                              and the jth column
                   A(*,J)   refers to the jth column [see 9-2]
                   A(I,*)   refers to the ith row    [see 9-2]

                If V is a vector variable:

                   V refers to the entire vector
                   V(I) refers to the ith component.

              [See 11-3 for subscript round-off rules]

          f.  A vector is a 1-column matrix

          g. IDN is the identity matrix of any appropriate size.

     Arithmetic Operators
          a.  +, -, /, * for multiplication, ** for exponentiation

          b.  Normal precedence:  **, * and /, + and -
                                  Parentheses from inner to outer.
                                  Sequence of + and - from left to

          c. Types of operand:    +, -, /, *, ** for scalars
                                  +, -, *, and numerator of / for
                                     vectors and matrices.

          No spaces, or splitting at end of line, in any number,
          variable, label, reserved word, or **.  Spaes allowable
          anywhere else.

          Form                Action              Type of Argument
          ----                ------              ----------------
          ABS(a)        absolute value of a      any expression
          ATAN(a)       arctangent of a, in      numerical valued
                          radians                  expression
          COS(a)        cosine of a, a in        numerical valued
                          radians                  expression
          EXP(e)        e raised to a power      numerical valued
          FLOOR(a)      greatest integer not     numerical valued
                          exceeding a              expression
          LN(a)         natural log of a         numerical valued
          LOG(a)        log to base 10 of a      numerical valued
          SQRT(a)       positive square root     numerical valued
                          of a                     expression
          SIN(a)        sine of a, a in          numerical valued
                          radians                  expression
          MAX(a,b,...)  maximum value of all     any expressions
                          elements of all

          MIN(a,b,...)  minimum value of all     any expressions
                          elements of all
          RAND(a)       next pseudo-random       any expression
                          number, or array
                          of numbers, in a
                          sequence in which
                          a was the last
          DET(a)        determinant of a         square array valued
          DOT(a,b)      dot product of a         vector-valued expres-
                             and b                 sions of equal
          INV(a)        inverse of a             square array valued
          POSMAX(a)     row position of maxi-    array valued expression
                          mum element of a
          POSMIN(a)     row position of mini-    array valued expression
                          mum element of a
          SGM(a)        sigma of a (sum of       array valued expression
                          all elements)
          TRC(a)        trace of a (sum of       array valued expression
                          elements on prin-
                          cipal diagonal)
          TRN(a)        transpose of a           array valued expression

          Symbol   With scalar expressions   With array expressions
          ------   -----------------------   ----------------------
            =      equals                    all corresponding ele-
                                               ments equal
            NE     not equal to              at least one pair of
                                               corr. ele. not equal
            LE     less than or equal to     all corr. ele. less than
                                               or equal
            GE     greater than or equal     all corr. ele. greater
                     to                        than or equal
            LT     less than                 all corr. ele. less than
                                               or equal; at least one
                                               less than
            GT     greater than              all corr. ele. greater
                                               than or equal; at least
                                               one greater than

     [See 11-4 for round-off rules applying to relations]


     The following symbols are used in the statement descriptions:

          v1, v2, ...            variables (scalar, vector, or
                                 matrix except as noted)

          r1, r2, ...            relations

          slabel1, slabel2, ...  statement labels

          blabel1, blabel2, ...  block labels

          e1, e2, ...            arithmetic expressions (a meaningful
                                 and conformable combination of
                                 numbers, variables, functions, and
                                 arithmetic operators)

Statements should begin in column 10 of the programming form.  If
continued onto more than one line, the second and aubsequent lines
should begin in column 15.  Columns 73 to 80 must not be used.

Any statement may be given a label--beginning in column 1 of the

     Assignment Statement
          LET v1 = e1

     Sequence Control Statements
          GO TO slabel1

          GO TO blabel1        Used only inside block 'blabel';
                               causes skip to end of block.

          IF e1 r1 e2 THEN s1 ELSE s2    where s1 and s2 are any type
                                         of statement except IF or

                      Either the THEN phrase or the ELSE phrase may
                      be omitted, but one or both must be given.

               Compound conditions may be used:

                      IF e1 r1 e2 AND e3 r2 e4 AND ... THEN s1 ELSE s2

                      IF e1 r1 e2 OR e3 r2 e4 OR ... THEN s1 ELSE s2

               but AND and OR phrases may not be mixed in the same
               statement.                 ---


     Iteration Control Statements
          A 'block' consists of a sequence of statements preceded by

              blabel1 BLOCK   and followed by   blabel1 END

     A block may be located anywhere in the program; it is executed
         only by a PERFORM statement calling it by name.  Blocks
         may be nested but not overlapped.  A block may contain
         any kind of statent, including PERFORM, except for a
         PERFORM referring to the block itself.

     PERFORM blabel1

     PERFORM blabel1 e1 TIMES    where e1 has integer value.

     PERFORM blabel1 WHILE e1 r1 e2

         Compound conditions may be used:

              WHILE e1 r1 e2 AND e3 r2 e4 AND ...

              WHILE e1 r1 e2 OR e3 r2 e4 OR ...

         but AND and OR phrases may not be mixed in the same

     PERFORM blabel1 FOR v1 = e1, e2, ...

                     FOR sv1 = e1 TO e2 BY e3    where sv1 is
                                                  a scalar vari-

                          The order of the TO and BY phrases can
                          be reversed; the BY phrase can be
                          omitted if e3 = 1.   [See 8-3 for more]

     Communication Statements
          READ v1, v2, ...   [See 5-2 and 9-2 for description]

          WRITE v1, v2, ... , 'title message', ... , /v3, /v4

                    Three types of elements may appear in the list
                    after WRITE:

                       1.  Variable.  Prints:

                              name of scalar and current value;

                              name of vector and current values of

                              name of each row vector of matrix and
                              current value of components.

                       2.  Variable preceded by a / . Current values
                              only are printed.

                       3.  A message enclosed in single quotes.  The
                              exact image will appear as a title on
                              the output.  Any characters except the
                              quote may be used in such a message.  A
                              message cannot continue onto a second
                              line on the programming form--a separate
                              item in the same or another WRITE state-
                              ment must be used.

                       [See 5-3 and 8-10] for more formatting details]

          [WRITE ALL  See 4-12 and 8-3 for a description]

          COMMENT   A comment line can be inserted at any time by
                    writing COMMENT in the label field (columns 1-7) of
                    the programming form.  Such a line will appear in
                    the program listing, but has no effect on execution.

     Dimensioning of Vectors and Matrices
          ALLOCATE mv1(e1, e2), vv2(e3), ...  where mv1  is a matrix
                                   variable and vv2 is a vector
                                   variable, and e1, e2 and e3 have
                                   integer values.

              When space is initially allocated to an array, the
              values of all the elements are zero.  If space is
              later changed by another allocation the values of
              those elements common to the old and the new alloca-
              tions are unchanged; the values of new elements are
              zero.  [See also 9-1]

     Tracing Changes in Value During Execution

          WATCH v1, v2, ...   where v1  and v2 are scalar variables.

              This will cause the system to monitor the values of
              each of the variables listed and print the new value
              each time one of the listed variables is assigned a
              value by a LET or READ statement.  This operation is
              temporary and is automatically discontinued for a
              particular variable after 10 such assignments.  [See 8-3]


     Data to be read by the execution of the READ statements is
     provided on the same form, after the last statement of the
     program.  The first data line is indicated by writing *DATA
     in columns 1 to 5.  Data items may be entered on this line
     beginning in column 7 and in columns 1 tto 72 of any following


     Data items are separated by commas.  An item may be either a
     number,  n1,  or an expression of the form  v1 = n1 .  The
     latter form is for checking purposes--it must correspond to the
     variable v1 in the associated READ statement.

     Data items are read in sequence as the program is executed.
     To an array with q elements appearing in a READ statement,
     there must correspond q successive items in the data list.
     If the array is a matrix, the items must be ordered by rows.

Other quotes:

From 4-12:

     Another statement that can be used for checking purposes is

           WRITE ALL

This will cause the values of all of the variables in the program to be
printed.  There is no WATCH ALL statement.

From 5-2:

     If READ v occurs in the program and the corresponding entry on the
datalist is w=n, where w is not the same as v, an error message is given.
The value n is assigned to v, no change is made in w, and the program
continues.  Thus the inclusion of items of the type v=n in the data list
provides checks against the accidental omission of data or inclusion of
extra numbers.

     If the total data list is too short, the machine will give an error
message and will give the value 1 for each of the missing numbers.  If the
list is too long the extra entries will be ignored but no error message will
be given.

From 5-3:

     Numbers are given to 9 significant figures. A number n /= 0 in the range
.0001 <= |n| < 100000 is printed in the usual decimal form, e.g. -327.512736,
0.0243472150 .  Zero is printed simply as 0.  All other numbers appear in the
form mEp with 1 <= |m| < 10, e.g. -2.31562007E+04, 5.00000000E-17 .

     A line of output is divided into six "fields", each 20 characters long.
Each variable name or value occupies one field.  The decimal point of a
number always comes at the seventh position in a field.  Each WRITE statement
starts a new line, but within a given WRITE statement fields are used 
consecutively, new lines being started as necessary.  The one exception occurs
when the name of a variable would come at the end of a line and its value
on the next line; in this case, the last field is left blank and the name
starts the next line.

     A field may be purposely skipped by simply omitting a variable name;
for example,

          WRITE ,A,,,/B

will put A and its value in fields 2 and 3, and the value of B in 6,
leaving fields 1, 4 and 5 vacant.  The statement 


will cause a whole line to be skipped.

From 8-3:

     The only restrictions on the use of a subscripted variable are in

          PERFORM k FOR v = ---


          WATCH v,...

Here v must be a non-subscripted variable.  Also, WRITE ALL will print only
the values of the non-subscripted variables.

From 8-10:

     1. A new line is started for each vector and each row of a matrix.

     2. Te values of the elements of a vector or a row of a matrix are
put successively in fields 2 through 6, repeating as necessary.

     3. In field 1 of the first line used for a vector or for a row of
a matrix is put the name of the vector or a symbol for the row of the matrix,
unless these are suppressed by a slash before the array in the WRITE list.
The slash does not change the spacing described in 2 above.

     4. A variable, subscripted or not, appearing in the WRITE list immediately
after an array, starts a new line.

     There is one exception to these rules.  If M is a matrix with only one
column, to save space it will be printed as if it were a vector, that is, in
the form

          M = m11      m21      m31     etc.

From 9-1:

Vectors as matrices
     Consideration of relations involving matrices and vectors can be
simplified by regarding a vector as a 1-column matrix.  This convention is
adopted in CUPL, and so, for example

          ALLOCATE X(7)


          ALLOCATE X(7, 1)

have precisely the same meaning.  After either of these allocations the
variables X(2) and X(2, 1) are meaningful and have the same value; X(M, N)
is meaningful only if N has the value 1.

From 9-2:

     For any matrix M, N(*, J) denotes the 1-column matrix (a vector)
which is the J-th column of M.  Similarly, M(I, *) denotes the 1-row matrix
(not a vector) which is the I-th row of M.  If M is m x n, then M(*, J) and
M(I, *) are m x 1 and 1 x m, and they can be used as matrices of these
sizes in any statement except ALLOCATE.  For example:

     READ M(*, 3)

will read data into the third column of M, leaving the rest of M unchanged.

From 9-8:

     So much space is needed to compute INV(A) or DET(A) that the size of
A is limited to 40x40 in these expressions.

From 11-3:

Automatic Integer Round-off
a. The value of a subscript is rounded to the nearest integer.

b. If the round-off involves a change of greater than 10^-9 (approximately)
   an error message is given.

From 11-4:

Automatic Relative Round-off for x r y
a. If both x and y are zero the condition is applied as it stands.

b. If either x or y is not zero:

   (i)   Both x and y are multiplied by 10**n, where n is chosen so that
         the larger of |x * 10**n|, |y * 10**n| lies between .1 and 1.

   (ii)  x * 10**n and y * 10**n are truncated to 14 decimal places

   (iii) The specified condition is interpreted on the resulting numbers.


We include here a transcription of appendix F of our reference document, An Instruction Manual For CORC, R.W. Conway, W.L. Maxwell, R.J. Walker.

There are a few typographical changes to fit it into the ASCII character set. The differences:

  • ^[-+]nnn is used to render exponent superscripts.

  • The square root radical sign surrounding b is rendered as b^-2

  • |a| is used to render the absolute-value operation.

  • `x' between digits is used to render the multiplication sign.

  • lines of dashes below headings indicate underscores.

  • page breaks in the original are represented by ^L here.

Error corrections:

  • Under `Sequence Control Statements', the IF keyword in the three if-statement templates was erroneously typed as `If'.

  • Under `Iteration Control Statements', the first AND keyword in the compound-AND example was incorrectly lowercased. In item 1 of the BEGIN explanation, the word `statement' was incorrectly uppercased.


  • Tags [See *1], [See *2], etc, are not part of the original; they reference footnotes following the transcript.

  • Tags such as [See 5-8] are not part of the original, they reference other quotes from the text given below by page number.

Otherwise, the appendix F transcription is exact, even down to hyphen breaks and spacing (allowing for the fact that the original typewriter spacing was somewhat irregular...).

The combination of Appendix F and the notes includes essentially all the manual's documentation of the CORC language itself. Most of the text is tutorials and exercises.

This implementation preserves the CORC-62 distinction between LOG and LN, contrary to the transcript below which describes CORC-63 (which identifies both with the natural-log function). CORC-62 also lacked the INT function, allowed only non-compound logical expressions in REPEAT...UNTIL, and allowed an alternate spelling `TIME' of `TIMES'.

The following .corc files, included with this distribution, are also transcribed from the manual. We include every complete program example. The following list maps programs to original page numbers:

simplecorc.corc         -- 4-6
gasbill.corc            -- 4-9
hearts.corc             -- 4-10
sumsquares.corc -- 5-4
powercorc.corc -- 5-6
factorial.corc -- 5-6
quadcorc.corc -- 5-9
title.corc -- 7-3 (note: this one uses continuations)

We have supplied leading comments and test data for these programs; they are otherwise unaltered.

                                APPENDIX F
                       Summary of the CORC Language

     Acceptable Characters
          Letters: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
          Digits:  0 1 2 3 4 5 6 7 8 9
          Special Characters:  + - / ( ) * $ , . =

          Normal decimal usage -- + sign may be omitted; decimal point
                    may be omitted from an integer.
          Only 11 significant figures will be considered (8 on Burroughs
          Output: 8 significant figures.
          Acceptable range: Absolute values from 10^-308 to 10^+308
                    (10^-49 to 10^+50 on Burroughs 220), and 0.
          Scientific notation: 1.2345 x 10^6 may be written 1.2345*10$(6).

     Variables and Labels
          a. 1 to 8 letters or digits; no blanks or special characters.
          b. First character must be a letter.
          c. Must not be one of the following "reserved" words:

                    ABS     DECREASE   GO         LET   NOTE      STOP
                    AND     ELSE       GTR        LN    OR        THEN
                    ATAN    END        IF         LOG   RAND      TIMES
                    BEGIN   EQUAL      INC        LSS   READ      TITLE
                    BY      EXP        INCREASE   MAX   REPEAT    TO
                    COS     FOR        INT        MIN   SIN       UNTIL
                    DEC     GEQ        LEQ        NEQ   SQRT      WRITE

          d. Statement labels must be unique -- particular label appears only
                    once in label field.
                    Block label used for only one set of BEGIN-END statements.
          e. Variables must be listed in Dictionary. If no initial value
             is given zero is assumed.  [See *1]
          f. One or two subscripts, enclosed in parentheses.  May be
             any expression (including subscripted variables) whose
             value at execution is a positive integer not greater than
             the maximum declared for this variable in Dictionary. [See D-2]
             In Dictionary and Input Data subscripts given as integers in
                    subscript field; parentheses are not used.  [See *1]
     Arithmetic Operators
          + Additions
          - Subtraction
          / Division
          * Multiplication (must be expressed, no implicit multiplication)
          $ Exponentiation; a^b written as a $(b)

     Rules of Precedence
          a. Expressions in parentheses first, from inner to outer.
          b. $ before *, /, + or -
          c. * or / before + or -
          d. Sequence of + and - from left to right

          Argument any expression, a:
                    ABS(a)   |a|
                    EXP(a)       e^a
                    SIN(a)   sin a, a in radians
                    COS(a)   cos a, a in radians
                    ATAN(a)  arctan a, a in radians
                    INT(a)   [a], greatest integer less than or equal to a

          Arguments two or more expressions, a, b,  ... f:
                    MAX(a, b, ... , f) value equal to greatest of expressions
                    MIN(a, b, ... , f) value equal to least of expressions
          Argument  any positive expression, b:
                    LN(b) or LOG(b) log b, natural logarithm of b
          Argument any non-negative expression, b:
                    SQRT(b)        + b^-2 
          Argument a variable, v:
                    RAND(v) next in a sequence of pseudo-random numbers.
                        See Chapter 7

     Relations (used only in IF and REPEAT ... UNTIL statements)
          EQL =              NEQ !=
          LSS <              LEQ <=
          GTR >              GEQ >=
     Card Numbers
          Strictly increasing sequence of 4 digit numbers, beginning
               with 0010.
          Initially right digit should be zero on all cards to leave
               room for later additions.

     Program Arrangement [See *1]
          Arrange and number programming forms in the following order:
                    1. Preliminary Description
                    2. Dictionary [See 2-10]
                         List all variables to be used
                         Specify initial values if not zero
                         Specify maximum value for each subscript.
                    3. Program Statements
                         One statement per line; for continuation place
                              C in column 21 of following line and begin
                              in column 42.  Do not break in middle of
                              variable, label, or number.
                         THEN, ELSE, OR and AND phrases of IF statement
                              on following lines, beginning in col 42,
                              but C in col 21 not required.
                    4. Input Data
                         Variables, Subscripts and values of data to
                              be called by READ statements in program.

     CORC Statements
     The following symbols are used in the statement descriptions:

          v, w, x   variables
          r, s, t   relations
          a, c      statement labels
          b         block label
          e, f, g, h, j, k    arithmetic expressions (any meaningful
                                 combination of numbers, variables,
                                 functions and arithmetic operators)

     Computation Statements
          LET v = e
          INCREASE v BY e    or   INC v BY e 
          DECREASE v BY e    or   DEC v BY e 
     Sequence Control Statements
          GO TO a             Statement with label a to be executed next.
          GO TO b             Used only inside block b; causes skip to
                              END of block b.

          IF e r f            Go to statement a if condition e r f is
             THEN GO TO a     satisfied; otherwise go to statement c.
             THEN GO TO c

          IF e r f            Go to statement a if all of the conditions
             AND g s h        listed are satisfied; otherwise go to
             ...              statement c.
             THEN GO TO a
             THEN GO TO c

          IF e r f            Go to statement a if all of the conditions
             OR g s h         listed is satisfied; otherwise go to state-
             ...              ment c.
             THEN GO TO a          AND and OR phrases cannot be mixed in
             THEN GO TO c          the same IF statement.

          STOP                Last statement in execution of program; not
                              necessarily written last on Program State-
                              ment sheet.

     Iteration Control Statements
          b BEGIN  Define limits of a block; b in label field, BEGIN-END.
          b END    statement field.
               1. Block may be entered (executed) only by REPEAT statement.
               2. Block may be located anywhere in program.
               3. Blocks may be nested, but not overlapped.
               4. Block b may contain any type of CORC statement, in-
                  cluding REPEAT, but not REPEAT b, ... , a REPEAT
                  Statement referring to itself.
          REPEAT b e TIMES     Value of e a non-negative integer.
          REPEAT b UNTIL e r f Continue repetition of block b until
                               condition e r f is satisfied.
          REPEAT b UNTIL e r f AND g s h AND ... Continue repetition
                               of block b until all conditions listed
                               are satisfied.  (Not available on Burroughs 
          REPEAT b UNTIL e r f OR g s h OR ... Continue repetition of
                               block b until any one of the conditions
                               listed is satisfied.  (Not available on
                               Burroughs 220)
          REPEAT b FOR   v = e, f, g, ..., ..., (h, j, k), ... Repeat
                               block b once for each expression on list,
                               with value of expression assigned to
                               variable v.  Three expressions on list
                               enclosed in parentheses mean from h to k
                               in steps of k. [See 5-8]

     Communication Statements
          READ v, w, x, ...    Read an Input Data card for each variable
                               on list; variables on cards read must
                               agree with variables on list.

          WRITE v, w, x, ...   Print variable and current value, three
                               to a line.  Each WRITE statement starts a
                               new line.

          TITLE message        Print "message" in computational results
                               when this statement is encountered in
                               execution of program.

          NOTE message         "Message" will appear in copy of program
                               only, not in execution.  Used for program
                               notes only.

[*1] This implementation of CORC does not support or allow a Dictionary section. Instead, variable initializations must be done via CUPL-style DATA and ALLOCATE statements.

[*2] Ignore this section. Program statements are free-format, with continuations not supported (though the example program test/title.corc shows the syntax, it will break cupl). Data for read statements is accepted in CUPL format following the keyword *DATA.

For completeness, however, the Dictionary feature is documented here.

From 2-10:

Dictionary of Variables
     In addition to a set of statements CORC requires a pro-
gram to contain another part known as the Dictionary.  The Diction-
ary of a program is merely a list of all the variables used in
the program. along with, if desired, the initial assigned values of
the variables.  If no initial value is specified the computer
assigns the initial value zero.

From 2-11:

     In the above example the Dictionary might look like this:

     A     1
     B    -1
     C    -6

The `above example' is the simplecorc.corc program. The CORC Dictionary is equivalent to a CUPL DATA section, but also allowed the programmer to dimension array variables. The example form on 2-12 makes it clear that the Dictionary was distinguished from the program proper by being in a different lower range of card line numbers.

From 5-8:

Three expressions on a list, enclosed in parentheses, are in-
terpreted in the following way:
     1. The first expression gives the initial value for the vari-
     2. The second expression gives the difference between con-
        secutive values.
     3. The third expression indicates where to stop -- the final
        value for the variable is less than or equal to the value
        of the third expression.

[... examples omitted ...]

    More than one such "triple" may be used on a list, and
"triples" may be intermixed with separate expressions;

From 6-5:

   A particular variable either has no subscripts, one sub-
script, or two subscripts and this use must be consistent through-
out a program.  A variable cannot appear as X(I) in one statement
and X(I, J) or just X in another statement of the same program.
The nature of a variable (whether it is to be subscripted or not)
must be indicated when the variable is listed in the CORC Diction-
ary.  This is done by giving the maximum value of any subscripts
that will be used in columns 21-25 of the Dictionary form.  If no
subscripts will be used these columns will be left blank in the
line for that variable.  If one subscript is to be used, the maxi-
mum value that that subscript will take on anywhere in the pro-
gram must be given in columns 21-23; columns 24-25 are left blank.
(Note that this is the maximum value of the subscript, and not
the maximum value of the variable.)  If two subscripts are to be
used the maximum value of the first is given in columns 21-23
and the maximum value of the second in columns 24-25.  For ex-
ample, if the Dicitonary [sic] looks like the following:

          VECTOR    45
          MATRIX   100     2
          ARRAY      3    32

then SCALAR is a simple variable that will not have any sub-
scripts anywhere in the program.  VECTOR will have one subscript
everwhere it appears in the program [...] MATRIX [...] will 
appear each time with two subscripts [...] ARRAY will also
always have two subscripts; [...]

From D-2:

Automatic Integer Round-off
a. The value of a subscript is rounded off to the nearest
b. If the round-off involves a change of greater than
   10^-9 (approximately;   the number is subject to some
   variation) an error message is given.

From D-3:

Automatic Relative Round-off for x r y
a. If both x and y are zero the condition is applied as it stands.
b. If either x or y is not zero:
   (i) Both x and y are multiplied by MAX(|x|, |y|);
   (ii) The results are rounded off to the nearest 
        integer if this involves a change of less
        than 10^-9 (10^-7 for the Burroughs 220), but
        not otherwise;
   (iii) The specified condition is interpreted for
        the resulting numbers.

Differences from the original CUPL

The most obvious differences are also the most trivial. CUPL was first implemented on an IBM/360 Model 30; CORC on Burroughs 1604 and 220 machines. Both used a small capital-letters-only character set SIXBIT, and followed the archaic IBM practice of using a slashed-O for alphabetic O and plain 0 for zero. Original CUPL/CORC listings thus look rather odd to the modern eye.

The original CUPL was a batch system with a fixed-field card format; labels in columns 1-8, statements in 10-72, statement continuations beginning in column 15 (CORC's format differed only in detail from this). In CUPL, data for the program was supplied following a special *DATA label in the same deck as the program; CORC did not require this marker (it is not clear from the CORC documentation how end-of-program was recognized).

On modern output devices, slashed-0 tends to be used, if at all, for zero. We have not tried to preserve IBM's reversal. Nor have we tried to enforce the columnation requirements, and we don't implement the continuation convention (new CUPL is free-format, with newlines ignored). We do preserve much of the visual appearance of CUPL listings by insisting on all caps and tab-indenting statements. We also preserve the *DATA mechanism for supplying initializations.

More significant differences arise from differences between the word size and floating-point format in CUPL's original host and those of typical modern C implementation. The 360 had a 36-bit word; original CUPL scalars ranged from 1e76 to 1e-78 with nine decimal digits of precision. As for CORC: the Burroughs 1604 was documented as having a much wider range, 1e308 to 1e-308 with 11 digits of precision; the Burroughs 220 supported 1e-49 to 1e50 with 8 digits of precision.

On today's typical 32-bit microprocessor such as an Intel 486, C floats are 32 bits and have roughly 1e+38 to 1e-38 range and 9 digits precision; doubles are 64 bits, with range roughly 1e308 to 1e-304 and 19 digits of precision. This implementation use doubles to emulate CUPL/CORC scalars.

We know from the documentation that the original CUPL compiler ran in 64K of core. The present implementation is easily twice that size. However, given the cycle speeds of the 1960s, it certainly runs a good deal faster that original CUPL, even with interpretation overhead.

We don't implement original CUPL's error-correction facilities. Though clever, they would make the parser forbiddingly complex, and are anyway much less important in an interactive environment.

There are many limits in original CUPL/CORC that we do not enforce. There is no limit on the length of variable names short of the lexer's very long token buffer length. There is no hard limit on the number of statements in a program. There is no hard limit on the size of arrays.

While the format of number output does not exactly conform to the original CUPL/CORC rules, it is sufficiently ugly to please any but the pickiest. We implement all of 5.2 except the fixing of the decimal point at position 7 in each field. Instead we simply use printf(3)'s %f and %e at field-width precision.

Also, by default, we wrap after three 20-char fields rather than 6, so as to fit on an 80-column line. Command-line options to change the line and field widths are available.

Unix Implementation Notes.

The CUPL/CORC implementation is built around YACC and LEX. The rest is ANSI C.

The YACC grammar just builds a parse tree, which is passed to interpret() for interpretation. This method requires that all programs are always small enough that the entire tree can be held in memory, but it has the advantage that front and back end are very well separated. It is a winning strategy on virtual-memory systems.

One hack that greatly simplifies the grammar productions is that the lexer actually returns parse tree nodes, even for atoms like identifiers, strings, and numbers. In fact, the lexical analyzer even does label and variable name resolution with the same simple piece of code; each IDENTIFIER token is looked up in the identifier list when it's recognized, so the parse tree early becomes a DAG. (The -v1 option causes the compiler to dump its parse tree for inspection.)

Most of the smarts are in interpret() and its sub-functions. Because array variables can be re-allocated, the internals have to use a dynamic vector/array type with its own indexing machinery. The code to manipulate this type lives in monitor.c.

Note that much of this machinery is quite generic and could be re-used for other languages with little change.

The implementation trades away some possible efficiencies for simplicity. Most importantly, each value has an attached malloc object to hold its elements, even when there is only one such element (as for scalars) which could reasonably be represented by a static field.

There are some comments in the code which discuss the possibility of a back end that would emit C. This would be easy to do if there were any serious corpus of CUPL/CORC code demanding to be translated. The compiler back end would emit code shaped like the parse tree, which would then link monitor.c as runtime support.

The only nontrivial difference between CUPL and CORC is the interpretation of GO TO <label> when <label> is associated with a block. In CUPL, this is a go to beginning of block; in CORC, it's go to end of block (which in CUPL is GO TO <block> END. The interpreter sets a flag when it sees any of the appropriate CORC-specific keywords (NOTE, BEGIN, DEC, DECREASE, EQL, GEQ, GTR, INC, INCREASE, INT, LEQ, LSS, NEQ, REPEAT, TITLE, UNTIL, $) during lexing, and execute() modifies its behavior appropriately.


Eric S. Raymond .