The -w options sets the line width (default 80).
The -f option sets the field width (default 20).
The -v option enables debugging output. At level 1, the parse tree is prettyprinted. At level 2, definition/reference counts for each variable and label are printed after each run. At level 3, an execution trace is displayed as the parse tree is printed. At level 4, each token intern and cons-cell allocation during parsing is also dumped. A suffix of y enables parser debugging messages.
CUPL was an early (1966) teaching language implemented as a batch compiler on the IBM/360 at Cornell University. It was descended from an earlier (1962) experimental language called CORC (CORnell Compiler), which was in turn derived loosely from Algol-58. Statements made without qualification about CUPL below also apply to CORC.
CUPL is documented in CUPL: The Cornell University Programming Language, by R.J. Walker, a manual first printed in November 1966. This implementation is based on the July 1967 second printing.
CORC is documented in An Instruction Manual For CORC, by R.W. Conway, W.L. Maxwell, R.J. Walker. This implementation tracks the 2nd edition of September 1963.
The purpose of this implementation is to preserve a CUPL/CORC implementation for the edification of historians and students of programming-language design. CUPL and CORC were representative of a significant class of teaching languages in its period, and study of their design casts a clear light on the preoccupations of their time.
The source distribution includes, in the file
cupl.doc
, a transcription of all the relevant
parts of the CUPL manual (the bulk of the text is a general tutorial
on scientific programming). Another file,
corc.doc
, similarly excerpts the CORC
manual.
CUPL has only one scalar type, a long floating-point real corresponding to C double (round-off rules coerce scalars to integer in contexts like subscripting). It supports vector and matrix aggregates, and has operations specialized for linear-algebra calculations. There is no function abstraction and all variables are global; program chunking is achieved through BLOCK or BEGIN blocks which resemble parameterless subroutines.
CUPL rather resembles early BASICs, minus BASIC's string facility. It is oriented towards scientific calculation and linear algebra, and would be nearly impossible (or, at any rate, extremely painful) to use for anything else.
The programming-support features of CUPL and CORC resembled those of the better-known WATFOR and WATFIV compilers, incorporating elaborate error-correction and trace output features using a runtime monitor.
The only incompatibility between the CUPL and CORC languages documented was the interpretation of GO TO <label> when <label> is associated with a block. In CUPL, this is a go to beginning of block; in CORC, it's go to end of block (which in CUPL is GO TO <block> END. The interpreter switches on CORC interpretation whenever it detects a CORC-specific word (such as NOTE) during lexing.
The CORC statement TITLE and the triple iteration construct have no counterparts in CUPL.
We reproduce here a nearly exact transcription of Appendix A of the Walker manual, CUPL: The Cornell University Programming Language.
Tags of the form [See m-n: ...] not part of the original document; they are references (by section-page number in the original manual) to notes which follow the appendix transcription. These notes are also excerpts from the manual.
There is one correction in the text. The original manual listed both LN and LOG as built-in functions and wrote "LOG(a)" is "natural log of a". We believe this is incorrect; we have changed the "LOG(a)" in the original to LN(a) and inserted a new "LOG(a)" entry. This implementation's LOG function is, accordingly, log_10() and not log_e().
There are a few typographical changes to fit it into the ASCII character set. The differences:
^[-+]nnn is used to render exponent superscripts.
subscripts are simply appended to their metavariables.
`x' between digits is used to render the multiplication sign.
lines of dashes below headings indicate underscores.
page breaks in the original are represented by form feeds here.
Otherwise, the appendix A transcription is exact, even down to hyphen breaks and spacing. ^L represents a page break. In the following notes, hyphen breaks and exact spacing are not preserved, but the original text is, with the following additional typographical changes:
|a| is used to render the absolute-value operation.
<= and < are used to render non-ASCII symbols.
In the original, a couple of instances of |x * 10**n| and |y * 10**n| were actually set as |10^nx| and |10^ny|, where ^n represents a superscript. This is excessively hard to read in ASCII.
The combination of Appendix A and the notes includes essentially all the manual's documentation of the CUPL language itself. We have not transcribed appendix D, "Error Considerations and Actions", nor appendix B, "Functions", because the former depends on the parsing machinery of the original compiler and the latter documents range restrictions and precision constraints for the special functions (which are not duplicated in our implementation).
The following .cupl files, included with this distribution, are also transcribed from the manual. We include every non-pathological program example. The following list maps programs to original page numbers:
cubic.cupl -- 7-7 (boxed)
fancyquad.cupl -- 2-8 (coding form example)
poly11.cupl -- 7-9 (boxed)
power.cupl -- 3-4 (exercise 5b)
prime.cupl -- 12-5 (boxed)
quadratic.cupl -- 2-2 (boxed)
random.cupl -- 10-2 (boxed)
rise.cupl -- 3-15 (exercise 7)
simplequad.cupl -- 2-1 (boxed)
squares.cupl -- 5-7 (output example)
sum.cupl -- 3-4 (exercise 5a)
We have supplied leading comments for most of these; they are otherwise unaltered.
Appendix A
Summary of CUPL
ELEMENTS OF THE LANGUAGE
Characters
----------
Letters: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Digits: 0 1 2 3 4 5 6 7 8 9
Special Characters: + - * / ( ) . , ' =
Numbers
-------
Normal decimal usage: e.g. 3, 1.725, -.06 .
"Scientific" notation: -1.2E-3 for -1.2 x 10^3 .
Truncated to 9 significant figures by the system.
Range: Absolute values from 10^-78 to 10^76, and 0 .
Variables and Labels
--------------------
a. Consist of 1 to 8 letters or digits, beginning with
a letter; no blanks or special characters.
b. Must not be one of the following "reserved words":
ABS DOT IF OR THEN
ALL ELSE INV PERFORM TIMES
ALLOCATE END LE POSMAX TO
AND EXP LET POSMIN TRC
ATAN FLOOR LN RAND TRN
BLOCK FOR LOG READ WATCH
BY GE LT SGM WHILE
COMMENT GO MAX SIN WRITE
COS GT MIN SQRT
DET IDN NE STOP
c. Must be unique--same name cannot be used for both a
label and a variable.
d. Variables have automatic initial value of zero.
e. Variables can be scalar, vector, or matrix. [See 8-3 and 9-8]
If A is a matrix variable:
A refers to the entire matrix.
A(I,J) refers to the element at the
intersection of the ith row
and the jth column
A(*,J) refers to the jth column [see 9-2]
A(I,*) refers to the ith row [see 9-2]
^L
A-2
If V is a vector variable:
V refers to the entire vector
V(I) refers to the ith component.
[See 11-3 for subscript round-off rules]
f. A vector is a 1-column matrix
g. IDN is the identity matrix of any appropriate size.
Arithmetic Operators
--------------------
a. +, -, /, * for multiplication, ** for exponentiation
b. Normal precedence: **, * and /, + and -
Parentheses from inner to outer.
Sequence of + and - from left to
right.
c. Types of operand: +, -, /, *, ** for scalars
+, -, *, and numerator of / for
vectors and matrices.
Spacing
-------
No spaces, or splitting at end of line, in any number,
variable, label, reserved word, or **. Spaes allowable
anywhere else.
Functions
---------
Form Action Type of Argument
---- ------ ----------------
ABS(a) absolute value of a any expression
ATAN(a) arctangent of a, in numerical valued
radians expression
COS(a) cosine of a, a in numerical valued
radians expression
EXP(e) e raised to a power numerical valued
expression
FLOOR(a) greatest integer not numerical valued
exceeding a expression
LN(a) natural log of a numerical valued
expression
LOG(a) log to base 10 of a numerical valued
expression
SQRT(a) positive square root numerical valued
of a expression
SIN(a) sine of a, a in numerical valued
radians expression
MAX(a,b,...) maximum value of all any expressions
elements of all
arguments
^L
A-3
MIN(a,b,...) minimum value of all any expressions
elements of all
arguments
RAND(a) next pseudo-random any expression
number, or array
of numbers, in a
sequence in which
a was the last
DET(a) determinant of a square array valued
expression
DOT(a,b) dot product of a vector-valued expres-
and b sions of equal
dimensions
INV(a) inverse of a square array valued
expression
POSMAX(a) row position of maxi- array valued expression
mum element of a
POSMIN(a) row position of mini- array valued expression
mum element of a
SGM(a) sigma of a (sum of array valued expression
all elements)
TRC(a) trace of a (sum of array valued expression
elements on prin-
cipal diagonal)
TRN(a) transpose of a array valued expression
Relations
---------
Symbol With scalar expressions With array expressions
------ ----------------------- ----------------------
= equals all corresponding ele-
ments equal
NE not equal to at least one pair of
corr. ele. not equal
LE less than or equal to all corr. ele. less than
or equal
GE greater than or equal all corr. ele. greater
to than or equal
LT less than all corr. ele. less than
or equal; at least one
less than
GT greater than all corr. ele. greater
than or equal; at least
one greater than
[See 11-4 for round-off rules applying to relations]
STATEMENTS
The following symbols are used in the statement descriptions:
v1, v2, ... variables (scalar, vector, or
matrix except as noted)
^L
A-4
r1, r2, ... relations
slabel1, slabel2, ... statement labels
blabel1, blabel2, ... block labels
e1, e2, ... arithmetic expressions (a meaningful
and conformable combination of
numbers, variables, functions, and
arithmetic operators)
Statements should begin in column 10 of the programming form. If
continued onto more than one line, the second and aubsequent lines
should begin in column 15. Columns 73 to 80 must not be used.
Any statement may be given a label--beginning in column 1 of the
form.
Assignment Statement
--------------------
LET v1 = e1
Sequence Control Statements
---------------------------
GO TO slabel1
GO TO blabel1 Used only inside block 'blabel';
causes skip to end of block.
IF e1 r1 e2 THEN s1 ELSE s2 where s1 and s2 are any type
of statement except IF or
PERFORM.
Either the THEN phrase or the ELSE phrase may
be omitted, but one or both must be given.
Compound conditions may be used:
IF e1 r1 e2 AND e3 r2 e4 AND ... THEN s1 ELSE s2
IF e1 r1 e2 OR e3 r2 e4 OR ... THEN s1 ELSE s2
but AND and OR phrases may not be mixed in the same
statement. ---
STOP
Iteration Control Statements
----------------------------
A 'block' consists of a sequence of statements preceded by
blabel1 BLOCK and followed by blabel1 END
^L
A-5
A block may be located anywhere in the program; it is executed
only by a PERFORM statement calling it by name. Blocks
may be nested but not overlapped. A block may contain
any kind of statent, including PERFORM, except for a
PERFORM referring to the block itself.
PERFORM blabel1
PERFORM blabel1 e1 TIMES where e1 has integer value.
PERFORM blabel1 WHILE e1 r1 e2
Compound conditions may be used:
WHILE e1 r1 e2 AND e3 r2 e4 AND ...
WHILE e1 r1 e2 OR e3 r2 e4 OR ...
but AND and OR phrases may not be mixed in the same
statement.
PERFORM blabel1 FOR v1 = e1, e2, ...
FOR sv1 = e1 TO e2 BY e3 where sv1 is
a scalar vari-
able
The order of the TO and BY phrases can
be reversed; the BY phrase can be
omitted if e3 = 1. [See 8-3 for more]
Communication Statements
------------------------
READ v1, v2, ... [See 5-2 and 9-2 for description]
WRITE v1, v2, ... , 'title message', ... , /v3, /v4
Three types of elements may appear in the list
after WRITE:
1. Variable. Prints:
name of scalar and current value;
name of vector and current values of
components;
name of each row vector of matrix and
current value of components.
2. Variable preceded by a / . Current values
only are printed.
^L
A-6
3. A message enclosed in single quotes. The
exact image will appear as a title on
the output. Any characters except the
quote may be used in such a message. A
message cannot continue onto a second
line on the programming form--a separate
item in the same or another WRITE state-
ment must be used.
[See 5-3 and 8-10] for more formatting details]
[WRITE ALL See 4-12 and 8-3 for a description]
COMMENT A comment line can be inserted at any time by
writing COMMENT in the label field (columns 1-7) of
the programming form. Such a line will appear in
the program listing, but has no effect on execution.
Dimensioning of Vectors and Matrices
------------------------------------
ALLOCATE mv1(e1, e2), vv2(e3), ... where mv1 is a matrix
variable and vv2 is a vector
variable, and e1, e2 and e3 have
integer values.
When space is initially allocated to an array, the
values of all the elements are zero. If space is
later changed by another allocation the values of
those elements common to the old and the new alloca-
tions are unchanged; the values of new elements are
zero. [See also 9-1]
Tracing Changes in Value During Execution
-----------------------------------------
WATCH v1, v2, ... where v1 and v2 are scalar variables.
This will cause the system to monitor the values of
each of the variables listed and print the new value
each time one of the listed variables is assigned a
value by a LET or READ statement. This operation is
temporary and is automatically discontinued for a
particular variable after 10 such assignments. [See 8-3]
DATA
Data to be read by the execution of the READ statements is
provided on the same form, after the last statement of the
program. The first data line is indicated by writing *DATA
in columns 1 to 5. Data items may be entered on this line
beginning in column 7 and in columns 1 tto 72 of any following
lines
^L
A-7
Data items are separated by commas. An item may be either a
number, n1, or an expression of the form v1 = n1 . The
latter form is for checking purposes--it must correspond to the
variable v1 in the associated READ statement.
Data items are read in sequence as the program is executed.
To an array with q elements appearing in a READ statement,
there must correspond q successive items in the data list.
If the array is a matrix, the items must be ordered by rows.
Other quotes:
From 4-12:
Another statement that can be used for checking purposes is
WRITE ALL
This will cause the values of all of the variables in the program to be
printed. There is no WATCH ALL statement.
From 5-2:
If READ v occurs in the program and the corresponding entry on the
datalist is w=n, where w is not the same as v, an error message is given.
The value n is assigned to v, no change is made in w, and the program
continues. Thus the inclusion of items of the type v=n in the data list
provides checks against the accidental omission of data or inclusion of
extra numbers.
If the total data list is too short, the machine will give an error
message and will give the value 1 for each of the missing numbers. If the
list is too long the extra entries will be ignored but no error message will
be given.
From 5-3:
Numbers are given to 9 significant figures. A number n /= 0 in the range
.0001 <= |n| < 100000 is printed in the usual decimal form, e.g. -327.512736,
0.0243472150 . Zero is printed simply as 0. All other numbers appear in the
form mEp with 1 <= |m| < 10, e.g. -2.31562007E+04, 5.00000000E-17 .
A line of output is divided into six "fields", each 20 characters long.
Each variable name or value occupies one field. The decimal point of a
number always comes at the seventh position in a field. Each WRITE statement
starts a new line, but within a given WRITE statement fields are used
consecutively, new lines being started as necessary. The one exception occurs
when the name of a variable would come at the end of a line and its value
on the next line; in this case, the last field is left blank and the name
starts the next line.
A field may be purposely skipped by simply omitting a variable name;
for example,
WRITE ,A,,,/B
will put A and its value in fields 2 and 3, and the value of B in 6,
leaving fields 1, 4 and 5 vacant. The statement
WRITE
will cause a whole line to be skipped.
From 8-3:
The only restrictions on the use of a subscripted variable are in
PERFORM k FOR v = ---
and
WATCH v,...
Here v must be a non-subscripted variable. Also, WRITE ALL will print only
the values of the non-subscripted variables.
From 8-10:
1. A new line is started for each vector and each row of a matrix.
2. Te values of the elements of a vector or a row of a matrix are
put successively in fields 2 through 6, repeating as necessary.
3. In field 1 of the first line used for a vector or for a row of
a matrix is put the name of the vector or a symbol for the row of the matrix,
unless these are suppressed by a slash before the array in the WRITE list.
The slash does not change the spacing described in 2 above.
4. A variable, subscripted or not, appearing in the WRITE list immediately
after an array, starts a new line.
There is one exception to these rules. If M is a matrix with only one
column, to save space it will be printed as if it were a vector, that is, in
the form
M = m11 m21 m31 etc.
From 9-1:
Vectors as matrices
-------------------
Consideration of relations involving matrices and vectors can be
simplified by regarding a vector as a 1-column matrix. This convention is
adopted in CUPL, and so, for example
ALLOCATE X(7)
and
ALLOCATE X(7, 1)
have precisely the same meaning. After either of these allocations the
variables X(2) and X(2, 1) are meaningful and have the same value; X(M, N)
is meaningful only if N has the value 1.
From 9-2:
For any matrix M, N(*, J) denotes the 1-column matrix (a vector)
which is the J-th column of M. Similarly, M(I, *) denotes the 1-row matrix
(not a vector) which is the I-th row of M. If M is m x n, then M(*, J) and
M(I, *) are m x 1 and 1 x m, and they can be used as matrices of these
sizes in any statement except ALLOCATE. For example:
READ M(*, 3)
will read data into the third column of M, leaving the rest of M unchanged.
From 9-8:
So much space is needed to compute INV(A) or DET(A) that the size of
A is limited to 40x40 in these expressions.
From 11-3:
Automatic Integer Round-off
---------------------------
a. The value of a subscript is rounded to the nearest integer.
b. If the round-off involves a change of greater than 10^-9 (approximately)
an error message is given.
From 11-4:
Automatic Relative Round-off for x r y
--------------------------------------
a. If both x and y are zero the condition is applied as it stands.
b. If either x or y is not zero:
(i) Both x and y are multiplied by 10**n, where n is chosen so that
the larger of |x * 10**n|, |y * 10**n| lies between .1 and 1.
(ii) x * 10**n and y * 10**n are truncated to 14 decimal places
(iii) The specified condition is interpreted on the resulting numbers.
We include here a transcription of appendix F of our reference document, An Instruction Manual For CORC, R.W. Conway, W.L. Maxwell, R.J. Walker.
There are a few typographical changes to fit it into the ASCII character set. The differences:
^[-+]nnn is used to render exponent superscripts.
The square root radical sign surrounding b is rendered as b^-2
|a| is used to render the absolute-value operation.
`x' between digits is used to render the multiplication sign.
lines of dashes below headings indicate underscores.
page breaks in the original are represented by ^L here.
Error corrections:
Under `Sequence Control Statements', the IF keyword in the three if-statement templates was erroneously typed as `If'.
Under `Iteration Control Statements', the first AND keyword in the compound-AND example was incorrectly lowercased. In item 1 of the BEGIN explanation, the word `statement' was incorrectly uppercased.
Notes:
Tags [See *1], [See *2], etc, are not part of the original; they reference footnotes following the transcript.
Tags such as [See 5-8] are not part of the original, they reference other quotes from the text given below by page number.
Otherwise, the appendix F transcription is exact, even down to hyphen breaks and spacing (allowing for the fact that the original typewriter spacing was somewhat irregular...).
The combination of Appendix F and the notes includes essentially all the manual's documentation of the CORC language itself. Most of the text is tutorials and exercises.
This implementation preserves the CORC-62 distinction between LOG and LN, contrary to the transcript below which describes CORC-63 (which identifies both with the natural-log function). CORC-62 also lacked the INT function, allowed only non-compound logical expressions in REPEAT...UNTIL, and allowed an alternate spelling `TIME' of `TIMES'.
The following .corc files, included with this distribution, are also transcribed from the manual. We include every complete program example. The following list maps programs to original page numbers:
simplecorc.corc -- 4-6
gasbill.corc -- 4-9
hearts.corc -- 4-10
sumsquares.corc -- 5-4
powercorc.corc -- 5-6
factorial.corc -- 5-6
quadcorc.corc -- 5-9
title.corc -- 7-3 (note: this one uses continuations)
We have supplied leading comments and test data for these programs; they are otherwise unaltered.
APPENDIX F
Summary of the CORC Language
Acceptable Characters
---------------------
Letters: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Digits: 0 1 2 3 4 5 6 7 8 9
Special Characters: + - / ( ) * $ , . =
Numbers
-------
Normal decimal usage -- + sign may be omitted; decimal point
may be omitted from an integer.
Only 11 significant figures will be considered (8 on Burroughs
220)
Output: 8 significant figures.
Acceptable range: Absolute values from 10^-308 to 10^+308
(10^-49 to 10^+50 on Burroughs 220), and 0.
Scientific notation: 1.2345 x 10^6 may be written 1.2345*10$(6).
Variables and Labels
--------------------
a. 1 to 8 letters or digits; no blanks or special characters.
b. First character must be a letter.
c. Must not be one of the following "reserved" words:
ABS DECREASE GO LET NOTE STOP
AND ELSE GTR LN OR THEN
ATAN END IF LOG RAND TIMES
BEGIN EQUAL INC LSS READ TITLE
BY EXP INCREASE MAX REPEAT TO
COS FOR INT MIN SIN UNTIL
DEC GEQ LEQ NEQ SQRT WRITE
d. Statement labels must be unique -- particular label appears only
once in label field.
Block label used for only one set of BEGIN-END statements.
e. Variables must be listed in Dictionary. If no initial value
is given zero is assumed. [See *1]
f. One or two subscripts, enclosed in parentheses. May be
any expression (including subscripted variables) whose
value at execution is a positive integer not greater than
the maximum declared for this variable in Dictionary. [See D-2]
In Dictionary and Input Data subscripts given as integers in
subscript field; parentheses are not used. [See *1]
^L
Arithmetic Operators
--------------------
+ Additions
- Subtraction
/ Division
* Multiplication (must be expressed, no implicit multiplication)
$ Exponentiation; a^b written as a $(b)
Rules of Precedence
-------------------
a. Expressions in parentheses first, from inner to outer.
b. $ before *, /, + or -
c. * or / before + or -
d. Sequence of + and - from left to right
Functions
---------
Argument any expression, a:
ABS(a) |a|
EXP(a) e^a
SIN(a) sin a, a in radians
COS(a) cos a, a in radians
ATAN(a) arctan a, a in radians
INT(a) [a], greatest integer less than or equal to a
Arguments two or more expressions, a, b, ... f:
MAX(a, b, ... , f) value equal to greatest of expressions
listed
MIN(a, b, ... , f) value equal to least of expressions
listed
Argument any positive expression, b:
LN(b) or LOG(b) log b, natural logarithm of b
e
Argument any non-negative expression, b:
SQRT(b) + b^-2
Argument a variable, v:
RAND(v) next in a sequence of pseudo-random numbers.
See Chapter 7
Relations (used only in IF and REPEAT ... UNTIL statements)
---------
EQL = NEQ !=
LSS < LEQ <=
GTR > GEQ >=
^L
Card Numbers
------------
Strictly increasing sequence of 4 digit numbers, beginning
with 0010.
Initially right digit should be zero on all cards to leave
room for later additions.
Program Arrangement [See *1]
-------------------
Arrange and number programming forms in the following order:
1. Preliminary Description
2. Dictionary [See 2-10]
List all variables to be used
Specify initial values if not zero
Specify maximum value for each subscript.
3. Program Statements
One statement per line; for continuation place
C in column 21 of following line and begin
in column 42. Do not break in middle of
variable, label, or number.
THEN, ELSE, OR and AND phrases of IF statement
on following lines, beginning in col 42,
but C in col 21 not required.
4. Input Data
Variables, Subscripts and values of data to
be called by READ statements in program.
CORC Statements
---------------
The following symbols are used in the statement descriptions:
v, w, x variables
r, s, t relations
a, c statement labels
b block label
e, f, g, h, j, k arithmetic expressions (any meaningful
combination of numbers, variables,
functions and arithmetic operators)
Computation Statements
----------------------
LET v = e
INCREASE v BY e or INC v BY e
DECREASE v BY e or DEC v BY e
^L
Sequence Control Statements
---------------------------
GO TO a Statement with label a to be executed next.
GO TO b Used only inside block b; causes skip to
END of block b.
IF e r f Go to statement a if condition e r f is
THEN GO TO a satisfied; otherwise go to statement c.
THEN GO TO c
IF e r f Go to statement a if all of the conditions
AND g s h listed are satisfied; otherwise go to
... statement c.
THEN GO TO a
THEN GO TO c
IF e r f Go to statement a if all of the conditions
OR g s h listed is satisfied; otherwise go to state-
... ment c.
THEN GO TO a AND and OR phrases cannot be mixed in
THEN GO TO c the same IF statement.
STOP Last statement in execution of program; not
necessarily written last on Program State-
ment sheet.
Iteration Control Statements
----------------------------
b BEGIN Define limits of a block; b in label field, BEGIN-END.
b END statement field.
1. Block may be entered (executed) only by REPEAT statement.
2. Block may be located anywhere in program.
3. Blocks may be nested, but not overlapped.
4. Block b may contain any type of CORC statement, in-
cluding REPEAT, but not REPEAT b, ... , a REPEAT
Statement referring to itself.
REPEAT b e TIMES Value of e a non-negative integer.
REPEAT b UNTIL e r f Continue repetition of block b until
condition e r f is satisfied.
REPEAT b UNTIL e r f AND g s h AND ... Continue repetition
of block b until all conditions listed
are satisfied. (Not available on Burroughs
220)
REPEAT b UNTIL e r f OR g s h OR ... Continue repetition of
block b until any one of the conditions
listed is satisfied. (Not available on
Burroughs 220)
^L
REPEAT b FOR v = e, f, g, ..., ..., (h, j, k), ... Repeat
block b once for each expression on list,
with value of expression assigned to
variable v. Three expressions on list
enclosed in parentheses mean from h to k
in steps of k. [See 5-8]
Communication Statements
------------------------
READ v, w, x, ... Read an Input Data card for each variable
on list; variables on cards read must
agree with variables on list.
WRITE v, w, x, ... Print variable and current value, three
to a line. Each WRITE statement starts a
new line.
TITLE message Print "message" in computational results
when this statement is encountered in
execution of program.
NOTE message "Message" will appear in copy of program
only, not in execution. Used for program
notes only.
[*1] This implementation of CORC does not support or allow a Dictionary section. Instead, variable initializations must be done via CUPL-style DATA and ALLOCATE statements.
[*2] Ignore this section. Program statements are free-format, with continuations not supported (though the example program test/title.corc shows the syntax, it will break cupl). Data for read statements is accepted in CUPL format following the keyword *DATA.
For completeness, however, the Dictionary feature is documented here.
From 2-10:
Dictionary of Variables
-----------------------
In addition to a set of statements CORC requires a pro-
gram to contain another part known as the Dictionary. The Diction-
ary of a program is merely a list of all the variables used in
the program. along with, if desired, the initial assigned values of
the variables. If no initial value is specified the computer
assigns the initial value zero.
From 2-11:
In the above example the Dictionary might look like this:
A 1
B -1
C -6
ROOT
X1
X2
The `above example' is the simplecorc.corc program. The CORC Dictionary is equivalent to a CUPL DATA section, but also allowed the programmer to dimension array variables. The example form on 2-12 makes it clear that the Dictionary was distinguished from the program proper by being in a different lower range of card line numbers.
From 5-8:
Three expressions on a list, enclosed in parentheses, are in-
terpreted in the following way:
1. The first expression gives the initial value for the vari-
able.
2. The second expression gives the difference between con-
secutive values.
3. The third expression indicates where to stop -- the final
value for the variable is less than or equal to the value
of the third expression.
[... examples omitted ...]
More than one such "triple" may be used on a list, and
"triples" may be intermixed with separate expressions;
From 6-5:
A particular variable either has no subscripts, one sub-
script, or two subscripts and this use must be consistent through-
out a program. A variable cannot appear as X(I) in one statement
and X(I, J) or just X in another statement of the same program.
The nature of a variable (whether it is to be subscripted or not)
must be indicated when the variable is listed in the CORC Diction-
ary. This is done by giving the maximum value of any subscripts
that will be used in columns 21-25 of the Dictionary form. If no
subscripts will be used these columns will be left blank in the
line for that variable. If one subscript is to be used, the maxi-
mum value that that subscript will take on anywhere in the pro-
gram must be given in columns 21-23; columns 24-25 are left blank.
(Note that this is the maximum value of the subscript, and not
the maximum value of the variable.) If two subscripts are to be
used the maximum value of the first is given in columns 21-23
and the maximum value of the second in columns 24-25. For ex-
ample, if the Dicitonary [sic] looks like the following:
SCALAR
VECTOR 45
MATRIX 100 2
ARRAY 3 32
then SCALAR is a simple variable that will not have any sub-
scripts anywhere in the program. VECTOR will have one subscript
everwhere it appears in the program [...] MATRIX [...] will
appear each time with two subscripts [...] ARRAY will also
always have two subscripts; [...]
From D-2:
Automatic Integer Round-off
---------------------------
a. The value of a subscript is rounded off to the nearest
integer.
b. If the round-off involves a change of greater than
10^-9 (approximately; the number is subject to some
variation) an error message is given.
From D-3:
Automatic Relative Round-off for x r y
--------------------------------------
a. If both x and y are zero the condition is applied as it stands.
b. If either x or y is not zero:
(i) Both x and y are multiplied by MAX(|x|, |y|);
(ii) The results are rounded off to the nearest
integer if this involves a change of less
than 10^-9 (10^-7 for the Burroughs 220), but
not otherwise;
(iii) The specified condition is interpreted for
the resulting numbers.
The most obvious differences are also the most trivial. CUPL was first implemented on an IBM/360 Model 30; CORC on Burroughs 1604 and 220 machines. Both used a small capital-letters-only character set SIXBIT, and followed the archaic IBM practice of using a slashed-O for alphabetic O and plain 0 for zero. Original CUPL/CORC listings thus look rather odd to the modern eye.
The original CUPL was a batch system with a fixed-field card format; labels in columns 1-8, statements in 10-72, statement continuations beginning in column 15 (CORC's format differed only in detail from this). In CUPL, data for the program was supplied following a special *DATA label in the same deck as the program; CORC did not require this marker (it is not clear from the CORC documentation how end-of-program was recognized).
On modern output devices, slashed-0 tends to be used, if at all, for zero. We have not tried to preserve IBM's reversal. Nor have we tried to enforce the columnation requirements, and we don't implement the continuation convention (new CUPL is free-format, with newlines ignored). We do preserve much of the visual appearance of CUPL listings by insisting on all caps and tab-indenting statements. We also preserve the *DATA mechanism for supplying initializations.
More significant differences arise from differences between the word size and floating-point format in CUPL's original host and those of typical modern C implementation. The 360 had a 36-bit word; original CUPL scalars ranged from 1e76 to 1e-78 with nine decimal digits of precision. As for CORC: the Burroughs 1604 was documented as having a much wider range, 1e308 to 1e-308 with 11 digits of precision; the Burroughs 220 supported 1e-49 to 1e50 with 8 digits of precision.
On today's typical 32-bit microprocessor such as an Intel 486, C floats are 32 bits and have roughly 1e+38 to 1e-38 range and 9 digits precision; doubles are 64 bits, with range roughly 1e308 to 1e-304 and 19 digits of precision. This implementation use doubles to emulate CUPL/CORC scalars.
We know from the documentation that the original CUPL compiler ran in 64K of core. The present implementation is easily twice that size. However, given the cycle speeds of the 1960s, it certainly runs a good deal faster that original CUPL, even with interpretation overhead.
We don't implement original CUPL's error-correction facilities. Though clever, they would make the parser forbiddingly complex, and are anyway much less important in an interactive environment.
There are many limits in original CUPL/CORC that we do not enforce. There is no limit on the length of variable names short of the lexer's very long token buffer length. There is no hard limit on the number of statements in a program. There is no hard limit on the size of arrays.
While the format of number output does not exactly conform to the original CUPL/CORC rules, it is sufficiently ugly to please any but the pickiest. We implement all of 5.2 except the fixing of the decimal point at position 7 in each field. Instead we simply use printf(3)'s %f and %e at field-width precision.
Also, by default, we wrap after three 20-char fields rather than 6, so as to fit on an 80-column line. Command-line options to change the line and field widths are available.
The CUPL/CORC implementation is built around YACC and LEX. The rest is ANSI C.
The YACC grammar just builds a parse tree, which is passed to interpret() for interpretation. This method requires that all programs are always small enough that the entire tree can be held in memory, but it has the advantage that front and back end are very well separated. It is a winning strategy on virtual-memory systems.
One hack that greatly simplifies the grammar productions is that
the lexer actually returns parse tree nodes, even for atoms like
identifiers, strings, and numbers. In fact, the lexical analyzer even
does label and variable name resolution with the same simple piece of
code; each IDENTIFIER token is looked up in the identifier list when
it's recognized, so the parse tree early becomes a DAG. (The
-v1
option causes the compiler to dump its parse tree
for inspection.)
Most of the smarts are in interpret() and its sub-functions. Because array variables can be re-allocated, the internals have to use a dynamic vector/array type with its own indexing machinery. The code to manipulate this type lives in monitor.c.
Note that much of this machinery is quite generic and could be re-used for other languages with little change.
The implementation trades away some possible efficiencies for simplicity. Most importantly, each value has an attached malloc object to hold its elements, even when there is only one such element (as for scalars) which could reasonably be represented by a static field.
There are some comments in the code which discuss the possibility of a back end that would emit C. This would be easy to do if there were any serious corpus of CUPL/CORC code demanding to be translated. The compiler back end would emit code shaped like the parse tree, which would then link monitor.c as runtime support.
The only nontrivial difference between CUPL and CORC is the interpretation of GO TO <label> when <label> is associated with a block. In CUPL, this is a go to beginning of block; in CORC, it's go to end of block (which in CUPL is GO TO <block> END. The interpreter sets a flag when it sees any of the appropriate CORC-specific keywords (NOTE, BEGIN, DEC, DECREASE, EQL, GEQ, GTR, INC, INCREASE, INT, LEQ, LSS, NEQ, REPEAT, TITLE, UNTIL, $) during lexing, and execute() modifies its behavior appropriately.