Data representation sublanguage
From the SETL Wiki
The data representation sublanguage (DRSL), a.k.a. the data structure representation language (DSRL) or the data structure specification language (DSSL), is a language for annotations that can be added to a SETL program to add type constraints and to direct the interpreter to use specific data structures in order to improve performance.
These annotations should not normally change the behaviour of a program, though it is possible to prevent a programming from running by adding incorrect type constraints.
Because of the dynamically-typed nature of the language, normally all variables are represented as a variant structure, with a tag identifying the type and a pointer to the content. A common use of a DRSL annotation is to specify that a variable has a particular type (e.g. a fixed-size integer), so that the value can be represented directly, avoiding a pointer indirection on each access.
Note: only CIMS SETL implements the DRSL, and CIMS SETL itself is not available, thus the following examples are untested (they produce the correct output in GNU SETL, which parses but ignores the repr declarations. The corresponding Java programs have been tested, of course.)
Contents |
[edit] Examples
[edit] An implementation of the Sieve of Eratosthenes
[edit] With no annotations
SETL version
1: program sieve;
2:
3: const limit := 1000;
4: var candidates, x, y;
5:
6: candidates := {2 .. limit};
7: (for x in [2 .. limit])
8: if x in candidates then
9: (for y in {x * x, x * (x + 1) .. limit})
10: candidates less:= y;
11: end;
12: end if;
13: end;
14: print(candidates);
15:
16: end program;
Roughly-equivalent Java program
import java.math.BigInteger;
import java.util.HashSet;
public class Sieve
{
private static final int LIMIT = 1000;
private static final BigInteger LIMIT_BIG =
BigInteger.valueOf(LIMIT);
public static void main(String[] args)
{
Object candidates =
new BigIntRange(BigInteger.ONE, LIMIT_BIG).asHashSet();
Object x;
for (x = new BigInteger("2");
((BigInteger) x).compareTo(LIMIT_BIG) <= 0;
x = ((BigInteger) x).add(BigInteger.ONE))
{
if (((HashSet<?>) candidates).contains(x))
{
BigInteger xBig = (BigInteger) x;
BigIntRange factors =
new BigIntRange(xBig.multiply(xBig),
xBig.multiply(xBig.add(BigInteger.ONE)),
LIMIT_BIG);
for (Object y: factors)
{
((HashSet<?>) candidates).remove(y);
}
}
}
for (Object n: (Iterable<?>) candidates)
{
System.out.println(n);
}
}
}
This algorithm should be familiar to most programmers. It computes all primes between 1 and some known limit (here 1000) by eliminating multiples of successively-identified prime numbers.
With no type annotations, the interpreter has no information about the possible types of the variables, so must represent all variables and set/tuple elements as variants.
The type casts in the Java program represent points where the SETL interpreter would need to check the type of a variant. It is not exactly equivalent. Overloading (whereby the '+' operator could mean integer addition or set union) is not considered, for example.
Notes:
- In the diagram, execution is currently at line 10. 3 has just been identified as a prime number and we are about to remove its first factor (9) from the candidate set. LIMIT is 20 and the hash table is 7 elements long (in reality it would be larger than this.)
An obvious, simple improvement we can make is to eliminate the variant structures by declaring the types of the three variables:
[edit] With primary type annotations
SETL version
1: program sieve;
2:
3: const limit := 1000;
4:
5: repr
6: mode candidate: integer 1 .. limit;
7: mode candidateSet: set(candidate);
8: x, y: candidate;
9: candidates: candidateSet;
10: end;
11:
12: var candidates, x, y;
13:
14: candidates := {2 .. limit};
15: (for x in [2 .. limit])
16: if x in candidates then
17: (for y in {x * x, x * (x + 1) .. limit})
18: candidates less:= y;
19: end;
20: end if;
21: end;
22: print(candidates);
23:
24: end program;
Roughly-equivalent Java program
import java.util.HashSet;
public class Sieve
{
private static final int LIMIT = 1000;
public static void main(String[] args)
{
HashSet<Integer> candidates =
new IntRange(1, LIMIT).asHashSet();
Integer x;
for (x = 2; x <= LIMIT; ++ x)
{
if (candidates.contains(x))
{
IntRange factors =
new IntRange(x * x, x * (x + 1), LIMIT);
for (Integer y: factors)
{
candidates.remove(y);
}
}
}
for (Object n: (Iterable<?>) candidates)
{
System.out.println(n);
}
}
}
Now, we have eliminated three inefficiencies:
- We no longer use arbitrary-precision integer arithmetic. The candidate primes have an upper bound (1000) and so the largest value required (995006 == 997 * (997 + 1)) easily fits in a 32-bit word.
- We no longer need to check the type of each variable's value whenever we access it, because they now all have fixed types.
- Similarly we no longer need to check the types of values extracted from the candidates set, because now it is constrained to hold only (fixed-width) integers.
This still does not resemble the program that a Java programmer would write to solve the same problem however. Typically implementations of the Sieve use an array of bits, one per candidate prime, to indicate whether the number is still a candidate prime or not. This substantially reduces the memory requirements of the program. The hash table used in the example above will require a minimum of two 32-bit words (and probably more) per integer. That's 998 initial candidate primes * 2 words per candidate * 4 bytes per word, or around 8000 bytes. A bit vector will require 998 bits / 8 byts per byte, which is 125 bytes, a 16-fold saving.
We could rewrite the SETL program to represent the set as an array of 32-bit unsigned integers, and use bitwise arithmetic to manipulate individual bits, but it would be much easier if we could specify a representation in which the interpreter does this for us, and fortunately we can - it is known as a remote set:
[edit] Using a remote set
SETL version
1: program sieve;
2:
3: const limit := 1000;
4:
5: repr
6: mode candidate: integer 1 .. limit;
mode candidateSet: set(candidate);
7: mode candidateSet: remote set(candidate);
8: x, y: candidate;
9: candidates: candidateSet;
10: end;
11:
12: var candidates, x, y;
13:
14: candidates := {2 .. limit};
15: (for x in [2 .. limit])
16: if x in candidates then
17: (for y in {x * x, x * (x + 1) .. limit})
18: candidates less:= y;
19: end;
20: end if;
21: end;
22: print(candidates);
23:
24: end program;
Roughly-equivalent Java program
import java.util.BitSet;
public class Sieve
{
private static final int LIMIT = 1000;
public static void main(String[] args)
{
BitSet candidates = new BitSet(LIMIT);
candidates.set(2, LIMIT + 1);
Integer x;
for (x = 2; x <= LIMIT; ++ x)
{
if (candidates.get(x))
{
IntRange factors =
new IntRange(x * x, x * (x + 1), LIMIT);
for (Integer y: factors)
{
candidates.clear(y);
}
}
}
for (int n = 2; n <= LIMIT; ++ n)
{
if (candidates.get(n))
{
System.out.println(n);
}
}
}
}
What not to do with a remote set
One may be tempted to change the candidate elimination loop:
(for y in {x * x, x * (x + 1) .. limit}) candidates less:= y; end';
to the simpler looking:
candidates -:= {x * x, x * (x + 1) .. limit};
However this will actually make the program less efficient. The first version (with the loop) will not actually construct a remote set (bit vector) or sparse set (hash table.) Instead it just loops through the numbers to be eliminated from the set, starting with x, x*(x+1), x*(x+2) etc. and removing each from the candidate set by clearing a single bit. As x increases, the step between multiples of x obviously increases and so the number of bits in the vector that must be modified decreases proportionally. The second expression (that computes the set difference), on the other hand, will construct a temporary remote set, populate it with the candidates to be eliminated and compute the set difference using a bitwise AND over the entire vector. Unlike in the loop version, the bit vectors don't get any smaller as x increases, and the bitwise AND must always be computed over (limit - 1) bits. This would increase the complexity of the algorithm from O(n.log n.log(log n)) to O(n.(log n)2).
[edit] More examples to follow
[edit] See also
- repr
- A study of SETL (one of very few DRSL documents on the web)


