kparc.com/b/
I have been studying this code for a while and I think I have understood most of it, although in general I am quite far out of my comfort zone. I find it fascinating. It is terse code, not your typical C, that’s for sure, but beautiful in its own way. It makes use of several clever tricks that are not what I would call self-explanatory. I hope the following notes help other interested people.
Although authorship is not mentioned anywhere, several references indicate that the code in kparc.com/b/ was written by Arthur Whitney. It seems to be his style. Love it or hate it.
The following files define an interpreter for a language called b, isomorphic to C. When compiled with the k.c file instead of b.c, you would obtain the k interpreter. Unfortunately, the k.c file is not publicly available.
In the readme file, these notes are included:
A.S kernel interface (map unmap ..) |
Additionally, the c.h file defines some macros to write C in a terser manner, and a.h is the header file that accompanies a.c. There are also some test files included in C and b (t.c and t.b) and a Makefile.
Included by a.h, this file consists of some basic definitions for a terser C language.
Single letters for common types:
typedef unsigned char C,*S,*K; |
Z is added for static types. For static inline, a preceding _ is used (why not either a Z for static or define _I, _J, and so on?):
#define ZI static I #define _ static inline |
Control statements are replaced by single letter macros:
#define R return |
Signatures of built-in functions:
V*memcpy(); |
Maximum and minimum are also defined as macros, which return a copy of the max or min value:
#define MN(a,b) ({typeof(a)_a=(a);typeof(a)_b=(b);_a<_b?_a:_b;}) |
AB is the abort macro. It uses functions from a.h: os(s) prints the string s and a new line to stdout, exit(1) exits with an error. In case exit fails, (V*)0L will produce a segmentation fault.
#define AB(s) (os(s),exit(1),(V*)0L) //abort string |
System calls (read, write, open, close, fstat, mmap, munmap and exit) are called using assembler. The S(n) macro defines a new global name n such that it first moves the content from rcx (4th parameter in function calls) to r10 (4th parameter in syscalls), then sets rax to the desired syscall number (defined in unistd_64.h), makes the syscall, and returns:
#include<asm/unistd_64.h> |
(I find it slightly surprising that system calls numbers are taken from the unistd_64.h header file instead of given as integers. I guess it probably is done so to avoid problems with changes between versions, but this is one of the few concessions to defined constants).
Header file for C library (a.c).
Basically, all values (chars, integers and floats) are of type K (which is a char*). The type of the K value is determined in basis to the following constants:
#define KC 1L // char |
Additionally, NL represents a new (not assigned) value, and QQ represents an error string. X2 and X3 are only used in commented code (fixed arrays are used instead in this version of the code):
#define NL (K)(X1<<48) |
There are two main kinds of values: unique values and array values.
Unique values are encoded in a single K with the actual type in the first 16 bits (a pointer only uses the last 48 bits), except floats. Floating point numbers in 64 bits use the first 12 bits for the sign and the exponent. The exponent will only be zero for negative zeros and subnormal numbers, but these are not supported, since DAZ (denormal-as-zero) and FTZ (flush to zero) are set at init time (see _start, below).
Arrays are implemented by adding 8 extra bytes with metadata before the actual array elements: memory (bucket), references, type, unique type, and 4 bytes for array size.
Extern definitions (public interface to C library in a.c):
extern J nt[],ip();extern K G[],r1(),tn(),j2(),k0(),sS(),enm(),o();V exit(),w2(),r0();S pi(),pf(),px();F fp(); |
Some other functions are defined as static inlined functions:
The X function replaces the value pointed to by k with the value in y and returns an empty K value (NL). It decreases the reference counter of the value pointed to by k (with r0, see below), so memory will become available if this was the only reference:
_ K X(K*k,K y){R r0(*k),*k=y,NL;} |
Output functions print to stdout (using the w2 function from a.c) chars (given as integers), newlines, strings (with an added newline), integers, floats and integers in hexadecimal (the last 3 functions use the print functions pi, pf and px, defined in a.c). All these functions (except nl, to print newlines) return the printed value (nb: chars are given and returned as integers):
_ I oc(I i){R w2((S)&i),i;} _ V nl(){w2("\n");} _ S os(S s){R w2(s),nl(),s;} _ J oi(J j){R os(pi(j)),j;} _ F of(F f){R os(pf(f)),f;} _ J ox(J j){R os(px(j)),j;} |
Scan functions to find, in the string s, a character c and return the pointer, or zero if not found. The function scn limits the search to a number of characters n, and returns this number if c is not found, or the index:
_ S sc(S s,I c){W(*s-c)P(!*s++,0)R s;} _ I scn(S s,I c,I n){N(n,P(c==s[i],i))R n;} |
Next, there are several functions to declare arrays and unique values.
The P1 to P3 functions define default parameters (x, y, z), but it looks like they have been replaced by a fixed array (see struct/fixedarray experiment in a.c):
//_ K P1(J x){R(K)(X1<<48|x);} |
Array values are declared using tn(t,n), from a.c, which allocates memory for n elements of the type t. The 0 type is used for pointers. An array of pointers is equivalent to a list of arbitrary values:
_ K kK(I n){R tn(0,n);} |
These functions create a new char array from a string, either with n chars (pn) or zero terminated (kp):
_ K pn(S s,I n){R memcpy(kC(n),s,n);} |
More functions to work with character arrays: the functions c0, c1, c2 and c3 create char arrays of 0, 1, 2 and 3 characters, respectively; the function jc appends a character c to the char array in x using the j2 function (from a.c), which joins two arrays:
_ K c0(){R kC(0);} _ K c1(C x){K r=kC(1);R*r=x,r;} _ K c2(C x,C y){K r=kC(2);R*r=x,r[1]=y,r;} _ K c3(C x,C y,C z){K r=kC(3);R*r=x,r[1]=y,r[2]=z,r;} _ K jc(K x,C c){R j2(x,kc(c));} |
Functions to create new unique values such that the type is stored in the first 16 bits, except for floating point values:.
_ K kc(J x){R(K)(KC<<48|x);} _ K ki(unsigned x){R(K)(KI<<48|(J)x);} _ K kf(F f){R*(K*)&f;} _ K ks(J x){R(K)(KS<<48|x);} _ K qs(S s){R(K)(QQ<<48|(J)s);} |
The A(x) macro will return the type of x if it is a unique value or zero for arrays. I(x) returns the value of a unique integer. Q(x) returns with x if it is an error string, and Qs(c,s) will return s as an error string if the condition c is true. The macro N1 is similar to the macro N, but does a do-while instead of a while loop (why isn’t it defined in c.h?):
#define A(x) ({J _j=(J)(x);!_j||_j>>52?KF:15&_j>>48;}) //#define Q(c,i) P(c,qi(i)) //error index(nyi,rank,length,type,..) |
The macros Ax and xi are shortcuts for A(x) and I(x), while xf access x as a float value (why isn’t there a F(x) macro?):
#define Ax A(x) |
Array attributes are stored before the actual array values, and therefore accessed with negative indices with the macros xm, xr, xu, xt, xn. The 8 bytes preceding the array elements store the attributes mturnnnn corresponding to memory bucket, type, return value type (for user functions), reference counter, and a 32 bits integer which indicates the size of the array:
#define xm x[-8] //mem |
The macros xx, xy and xz are used to access the first 3 values in an array of K values, while xI, xF and xK give access to x as an array of integers, floats and K values, respectively (xC is not needed because K values are already defined as a char*). The macros Xc, Xi and Xx allow to access the i-th element of an array of the C, I or K type. Finally, X0(e) returns a copy of e after decreasing the reference counter of x:
#define xx xK[0] |
In addition to the above macros that work for the x variables, they are defined for the variables y, z and r. The variables x, y and z usually correspond to function parameters, while r is the return value:
#define Ay A(y) |
In the following macros for function definitions, the letter indicates the return type, with a preceding Z for static functions, and the final number from 1 to 3 indicates the number of arguments:
#define ZV1(f) ZV f(K x) |
The k1, k2 and k3 functions return an array of K with 1, 2 or 3 values:
_ K1(k1){K r=kK(1);R rx=x,r;} |
These functions join several elements in a single array (j2, defined in a.c, joins two arrays):
_ K3(j3){R j2(j2(x,y),z);} // join 3 K values (x,y,z) in K array |
This is the C library. It includes basic memory management and I/O functions, as well as functions to help building interpreters.
Signatures of the functions defined in A.S to make system calls:
J read(),write();I open(),close(),fstat(),munmap();S mmap();V exit(); |
Memory is allocated using mmap. The function ma allocates n bytes of memory if d is zero, or maps memory for the file with file descriptor d. The initial memory address used is 0x70000 (I think this is a partially arbitrary choice, just large enough to leave room for the stack in lower addresses). This address will be rounded to the nearest memory page boundary, and it is increased after each allocation (nb: it is kept in a static variable):
ZS ma(I d,J n){ZJ p=0x700000;p+=d?0:n;R mmap( |
Files are also opened using mmap. The string s contains a file name, and the number of bytes (the file size given by fstat) will be stored in n:
ZS mf(S s,J*n){ |
The function w2 writes a zero-terminated string to stdout, while r2 reads a maximum of 256 input characters from stdin (while also printing them to stdout) and returns the address of the zero-terminated string with these characters, stored in the static buffer b:
V w2(S s){write(2,s,strlen(s));} |
All the format functions (named pi for integers, px for hexadecimal integers and pf for floating point values) use the static buffer b, with 24 bytes, to write characters from the end of the buffer:
ZC b[24]; |
The static function ng, which precedes a string with a minus sign, is used for formatting negative values:
ZS ng(S s){R*--s='-',s;} |
The pi function returns a string with a formatted integer in decimal base with the help of the static function pu, which prints an unsigned integer (in decimal base) from the least significant digit at the end of the buffer until the most significant one:
ZS pu(S s,J i){J j;do*--s='0'+i-10*(j=i/10);W(i=j);R s;} |
Integers can also be printed in hexadecimal base using px. This function calls hh(s,c), which writes the hexadecimal representation of the byte c to the string s. The symbol to use is selected on basis to the property that, in ASCII, ‘W’ is defined as ‘a’-10:
ZS hh(S s,C c){N(2,C a=i?c&15:c>>4;s[i]="0W"[9<a]+a)R s;} |
Floating point values are formatted with pf:
S pf(F f){ |
Parsing functions read values from the string passed as the first argument with a maximum of n characters. The function ip reads integers and fp reads floating point values.
J ip(S p,I n){P('-'==*p,-ip(p+1,n-1))J j=0;N(n,j=10*j+p[i]-'0')R j;} |
Unique values are stored in K values, with the type in the first 2 bytes. Array values are stored in memory from a memory pool structured as a hash table, where the hash is related to the approximate array size. The address of the buckets is stored in the static array M. The total amount of assigned memory bytes (which is usually lower than the allocated one) is stored in the workspace variable W. I am not sure why the initial value is -32 (may it be because of the 32 bytes need for a fixed array?):
ZK M[31]; ZJ W=-32; |
The M[i] value points to a memory region with space available for 16<<i bytes. If more than one such region is available, the addressed memory will contain a pointer to it, in a linked list.
The function m1(n) returns a pointer from the memory pool with available space for n bytes. If there is a region available in a bucket, it is returned, and the bucket set to zero. If there is not, but there is memory in a larger bucket, it is returned from there and the remaining memory is assigned to a lower size bucket. If no memory is available, it is allocated with ma:
ZK m1(J n){ |
Arrays of type t and size n are allocated using the function tn(t,n). The size for each type is stored in the nt array:
// types are chijefs:
// pointer (list), char, hex byte?, int, long, e?, float, symbol (char as int)
J nt[]={8,1,2,4,8,4,8,4};
K tn(I t,I n){
// allocate memory (minimum for 1 value) for every non-string type
K x=m1(MX(1,n)*nt[KS<t?0:t]);
R
W+=16L<<xm, // add assigned memory to workspace
xu=0,xt=t,xn=n, // set metadata (not a function, type, length)
x; // return memory address of first element
}
The static functions l1 and l0 respectively allocate and free fixed arrays for three pointer values and I think l2 should return an array of K strings with their values but has not been implemented yet. These fixed arrays have type 8 (there should probably be a #define KL 8L in a.h, but a comment indicates that these fixed arrays are an experiment). When a fixed array is freed with l0, its y and z elements are freed calling l0 recursively (I think the l0(xz) is unnecessary, it will be called from l0(xy)) and its memory is returned to the pool, making it the first element in the corresponding bucket and referencing the previous one as a linked list:
ZK1(l2){R kp("[]");} // I think this is a TODO |
More generally, arrays increase and decrease their reference counters with r1 and r0, respectively. If there is an overflow, r1 will abort (Is this expected? Why are there no similar checks anywhere else?). When r0 is called and there are no remaining references, it is returned to the pool:
K1(r1){ |
In practice, what happens is the following: the first time m1 is called, it will allocate enough space for n bytes, but in fact allocating a memory region whose size is a power of 2 multiplied by 16 bytes (with a minimum of 4 Mb). The not used memory is assigned to different buckets in the M variable, such that M[0] points to a 16 bytes memory region, M[1] points to a 32 bytes memory region, M[2] to a 64 bytes memory region, and so on. Next time m1 is called it will return an address from the smallest possible bucket. If the corresponding bucket does not contain a pointer (is zero), it will take the pointer from the next bucket, divide the region in two, and return one while assigning the other to M. Memory is never really deallocated, it is just returned to the pool. This is done storing pointers to regions of the same size in a linked list, such that a pointer is stored in the memory region pointed to by the address in M, and so on. Notice that, although a memory region can be divided in smaller ones, they are never merged back.
The additional function j2(x,y) is used to merge two arrays in one, copying the elements of y at the end of x or copying the elements of both in a new array if there is not available space or there is some reference to x. For this, it uses the static function xiy, which copies values from an array in another, assuming there is enough memory available:
ZK xiy(K x,I i,K y){ |
The file a.c also includes several functions that can be used to build an interpreter. For this, it makes use of the ev() and ps() functions, defined elsewhere (ie. in the the b.c or k.c file):
K ev(),ps(); |
There is space available for 26 global variables (including functions), stored in the G array. Each of them will be named by a single letter from a to z. There is also a K0 value which is not used by b.c, although it is initialized (in the km function, see below):
K G[26]; ZK K0; |
The interpreter program will start with the _start function, avoiding the fatter (!) main function. This function first writes a 32 bit integer before the stack pointer (at address -4(%rsp)) and calls ldmxcsr to set the 32 bit register MXCSR, which contains flags for control and status information regarding SSE instructions. Then, it moves the pointer at 8(%rsp), which corresponds to argv, to %rdi (first function argument) and jump to km.
V _start(){ asm("movl $0x9fc0,-4(%rsp);ldmxcsr -4(%rsp);lea 8(%rsp),%rdi;jmp km"); } |
The flags of the MXCSR register are the following:
FZ bit 15 Flush To Zero
RN bits 13 and 14 are 0 Round To Nearest
PM bit 12 Precision Mask
UM bit 11 Underflow Mask
OM bit 10 Overflow Mask
ZM bit 9 Divide By Zero Mask
DM bit 8 Denormal Mask
IM bit 7 Invalid Operation Mask
DAZ bit 6 Denormals Are Zero
The flags FZ (also called FTZ) and DAZ make possible to code unique values of other types as a special case of floating point values, since they imply, in practice, that a number with a zero exponential part is not a valid floating point number.
Inside km, G values are initialized to be new values (NL), and K0 to be a zero length char array (this value is never used later). Then, the first argument of the program (a is argv[0], so ++a is argv[1]), if given, is loaded with the ld function. Finally, the program enters in an infinite loop reading user input with r2, evaluating the obtained string with es and printing the result with pr:
V km(S*a){ (K0=kK(0))=c0(); // init K0 to empty char array |
The loading function ld reads a file line by line, processing its input with pr and es:
ZS1(ld){ // and print result, possibly returning an error string |
Strings (zero-terminated) are evaluated, both inside ld and the infinite loop in km, by es. This function will call different functions for system commands (\\ or \ to exit, \w to return workspace size, \v for global variables, \f for functions, \t to time a command and \l to load a file) or ev (defined in b.c or k.c) for everything else:
ZS1(es){ |
The commands \v and \f call the vf function (with argument 1 for functions and zero otherwise). It returns an array with the numbers of the variables used with type string (char*). This will be understood by the pr function to format it as a list of variables with single letter names:
ZK vf(I f){ |
Command \t times the following command calling tm. If it is called as \t:n, the function is repeated n times. The static function ms, called from tm, returns the time from the last reset of the tsc counter in milliseconds, assuming that each cpu cycle lasts 0.58e-6 ms. The number of cycles is got calling the asm instruction rdtsc, which stores the 64 bits number in two 32 bits one (in this case, a and d):
ZF ms(){ |
ZS1(tm){ |
The result of es when loading files or in the REPL is printed by pr, which does not do much more than calling the output function o, which will call, for non-errors, the more complex se function:
ZK1(pr){if(NL-x)r0(o(x));R x;} // print non-empty values // formatted as TxX, where T is the type number of the return value and X // is its address in hexadecimal |
These additional functions are called from se:
// Return item in position i as unique element |
Finally, there is an enm function, which creates an array of integers from 0 to xi:
K1(enm){K r=kI(xi);N(rn,Ri=i)R r;} |
The a.c file also includes a rand static function to generate (semi)random numbers, but it is not used (maybe a missing \r system command? It may be easily implemented multiplying by the the result of ms and dividing to have a float between 0 and 1). The digits of pi are used as seed, together with the value 4294957665 (recommended in numerical recipes):
ZI rand(){ZJ j0=-314159;R j0=4294957665L*j0+(j0>>32);} |
That was the easy part. Now it gets complicated.
The b interpreter (I assume the k one is not too different) works generating strings with machine code and executing them casting to a function. It is basically a compiler. Of course, no constants are used, just opcodes in hexadecimal. There are also no comments, except for this gem:
// :+-*% ^&|<=> x64 JJ Jj o2 cc tst RET cll psh pop acdbsbsd89.. o[m[s|d]] c3 eb+1 e8+4 e9+4 [f2/66/4*][0f] 5* 7*+1 b*+4 0f8*+4 03 23 2b 3b (6b 83) 89 8b .. |
This means:
Some useful references:
There are two functions: ps parse a string to generate an array of K values that ev executes, casting to a function, the type number in yu indicates the resulting type of the function (a generic K value, a float value or an integer):
// Evaluate a function, the x argument must be a K* array, |
The ps function is where all the fun is. It includes several embedded functions:
K ps(S s){ // used to fill mod(2),reg(3),r/m(3) byte // integers, it means shift right (/) 4-o?f(o,r,y): // neg // mov mov lea imul m(3-o?a:3,A[r],a?A[x]:4), // 3==s[p]/64 => mod == 11 => r/m is a register // (mod=00, r/m=101) // with displacement addressing mode the instruction // will have one instruction byte, one for mod,reg,r/m // and four for the address, else instruction, mod,reg,r/m // and one byte for the register // check if instruction uses relative address argument: 0xe8==*s|| //function call (p?8-s[1]/16:4>*s/16||8==*s/16)&&5==(0xc7&s[1+p]) ) I t=T[xi-'a']; R 14==t?-2:13==t?-4:2*t-26; } // this will eventually abort (sh not implemented) // shift left, implemented as an addition // (a shift left is a multiplication by 2, which is the // same as adding it to itself) // but it means shr applied to ints) e(s,y), // first argument is y // second argument is one (kc(129)) or // kc(127) for # (why?) or zero for * (why?) kc(128+('#'==a?-1:'*'!=a)) // a is func number, r will be zero for // the first literal long argument if the // op is not a comparison |