Features News Downloads Documentation CVS Bugs Platforms Mailing Lists

Welcome to the Tensile home page


Introduction

Tensile (formerly NSL) is a programming language intended primarily for processing text documents in various input formats and in various languages. It is being developed as to be as light-weight as possible, however, providing a possibility to solve a wide range of tasks. It can be used as a stand-alone tool, as well as a CGI engine. It is not intended to be embeddable like Tcl, but since the interpreter is very compact, it can be attached to an application without great overhead.

Tensile should be easy to learn (though the lack of documentation as of yet may be a considerable obstacle ;). Its syntax is much simplier than that of perl or even awk and is more like Tcl or csh. It has, however, some peculiarities in syntax, as well as in programming techniques, so it would probably require some time to get accustomed to.

However, Tensile is not a quick-development language. Its core does not and shall not include 'complete solutions'. Inspite of its rather high level, it should be regarded as a toolbox by the means of which a programmer may implement what he wants. Only such approach (IMHO) may allow to keep the language small, efficient, easy both to learn and to use.

The Tensile interpreter is really compact — a stripped binary image (under Linux/Intel) occupies only about 100k! (this doesn't include support functions which are moved to a separate library, libutils which is about 80k). It also requires DB1 or GDBM (but these are present on most systems) and libltdl which comes with the package (just in case). And I am strongly inclined to maintain this compactness — new features should not be coded into the core but rather implemented as separate modules. Besides, several compile-time options are provided to trigger some of the language elements.

Tensile is meant to be portable. Its core doesn't use any features beyond ISO C + POSIX library functions. A list of (more or less) successful builds on various platforms will be available really soon

ATTENTION!

The fate of Tensile depends on you!

Your feedback is vitally important for developing Tensile, or how else could I learn what Tensile lacks, what bugs/defects it has and so on. So, people who got interested in Tensile! If you use it (or tried), please, send you comments, requests, advices, swearings etc.

History

I started working on Tensile in 2000 when I was faced with a task of extracting structured data from free-form texts. One of the basic problems was that those texts were not in plain-text format and contained a lot of non-Latin1 characters in various fonts most of which complied to no standard encoding. After several attempts to hard-code all that I need in C, I realized that a higher-level language was necessary. However, peculiarities of the task would require a deep knowledge of any such language (like Perl). And as my knowledge was insufficient and I am in nature too lazy to deepen it, so I decided to write my own HLL which would better meet my needs than any others now existant.

So, in June, 2001 the core of the language was complete. Then I thought that my work might be useful for other members of the programmers' community. On the other hand, there is a lot of work to be done so that Tensile would become a widely usable language. With this in mind, I decided to take Advantage of the services provided by SourceForge. However, SF seemed unwilling to provide any feedback, so I left it for GNU Savannah ( savannah.gnu.org) The Savannah services for Tensile can be found here.

Up to 01 Jun 2002 the language was called NSL (abbr. of 'New Programming Language). However after some thoughts it changed its name to Tensile which is an unobvious abbreviation of ThE New Scripting LanguagE. It was the only word that more or less fitted the primary abbreviation, and at the same time it reveals high flexibility and extensibility of the language.

The freshest downloads can be get from Savannah and ILI RAN. FTP access is available via sunsite.dk

The reference guide for Tensile is found here. Note it is incomplete; if you have questions, use Savannah mailing lists or feel free to mail me directly.

Basic features

There are three main concepts which make Tensile differ from other languages.

Automata

All the complex string translations (such as case mapping, converting from one encoding to another, producing collation sequences etc) are being done with the help of user-defined finite-state automata
*. Unlike many other languages, Tensile does not use Unicode internally, since IMHO it would greatly hurt portability and compactness (well, I may be wrong :).

Automata may be grouped into sequences where the output of one automaton is the input for another (similar to Unix pipes). Such sequences are further referred to as autoseqs


* More exactly, what is used in Tensile are pushdown transducers, not finite-state autotomata, but the former term is rarely used nowadays, so I used the latter as a more comprehensive one.

The idea was inspired by J. Plaice's and Y. Haralambous's Omega project, but the implementation qis at all different and shares no code with that.

Storages

Tensile does not have real structured data types (in principle, it has a single data type — a string). Their role is played by storages. A storage is an abstraction of a data collection with opaque internal structures. The user always operates with storages as sets of 'key-value' pairs, no matter whether in fact it is an array, a table (a dictionary), or an SQL query result — it's only the form of the key that differs. The user now may define her own storage types.

Streams

Tensile programs deal not with OS-level files, but with streams. A stream is thought of as a flow of raw text interlaced with markup tags. The exact way how those tags are formed is hidded with a stream driver, so that an application always deals with an HTML-like structures. As of now only plain-text and HTML stream drivers are implemented (and also special streams for CGI support). Besides, a user may define her own stream types by the means of the language itself.

Note that streams define only data layout not the way they're stored. For those stream types which do not impose the physical representation, the latter is determined by a flow which is analogous to a protocol prefix of an URL. Now the following flow types exist in the core: ordinary files, pipes and Tensile strings. Other flow types may be (and are) defined by extension modules and, with some limitations, by Tensile programs.

There are yet some things to be mentioned.

TO DO

A lot of things:


The author and maintainer of this project is Artem V. Andreev.
Mail me to artem@AA5779.spb.edu