Red Programming Language: vector

Showing posts with label vector. Show all posts

April 26, 2015

0.5.3: Faster compilation and extended vector! support

The main point of this minor release is to speed up compilation time by introducing a new way for the compiler to store Red values required for constructing the environment during the runtime library startup.

Introducing Redbin

Red already provides two text-oriented serialization formats, following the base Rebol principles. Here are the available serialization formats now in Red with some pros/cons:

MOLD format

provides a default readable text format, very close to the source code version
cannot properly encode many values

MOLD/ALL format

can encode series offsets
some values with literal forms that rely on words that can be natively encoded (none, true/false, objects, ...)
human-readable, but not always nice-looking

Redbin format

can encode any value accurately
supports words binding
can encode contexts efficiently
supports cycles in blocks
can encoded name/value pairs in any context
extremely fast loading time
very small storage space used when compressed
non human-readable

So far, the existing environment source code (mostly block values) was converted to pure Red/System construction code which was pretty simple and straightforward to implement, but was generating thousands of extra lines of code, slowing down the native compilation process. The right solution for that was to introduce a new binary serialization format for Red values called Redbin (very inspired by Carl's REBin proposal).

Redbin's specification focuses on optimizing the loading time of encoded values, by making their stored representation very close to their memory representation, bypassing the parsing and validation stages. Moreover, the Redbin payload is compressed using the Crush algorithm (that Qtxie ported to Red/System), which features one of the fastest decompressors around while having a general compression ratio very close to the deflate algorithm (but compression speed is about an order of magnitude slower). This fits perfectly the needs for our Redbin use-case.

So the gains compared to pre-0.5.3 version are:

compilation time of empty Red program is ~40% faster!

generated executable of empty Red program is about 100KB smaller (278KB only on Windows now).

faster startup time, as the Redbin decoding process is much faster than the previous Red-stack-oriented construction approach.

Those benefits also extend to user code, your static series will be saved in Redbin format as well.

Redbin format is currently emitted by the compiler and decoded by the Red runtime, but there is no encoder yet in the runtime that would allow user code to emit Redbin format. We will provide that support in a future version, it is not high priority for now. A "compact" version of the encoding format will also be added, so that Redbin can also be a good choice for remote data exchange.

Compilation from Rebol console

For those using Red toolchain from Rebol2 console, a new rc function is introduced to avoid reloading the toolchain on each run. Typical session looks like this:

    >> do %red.r
    >> rc "-c tests\demo.red"

    -=== Red Compiler 0.5.3 ===-

    Compiling /C/Dev/Red/tests/demo.red ...
    ...compilation time : 416 ms

    Compiling to native code...
    Script: "Red/System PE/COFF format emitter" (none)
    ...compilation time : 12022 ms
    ...linking time     : 646 ms
    ...output file size : 284160 bytes
    ...output file      : C:\Dev\Red\demo.exe
 
    >> call/output "demo.exe" s: make string! 10'000
    == 0
    >> print s

      RedRed              d
      d     d             e
      e     e             R
      R     R   edR    dR d
      d     d  d   R  R  Re
      edRedR   e   d  d   R
      R   e    RedR   e   d
      d    e   d      R   e
      e    R   e   d  d  dR
      R     R   edR    dR d

Collation tables

Since 0.5.2, Red provides collation tables for more accurate case folding support. Those tables can now be accessed by users using these paths:

    system/locale/collation/upper-to-lower
    system/locale/collation/lower-to-upper

Each of these tables is a vector of char! values which can be freely modified and extended by users in order to cope with some specific local rules for case folding. For example, in French language, the uppercase of letter é can be E or É. There is a divide among French people about which one should be used and in some cases, it can just be a typographical constraint. By default, Red will uppercase é as É, but this can be easily changed if required, here is how:

    uppercase "éléphant"
    == "ÉLÉPHANT"
    
    table: system/locale/collation/lower-to-upper
    foreach [lower upper] "àAéEèEêEôOûUùUîIçC" [table/:lower: upper]

    uppercase "éléphant"
    == "ELEPHANT"

Extended Vector! datatype

Vector! datatype now supports more actions and can store more datatypes with different bit-sizes. For integer! and char! values, you can store them as 8, 16 or 32 bits values. For float!, it is 32 or 64 bits. Several syntactic forms are accepted for creating a vector:

    make vector! <slots>
    make vector! [<values>]
    make vector! [<type> <bit-size> [<values>]]
    make vector! [<type> <bit-size> <slots> [<values>]]

    <slots>    : number of slots to preallocate (32-bit slots by default)
    <values>   : sequence of values of same datatype
    <type>     : name of accepted datatype: integer! | char! | float!
    <bit-size> : 8 | 16 | 32 for integer! and char!, 32 | 64 for float!

The type of the vector elements can be inferred from the provided values, so it can be omitted (unless you need to force a bit-size different from the values default one). If a value with a bit-size greater than the vector elements one, is inserted in the vector, it will be truncated to the bit-size of the vector.

For example, creating a vector that contains 1000 32-bit integer values:

    make vector! 1000

Or if you want to specify the bit-size of the vector element:

    make vector! [char! 16 1000]
    make vector! [float! 64 1000]

You can also initialize a vector from a block as below:

    make vector! [1.1 2.2 3.3 4.4]

Again you can also specify the bit-size of the vector element:

    make vector! [integer! 8 [1 2 3 4]]

For integer! and char! vectors, you can use all math and bitwise operators now.

    x: make vector! [1 2 3 4]
    y: make vector! [2 3 4 5]
    x + y
    == make vector! [3 5 7 9]

In case of different bit-sizes, the resulting vector will be using the highest bit-size. If a math operation is producing a result that does not fit the bit-size, the result is currently truncated to the bit-size (using a AND operation). Ability to read and change the bit-size of a vector will be added in future releases.

The following actions are added to vector! datatype: clear, copy, poke, remove, reverse, take, sort, find, select, add, subtract, multiply, divide, remainder, and, or, xor.

The vector! implementation is not yet final, some of its actions will get optimized for better performances and, in future, rely on SIMD for even faster operations. For multidimensional support, it will be implemented as a new matrix! datatype in the near future, inheriting from vector!, so the additional code required will be kept minimal.

Bugfixing

This was a short-term release, but we managed to fix a few bugs anyway.

What's next

Another minor release will follow with many runtime library additions and new toolchain improvements. See the planned features for 0.5.4 on our Trello board.

The 0.6.0 release will also most probably be split in two milestones, one for GUI and another for Android support.

In the meantime, enjoy this new release! :-)

March 15, 2015

0.5.1: New console and errors support

This new release brings many new features, improvements and some bugfixes that will make Red more usable, especially for newcomers. The initial intent for this release was just to replace the existing console implementation, but it looked like the right time to finally implement also proper general error handling support.

New console engine

The old console code we were using so far for the Red REPL was never meant to last that long, but as usual in software development, temporary solutions tend to become more permanent than planned. Though, the old console code really needed a replacement, mainly for:

removing the dependency to libreadline and libhistory, they were creating too many issues on the different Unix platforms, so became troublesome for many newcomers.

having a finer-grained control over keystrokes on text input, in order to implement convenient features like word completion.

having a bigger platform-independent part, so that we can add any kind of backends, like GUI ones, without duplicating too much code.

So, the new console code gets rid of third-party libraries and runs only on what the OS provides. The new features are:

built-in history, accessible from system/console/history
customizable prompt from system/console/prompt
word and object path completion using TAB key
ESC key support for interrupting a multi-line input

Other notable console-related improvements:

about function now returns also the build timestamp.
what function has now a more readable output.
Console output speed on Windows is now very fast, thanks to the patch provided by Oldes for buffered output.

The console code is not in its final form yet, it needs to be even more modular and wrapped in a port! abstraction in the future.

Errors support

Red now supports first class errors as the error! datatype. They can be user-created or produced by the system. The error definitions are stored in the system/catalog/errors object.

 red>> help system/catalog/errors
 `system/catalog/errors` is an object! of value:
     throw            object!   [code type break return throw continue]
     note             object!   [code type no-load]
     syntax           object!   [code type invalid missing no-header no-rs-h...
     script           object!   [code type no-value need-value not-defined n...
     math             object!   [code type zero-divide overflow positive]
     access           object!   [code type]
     user             object!   [code type message]
     internal         object!   [code type bad-path not-here no-memory stack...

User errors can be created using make action followed by an error integer code or a block containing the category and error name:

 red>> make error! 402
 *** Math error: attempt to divide by zero
 *** Where: ???

 red>> make error! [math zero-divide]
 *** Math error: attempt to divide by zero
 *** Where: ???

These examples are displaying an error message because the error value is the returned value, we still need to implement a full exception handling mechanism using throw/catch natives in order to enable raising user errors that can interrupt the code flow. The error throwing sub-system is implemented and used by the Red runtime and interpreter, just not exposed to the user yet.

Errors can be trapped using the try native. An error! value will be returned if an error was generated and can be tested using the error? function.

 red>> a: 0 if error? err: try [1 / a][print "divide by zero"]
 divide by zero
 red>> probe err
 make error! [
    code: none
    type: 'math
    id: 'zero-divide
    arg1: none
    arg2: none
    arg3: none
    near: none
    where: '/
    stack: 3121680
 ]
 *** Math error: attempt to divide by zero
 *** Where: /

Currently the console will display errors if they are the last value. That behavior will be improved once the exception system for Red will be in place.

Errors when displayed from compiled programs, provide calling stack information to make it easier to locate the source code where the error originated from. For example:

    Red []
    
    print mold 3 / 0

will produce the following error once compiled and run:

    *** Math error: attempt to divide by zero
    *** Where: /
    *** Stack: print mold /

SORT action

Sorting data is now supported in Red, in a polymorphic way as in Rebol. The sort action is very versatile and useful. Let's start from a basic example:

    scores: [2 3 1 9 4 8]
    sort scores
    == [1 2 3 4 8 9]

As you can see, sort modifies the argument series, you can keep the series unchanged by using copy when passing it as argument:

    str: "CgBbefacdA"
    sort copy str
    == "aABbCcdefg"
    sort/case copy str
    == "ABCabcdefg"
    str
    == "CgBbefacdA"

By default, sorting is not sensitive to character cases, but you can make it sensitive with the /case refinement.

You can use /skip refinement to specify how many elements to ignore, it's handy when you need to sort records of a fixed size.

    name-ages: [
        "Larry" 45
        "Curly" 50
        "Mo" 42
    ]
    sort/skip name-ages 2
    == ["Curly" 50 "Larry" 45 "Mo" 42]

The /compare refinement can be used to specify how to perform the comparison. (It does not yet support block! as argument)

    names: [
        "Larry"
        "Curly"
        "Mo"
    ]
    sort/compare names func [a b] [a > b]
    == ["Mo" "Larry" "Curly"]

Combining it with /skip refinement, you can do some complex sorting task.

    name-ages: [
        "Larry" 45
        "Curly" 50
        "Mo" 42
    ]
    sort/skip/compare copy name-ages 2 2    ;-- sort by 2nd column
    == ["Mo" 42 "Larry" 45 "Curly" 50]

The /all refinement will force the entire record to be passed to the compare function. This is useful if you need to compare one or more fields of a record while also doing a skip operation. In the following example, sorting is done by the second column, in descending order:

    sort/skip/compare/all name-ages 2 func [a b][a/2 > b/2]
    == ["Curly" 50 "Larry" 45 "Mo" 42]

Sort uses Quicksort as its default sorting algorithm. Quicksort is very fast, but it is an unstable sorting algorithm. If you need stable sorting, just add /stable refinement, it will then use Merge algorithm instead to perform the sort.

New datatypes

A couple of new datatypes were added in this release, mostly because of internal needs in Red runtime to support the new features.

The typeset! datatype has been fully implemented, and is on par with the Rebol3 version. A typeset! value is a set of datatypes stored in a compact array of bits (up to 96 bits). Datatype lookups are very fast in typesets and they are mostly used internally for runtime type-checking support. The following actions are supported on typeset! values: make, form, mold, and, or, xor, complement, clear, find, insert, append, length?. Comparison operators are also supported.

A preliminary implementation of the vector! datatype is also part of this release. A vector! value is a series of number values of same datatype. The internal implementation uses a more compact memory storage format than a block! would do, while, on the surface, behaving the same way as other series. Only 32-bit integer values can be stored for now in vectors. The following actions are supported by vector! values: make, form, mold, at, back, head, head?, index?, insert, append, length?, next, pick, skip, tail, tail?. The implementation will be completed in future releases.

Runtime type checking support

It has finally being implemented, as proper error handling support is now available. So from this release on, function arguments types will be check against the function specification and non-conforming cases will result in an error. Return value type-checking will be added later.

The type-checking might break some existing Red code around that was letting silently pass invalid arguments, so check your code with this new release before upgrading.

The compiler does not do any type checking yet, that will be added at a later stage (though, don't expect too much from it, unless you annotate with types every function exhaustively).

Also notice that the runtime type-checking implementation is making the Red interpreter a little bit faster, thanks to a new optimized way to handle function specification blocks (an optimized spec block is cached after first call, resulting in much faster processing time afterwards).

Red/System improvements

Exceptions handling has been improved, introducing the catch statement allowing to catch exceptions using an integer filtering value. Here is a simple example in the global context:

    Red/System []

    catch 100 [
        print "hello"
        throw 10
        print "<hidden>"
    ]
    print " world"

will output

    hello world

The integer argument for catch intercepts only exceptions with a lower value, providing a simple, but efficient filtering system.

In addition to that, uncaught exceptions are now properly reporting a runtime error instead of passing silently. This new enhanced low-level exception system is supporting the new higher-level Red error handling system.

A couple of new compiler directives have been also added in order to strengthen the interfacing with Red layer:

    #get <path>

The #get directive returns a red-value! pointer on a value referred by a Red object path. This is used internally in the runtime to conveniently access the Red system object content from Red/System code. This directive will be extended in the future to access also words from Red global context.

    #in <path> <word>

The #in directive returns a red-word! pointer to a Red word bound to the object context referred by path.

What's next?

In addition to many minor pending improvements, we will be working on a minor release that will introduce the Redbin format for accurately serialize Red values in binary form. Redbin format will be used to make the compilation process much faster, as it currently slows down pretty quickly as the Red-level environment code size grows up.

Enjoy this new release! :-)

Pages

April 26, 2015

0.5.3: Faster compilation and extended vector! support

March 15, 2015

0.5.1: New console and errors support