Showing posts with label python. Show all posts
Showing posts with label python. Show all posts

Wednesday, 28 September 2016

Mixing iteration and read methods would lose data

I just upgraded to Python 2.7.12 and my code is now erroring with rather unhelpful Mixing iteration and read methods would lose data errors. Googling for this yields a number of StackOverflow posts where people are iterating over a file, and then within the loop, doing calls on the file. This sounds like a good error for their case, but in my case the fixes are somewhat more inane.

This errors:

    persistence.write_dict_uint32_to_list_of_uint32s(output_file, { k: [] for k in read_set_of_uint32s(input_file) })
This fixes that error:
    set_value = persistence.read_set_of_uint32s(input_file)
    persistence.write_dict_uint32_to_list_of_uint32s(output_file, { k: [] for k in set_value })
Another error at another location:
    persistence.write_set_of_uint32s(output_file, persistence.read_set_of_uint32s(input_file))
Some simple rearranging makes this error go away:
    set_value = persistence.read_set_of_uint32s(input_file)
    persistence.write_set_of_uint32s(output_file, set_value)
And finally, the next error at another location:
def read_string(f):
    s = ""
    while 1:
        v = f.read(1) # errors here
        if v == '\0':
            break
        s += v
    return s
And the fix for this:
def read_bytes(f, num_bytes):
    return f.read(num_bytes)

def read_string(f):
    s = ""
    while 1:
        v = read_bytes(f, 1)
        if v == '\0':
            break
        s += v
    return s
This error seems like a horribly broken idea, that is popping up in places it wasn't intended.

Friday, 26 June 2015

Generating e-books with Calibre on Hostmonster

One thing the new generated Imaginary Realities web site currently lacks, which the old hand-written version had, is epub and PDF versions of the published issues that can be downloaded and read at one's leisure.  Or put away for offline reference, if that's what you're into.

These e-books were generated by Calibre, which is a wonderful piece of work written primarily in Python.  It allows you to add a web page as an e-book, and it'll go through and follow the links and put it together as a HTML e-book.  Then you can convert that e-book into any number of formats, including the two formats I concentrate on, PDF and epub.  All this is done through a pretty standard media library interface for e-books.  But you can also drive the conversion by command line if you are willing to put in a bit more work.

As the new web site is generated into HTML using Jinja2, I wanted to take the output and then as a final step, push it through Calibre's ebook-convert command.  Compiling Calibre from source is a complicated endeavour which the author warns against the complexity of, so I wanted to avoid that.  Instead I downloaded and installed one of the Linux static builds.  By default, it installs to /opt, but it's possible to redirect it to use ~/ as the base directory instead.

Once installed, it's a somewhat straightforward matter to generate the e-books (see the source code). Take the command path, and the command arguments which map to the original conversion done in the GUI (you have to guess this somehow)

The command:

command = "/home/mememe/opt/calibre/ebook-convert"
The arguments:
    standard_arguments = ""
    standard_arguments += " --disable-font-rescaling"
    standard_arguments += " --margin-bottom=72"
    standard_arguments += " --margin-top=72"
    standard_arguments += " --margin-left=72"
    standard_arguments += " --margin-right=72"
    standard_arguments += " --chapter=/"
    standard_arguments += " --page-breaks-before=/"
    standard_arguments += " --chapter-mark=rule"
    standard_arguments += " --output-profile=default"
    standard_arguments += " --input-profile=default"
    standard_arguments += " --pretty-print"
    standard_arguments += " --replace-scene-breaks=\"\""
    standard_arguments += " --toc-filter=.*\[\d+\].*"
And then invoke calibre for each issue and each format:
        output_basename = "imaginary-realities-v%02di%02d-%04d%02d" % (volume_number, issue_number, year_number, month_number)
        for suffix in ("epub", "pdf"):
            output_filename = output_basename +"."+ suffix
            ret = subprocess.call([ command_path, html_path, os.path.join(output_path, output_filename) ])
Unfortunately, if you're on shared hosting, then you're at the mercy of whomever administers it. It turns out that PDF generation uses QT components, and if you're not on the right version of libc++ or some combination of libraries, then the PyQT extension modules that come with Calibre's static install, will simply fail to import. It's not so much a bug with Calibre, as Hostmonster offering a dated environment. The Mobi format is also affected by this problem.

Epub is about the only format which will generate without the use of QT. Unfortunate, but there's not much that can be done about it without investing a lot more work, and there's so many other things that I could do with this project. My iPad 1 (which is a poorly aging piece of junk) will accept both epub and PDF in Apples e-book reader app. Hopefully, most other modern devices can also handle the epub format.  We'll see!  It might also be that there's a different tool which can be more easily installed or compiled and which will generate PDFs of the same level of quality.

Thursday, 25 June 2015

Imaginary Realities source code

The Imaginary Realities web site is generated using Jinja2 from Python flat files.  I've changed the repository from private to public, as there's no real reason to keep it shelved away.  If my hosting goes down, someone else can easily generate their own version of the web site and host it, should that take their fancy.

The git repository is hosted on bitbucket:

https://bitbucket.org/rmtew/imaginary-realities

Friday, 12 June 2015

Imaginary Realities website updated!

I've finally found the time to give the Imaginary Realities website a more updated look.  The main change is that it's now generated by Jinja2, a Python templating library which I'd thoroughly recommend.  It made everything simple, up to and including picking out and including the featured article.


There's always more to do.  The next steps are likely to be taking advantage of the Reddit and Disqus Python APIs to make discussion of articles more readily discovered.

There's a whole lot of shenanigans with Disqus and something I think they call "Discover." At first you could opt out of it showing on your website, now you can kind of opt out of having your own posts discoverable.  It's unclear whether this means that you  will one day wake up and find giant rows of thumbnails of various trashy thumbnails and links to "top 10 celebrities who benefited from an all cabbage diet" and so forth.  I googled for ages trying to work this out, and they don't seem to appear at the moment, so fingers crossed.  The license for this site is Creative Commons non-commercial, so we'd have to look for a Disqus replacement if this started happening.

Saturday, 6 June 2015

stacklessemu

I use mobile broadband, and reducing bandwidth usage saves me a lot of money.  So when I wanted to play with pypes after hearing that Yahoo Pipes was being shut down, I somehow ended up taking Peter Szabo's greenstackless.py module, improving it a little, and putting it on a package up on PyPI.

It provides a good coverage of the Stackless API.  It's already been used as a backend for Peter Szabo's syncless project, and with the improvements I've made (adding the stackless.run() method, and fixing a greenlet/tasklet circular reference bug), it's even more capable.  But the devil is in the details.


It doesn't support the pre-emptive tasklet interruption that Stackless does, and can't as that is implemented via modifications to the internal Python VM source code.  Which is one of the few modifications Stackless makes in it's capacity as a fork of Python.  It is possibly possible to support this, as I believe Jeff Senn way back noted he had written a similar module with pre-emptive support based on the tracing hooks (with the downside that debuggers are unusable).

It doesn't support the threading model of Stackless Python.  This is derived by having a scheduler per thread, and most tasklets belonging to that scheduler at some level have slices of the stack of that thread and are therefore symbiotically attached to it.  It should be very possible to support this, with additional work, but that's something to set aside for when it's needed.

It doesn't support the module level properties, that Stackless implements.  And it can't unless it relies on hacks suggested in places like StackOverflow where classes are injected into 'sys.modules' to act as modules.

Future work I'm tempted to do is working out whether I can add the tealet project, as an alternative backend to the greenlet backend it currently has.  Greenlets are of course the stack slicing of Stackless extracted into an extension module.  tealet is the "next generation" version of the stack slicing intended for Stackless Python.  They're not using it already because it was such a wide ranging low level change to the workings of Stackless, that adopting it introduces potential instability which is not desirable at this time.

Kristjan Valur implemented a greenlet emulation module for tealet (in the same repo), so it may be that using it might be as simple as running stacklessemu on tealet.greenlets on tealet.  It's a pity it wasn't uploaded to pypi so I can try it out, but I don't think that's in Kristjan's wheelhouse.

Saturday, 16 August 2014

MIPS32 support

My satellite receiver has the odd bug, but otherwise it is generally a pretty good piece of hardware and software.  However, the bugs and the inability to fix them, are something I wish I could do something about.  The firmware, is written in MIPS assembly language, which runs on an Ali 3602 chip.  Scanning through it, you can see interesting things like the license for Linux-NTFS, something which was reported to the GPL violations mailing list several years back (with no action taken).

Anyway, I can't afford the main interactive disassembler out there, IDA.  So I have my own, which has a token amount of features comparatively.  Peasauce.  Up until now, it only disassembled m68k machine code, but I've just finished adding basic MIPS support.  It's nowhere near perfect, but it's a start.  And it shows how much work goes into IDA.


The next architecture is likely to be ARM support, although it like MIPS gets more complicated.  There's ARM and ARM thumb instructions, and they are different sized and mix together to some extent.  There's also MIPS32 and MIPS16, and they are different sized and mix together to some extent.  But that's a problem for another occasion.  The work on this could be endless, if I had the time.

Wednesday, 29 January 2014

Roguelike MUD progress #7 - Web-based client

Previous post: Roguelike MUD progress #6

I set a mini-goal for myself, to display a curses-based program on a web page.

The first step was to find a no frills, easy to adopt and extend Python websocket solution. One I've stumbled on  few times, is the simple websockets client/server gist. I used a slightly fixed version of it as a base. The next step was to find decent web-based terminal emulation. There are frameworks available which ditch the terminal window, but they come with larger problems.  The best candidate is term.js.

So taking the simple websockets simple_websocket_client.html and adding term.js, then modifying simple_websocket_server.py, it was easy to get a simple echo working with embedded ANSI control sequences.

First, I tried to proxy the input and output from the nano editor. Unfortunately, doing this with the subprocess module resulted in nano outputting a message about not supporting redirection.  There's probably a way to do this, but too much bother and it is not really my goal anyway.

Next, I went for something simpler.  I did a simple proxying of input and output to a telnet connection. Specifically my Roguelike MUD codebase.  There were some teething problems.  Text incoming from the browser-based terminal through the websocket, is UTF-8 encoded.  And similarly, outgoing text has to be as well.  But responses from the MUD contain both ANSI escape sequences and telnet negotiation sequences.  In order to work around this, I stripped the telnet negotiation out of outgoing text, and modified the relatively basic websocket code to also send binary frames.

A working connection:


In-game display:


As the picture above shows, the ANSI sequences are working perfectly.  But the character set does not seem to contain the line drawing characters.  My game interface already has debugging tools which display accessible character sets.

The main character sets:


As can be seen above, the only character set with even a hint of line drawing characters, is the one exposed through the ESC(0 ANSI sequence.  Most of these sequences are faked by term.js, where known character sets are mocked up from the underlying unicode-supporting font.

One solution is to do what I do with Putty, which is to identify the connected client, and if it's one which supports unicode, then switch unicode on using terminal control sequences.  Modifying term.js to respond to the ENQ character (0x05), and send it's terminal name (xterm) was easy enough.

This is what I should be seeing now:


This is what I am actually seeing:


The correct unicode characters are being sent, but not being displayed correctly by term.js.  I don't expect this will be that difficult to correct when I get some more time to work on this.

Saturday, 25 January 2014

Stackless Python 2.7.6

I've finally completed the last part of the Stackless Python release process for our fork of Python 2.7.6.  We're a little behind on a few releases, but the source code for all of them has been  stable and working in the Mercurial repository in a timely fashion.  Unfortunately, due to the political problems that have arisen with our Stackless 2.8 work, we've taken the repository private until we've addressed them.

Thanks to Robert Babiak for the MacOS installer, and Anselm Kruis for the Windows installers, the documentation and setting up readthedocs accessible through the documentation link below.

Links:




Monday, 4 February 2013

pysrd

One of my long-term projects is to programmatically generate game logic from the D&D 3.5 SRD.  The idea is that ideally if the required legwork was done, any time I wanted to prototype something around a game experience, it should be possible to plug in the processed data and be 98% of the way to having stock D&D gameplay.  There's a lot of hand-waving there around the phrase "required legwork", but that's not really important for the purpose of this post.


I've extracted the low level scripts I use to both extend the database and introspect it's contents, and published them on github under the project name pysrd.




The SRD is available from Wizards of the Coast as a set of RTF documents.  This of course, is not ideal, but fortunately it is possible to get the data from andargor.com in different database formats, including SQLite.  Unfortunately however, not all SRD data is included in these databases.  Fortunately, the OpenSRD project includes generated HTML versions of the RTF documents, which can more easily be processed.

Sunday, 11 November 2012

Peasauce, interactive disassembler.. someday

From Resource to Peasauce

I have a list of hobby projects that I've always wanted to find the time and energy to work on, and one of these is an interactive disassembler.  Years ago when people used a type of computer called "the Amiga", it had an extremely usable interactive disassembler called Resource.  You would point it at a file and it would turn the file into assembly language code and data, after which you would interactively disassemble some more, and finally make it write out a file containing the assembly language source code.

Disassembled and commented code in Resource.
Recently I finally managed to start using some spare time to work on "Peasauce", and have made a pretty good start.  What it can currently do isn't really useful yet, given that editing a disassembled file or exporting it isn't implemented yet. But being able to select a file, see the disassembled results and then scroll around it qualifies as interactive disassembling in the loosest sense.  So it's a good start.

Disassembled code in Peasauce.
In order to help make the code more general, in addition to Amiga executable files, it also loads those from the Atari ST and X68000 platforms. I've never used either, but they're both based on the Motorola 680x0 CPU family so it's a minimal amount of extra work over and above parsing a few semi or undocumented file structures.

UI

I initially started usng the Tkinter, because it comes with the standard library.  Unfortunately, out of the box it just wasn't flexible enough to work in the way I wanted.  As an alternative, I first looked at PySide but wasn't able to locate documentation to make it easy to work out if it suited my needs.  The website looks like it is in a transitional phase and several links to useful sounding documentation were dead unfortunately.  This left wxPython, which I've used before.  As a fringe benefit of having a small play around with PySide, I first encountered crashes and then with Anselm Kruis' recent patch to Stackless Python was able verify this decade long ABI incompatibility was fixed and that the crashes went away.

With wxPython, I'm currently using the ListCtrl where it simply knows how many list items there are and requests list items as needed for display.  Strangely, as the number of list items increase it gets slower, but that's likely my fault for some reason. As a short term solution while there's minimal interactivity, it's good enough, but the time is almost right for something more dynamic.

The bisect module

I was iterating over ordered lists looking for the right entry, and this was profiling as the place most time was spent.  A quick browse of the Python documentation introduced me to a module I'd never heard of before, the bisect module.  By keeping two lists instead of one, bisect let me look up objects in the second list by passing it the first list.  That makes it sound rather complicated, a simplified piece of example code is a lot more helpful.
 lookup_index = bisect.bisect_left(lookup_keys, lookup_key)
 if lookup_keys[lookup_idx] != lookup_key:
  lookup_value = lookup_values[lookup_idx-1]
 else:
  lookup_value = lookup_values[lookup_idx]
While I haven't actually looked, writing this up has made me suspect that if I also bung judicial use of this in the list control virtual item fetching, it might even solve the slowness problem with larger files.  Actually going away and looking at the code confirms it.  I'll need another indexing list though, and to keep that synchronised with the other two.

Opensauce

My laptop is a little unreliable and addressing that isn't on the table, so a good way to ensure I don't lose the source code should anything happen, is to throw it up on github with everyone else's half completed projects where it can also sit mouldering and abandoned.  Or slowly be developed further, of course.

Link: Peasauce on github.

Tuesday, 26 June 2012

Stackless Python and Stack Overflow

I often search for programming related information, and gladly use the Stack Overflow based results.  More often than not, the answer I am looking for is already there.  This unfortunately means that people blindly post their questions there, without it occurring to them that there might be a better place where actual experts are present.  This better place is the Stackless Python mailing list.

For projects where the number of knowledgeable users present on Stack Overflow is high enough, it obviously serves as an invaluable resource.  For a lesser used niche project like Stackless Python, not so much.

Wednesday, 20 June 2012

Mnemosyne 2 finally released

Just noticed there was a new release of Mnemosyne, the spaced repetition flashcard program I use which also happens to be written in Python.  It's ten times larger than the previous version, but at 20 megabytes that's not a big deal.  A quick look suggests that this is due to the inclusion of mplayer and the updated version of QT being used.


There was no information I could locate about the repercussions of installing 2.0 over an existing 1.x installation, so I went ahead and installed it under a different menu and installation directory name.  Either as part of the installation process, or when first run, it located my existing cards and imported them.


The revision interface looks pretty much the same.  The card browsing interface which I didn't screenshot, is much improved however.


There's also new support for synchronisation with mobile devices built in, and a whole range of other more standard features used in the implementation of the new version.  I haven't had a chance to explore these yet, but am looking forward to it.

The older version had support for user authored plug-ins written in Python, which I only used to convert Chinese pinyin numbers to tone marks.  Unfortunately, there's no matching support in the new version yet.  For a long time, it seemed like this was one of those projects where the author had given up on it and was effectively distracted by a never-ending "rewrite".  Glad to see it's not the case.

Saturday, 3 March 2012

pyuv experimentation

The appeal of libuv, and therefore it's Python binding pyuv, is that it provides asynchronous access to a wide range of functionality.  I don't think there is any documentation for libuv itself, if you want to understand the nuances of how to use it, you probably need to jump back and forwards between C source files interpreting the use which has been made of various macros.  The pyuv documentation however, provides a decent overview of the functionality available and how it can be used for Python.

Unfortunately, there is still a degree of libuv source code reading involved, working out what happens in various situations. In this situation, I was curious what happened when the server disconnected a client that was trying to read from its accepted connection.

Server Code

The code starts by locating or creating a loop, which is responsible for managing the events and callback dispatching related to the IO that is performed with respect to that loop. Next a timer is created, this in theory every 20th of a second calls a function. In reality, it is a hack to work around the fact that you can't directly specify a maximum timeout to the run_once method. Instead, the timeout enables the run_once to exit rather than blocking indefinitely.

Next, a TCP connection is created. It is set to be allowed to reuse the same port without waiting, bound to a listening address and then the loop is run until the callback indicating the listen operation occurs.
import pyuv
loop = pyuv.Loop.default_loop()
timer = pyuv.Timer(loop)
timer.start(lambda *args: None, 0.0, 0.05)
listen_socket = pyuv.TCP(loop)
listen_socket.nodelay(True)
listen_socket.bind(("0.0.0.0", 3000))
had_listen_callback = False
def listen_callback(*args):
    global had_listen_callback
    print "listen_callback", args
    had_listen_callback = True

listen_socket.listen(listen_callback, 5)
while not had_listen_callback:
    timer.again()
    loop.run_once()
The listen callback is called everytime there is an incoming connection so that it can be accepted and handled however. The code now proceeds to accept the connection, write token data to it, and then to wait until the data has been sent.
incoming_socket = pyuv.TCP(loop)
listen_socket.accept(incoming_socket)
had_write_callback = False
def write_callback(*args):
    global had_write_callback
    print "write_callback", args
    had_write_callback = True

incoming_socket.write("DATA", write_callback)

while not had_write_callback:
    timer.again()
    loop.run_once()
And finally, once the data has been sent, the socket is closed.
had_close_callback = False
def close_callback(*args):
    global had_close_callback
    had_close_callback = True

incoming_socket.close(close_callback)
while not had_close_callback:
    timer.again()
    loop.run_once()
Client Code

The client code has the same boilerplate to begin with, but then tries to connect to the listening address, and waits for the connection to be established.
import pyuv
loop = pyuv.Loop.default_loop()
timer = pyuv.Timer(loop)
timer.start(lambda *args: None, 0.0, 0.05)
client_socket = pyuv.TCP(loop)
had_connect_callback = False
def connect_callback(*args):
    global had_connect_callback
    print "connect_callback", args
    had_connect_callback = True

client_socket.connect(("127.0.0.1", 3000), connect_callback)
while not had_connect_callback:
    timer.again()
    loop.run_once()
Next the client starts reading, and prints out the arguments everytime it receives data in it's callback.
had_read_callback = False
def read_callback(*args):
    global had_read_callback
    print "read_callback", args
    had_read_callback = True

client_socket.start_read(read_callback)
while True:
    timer.again()
    loop.run_once()
Results

The answer to the question of what happens when the client is disconnected mid-read, is that it receives a data value of None and an error value of UV_EOF. Pretty simple actually.

My main reason for playing with pyuv, is to hopefully replace the monkey-patching framework that is stacklesslib, with one that can cover a wider variety of thread blocking functionality. The existing Stackless-compatible socket module is based on asyncore and by extension it wraps the existing Python networking functionality in an asynchronous manner, and it can handle the more complicated functionality like makefile and ioctl because of this.

Unfortunately, libuv and by extension pyuv, abstracts away or does not provide access to much of the standard API. This means a lot more work if those aspects are to be emulated. Sure, it might be an easier approach to simply not implement them, but that loses a lot of the benefit of writing a monkey-patched module. If the replacement socket module provides makefile, then it is compatible with a wider range of other standard library modules.

I've written the most straightforward part of a socket module for Stackless based on pyuv, and it can be seen within a Google hosted project here.

Friday, 23 December 2011

Python garbage collection crash

I run the unit tests for the Windows debug build of Stackless Python 2.7.2, the tests pass, but on exiting the interpreter crashes.  The problem is that the garbage collector finds a 1 element list, where that element has either been already garbage collected or never initialised.  Logically, the following occurs.


How do you track down the cause of this crash?

Friday, 16 December 2011

Gyp and libuv

One open source project that I've heard a bit about recently is libuv.  A cross-platform asynchronous IO library, it is something that I can plug into Kristjan Valur's stacklesslib module.  And all the work writing the Python bindings has already been done by Saúl Ibarra Corretgé in his pyuv project.

Anyway, I spent some time today getting it compiling for Windows and thought I would post a note about gyp, so that I could remember to avoid the same problems in future.  Gyp is a pretty handy build system. Just define the gist of what needs to be compiled in a mildly arcane mark-up, and it will churn out Makefiles, Visual Studio projects and solutions and a range of other things.  However, as with most programming related things, it is the little details that trip you up and cause you to waste your time.

Missing include files

If you are compiling with Visual Studio 2008, and it includes inttypes.h or stdint.h, errors will occur as these files will not be present.

...
libuv\include\uv.h(54): fatal error C1083: Cannot open include file: 'stdint.h': No such file or directory
...
Apparently the solution to this, after a lot of googling on what constitutes a wi-fi connection over here in New Zealand,  is to download something like the custom msinttypes versions of these files.   And then bung them somewhere Visual Studio will find it.. wherever that is.

Overriding gyp settings

If you are compiling a static library with Visual Studio using gyp, it will start with a range of default settings.  You will not be able to override these settings no matter where you put your own custom versions in your gyp configuration file.  The specific problem I had was that Python extensions are compiled with a RuntimeLibrary setting of MultiThreadedDLL and libuv is compiled with a different setting of MultiThreaded.  These are of course incompatible, and a variety of compilation errors are caused by it.
...
LIBCMT.lib(crt0dat.obj) : error LNK2005: __amsg_exit already defined in MSVCRT.lib(MSVCR90.dll)
...
LINK : warning LNK4098: defaultlib 'MSVCRT' conflicts with use of other libs; use /NODEFAULTLIB:library
LINK : warning LNK4098: defaultlib 'LIBCMT' conflicts with use of other libs; use /NODEFAULTLIB:library
LIBCMT.lib(crt0.obj) : error LNK2001: unresolved external symbol _main
build\lib.win32-2.7\pyuv.pyd : fatal error LNK1120: 1 unresolved externals
In order to force this gyp setting to change from the default, it needs to be defined within a conditional expression.  The default setting is also a conditional expression, and .. well.. work it out.  Maybe it's in the documentation somewhere, but not in a clearly stated way that I could find.

Saturday, 5 November 2011

Scheduler deadlock

One common approach to creating tasklets, is to wrap the callable that will run within the newly created tasklet in a function that captures exceptions.

Something like the following:

import traceback

def create_tasklet(f, *args, **kwargs):
    try:
        return f(*args, **kwargs)
    except BaseException, e:
        # Let TaskletExit, SystemExit and KeyboardInterrupt raise up.
        if isinstance(e, SystemExit, KeyboardInterrupt):
            raise e
        traceback.print_exc()
Another exception that is worth handling especially, is RuntimeError. In the case where the last tasklet tries to block, the scheduler will instead raise a RuntimeError on it in order to prevent deadlock. Catching this exception and showing what the main tasklet is doing, and why it is blocked, is helpful. That is assuming you are running the scheduler in the main tasklet.

Something like the following:
import traceback

def create_tasklet(f, *args, **kwargs):
    try:
        return f(*args, **kwargs)
    except BaseException, e:
        if isinstance(e, RuntimeError) and e[0].startswith("Deadlock:"):
            print "Deadlocked tasklet, the main tasklet is doing the following:"
            traceback.print_stack(stackless.main.frame)
        # Let TaskletExit, SystemExit and KeyboardInterrupt raise up.
        elif isinstance(e, SystemExit, KeyboardInterrupt):
            raise e
        traceback.print_exc()
For future reference, the situation where I needed this was where I was polling for completed asynchronous file IO events. In theory this was programmed to do so in a way where it would schedule any tasklets blocked waiting for their IO to complete, so they would run when the scheduler was pumped next. However, due to a race condition the main tasklet sent the results of the IO operation to the channel before the tasklet doing the operation itself blocked to wait for those results. Then when the other tasklet itself blocked, the deadlock situation occurred.

Sunday, 11 September 2011

Minimal Sqlite library size

The Sqlite website states that the library size should be around 300KB, and that by omitting optional features it was possible to get the file size down to 180KB.

SQLite is a compact library. With all features enabled, the library size can be less than 300KiB, depending on compiler optimization settings. (Some compiler optimizations such as aggressive function inlining and loop unrolling can cause the object code to be much larger.) If optional features are omitted, the size of the SQLite library can be reduced below 180KiB.
There are also disclaimers that omitting optional features is unsupported and not guaranteed to work.
Important Note: The SQLITE_OMIT_* compile-time options are unsupported.
And there are three out of the many which do not work. When they are specified, compilation errors due to other external code depending on parts of them.
  • SQLITE_OMIT_COMPLETE
  • SQLITE_OMIT_DISKIO
  • SQLITE_OMIT_GET_TABLE
Which leaves the following options that do work:
nmake "OPTS=-DSQLITE_OMIT_ALTERTABLE -DSQLITE_OMIT_ANALYZE -DSQLITE_OMIT_ATTACH -DSQLITE_OMIT_AUTHORIZATION -DSQLITE_OMIT_AUTOINCREMENT -DSQLITE_OMIT_AUTOINIT -DSQLITE_OMIT_AUTOMATIC_INDEX -DSQLITE_OMIT_AUTORESET -DSQLITE_OMIT_AUTOVACUUM -DSQLITE_OMIT_CAST -DSQLITE_OMIT_BETWEEN_OPTIMIZATION -DSQLITE_OMIT_BLOB_LITERAL -DSQLITE_OMIT_BUILTIN_TEST -DSQLITE_OMIT_BTREE_COUNT -DSQLITE_OMIT_CHECK -DSQLITE_OMIT_COMPILEOPTION_DIAGS -DSQLITE_OMIT_COMPOUND_SELECT -DSQLITE_OMIT_DATETIME_FUNCS -DSQLITE_OMIT_DECLTYPE -DSQLITE_OMIT_EXPLAIN -DSQLITE_OMIT_FLAG_PRAGMAS -DSQLITE_OMIT_FLOATING_POINT -DSQLITE_OMIT_FOREIGN_KEY -DSQLITE_OMIT_INCRBLOB -DSQLITE_OMIT_INTEGRITY_CHECK -DSQLITE_OMIT_LIKE_OPTIMIZATION -DSQLITE_OMIT_LOAD_EXTENSION -DSQLITE_OMIT_LOCALTIME -DSQLITE_OMIT_LOOKASIDE -DSQLITE_OMIT_MEMORYDB -DSQLITE_OMIT_OR_OPTIMIZATION -DSQLITE_OMIT_PAGER_PRAGMAS -DSQLITE_OMIT_PRAGMA -DSQLITE_OMIT_PROGRESS_CALLBACK -DSQLITE_OMIT_QUICKBALANCE -DSQLITE_OMIT_REINDEX -DSQLITE_OMIT_SCHEMA_PRAGMAS -DSQLITE_OMIT_SCHEMA_VERSION_PRAGMAS -DSQLITE_OMIT_SHARED_CACHE -DSQLITE_OMIT_SUBQUERY -DSQLITE_OMIT_TCL_VARIABLE -DSQLITE_OMIT_TEMPDB -DSQLITE_OMIT_TRACE -DSQLITE_OMIT_TRIGGER -DSQLITE_OMIT_TRUNCATE_OPTIMIZATION -DSQLITE_OMIT_VACUUM -DSQLITE_OMIT_UTF16 -DSQLITE_OMIT_VIEW -DSQLITE_OMIT_VIRTUALTABLE -DSQLITE_OMIT_WAL -DSQLITE_OMIT_XFER_OPT" -f makefile.msc all dll
Additionally there are some other features that can be disabled by modifying the makefile before compilation.
MAKEFILE_NAME = "Makefile.msc"
s = open(MAKEFILE_NAME, "r").read()
s = s.replace("OPT_FEATURE_FLAGS = $(OPT_FEATURE_FLAGS) -DSQLITE_ENABLE_FTS3=1",
    "# OPT_FEATURE_FLAGS = $(OPT_FEATURE_FLAGS) -DSQLITE_ENABLE_FTS3=1")
s = s.replace("OPT_FEATURE_FLAGS = $(OPT_FEATURE_FLAGS) -DSQLITE_ENABLE_RTREE=1",
    "# OPT_FEATURE_FLAGS = $(OPT_FEATURE_FLAGS) -DSQLITE_ENABLE_RTREE=1")
s = s.replace("OPT_FEATURE_FLAGS = $(OPT_FEATURE_FLAGS) -DSQLITE_ENABLE_COLUMN_METADATA=1",
    "# OPT_FEATURE_FLAGS = $(OPT_FEATURE_FLAGS) -DSQLITE_ENABLE_COLUMN_METADATA=1")
s = s.replace("-O2", "-Os")
open(MAKEFILE_NAME, "w").write(s)
And gives the minimal file sizes of:
381,162 libsqlite3.lib
373,588 sqlite3.lo
382,464 sqlite3.exe
327,680 sqlite3.dll
This is not quite as simple (or perhaps achievable) as advertised. My current assumption is that the Sqlite website is four or five years out of date in describing what is possible.

Friday, 3 June 2011

Wanted: Memory conservative key-value store

I would really like to find a key value store that is memory conservative. What we currently have is like a souped up version of the dumbdbm standard library module, but it has a cache budget and as it loads in new values flushes older ones to make room. However, as the amount of data managed increases so does the amount of key metadata indicating whereabouts the values lie on disk. So now the next step is to either add some form of caching for key metadata, or find a suitable free open source solution.

Does anyone know of a suitable one that is not constrained by the GPL? It doesn't have to be Python, but Python bindings are a bonus.

Considering Sqlite

When thinking of low memory database solutions, Sqlite is one that comes to mind, and even better it comes as part of the Python distribution these days. And even betterer, there's a custom port for my uncommon platform of choice. And even.. bettererer it has an IO abstraction layer that allows it to work with custom IO solutions with minimal additional work. Additionally, reading the spiel makes it sound appealing memory-wise:

SQLite is a compact library. With all features enabled, the library size can be less than 300KiB, depending on compiler optimization settings. (Some compiler optimizations such as aggressive function inlining and loop unrolling can cause the object code to be much larger.) If optional features are omitted, the size of the SQLite library can be reduced below 180KiB. SQLite can also be made to run in minimal stack space (4KiB) and very little heap (100KiB), making SQLite a popular database engine choice on memory constrained gadgets such as cellphones, PDAs, and MP3 players. There is a tradeoff between memory usage and speed. SQLite generally runs faster the more memory you give it. Nevertheless, performance is usually quite good even in low-memory environments.
But you know what? I am as yet unable to get it down to 180KiB no matter how many features I compile out of it using the handy SQLITE_OMIT... options. And not all options can be omitted if I want to use pysqlite, as it does not suit the maintainer to support them.

Here's a clipped table of the code sizes for various cross-compilations:

Overall
libsqlite3.a
libpysqlite3.a
Description
1
0
0
Without sqlite
2
487536
425060
49860
Sqlite with optimise for size
3
365212
308730
46740
Sqlite with optimise for size + code omissions
4
536608
472290
50560
Sqlite with full optimise
5
402492
344440
47430
Sqlite with full optimise + code omissions

In the Windows Python 2.7 installation _sqlite3.pyd is 48KB and sqlite3.dll is 417 KB. So the sizes above, are still comparatively above that expecting both to be done with no omissions and full optimisation. But more or less close enough.

Considering home grown

Any third party solution would need to be adapted to deal with the custom IO needs, unless it was written in pure Python. At this point, the simplest solution is just to extend what I already have.

Edit: Just a note, the key desired feature is memory management. It should be possible to put hard constraints on the amount of memory it uses, both for the cached records read from disk, and for the lookup information that maps keys to location of records on disk. Most key value stores I have looked at either claim to keep all keys in memory as a feature, or just keep them all in memory because it is the simple thing to do.

Friday, 29 April 2011

Python lazyimporting

At work, we have a modified version of this lazyimport module. Originally, I used it to work out which modules were not actually ever made use of, but the suggestion was made that it should be enabled by default. And it turns out that enabling it does result in worthwhile memory savings, something which is very useful when you have a budget, but it's not all plain sailing..

We've encountered problems along the way, one of which I will mention while it is fresh for anyone who might be using, or planning to use the same code.

static PyObject *
get_warnings_attr(const char *attr)
{
    static PyObject *warnings_str = NULL;
    PyObject *all_modules;
    PyObject *warnings_module;
    int result;

    if (warnings_str == NULL) {
        warnings_str = PyString_InternFromString("warnings");
        if (warnings_str == NULL)
            return NULL;
    }

    all_modules = PyImport_GetModuleDict();
    result = PyDict_Contains(all_modules, warnings_str);
    if (result == -1 || result == 0)
        return NULL;

    warnings_module = PyDict_GetItem(all_modules, warnings_str);
    if (!PyObject_HasAttrString(warnings_module, attr))
            return NULL;
    return PyObject_GetAttrString(warnings_module, attr);
}

The problem is that PyDict_GetItem returns a borrowed reference, then the subsequent call to PyObject_HasAttrString causes the replacement of the lazy loading stub with the real imported module (and the garbage collection of the stub), and finally.. PyObject_GetAttrString chokes on its clobbered reference. Strangely, this only happens for certain people consistently in binaries they themselves built which are linked against the Python static library. The solution is that the warnings module needs to be excluded from lazy importing, or else this code will crash.

If there were a central repository for this module and a maintainer, I'd get our code released so that others could benefit from our changes (or we could get feedback on them).

Anyone else using this incredibly cool piece of code?

Wednesday, 16 March 2011

Weird "Linked-In" recruiting emails

I often get emails saying something like "Hey, I'm SOMEONE from REPUTABLE COMPANY. I saw your profile on Linked-In and we'd like to know if you are interested in POSITION with us." Yesterday's email was for "Senior Python Developer".

I reply to these saying how I already have a job, but that I deleted my Linked-In profile a year or two ago and that I'd appreciate it if they could tell me where they got my contact information. The resultant emails include backpedaling and claiming that they must have seen a forum post by me, my (non-existent) github profile, or something. Consistently. The initial email always states Linked-In. But follow up inquiry by me results in hand-waving.

The emails always come from some HR person at a reputable company. I'd love to know what is going on. Is Linked-In selling my old information (even though I believe I deleted my account) as part of a database? Is there some lead harvesting service that collects this information and sells it?