05. Dec
2003

Template Untwining

I am thinking about reverse-template engines, that is engines for taking templates and their instances and extracting the data that was filled into the templates.

For example if I have the Zope Page Template:

<h1 tal:content='document/title'> Document Title Goes Here</h1>

And this instance of it:

<h1>Rampaging Rucksacks Ravage Rome</h1>

I would like an automated way of extracting document/title, possibly getting this Python data structure:

{ 'document/title': 'Rampaging Rucksacks Ravage Rome', }

Well, I should probably head off to my trusted editor, write a few unittests and see if I can’t hack together something to satisfy them.

02. Dec
2003

Line-wrapping when soft hyphenation is wrong

Update: (31. december 2003) It seems my reading of the HTML 4 specification is not universal. And, to make matters worse, neither of the HTML, Unicode or ISO Latin 1 standards agree on what the shy character should mean. See Soft Hyphenation (SHY) - a hard problem? for the gory details.

There is this nifty little HTML character entity, , which lets one give soft hyphenation hints to browsers. But, when writing documentation for code, one sometimes needs write really, unbelivably, imagination-strainingly long identifiers. And, having them hyphenated, when split over multiple lines might not be what one wants (at least, it is not what I want).

Exempli Gratia

Here is an example of what might happen, if the browser cannot break your long identifier into little bits and pieces:

sheared.web.querystring.UnvalidatedInput.as_text

So, when you has to type in a pathologically long identifier from a program, and you want it to gracefully break in two on the middle, what can you do?

One solution

I have only tested this with Mozilla Firebird 0.7.1, it might not work on any other browsers.

Those of us living in CSS-land can actually do something. Rendering a white-space with zero font-size produces no visible break when inside a word. But, if you place such a zero-width white-space inside a word, the word is happily chopped into pieces where you inserted it.

So, this goes into your stylesheet:

.shy { font-size: 0; }

This is your HTML:

sheared.<span class='shy' /> </span>web.<span class='shy' /> </span>querystring.<span class='shy' /> </span>UnvalidatedInput.<span class='shy' /> </span>as_text

And, suddenly it renders beautifully:

sheared. web. querystring. UnvalidatedInput. as_text

02. Dec
2003

Mozilla versus

Mozilla and it’s offspring are all very cool browsers, but there is one thing they really ought to implement. Soft hyphenation.

Soft hyphenation lets you give the browser hints about where it can break long, continous words (i.e. anything not containing normal white-space). So, for example, I can write sheared.web.querystring.UnvalidatedInput.as_text, and give the browser hints as to where it would be okay to break it up, so the pieces can fit on one line.

This works by putting  in your text, in those places where it would okay, if the browser breaks a word into pieces when it becomes too long to fit it’s containing box. And, when the browser feels it has break your carefully crafted long words into bits and pieces, it tacks a hyphen on the end of all but the last of the pieces. Just like you would expect from hyphenation.

So, for example the HTML behind the long method name above is

sheared.&shy;web.&shy;querystring.&shy;UnvalidatedInput.&shy;as_text

And, in a browser handling  the code above and the long method name in the paragraph above would both be split over several lines and properly hyphenated. Alas, not in the world.

I mean, really, how difficult can it be? So, I don’t know anything about writing browsers or layout engines, but I do know that there has been a feature request open since sometime in 1999, to implement this. I just does not seem to happen.

02. Dec
2003

Programmer’s Puzzles

Doing Code Katas reminded me of some programmer’s puzzles (or exercises if you will) I once stumbled upon. I can’t remember where I first saw them, and Google isn’t much help. So, I decided to dig them out from ${HOME} and post them here for other peoples enjoyment.

Given a pointer to the first node in a singly-linked list, determine in linear time whether the list terminates or has a cycle, without storing to memory.
How can you construct a linked list that can be traversed forward and backward using only enough space for one pointer per node, but still allowing the nodes to have arbitrary addresses?
Prove or disprove: Any Boolean expression with multiple occurences of at least one variable can be simplified.
Can you always divide an integer by a power of two using the sign-extending “arithmetic right shift” instruction? Construct a sequence that works for all inputs on your machine. Would this have been easier to write back in the 60’s?
If your processor doesn’t have a bit population count instruction that counts the 1 bits in a word, construct a fast sequence that does so without looping or loading from memory.
Write code that counts the number of rightmost zero bits in a word, again without looping or loading or using a special bit scan instruction. Is it easier to count the number of leftmost zero bits?

01. Dec
2003

Code Fu

My code fu is strong these days, three Code Katas in the last few days.

- Counting Code Lines

Did this one (Kata thirteen) again, this time in a real programming language. Python.

Some 50 odd lines of Python code to strip C and C++ style comments from a text-file and count the remaining lines (excluding empty lines). It’s a simple read-lex-parse-write cycle. Nothing fancy, but it works.

- Word Chains

This one (Kata Nineteen) was different, counting lines of source code is pretty straight forward; either you (ab)use cpp(1) or you write a simple lexer-parser pair. Figuring out how to build the word chains required a bit more thought work.

First, I decided I would do a recursive depth-first search from a to b, examining the candidates at each step in the order most likely to lead quickly to b. This is naturally a hodgepot of heuristics, but measuring the distance between two words by the number of letters they differ, and trying the different branches in order of increasing distance from the goal, seemed like a fine plan.

Next I had to think through if it was worth building a datastructure for the words adjacancy lists, or if one should just grep through the wordlist at every step. In the end I decided to go with the simplest possible solution, grepping through the adjacancy list at each step from a to b, looking for possible words to try. Naturally through all of this, one must keep a list of all the words already visited, so one does not goes there looking again.

This adds up to a grand total of 80 lines of Python code, but most of it is pretty simple stuff (e.g. distance between words, grepping through a list, some wrapping it up as a script), the core of it all is this:

def wordchain(here, there, visited=None):
    if distance(here, there) &lt; 2:
        return [here, there]

    if visited is None:
        visited = [here]

    adj = adjacant(here)
    adj.sort(lambda a, b: cmp(distance(a, there),
                              distance(b, there)))
    for word in adj:
        if word in visited:
            continue
        visited.append(word)

        try:
            wc = wordchain(word, there, visited)
            wc.insert(0, here)
            break
        except NoWordChainException:
            pass
    else:
        raise NoWordChainException, (here, there)

    return wc

01. Dec
2003

Kata Thirteen

Paraphrasing Kata Thirteen:

Your mission, should you choose to accept it, is to count lines of actual code in Java source-code.

Let us see, the real task is to remove comments from the source code, anything left after that must be actual code-lines. So, this should suffice (assuming GNU cpp(1)):

cpp -fpreprocessed -P - - \ | grep -v '^[[:space:]]*$' \ | wc -l

No need to complicate the task with a real programming language, when we have all the tools available.

30. Nov
2003

Re: A Diversion

In A Diversion PragDave poses the following kata:

Think of binary numbers: sequences of 0’s and 1’s. How many n-digit binary numbers are there that don’t have two adjacent 1 bits? For example, for three-digit numbers, five of the possible eight combinations meet the criteria: 000, 001, 010, 011, 100, 101, 110, 111. What is the number for sequences of length 4, 5, 10, n?

Having given the answer for 3-digit binary numbers, he also asks why this relationship exists. Well, I think I can answer that. It is simple combinatorics. Since typesetting math in HTML is such a pain, I will just post the LaTeX note I wrote to explain the relationship to myself:

\documentclass{article} \title{Re: A Diversion} \author{Sune Kirkeby} \date{30. November 2003} \begin{document} \newcommand{\bicoef}[2]{ \left(\begin{array}{c} #1\\#2 \end{array}\right)} \maketitle Let $a$ be the number of $n$-digit binary numbers with exactly $k$ $1$ bits, where no two $1$-bits are adjacant. Then $$a = \bicoef{n-k+1}{k} $$ Let $b$ be the number of $n$-digit binary numbers that do not have two adjacant $1$ bits. Then $$b = \sum_{k=0}^n \bicoef{n-k+1}{k}$$ For $n=3$: $$\bicoef{3-0+1}{0} + \bicoef{3-1+1}{1} + \bicoef{3-2+1}{2} + \bicoef{3-3+1}{3} = 1 + 3 + 1 + 0 = 5$$ \end{document}