Log in

No account? Create an account
misc tech's Journal
[Most Recent Entries] [Calendar View] [Friends]

Below are the 20 most recent journal entries recorded in misc tech's LiveJournal:

Tuesday, September 15th, 2009
10:36 am
Quicksort is not my favourite sorting algorithm.
We've been trying to hire some more programmers at work, and being able to sort the wheat from the chaff is a useful thing. Normally we ask a handful of basic java questions (abstract vs interface) and writing a compareTo method.

Asking ridiculously easy questions is a good thing, as it was surprising to me how often people get them wrong. Given a chance, I'd love to ask harder questions, like:

What's wrong with Quicksort?

Quicksort is often the first sorting algorithm that comes to mind. It's easy to define and generally accepted to be 'fast enough'. Unfortunately, most people aren't aware of the drawbacks of using quicksort:

Can you guess what my answer is?Collapse )

A large problem with quicksort is that it's easy to implement badly, and hard to implement well.

Implementing quicksort well is described in 'Engineering a Sort Function' by McIllroy and Bentley. They describe how to pick a pivot element and how to partition and swap efficiently. And the trick of defaulting to insertion sort for small lists. This describes the current implementation of quicksort in java and the c standard libraries.

Generating malicious input is described by McIllroy in 'A Killer Adversary for Quicksort'. The trick is to write a comparison function that constructs the worst input on the fly.

So what would I like to see people using instead? Merge sort and Radix sort would be my preferred candidates.

Merge sort is stable and has a worst case runtime of O(nlogn). It is also very easy to write, and if you're sneaky you can make it run closer to O(n) on mostly sorted data. The trick is to partition the list into ascending and descending sequences, and then merge these.

This trick is used in a few places. Notably in python's default sorting method 'timsort'. Tim peters has written an excellent explanation of the code, and a few clever optimizations for speeding things up in an imperative version.

Adaptive merge sort can be written in functional languages too. 'Runsort' outlines the same trick in prolog, and a handful of tricks to make it faster. (I managed to rattle off a version of this in haskell)

Radix sort on the other hand shines when sorting lists of strings (i.e. when comparisons become relatively expensive), usually outperforming quicksort by a factor of two. A good introduction to this is in Engineering Radix Sort' by McIllroy (again!) et al.

They describe a variant of Radix Sort - 'American Flag Sort' that tries to minimize space overhead by permuting the list in-place.

Admittedly - quicksort is not 'the great satan of sorting functions', but knowing when to use it involves knowing the weaknesses. Even so, Naive quicksort is still the easiest sort function to write on a whiteboard.

(And the recent two-pivot quicksort variant is still a damn cool bit of work.)
Sunday, May 31st, 2009
3:52 am
A couple of things in tech:

Wolfram Alpha has launched, in an attempt to distribute a know it all to the masses, but unfortunately the one we have recieved is autistic.

It is meant to answer questions - like a calculator is a tool for answering mathematical problems, this is a calculator for facts and data.
The attempt to encompass this is a lot of natural language processing, and the solution entails building something bigger than Wolfram's Ego.

The examples work but the web is now littered with counterexamples.

Microsoft have amalgamated their search offerings together under a new brand and direction. Presenting 'Bing' - A 'decision engine' apparently.
It isn't here yet but there is an annoying video which shows the perfect consumer tool. Want to buy stuff find stuff to buy.

It isn't anything new beyond better integration of existing components, geared towards purchasing things.

Google on the other hand have come up with something novel - a wave. A numver of simple extentions to existing instant messaging systems to add significant features - real time editing and collaboration, and structured content containing widgets and media.

On top of this they've developed a highly integrated web app that looks like gmail, but interacts with the desktop (drag and drop), with a lot of well thought out features.

They are releasing the client and the server openly - unlike the highly centralised models of interaction like twitter and facebook - this is more like a new version of jabber/google chat.

In fact, the protocol itself is just an extention to XMPP.

Unlike the last two efforts above, this is achieving something new on top of existing protocols and standards.

And what it is trying to achieve is a federated real time collaborative conversation and content tool.

Which is pretty awesome.

I can see it easily replacing email, forums and instant messaging - even livejounal for me.

As well as replacing bug tracking for work.

It's a great step forward that builds on the present rather than abandoning it. They demonstrated integration with existing apps and keeping most of the features.

I think I should mention again the whole open nature and federated thing - this is not an attempt to build a great tool, but to give everyone a great tool to use.
Tuesday, April 21st, 2009
4:09 am
I wrote some horrible C to do co-routines using the labels as values in gnu c or c99(?)

Thursday, March 26th, 2009
5:03 pm
hungarian notation is not a type system
Hungarian notation comes up every now and again, like some deep sea monster, and frequently the cause is Joel Spolsky.

His article on Systems vs Apps Hungarian advocates "application style" and then goes on to argue it's merits for preventing cross site scripting (XSS) attacks.

I think that his example is a foolish attempt at xss prevention that amounts to security theater.

The debate between Systems and Apps Hungarian regardless of prefix is essentially the debate between naming variables after their type or naming variables after their intent.

I agree that encoding intent in names is a good thing, and leads to better code, but I don't think Joel is right that you can prevent cross site scripting through naming conventions.

The security of a system is not measured by its strengths but by its weaknesses. Joel argues that using hungarian notation can help prevent XSS attacks, but it he admits it will not always work.

He argues that it is better than nothing, but fails to compare it to complete solutions. No matter how secure your door is, it won't matter
if they break through the walls. His method of prevention may prevent some errors but it cannot eradicate them altogether.

Keeping raw strings and html seperate requires seperate types for each, and it's trivial in most languages to define new classes or objects. Another alternative is templates wherein things are escaped by default.

Many classes of software vunerabilities are type errors - format string attacks, cross site scripting, and sql injection - and there is no excuse for modern software to be vulnerable to these. The methods of prevention are well established, and documented.

There will always be a burden on the programmer to ensure safety. Using hungarian notation for this does not alleivate the programmer at all, and can require significant maintenance.

Type safety requires a type system, not a naming convention.
Wednesday, March 4th, 2009
9:08 pm
Monday, March 2nd, 2009
12:26 pm
more toy language wafflings: adding parsing evaluation grammar operators to unification

I've been playing with a dialect of prolog with the following differences:

no cut - replaced with explicit backtracking and set of new control flow statements and operators; functions instead of predicates (including closures); a different syntax with less punctuation, but also operator precedence based.

but like prolog it has logical variables, unification, and goal directed evaluation.

I picked prolog as a base because it's a very precise, expressive language. I enjoy logic programming, and wish to make it more applicable to the unix environment

I've been toying with the idea for a while, and wrote a couple of parsers a while back, but wasn't happy with the syntax or likely semantics. I had a bit of inspiration last week and started knocking out a parser and interpreter. Here is some example code in this toy language:
some more examples, and exectuable parser expression grammarsCollapse )

Code and a lackluster readme are here:
Sunday, February 8th, 2009
4:38 am
Why is markup generation so awkward?


Here, Jeff Atwood writes about the problems of producing HTML from code and argues that treating markup and code equally is a good thing.

I've had a few thoughts on the matter:

Producing markup or structural output from code is one of these awkward bits of coding that often leads to complex software. Allowing you to embed the markup into your code is an example at an attempt to make things simpler. ...Collapse )To wrap up:

I think that MVC is an attempt to make complex things possible, but often
doing so at the expense of simplicity.

I think embedding markup in code is akin to treating it like a 'native' data
structure, which makes it simpler to deal with.

I think that embedding query languages is a similar step forward which
would also make coding easier.

Finally, I don't think this alone is a silver bullet - bigger programs will
always need more structure to make them managable, but we don't
need to write big programs all of the time.
Tuesday, November 25th, 2008
12:15 pm
Interesting project on GetACoder
Most of you are aware of various websites that link up people wanting software projects written with freelance programmers. On one of these, GetACoder, user AlanT has recently posted an interesting project for a debugger:

Bug finder

The purpose of this project is to create a debugger program. This program will take as input the source code another program, and will analyze that other program and determine if it will run to completion, or have an error, or go into an infinite loop.

To state that another way, given a function f and input x, determine if f(x) will halt.

If you scroll down the page, KurtG makes an interesting comment.
Thursday, July 31st, 2008
1:06 pm
Today I have been embedding a javascript interpreter in python:

>>> from qtscript import *
>>> q = js()
>>> q("function f(x) { return x*2}")
>>> q['f'](2)
>>> def id(x): return x
>>> q['id'] = id
>>> q('id(4)')
>>> q('id')([1,2,3])
[1.0, 2.0, 3.0]
>>> q("function g(x) { return x(10)}")
>>> q('g(id)')
>>> f = q['f']
>>> g = q['g']
>>> g(f)

codeCollapse )
Saturday, July 19th, 2008
3:29 pm
As part of a foolish attempt to get people to learn prolog, I've been writing some tutorials.

It has been fun, and some people were interested, but busy/lazy. I was planning to build up to a proper lisp interpreter, but I haven't got that far yet.

eval(X,O) :- defined(X,A), eval(A,O).
eval([X|T],O) :- defined(X,A), eval([A|T],O).

eval(X,X) :- number(X); X = t ; X = [].

eval([define,X,Y],t) :- \+ defined(X,_), asserta(defined(X,Y)).

eval([cond],_) :- !, fail.
eval([cond,[H,A]|_],Z) :- eval(H,O), \+ O = [],!, eval(A,Z).
eval([cond,_|T],Z) :- eval([cond|T],Z),!.

eval([F|A],X) :- eval_list(A,Ae), apply(F,Ae,X),!.

eval_list([H|T],[Ho|To]) :- eval(H,Ho), eval_list(T,To).

apply(add,[X,Y],Z) :- Z is X + Y.
apply(add,[X,Y|T],Z) :- I is X + Y, apply(add,[I|T],Z).

apply(mul,[X,Y],Z) :- Z is X * Y.
apply(mul,[X,Y|T],Z) :- I is X * Y, apply(mul,[I|T],Z).

apply(sub,[X,Y],Z) :- Z is X - Y.
apply(div,[X,Y],Z) :- Z is X / Y.

apply(mod,[X,Y],Z) :- Z is X mod Y.
apply(pow,[X,Y],Z) :- Z is X ** Y.

apply(eq,[X,Y],t) :- X = Y.
apply(eq,[X,Y],[]) :- \+ X = Y.

apply(atom,[X],t) :- atom(X); X = [].


apply([lambda,[],E],[],O) :- eval(E,O).
apply([lambda,[A|Ta],E],[L|Tl],O) :- subst(A,L,E,E2), apply([lambda,Ta,E2],Tl,O).

subst(A,B,[A|T],[B|L]) :- subst(A,B,T,L).
subst(A,B,[H|T],[H|L]) :- subst(A,B,T,L).

% gprolog
:- predicate_property(defined,dynamic).
% swi prolog
:- dynamic defined/2.

defined(_,[]) :- !,fail.

Due to the implementation being a little quirky on purpose, we can do things like:
?-  eval_list([[define,3,4],[add,3,3]],X).

X = [t, 8] 

?- eval_list([[define,quote,sub],[quote,8,2]],X).

X = [t, 6] 

?- eval_list([[define,define,add],[define,3,[quote,8,2]]],X).

X = [t, 10] 


As well as more normal things like this:
?- eval_list([[define,square,[lambda,[x],[mul,x,x]]],[square,4]],X).

X = [t, 16] 

Wednesday, April 16th, 2008
11:04 am
Hooray! Phrack is back!

No-one told me that phrack was being published again.
Saturday, March 1st, 2008
1:43 pm
The Truth about Unix: The user interface is horrid
From cincy!duke!decvax!utzoo!datamat!rumor Sat Aug  1 23:37:27 PDT 1981
The Truth about UNIX: fa.unix-wizards

                   The truth about Unix:
                The user interface is horrid

                      Donald A. Norman
                  Department of Psychology
                Program in Cognitive Science
          Center for Human Information Processing
            University of California, San Diego
                 La Jolla, California 92093
                 (to appear in Datamation)

     Unix is a highly touted operating system.  Developed at
the  Bell  Telephone Laboratories and distributed by Western
Electric, it has  become  a  standard  operating  system  in
Universities,  and  it promises to become a standard for the
large micro- mini- systems for  home,  small  business,  and
educational setting.  But for all its virtues as a system --
and it is indeed an elegant system -- Unix is a disaster for
the casual user.  It fails both on the scientific principles
of human engineering and even in just plain   common  sense.
The motto of the designers of Unix towards the user seems to
be "let the user beware."

If Unix is really to become a general systemCollapse )
Saturday, February 23rd, 2008
9:48 pm
Wednesday, February 6th, 2008
1:26 pm
If like me you read too many programming blogs, then the following might ring a few bells:

a poor substitute for parody followsCollapse )
Thursday, January 17th, 2008
5:46 am
JOB: MediaWiki hacker wanted
Job title: MediaWiki hacker

I am starting up a company that will produce an inclusionist fork of Wikipedia. Over time it will also collect and integrate data from other sources. I'm looking for a person or persons to work as programmers on the website.


* Linux
* familiarity with wikis and Wikipedia


* ideally you are an experienced MediaWiki hacker who has written MediaWiki extensions
* Python
* experience contributing to open source projects
* experience as a website sysadmin, particularly running Ubuntu and Apache.

Pay: negotiable. Initially on a contract basis. Eventually I hope to offer a full-time job and stock options to a person or persons who is/are competent and efficient. This is your chance to get in at the start of what may be the next big thing. Ideally I'd like someone based in Edinburgh, if not that then in the UK, if not that then anywhere in the world.
Wednesday, November 21st, 2007
3:24 pm
ciphergoth links to this excellent juxtaposition of quotes about Amazon's new ebook and it's oppressive nature.

He has also been working on an interesting approach to solving zooko's triangle (aka useful naming systems, and the tradeoffs involved in the design).

Finally, there is a nice rosetta stone of programming language syntax someone found.

Also: If there is any random subject you would be interested in seeing, I can probably post a number of papers on the subjects?

Edit: Bonus GCC Bug: what does the following code print?
#include <stdio.h>
#include <stdlib.h>

int main() {
const int i = 2;
return printf("%d\n",-10*abs(i-1));
Hint: It isn't -10.
Tuesday, October 30th, 2007
10:48 pm
The ICFP 2007 Contest Presentation is online.

Also, an excellent video on QuickCheck in Erlang, a tool for generating, testing and simplifying test cases.
Saturday, September 1st, 2007
12:34 pm
Halting Password Puzzles
Instead of hashing a password for a set amount of iterations and checking, this uses an arbitrary amount set by the user. The effect is that they have a password hash function, that given a hash and a password will halt eventually if given the right password, and loop forever on wrong input.

We revisit the venerable question of ``pure password''-based key derivation and encryption, and expose security weaknesses in current implementations that stem from structural flaws in Key Derivation Functions (KDF). We advocate a fresh redesign, named Halting KDF (HKDF), which we thoroughly motivate on these grounds:

1. By letting password owners choose the hash iteration count, we gain operational flexibility and eliminate the rapid obsolescence faced by many existing schemes.
2. By throwing a Halting-Problem wrench in the works of guessing that iteration count, we widen the security gap with any attacker to its theoretical optimum.
3. By parallelizing the key derivation, we let legitimate users exploit all the computational power they can muster, which in turn further raises the bar for attackers.

HKDFs are practical and universal: they work with any password, any hardware, and a minor change to the user interface. As a demonstration, we offer real-world implementations for the TrueCrypt and GnuPG packages, and discuss their security benefits in concrete terms.

This is really, really neat and quite simple. I hope to see this implemented soon.
12:34 pm
Q - Equational programming Language.
A language based around general term rewriting rather than the lambda calculus.

The symbolic differentiator is very nice, and the other examples are useful too. Q in a nutshell is worth skimming over to get an idea for the language.

For instance:
[] X = [];
[F|Fs] X = [F X|Fs X];
==> [(+1),(*2),(1/)] 4
12:34 pm
Sage - a free computer algebra system for python.
Looks useful, and interesting. Basically a consistent front end for lots of existing programs, and it even has compatibility with propreitary systems.

This would be an excellent thing to use instead of the awful clunky languages that normally accompany mathematics software, and more useful to boot. You don't need to know much python to add numbers, computer factors or to draw pretty graphs.

Here is a recent blog post with a more indepth overview, and here is a short presentation
About LiveJournal.com