External Sort Code in C/C++

Dave Howorth dhoworth at mrc-lmb.cam.ac.uk
Wed Feb 24 10:45:46 UTC 2010


Karl Auer wrote:
> On Tue, 2010-02-23 at 22:17 -0600, MirJafar Ali wrote:
>> External sorting is well defined by Prof. Knuth as sorting process in
>> the files and not in memory.  For example, I want to sort 100GB on my
>> machine with 1GB RAM. I don't think that STL and standard binary sort
>> will do that they are all in-memory sorting codes.
>>
> Oh, OK, I see what you mean.
> 
> The general principle is to reduce the external data to a set of keys
> that DOES fit in memory, or to index the external data on disk, then
> traverse the index. Actually sorting the file *on disk* is rarely
> useful.

Actually, if the keys fit in memory, you don't need an external sort :)

External sorts are used when the data is too big for that. They were
used much more often in the days when main memory was measured in KB and
the storage devices were tapes, or drums if you were lucky. But they can
still be useful.

The correct wikipedia page http://en.wikipedia.org/wiki/External_sorting
has a reasonable summary. It mentions a library
http://stxxl.sourceforge.net/ so I guess there will be some applications
that use it. google also gives a fair few hits.

MirJafar, what have you tried?

Cheers, Dave






More information about the ubuntu-users mailing list