Atom feed for keyword "optimization"

Reverse File Iterator and Premature Optimization
Oct 03, 2007


This afternoon, I had a need for a reverse file iterator in a script I was writing to process a quite large log file. After some unsuccessful googling followed by some quick hacking, I was left with a frustratingly close, but nonfunctional, iterator. Instead of fixing it while I was at work, I hacked up a solution calling "tail" and left it for a while.

When I came back to finish the job this evening, I figured that I ought to go look at the source of tail for some inspiration. Surely the unix hackers had figured this problem out long ago? In reverse.c, I found the inspiration I needed:

    for (; pos >= start; pos--) {
        /* A seek per char isn't a problem with a smart stdio */
        if (fseeko(fp, pos, SEEK_SET) != 0) {
        //snip
            if ((ch = getc(fp)) == '\n')

I had been reading the file in chunks, splitting the chunk into lines, handling a cache of the lines, and compensating for unfinished lines, all because I had buried deep down in my lizard brain the idea that an fseek per character was "slow". In pure python, for chrissake!

Properly reminded of the fact that premature optimization can sneak up anywhere, I ended up with this code:

import os

class reversefile(object):
    """Iterate backwards through a file. f should be an open file handle"""
    def __init__(self, f):
        self._f = f
        self.end = os.stat(f.name).st_size

    def __iter__(self): return self

    def next(self):
        if self.end == 0:
            raise StopIteration

        pos = self.end-2
        while pos >= 0:
            self._f.seek(pos)
            if self._f.read(1) == '\n':
                end = self.end
                self.end = pos
                return self._f.read(end - pos - 1)
            pos -= 1

        end = self.end
        self.end = 0
        self._f.seek(0)
        return self._f.read(end).strip("\n")

You can see the whole thing, with source and some very brief tests I hacked up, here.

update:

Reddit user Brian posts a solution that goes much faster on his platform (I've done no performance testing of my code, it wasn't important for me) and handles platform linebreaks better. In other words, it's better for all purposes. Thanks Brian, I wish I'd found that when I was googling!

[# ] python, file, iterator, optimization, openbsd

I'm a programmer for a small company in Baltimore, Maryland, USA. Besides programming, I play competitive ultimate. I blog at irregular intervals about various programming topics, but mostly Python.