This afternoon, I had a need for a reverse file iterator in a script I was writing to process a quite large log file. After some unsuccessful googling followed by some quick hacking, I was left with a frustratingly close, but nonfunctional, iterator. Instead of fixing it while I was at work, I hacked up a solution calling "tail" and left it for a while.
When I came back to finish the job this evening, I figured that I ought to go look at the source of tail for some inspiration. Surely the unix hackers had figured this problem out long ago? In reverse.c, I found the inspiration I needed:
for (; pos >= start; pos--) {
/* A seek per char isn't a problem with a smart stdio */
if (fseeko(fp, pos, SEEK_SET) != 0) {
//snip
if ((ch = getc(fp)) == '\\n')
I had been reading the file in chunks, splitting the chunk into lines, handling a cache of the lines, and compensating for unfinished lines, all because I had buried deep down in my lizard brain the idea that an fseek per character was "slow". In pure python, for chrissake!
Properly reminded of the fact that premature optimization can sneak up anywhere, I ended up with this code:
import os
class reversefile(object):
"""Iterate backwards through a file. f should be an open file handle"""
def __init__(self, f):
self._f = f
self.end = os.stat(f.name).st_size
def __iter__(self): return self
def next(self):
if self.end == 0:
raise StopIteration
pos = self.end-2
while pos >= 0:
self._f.seek(pos)
if self._f.read(1) == '\\n':
end = self.end
self.end = pos
return self._f.read(end - pos - 1)
pos -= 1
end = self.end
self.end = 0
self._f.seek(0)
return self._f.read(end).strip("\\n")
You can see the whole thing, with source and some very brief tests I hacked up, here.