In the last article, I discussed tracking down a segfault in grep. This article continues down the rabbit hole.
At the end of the last instalment, I was left with a puzzle. Why was the version of grep from textproc/bsdgrep acting in a different manner to /usr/bin/bsdgrep? Time to put on my debugging cap, and dig down a little further.
Let's start off with trying a few variants of the input set:
Excellent, we made it crash! Can we slim it down even more?
What about the behaviour on mac?
It was around this time that I realised I had been looking at the wrong binary. D'oh!
After actually reading TFM, I determined that FreeBSD builds its system binaries from SVN and not from ports. The binary I had been looking at previously was a different version!
A bit of poking around at freebsd.org and googling for grep.c revealed this as the likely repo path:
Lets pull it down from svn:
Note - I use set depth with svn to avoid pulling down the whole FreeBSD source tree, which is quite large.
Lets check the behaviour:
So, what does the debugger have for us?
That looks very familiar indeed. In my last article, that was the backtrace I got on my macbook when I first started to inspect the bug.
For comparison purposes, here it is again:
Lets help ourselves out by making a debug build:
What does valgrind have for us?
As in the previous article, valgrind crashes when I try to attach the debugger. At least it gives up some some useful info before doing so. The badness starts in the printline function, in util.c on line 469.
With valgrind down for the count, I decide to soldier on with gdb alone:
Ouch - that would explain the segfault. A huge value is being passed as the size param to fwrite. Why?
I'm not going to pretend I know all the C conversion rules down to the letter, but it seems fairly obvious that we are ending up with a -4 in a place where it shouldn't be, and it is getting converted into an unsigned long in the process.
As in the previous article, this bug comes down to the offsets rm_so and rm_eo. These are offsets that point to the beginning and end of the matching parts of the line.
I read through the code a few times, and added some debug output to get an idea of what it was trying to do.
Here is the problematic loop inside util.c written in pythonic pseudo-code:
for match in matches:
ifnot print_only_matching_part:
print non_matching_part
if print_in_color:
print color_escape_sequence_start
print matching_part
if print_in_color:
print color_escape_sequence_end
start_of_next_non_matching_part = match.end_offset
if print_only_matching_part:
print end_line
This is the c version with some debugging output added to the relevant parts:
for (i = 0; i < m; i++) {
if (!oflag)
{
fprintf(stderr, "%s:%d match %d fwrite rm_so %d rm_eo %d a %zu\n",
__FILE__, __LINE__, i, matches[i].rm_so, matches[i].rm_eo, a);
fwrite(line->dat + a, matches[i].rm_so - a, 1, stdout);
}
if (color)
fprintf(stdout, "\33[%sm\33[K", color);
fwrite(line->dat + matches[i].rm_so,
matches[i].rm_eo - matches[i].rm_so, 1,
stdout);
if (color)
fprintf(stdout, "\33[m\33[K");
a = matches[i].rm_eo;
fprintf(stderr, "%s:%d match %d a = %zu\n", __FILE__, __LINE__, i, a);
if (oflag)
putchar('\n');
}
After adding the debug output, I ran a few experiments with different patterns:
This shows us that the loop is choking when handed overlapping pattern matches. There must be an assumption here that matching parts of the line do not overlap. In this case, they do - and when the code tries to print out the non-matching part of the line, it ends up with a negative value for the size of that part.
At this point I verified that this bug was still happening in the head branch, and then sent a bug report off to FreeBSD - 197555.
The probable fix for this bug is to check whether the non-matching part of the line has already been printed out by a previous match. A fairly big assumption was violated however, so some more careful testing and debugging is warranted. I am going to leave that to the FreeBSD maintainers.
Apple can either poke around and fix their own version, or wait for a fix from upstream. I do wonder why they switched from GNU grep in the first place - possibly a licensing issue?
If you've made it this far, thanks for coming along on the ride. And remember, there are always more bugs lurking just under the surface.
Like the article? Please follow me on twitter and check out my bio.