Darryl Gove's blog

Thursday May 07, 2009

The perils of strlen

Just been looking at an interesting bit of code. Here's a suitably benign version of it:

#include <string.h>
#include <stdio.h>

void main()
{
  char string[50];
  string[49]='\0';
  int i;
  int j=0;
  for (i=0; i<strlen(string); i++)
  {
   if (string[i]=='1') {j=i;}
  }
  printf("%i\n",j);
}

Compiling this bit of code leads to a loop that looks like:

                        .L900000109:
/* 0x002c         12 */         cmp     %i5,49
/* 0x0030         10 */         add     %i3,1,%i3
/* 0x0034         12 */         move    %icc,%i4,%i2
/* 0x0038         10 */         call    strlen  ! params =  %o0 ! Result =  %o0
/* 0x003c            */         or      %g0,%i1,%o0
/* 0x0040            */         add     %i4,1,%i4
/* 0x0044            */         cmp     %i4,%o0
/* 0x0048            */         bcs,a,pt        %icc,.L900000109
/* 0x004c         12 */         ldsb    [%i3],%i5

The problem being that for each character tested there's also a call to strlen! The reason for this is that the compiler cannot be sure what the call to strlen actually returns. The return value might depend on some external variable that could change as the loop progresses.

There's a lot of functions defined in the libraries that the compiler could optimise, if it was certain that it recognised them. The compiler flag that enables recognition of the "builtin" functions is -xbuiltin (which is included in -fast. This enables the compiler to do things like recognise calls to memcpy or memset and in some instances produce more optimal code. However, it doesn't recognise the call the strlen.

In terms of solving the problem, there are two approaches. The most portable approach is to hold the length of the string in a temporary variable:

  int length=strlen(string);
  for (i=0; i<length; i++)

Another, less portable approach, is to use #pragma no_side_effect. This pragma means the return value of the function depends only on the parameters passed into the function. So the result of calling strlen only depends on the value of the constant string that is passed in. The modified code looks like:

#include <string.h>
#include <stdio.h>
#pragma no_side_effect(strlen)

void main()
{
  char string[50];
  string[49]='\0';
  int i;
  int j=0;
  for (i=0; i<strlen(string); i++)
  {
   if (string[i]=='1') {j=i;}
  }
  printf("%i\n",j);
}

And more importantly, the resulting disassembly looks like:

                        .L900000109:
/* 0x0028          0 */         sub     %i1,49,%o7
/* 0x002c         11 */         add     %i3,1,%i3
/* 0x0030          0 */         sra     %o7,0,%o5
/* 0x0034         13 */         movrz   %o5,%i4,%i2
/* 0x0038         11 */         add     %i4,1,%i4
/* 0x003c            */         cmp     %i4,%i5
/* 0x0040            */         bcs,a,pt        %icc,.L900000109
/* 0x0044         13 */         ldsb    [%i3],%i1

Comments:

Now that Sun Studio understands the gnu __attribute__((pure)), wouldn't that be better?

The documentation of no_side_effect isn't clear on what happens when you dereference pointer arguments.

Posted by Marc on May 07, 2009 at 04:20 PM PDT #

)

Posted by Merak on May 08, 2009 at 11:25 AM PDT #

Post a Comment:
Comments are closed for this entry.

Calendar

Search this blog

About

Solaris Application Programming

Book resources

The Developer's Edge

Book resources

OpenSPARC Internals

Book resources

Recent entries

Custom search

Tag cloud

book cmt communityone compiler cooltools cpu2006 dtrace gcc libraries linker multithreading openmp opensolaris opensparc optimisation optimization parallelisation parallelization performance performanceanalyzer programming secondlife solaris solarisapplicationprogramming sparc spot sunstudio ultrasparc ultrasparct2 x86

Links

Webcasts

Articles

Presentations

Interesting docs

Navigation

Referers

Feeds