Is there any compiler and library where strcmp() returns values other than -1 0 and 1?


Question


Though the common sense and literature is clear about the behaviour of strcmp():

int strcmp( const char *lhs, const char *rhs );

Negative value if lhs appears before rhs in lexicographical order.

Zero if lhs and rhs compare equal.

Positive value if lhs appears after rhs in lexicographical order.

I can't seem to make it return any values other than -1, 0 and 1.

Sure it is true that the behaviour is consistent with the definition but I was expecting values bigger or smaller than 1 or -1 since the definition asserts that results will be <0, 0 or >0, not -1, 0 or 1.

I tested this in several compilers and libraries with the same results. I would like to see an example where that's not the case.

sample code

#include <stdio.h> 
#include <string.h> 

  
int main() 
{  
   printf("%d ", strcmp("a", "a"));
   printf("%d ", strcmp("abc", "aaioioa"));
   printf("%d ", strcmp("eer", "tsdf"));
   printf("%d ", strcmp("cdac", "cdac"));
   printf("%d ", strcmp("zsdvfgh", "ertgthhgj"));
   printf("%d ", strcmp("abcdfg", "rthyuk"));
   printf("%d ", strcmp("ze34", "ze34"));
   printf("%d ", strcmp("er45\n", "io\nioa"));
   printf("%d", strcmp("jhgjgh", "cdgffd"));
}

Result: 0 1 -1 0 1 -1 0 -1 1


Answer 1:


The C standard clearly says (C11 §7.24.4.2 The strcmp function):

The strcmp function returns an integer greater than, equal to, or less than zero, accordingly as the string pointed to by s1 is greater than, equal to, or less than the string pointed to by s2.

It doesn't say how much greater than or less than zero the result must be; a function that always returns -1, 0 or +1 meets the standard; so does a function that sometimes returns values with a magnitude larger than 1, such as -27, 0, +35. If your code is to conform to the C standard, it must not assume either set of results; it may only assume that the sign of the result is correct.

Here is an implementation of strcmp() — named str_cmp() here so that the result can be compared with strcmp() — which does not return -1 or +1:

#include <string.h>
#include <stdio.h>

static int str_cmp(const char *s1, const char *s2)
{
    while (*s1 == *s2 && *s1 != '\0')
        s1++, s2++;
    int c1 = (int)(unsigned char)*s1;
    int c2 = (int)(unsigned char)*s2;
    return (c1 - c2);
}

int main(void) 
{  
   printf("%d ", strcmp("a", "a"));
   printf("%d ", strcmp("abc", "aAioioa"));
   printf("%d\n", strcmp("eer", "tsdf"));

   printf("%d ", str_cmp("a", "a"));
   printf("%d ", str_cmp("abc", "aAioioa"));
   printf("%d\n", str_cmp("eer", "tsdf"));
   return 0;
}

When run on a Mac (macOS Mojave 10.14.6; GCC 9.2.0; Xcode 11.13.1), I get the output:

0 1 -1
0 33 -15

I did change your data slightly — "aaioioa" became "aAioioa". The overall result is no different (but the value 33 is bigger than you'd get with the original string) — the return value is less than, equal to, or greater than zero as required.

The str_cmp() function is a legitimate implementation and is loosely based on a historically common implementation of strcmp(). It has slightly more care in the return value, but you can find two minor variants of it on p106 of Brian W Kernighan and Dennis M Ritchie The C Programming Language, 2nd Edn (1988) — one using array indexing, the other using pointers:

int strcmp(char *s, char *t)
{
    int i;
    for (i = 0; s[i] == t[i]; i++)
        if (s[i] == '\0')
            return 0;
    return s[i] - t[i];
}

int strcmp(char *s, char *t)
{
    for ( ; *s == *t; s++, t++)
        if (*s == '\0')
            return 0;
    return *s - *t;
}

The K&R code might not return the expected result if the plain char type is signed and if one of the strings contains 'accented characters', characters from the range -128 .. -1 (or 0x80 .. 0xFF when viewed as unsigned values). The casting in my str_cmp() code treats the data as unsigned char (via the cast); the (int) cast isn't really necessary because of the assignments. The subtraction of two unsigned char values converted to int produces a result in the range -255 .. +255. However, modern versions of the C library don't use the direct subtraction like that if they return only -1, 0 or +1.

Note that the C11 standard §7.24.4 String comparison functions says:

The sign of a nonzero value returned by the comparison functions memcmp, strcmp, and strncmp is determined by the sign of the difference between the values of the first pair of characters (both interpreted as unsigned char) that differ in the objects being compared.

You can look at How do I check if a value matches a string?. The outline there shows:

if (strcmp(first, second) == 0)    // first equal to second
if (strcmp(first, second) <= 0)    // first less than or equal to second
if (strcmp(first, second) <  0)    // first less than second
if (strcmp(first, second) >= 0)    // first greater than or equal to second
if (strcmp(first, second) >  0)    // first greater than second
if (strcmp(first, second) != 0)    // first unequal to second

Note how comparing to zero uses the same comparison operator as the test you're making.

You could (but probably shouldn't) write:

if (strcmp(first, second) <= -1)    // first less than second
if (strcmp(first, second) >= +1)    // first greater than second

You'd still get the same results, but it is not sensible to do so; always comparing with zero is easier and more uniform.

You can get a -1, 0, +1 result using:

unsigned char c1 = *s1;
unsigned char c2 = *s2;
return (c1 > c2) - (c1 < c2);

For unrestricted integers (rather than integers restricted to 0 .. 255), this is safe because it avoids integer overflows whereas subtraction gives the wrong result. For the restricted integers involved with 8-bit characters, overflow on subtraction is not an issue.




Answer 2:


The specification says that the numbers have to be negative, zero or positive, but it doesn't lock down the exact value necessary. The library itself may behave in more specific ways.

The spec means that code like this is technically invalid:

if (strcmp(a, b) == 1)

This may "work on my machine" but not someone else's who uses a different library.

Where what you should be writing is:

if (strcmp(a, b) > 0)

That's all it really means: expect values other than just 1/-1 and code accordingly.




Answer 3:


Please re-read this bit

Negative value if lhs appears before rhs in lexicographical order.

Is -1 sufficient for this statement to be true?

Zero if lhs and rhs compare equal.

Positive value if lhs appears after rhs in lexicographical order.

Is 1 sufficient for this statement to be true?

So the sample code is acting as per spec.

EDIT

Just test the return value for zero, less than zero or more than zero. As per spec this should work in all implementations.

EDIT 2

I think this will fulfull the spec - have not tested :-(

 for (size_t i = 0; s1[i] && s2[i] &&s1[i] == s2[i]; ++i) {
     // Empty
   }
   return s2[i] - s1[i]; // This may be the wrong way around

This will return values other that 1, -1 or 0.




Answer 4:


Here are a few examples of C libraries with strcmp() implementations that do not always return -1, 0 or +1:

The Bionic libc has a BSD based implementation of strcmp():

int
strcmp(const char *s1, const char *s2)
{
    while (*s1 == *s2++)
        if (*s1++ == 0)
            return (0);
    return (*(unsigned char *)s1 - *(unsigned char *)--s2);
}

The Dietlibc does the same. It is even non conforming version if configured for WANT_SMALL_STRING_ROUTINES:

int
strcmp (const char *s1, const char *s2)
{
#ifdef WANT_SMALL_STRING_ROUTINES
    while (*s1 && *s1 == *s2)
        s1++, s2++;
    return (*s1 - *s2);
#else
    // a more advanced, conforming implementation that tests multiple characters
    // at a time but still return the difference of characters as unsigned bytes
#endif
}

Glibc has this implementation of strcmp in its generic directory, used for exotic architectures:

int
strcmp (p1, p2)
     const char *p1;
     const char *p2;
{
  register const unsigned char *s1 = (const unsigned char *) p1;
  register const unsigned char *s2 = (const unsigned char *) p2;
  unsigned reg_char c1, c2;

  do
    {
      c1 = (unsigned char) *s1++;
      c2 = (unsigned char) *s2++;
      if (c1 == '\0')
    return c1 - c2;
    }
  while (c1 == c2);

  return c1 - c2;
}

Musl C library has a very compact implementation:

int strcmp(const char *l, const char *r)
{
    for (; *l==*r && *l; l++, r++);
    return *(unsigned char *)l - *(unsigned char *)r;
}

The newlib has this implementation:

int
_DEFUN (strcmp, (s1, s2),
    _CONST char *s1 _AND
    _CONST char *s2)
{
#if defined(PREFER_SIZE_OVER_SPEED) || defined(__OPTIMIZE_SIZE__)
  while (*s1 != '\0' && *s1 == *s2)
    {
      s1++;
      s2++;
    }

  return (*(unsigned char *) s1) - (*(unsigned char *) s2);
#else
  // a more advanced approach, testing 4 bytes at a time, still returning the difference of bytes
#endif
}

Many alternative C libraries seem to follow the same pattern and return the difference of bytes, which matches the specification. But the implementations you tested seem to consistently return -1, 0 or +1. Don't rely on this. It might change in future releases, or even with the same system using different compilation flags.



来源:https://stackoverflow.com/questions/59779056/is-there-any-compiler-and-library-where-strcmp-returns-values-other-than-1-0


码神部落- 版权声明 1、本主题所有言论和图片纯属会员个人意见,与码神部落立场无关。
2、本站所有主题由该帖子作者发表,该帖子作者半岛情歌码神部落享有帖子相关版权。
3、码神部落管理员和版主有权不事先通知发贴者而删除本文。
4、其他单位或个人使用、转载或引用本文时必须同时征得该帖子作者半岛情歌码神部落的同意。
5、帖子作者须承担一切因本文发表而直接或间接导致的民事或刑事法律责任。
6、本帖部分内容转载自其它媒体,但并不代表本站赞同其观点和对其真实性负责。
7、如本帖侵犯到任何版权问题,请立即告知本站,本站将及时予与删除并致以最深的歉意。

最新回复 (0)
    • 码神部落
      2
        立即登录 立即注册 GitHub登录
返回
发新帖
作者最近主题: