C++ Newsletter/Tutorial Issue 1
Issue #001
October, 1995
Contents
- Introduction
- Using C++ as a Better C Part 1 - function prototypes
- Introduction to Namespaces Part 1 - introductory comments
- Performance - handling a common strcmp() case
INTRODUCTION
This newsletter is being distributed at no cost to all interested parties, and may be copied subject to the copyright restrictions specified below. The newsletter will come out once or twice a month and will contain a variety of types of information on C++, such as descriptions of new language features, advice on moving a project to C++, performance hints, and so on.
If you have comments or suggestions on newsletter content, please send them to the address given below.
USING C++ AS A BETTER C - PART 1
People often ask about how to get started with C++ or move a project or development team to the language. There are many answers to this question. One of the simplest and best is to begin using C++ as a "better C". This term doesn't have a precise meaning but can be illustrated via a series of examples. We will cover some of these examples in forthcoming issues of the newsletter.
One simple but important area of difference between C and C++ deals with the area of function definition and invocation. In older versions of C ("Classic C"), functions would be defined in this way:
f(s)
char* s;
{
return 0;
}
The return type of this function is implicitly "int", and the function has no prototype. In ANSI C and in C++, a similar definition would be:
int f(char* s)
{
return 0;
}
Why does this matter? Well, suppose that you call the function with this invocation:
f(s)
char* s;
{
return 0;
}
g()
{
f(23);
}
In Classic C, this would be a serious programming error, because a value of integer type (23) is being passed to a function expecting a character pointer. However, the error would not be flagged by the compiler, and the result would be a runtime failure such as a crash. By contrast, in ANSI C and in C++ the compiler would flag such usage.
Very occasionally, you want to cheat, and actually pass a value like 23 as a character pointer. To do this, you can say:
f((char*)23);
Such usage is typically only seen in very low level systems programming.
Using function prototypes in C++ is a big step forward from Classic C; this approach will eliminate a large class of errors in which the wrong number or types of arguments are passed to a function.
INTRODUCTION TO NAMESPACES - PART 1
Namespaces are a relatively new C++ feature just now starting to appear in C++ compilers. We will be describing some aspects of namespaces in subsequent newsletters.
What problem do namespaces solve? Well, suppose that you buy two different general-purpose class libraries from two different vendors, and each library has some features that you'd like to use. You include the headers for each class library:
#include "vendor1.h"
#include "vendor2.h"
and then it turns out that the headers have this in them:
// vendor1.h
... various stuff ...
class String {
...
};
// vendor2.h
... various stuff ...
class String {
...
};
This usage will trigger a compiler error, because class String is defined twice. In other words, each vendor has included a String class in the class library, leading to a compile-time clash. Even if you could somehow get around this compile-time problem, there is the further problem of link-time clashes, where two libraries contain some identically-named symbols.
The namespace feature gets around this difficulty by means of separate named namespaces:
// vendor1.h
... various stuff ...
namespace Vendor1 {
class String {
...
};
}
// vendor2.h
... various stuff ...
namespace Vendor2 {
class String {
...
};
}
There are no longer two classes named String, but instead there are now classes named Vendor1::String and Vendor2::String. In future discussions we will see how namespaces can be used in applications.
PERFORMANCE TIPS
In this section of the newsletter we will present some practical performance tips for improving code speed and reducing memory usage. Some of these tips will be useful only for C++ code and some will be more general and applicable to C or other languages.
As a first example, consider an application using C-style strings and functions such as strcmp(). A recent experience with this sort of application involved a function that does word stemming, that is, takes words such as "motoring" and reduces them to their root stem, in this case "motor".
In profiling this function, it was observed that much of the overall time was being spent in the strcmp() function. For the C++ compiler in question (Borland 3.1), this function is written in assembly language and is quite fast, and attempts to speed it up by unrolling the equivalent code locally at the point of function call will typically result in slowing things down.
But it's still the case that calling a function, even one implemented in assembly language, has some overhead, which comes from saving registers, manipulating stack frames, actual transfer of control, and so on. So it might be worth trying to exploit a common case -- the case where you can determine the relationship of the strings by looking only at the first character.
So we might use an inline function in C++ to encapsulate this logic:
inline int local_strcmp(const char* s, const char* t)
{
return (*s != *t ? *s - *t : strcmp(s, t));
}
If the first characters of each string do not match, there's no need to go further by calling strcmp(); we already know the answer.
Another way to implement the same idea is via a C macro:
#define local_strcmp(s, t) ((s)[0] != (t)[0] ? (s)[0] - (t)[0] : \
strcmp((s), (t)))
This approach has a couple of disadvantages, however. Macros are hard to get right because of the need to parenthesize arguments so as to avoid subtly wrong semantics. Writing local_strcmp() as a real function is more natural.
And macros are less likely to be understood by development tools such as browsers or debuggers. Inline functions are also a source of problems for such tools, but they at least are part of the C++ language proper, and many C++ compilers have a way of disabling inlining to help address this problem.
How much speedup is this approach good for? In the word stemming program, for input of about 65000 words, the times in seconds were:
strcmp() 9.7
inline local_strcmp() 7.5
#define local_strcmp() 7.5
or a savings of about 23%. Obviously, this figure will vary with the compiler and the application.
This particular speedup is achieved by exploiting a common case -- the case where the first letters of two strings are different. For applications involving English words, this is often a good assumption. For some other types of strings, it may not be.
-------------------------
Copyright (c) 1995 Glen McCluskey. All Rights Reserved.
This newsletter may be further distributed provided that it is copied in its entirety, including the newsletter number at the top and the copyright and contact information at the bottom.
Glen McCluskey & Associates
Professional C++ Consulting
Internet: glenm@glenmccl.com
Phone: (800) 722-1613 or (970) 490-2462
Fax: (970) 490-2463
FTP: rmii.com /pub2/glenm/newslett (for back issues)
Web: http://www.rmii.com/~glenm