Copy Link
Add to Bookmark
Report

C++ Newsletter/Tutorial Issue 2

eZine's profile picture
Published in 
CPPNL
 · 2 years ago

Issue #002
November, 1995

Contents

  • Using C++ as a Better C Part 2 - references
  • Introduction to Namespaces Part 2 - using namespace
  • Writing Robust C++ Code Part 1 - assert() and subscript checking
  • Performance - handling lots of small strings via a C++ class


USING C++ AS A BETTER C - PART 2

In the last newsletter we discussed using function prototypes in C++ to eliminate a common type of error encountered in C, that of calling a function with the wrong number or types of arguments. Another C++ feature that can be used to reduce programming errors is known as references.

A reference is another name for an object. For example, in this code:

    int i; 

int& ir = i;


ir is another name for i. To see how references are useful, and also how they're implemented, consider writing a function that has two return values to pass back. In ANSI C, we might say:

    void f(int a, int b, int* sum, int* prod) 
{
*sum = a + b;
*prod = a * b;
}

void g()
{
int s;
int p;

f(37, 47, &s, &p);
}


In C++, we would say:

    void f(int a, int b, int& sum, int& prod) 
{
sum = a + b;
prod = a * b;
}

void g()
{
int s;
int p;

f(37, 47, s, p);
}


One way of viewing references is to consider that they have some similarities to C pointers, but with one level of pointer removed. Pointers are a frequent source of errors in C.

A reference must be initialized, and its value (the pointed at object) cannot be changed after initialization. The value of the reference cannot change, but the value of the referenced object can, unless the reference is declared as const. So, for example:

    int i = 0; 
int& ir = i;
ir = -19; // i gets the value -19


is acceptable, while:

    const int& irc = 47; 
irc = -37; // error


is not. A constant reference that points at a value like 47 can be implemented using a temporary.

References are useful in argument passing and return. Another use is illustrated below in the section on writing robust code.

INTRODUCTION TO NAMESPACES - PART 2

In the last issue we discussed one of the problems solved by namespaces, that of conflicting class names. You might be using class libraries from two different vendors, and they both have a String class in them. Using namespaces can help to solve this problem:

    namespace Vendor1 { 
class String { ... };
}

namespace Vendor2 {
class String { ... };
}


How would you actually use the String classes in these namespaces?

There are a couple of common ways of doing so. The first is simply to qualify the class name with the namespace name:

    Vendor1::String s1, s2, s3;


This usage declares three strings, each of type Vendor1::String.

Another approach is to use a using directive:

    using namespace Vendor1;


Such a directive specifies that the names in the namespace can be used in the scope where the using directive occurs. So, for example, one could say:

    using namespace Vendor1; 

String s1, s2, s3;


and pick up the String class found in the Vendor1 namespace.

What happens if you say:

    using namespace Vendor1; 
using namespace Vendor2;


and both namespaces contain a String class? Will there be a conflict? The answer is "no", unless you try to actually use String. The using directive doesn't add any names to the scope, but simply makes them available. So usage like:

    String s1;


would give an error, while:

    Vendor1::String s1; 
Vendor2::String s2;


would still work.

You might have noticed that namespaces have some similarities of notation with nested classes. But namespaces represent a more general way of grouping types and functions. For example, if you have:

    class A { 
void f1();
void f2();
};


then f1() and f2() are member functions of class A, and they operate on objects of class A (via the "this" pointer). In contrast, saying:

    namespace A { 
void f1();
void f2();
}


is a way of grouping functions f1() and f2(), but no objects or class types are involved.

WRITING ROBUST C++ CODE - PART 1

Many of the techniques used in writing robust C code also apply in C++. For example, if you have a function that is supposed to be passed a person's name, as a C-style string, it would be wise to say:

    #include <assert.h> 

void f(char* name)
{
assert(name && *name);

...
}


to perform basic checks on the passed-in pointer. assert() is a function (actually a macro) that checks whether its argument is true (non-zero), and aborts the program if not.

But C++ offers additional opportunities to the designer interested in producing quality code. For example, consider a common problem in C, where vector bounds are not checked during a dereference operation, and a bad location is accessed or written to.

In C++, you can partially solve this problem by defining a Vector class, with a vector dereferencing class member defined for the Vector, and the vector size stored:

#include <stdio.h> 
#include <assert.h>

class Vector {
int len; // number of elements
int* ptr; // pointer to elements
public:
Vector(int); // constructor
~Vector(); // destructor
int& operator[](int); // dereferencing
};

//constructor
Vector::Vector(int n)
{
assert(n >= 1);

len = n; // store length
ptr = new int[n]; // get storage to store elements
assert(ptr);
}

//destructor
Vector::~Vector()
{
delete ptr;
}

//dereferencing
int& Vector::operator[](int i)
{
assert(i >= 1 && i <= len);

return ptr[i - 1]; // return reference to vector slot
}

//driver program
main()
{
int i;
const int N = 10;
Vector v(N);

for (i = 1; i <= N; i++) // correct usage
v[i] = i * i;

for (i = 1; i <= N; i++) // correct usage
printf("%d %d\n", i, v[i]);

v[0] = 0; // will trigger assert failure

return 0;
}


In this example, we create a vector of 10 elements, and the vector is indexed 1..10. If the vector is dereferenced illegally, as in:

    v[0] = 0;


an assertion failure will be triggered.

One objection to this technique is that it can be slow. If every vector reference requires a function call (to Vector::operator[]), then there may be a large performance hit. However, performance concerns can be dealt with by making the dereferencing function inline.

Two other comments about the above example. We are assuming in these newsletters that if operator new() fails, it returns a NULL pointer:

    ptr = new int[n]; 
assert(ptr); // check for non-NULL pointer


The current draft ANSI standard says that when such a failure occurs, an exception is thrown or else a new handler is invoked. Because many C++ implementations still use the old approach of returning NULL, we will stick with it for now.

The other comment concerns the use of references. In the code:

    v[i] = i * i;


the actual code is equivalent to:

    v.operator[](i) = i * i;


and could actually be written this way (see a C++ reference book on operator overloading for details).

Vector::operator[] returns a reference, which can be used on the left-hand side of an assignment expression. In C the equivalent code would be more awkward:

#include <stdio.h> 

int x[10]; // use f() to index into x[10]

int* f(int i)
{
return &x[i - 1];
}

main()
{
*f(5) = 37;

printf("%d %d\n", *f(5), x[4]);

return 0;
}

PERFORMANCE TIPS

In performance tips this issue, we will present a complete C++ class along with its implementation code, which is appended to the end of the discussion.

This C++ example addresses a common problem in applications that use a lot of small strings. The problem has to do with the overhead associated with allocating the string via malloc() (in C) or operator new() (C++). Typically, such overhead is 8-16 bytes per string. And allocating then deallocating many small blocks will tend to fragment memory.

The Copy class deals with the problem by internally allocating large blocks and then shaving off small chunks for individual strings. It keeps track of all the large blocks allocated and deallocates them when a given Copy object is no longer needed. To use this system, you would allocate a Copy object for each major subsystem in your application that uses small strings. For example, at one point in your application, you might need to read in a dictionary from disk and use it for a while. You would allocate a Copy object and then use it to allocate the strings for each word, then flush the strings all at once.

In the application that this class was devised for, implementing string copying in this way saved 50K out of a total available memory pool of 500K. This is with Borland C++, which rounds the number of requested bytes for a string to the next multiple of 16, or an average wastage of 8 bytes. Since the Copy class uses 1024-byte chunks, on average 512 bytes will be wasted for a given set of strings, so the breakeven point would be 512 / 8 = 64 or more strings.

There are many variations on this theme. For example, if you are certain that the strings will never be freed, then you can simply grab a large amount of memory and shave chunks off of it, without worrying about keeping track of the allocated memory. Or if you have many objects of one class, such as tree nodes, you can overload operator new() for that class to do a similar type of thing.

Note that this particular storage allocator is not general. The allocated storage is aligned on 1-byte boundaries. This means that trying to allocate other than char* objects may result in performance degradation or a memory fault (such as "bus error" on UNIX systems). And the performance gains of course decline somewhat with large strings, while the wastage increases from stranding parts of the 1024-byte allocated chunks.

This same approach could be used in C or assembly language, but C++ makes it easier and encourages this particular style of programming.

An example of usage is included. A dictionary of 20065 words with total length 168K is read in. Without use of the Copy class it requires 354K, an 111% overhead. With the Copy class it takes 194K, an overhead of 15%. This is a difference of 160K, or 8 bytes per word. The results will of course vary depending on a particular operating system and runtime library. And the Copy version runs about 20% faster than the conventional version on a 486 PC.

The driver program that is included will work only with Borland C++, so you will need to write some other code to emulate the logic.

#include <string.h> 
#include <assert.h>

const int COPY_BUF = 1024; // size of buffer to get
const int COPY_VEC = 64; // starting size of vector

class Copy {
int ln; // number of buffers in use
int maxln; // max size of vector
char** vec; // storage vector
int freelen; // length free in current
char* freestr; // current free string
public:
Copy(); // constructor
~Copy(); // destructor
char* copy(char*); // copy a string
};

// constructor
Copy::Copy()
{
ln = 0;
maxln = 0;
vec = 0;
freelen = 0;
freestr = 0;
}

// destructor
Copy::~Copy()
{
int i;

// delete buffers

for (i = 0; i < ln; i++)
delete vec[i];

// delete vector itself

if (vec)
delete vec;
}

// copy a string
char* Copy::copy(char* s)
{
int i;
char** newvec;
int len;
char* p;

assert(s && *s);

len = strlen(s) + 1;

// first time or current buffer exhausted?

if (len > freelen) {
if (!vec || ln == maxln) {

// reallocate vector

maxln = (maxln ? maxln * 2 : COPY_VEC);
newvec = new char*[maxln];
assert(newvec);
for (i = 0; i < ln; i++)
newvec[i] = vec[i];
if (vec)
delete vec;
vec = newvec;
}

// allocate new buffer

vec[ln] = new char[COPY_BUF];
assert(vec[ln]);
freelen = COPY_BUF;
freestr = vec[ln];
ln++;
}

// allocate and copy string

freelen -= len;
p = freestr;
freestr += len;
strcpy(p, s);
return p;
}

#ifdef DRIVER

#include <stdio.h>
#include <alloc.h>

main()
{
long cl;
const int MAXLINE = 256;
char buf[MAXLINE];
FILE* fp;
char* s;
#ifdef USE_COPY
Copy co;
#endif

cl = coreleft();
fp = fopen("c:/src/words", "r");
assert(fp);
while (fgets(buf, MAXLINE, fp) != NULL) {
#ifdef USE_COPY
s = co.copy(buf);
#else
s = new char[strlen(buf) + 1];
assert(s);
strcpy(s, buf);
#endif
}
fclose(fp);
printf("memory used = %ld\n", cl - coreleft());

return 0;
}

#endif


-------------------------

Copyright (c) 1995 Glen McCluskey. All Rights Reserved.

This newsletter may be further distributed provided that it is copied in its entirety, including the newsletter number at the top and the copyright and contact information at the bottom.

Glen McCluskey & Associates
Professional C++ Consulting
Internet: glenm@glenmccl.com
Phone: (800) 722-1613 or (970) 490-2462
Fax: (970) 490-2463
FTP: rmii.com /pub2/glenm/newslett (for back issues)
Web: http://www.rmii.com/~glenm

← previous
next →
loading
sending ...
New to Neperos ? Sign Up for free
download Neperos App from Google Play
install Neperos as PWA

Let's discover also

Recent Articles

Recent Comments

Neperos cookies
This website uses cookies to store your preferences and improve the service. Cookies authorization will allow me and / or my partners to process personal data such as browsing behaviour.

By pressing OK you agree to the Terms of Service and acknowledge the Privacy Policy

By pressing REJECT you will be able to continue to use Neperos (like read articles or write comments) but some important cookies will not be set. This may affect certain features and functions of the platform.
OK
REJECT