Java Newsletter by Glen McCluskey - Issue 2
Issue #002
March, 1996
Contents
- Comparing C/C++ With Java Part 2 - Sizes of Primitive Types
- Chars, Unicode, and File I/O
- A Way of Doing Class Initialization
- What Happens When You Output a Character?
- An Annotated Example of Java Usage
- Interfacing to an Applet
INTRODUCTION
In this issue we will continue to introduce Java the language, with the centerpiece of the issue a substantial annotated program example. We will also talk about Java I/O in several contexts, and show an interesting technique for doing class initialization.
COMPARING C/C++ WITH JAVA PART 2 - SIZES OF PRIMITIVE TYPES
If you've used C or C++ at all you will be familiar with common fundamental types like char, int, and double. Java also has these types, but with a couple of twists.
The first new angle is that the types are of uniform size across all Java implementations. Specifically, sizes in bits are:
boolean N/A
byte 8
char 16
short 16
int 32
long 64
float 32
double 64
The boolean type is not integral and so no size is listed. It can have the values true and false. The character type is 16 bits using the Unicode character set, more about which below.
The advantages of uniform sizes are obvious. Even today it is still very easy to stumble across code that is non-portable because someone assumed that an "int" would hold more than 16 bits (it doesn't on most PCs) or that a long and a pointer are the same size.
Java has no sizeof() operator like C and C++. With uniform data sizes, and the compiler handling the details of computing the size of space needed for an allocation statement like:
long x[] = new long[189];
there is not the same need for such an operator.
One drawback to this approach is that if the size of a data type is not a "natural" fit with the underlying hardware, some penalties in performance can be expected. For example, it's true today that the natural size for long is 32 bits on many machines, and requiring that such a type be 64 bits may result in slower code. But with the rapid pace of change in hardware, this concern isn't that significant, especially when weighed against the benefits of uniform sizes.
CHARS, UNICODE, AND FILE I/O
In the last section we mentioned that a char in Java is 16 bits, stored as two bytes. The high byte typically is 0, and various of the Java library classes and methods allow one to specify the high byte. Here's an example of byte and character I/O that illustrates some of these points, in a file "uni.java":
import java.io.*;
public class uni {
public static void main(String args[])
{
InputStream istr = null;
try {
istr = new FileInputStream("testfile");
}
catch (FileNotFoundException e) {
System.err.println("*** file not found ***");
System.exit(1);
}
try {
int b;
String s = "";
while ((b = istr.read()) != -1) {
s += (char)b;
}
System.out.print(s);
}
catch (IOException e) {
}
System.exit(0);
}
}
In this example, we attempt to open a file input stream to an input file "testfile", catching an exception and bailing out if the open fails (more about exceptions below). Note that we don't close the file explicitly. This is done by something akin to a C++ destructor, a method called finalize() that is invoked when garbage collection is done. We will talk about this area at some point; the semantics of resource cleanup and freeing are different in Java because of delayed object destruction.
Then we read bytes from the file using the read() method. The bytes are returned as ints, so that -1 can be used to indicate end of file (C has a similar trick with EOF). We take each int (byte) and cast it to a character and append it to a String object that we'd initialized to the empty string. Finally, we print the string.
A String object has a sequence of characters in it, and we have converted the input bytes that were read into characters and shoved them into the string. Since characters are Unicode, we have converted a sequence of input bytes into Unicode.
But it's not quite this easy. In casting to a character, there is the implicit supplying of a 0 to fill the high byte of the character, resulting in code that's not very portable. A better way to express the line:
s += (char)b;
would be:
byte x[] = {(byte)b};
s += new String(x, 0);
In other words, build a vector of bytes and construct a String from them, with the high byte fill value explicitly specified.
We will be saying more about Java I/O in the future. The Java library has a variety of classes and methods for dealing with input and output of various types. The I/O example shown above illustrates a way of doing low-level input. There are higher-level mechanisms available in the library.
A WAY OF DOING CLASS INITIALIZATION
In C++ one can use constructors to initialize class object instances when they're created, and employ static data members that are initialized when the program starts. But what if you'd like some code to be executed once for a given class, to kind of set things up for the class? One way of doing this in Java is to say:
public class A {
static {
System.out.println("got to static initialization");
}
public static void main(String args[])
{
System.out.println("start of main()");
A c1 = new A();
A c2 = new A();
}
}
No matter how many instances of objects of class A are created, the block of code at the top will be executed only one time. It serves as a hook to do class setup at the beginning of execution.
WHAT HAPPENS WHEN YOU OUTPUT A CHARACTER?
The technique shown in the previous section has one very important use. When you say:
System.out.println("x");
what happens? It's interesting to trace through the sequence of operations used to output a character.
In the first place, System is a class defined in the Java library. It is a wrapper class that you do not actually create object instances of, nor may you derive from the System class, because it is declared as "final". In C++ such a class is sometimes referred to as a "static global class".
System.out is defined as:
public static PrintStream out;
meaning that it's available to all and that there is only one object instance of PrintStream for "out". This PrintStream stream corresponds to standard output, kind of like file descriptor 1 in UNIX, stdout in C, or cout in C++. Similar streams are established for input and standard error output.
The output stream is initialized via a static initialization block of the type illustrated above. The actual code is:
out = new PrintStream(new BufferedOutputStream(
new FileOutputStream(FileDescriptor.out), 128), true);
This is a mouthful that says that a PrintStream is based on a BufferedOutputStream (with a buffer 128 long) which is based on a FileOutputStream with a specified file descriptor, and that output is line buffered.
Saying:
System.out.println("xxx");
means that you're invoking the println(String) method for a PrintStream. Doing so immediately results in the sequence:
PrintStream.print("xxx");
PrintStream.write('\n');
PrintStream.print("xxx") contains a loop that iterates over the characters in the String ("xxx" is a String, not a vector of characters) calling PrintStream.write() for each. PrintStream.write() calls out.write(), implementing line buffering as it goes.
What is out.write()? When the output stream was initialized, we created a PrintStream object and said that it should be based on a BufferedOutputStream. "out" is an instance variable of a class FilterOutputStream from which PrintStream derives ("extends"), and out is set to reference a BufferedOutputStream object. In a similar way, BufferedOutputStream is based on FileOutputStream.
out.write() in BufferedOutputStream collects characters into a buffer (specified in the creation line illustrated above). When the buffer becomes full, out.flush() is called. This results in a different write() being called in the FileOutputStream package. It writes a sequence of bytes to the file descriptor specified when the stream was created. This last method is native, that is, is implemented in C or assembly language and not in Java code itself.
This approach to I/O is quite flexible and powerful, and names like "stream nesting" and "stream filtering" are used to describe it. It's not a terribly efficient approach, however, especially since Java itself is interpreted and many of the higher layers of the system are written in Java itself.
One other note: when trying to figure out just what methods are called in an example like the one in this section, it's helpful to use the profiling feature of JDK:
$ java -prof xxx
This shows called methods, who called them, and how many times they were called.
AN ANNOTATED EXAMPLE OF JAVA USAGE
Here is a longer example of a complete Java program (not an applet). This program does simple expression evaluation, so for example, input of:
(1 + 2) * (3 + 4)
yields a value of 21.
If you're not familiar with this sort of programming, similar to what is found in language compilers themselves, a brief explanation is in order. The program takes input and splits it into what are called tokens, logical chunks of input. For the input above, the tokens are:
(
1
+
2
)
*
(
3
+
4
)
and the white space is elided. Then the program tries to make sense of the stream of input tokens. It implicitly applies a grammar:
expr -> term | expr [+-] term
term -> fact | term [*/] fact
fact -> number | ( expr )
Don't worry too much if you don't understand this. It's a way of describing the structure of input. You can think of it as a way of converting an input expression into the Reverse Polish Notation that some older calculators used to use.
Here is the actual program, in a file "calc.java". We will have more to say about this program in the next section below. Annotations are given in /* */ comments, while regular program comments use //. (Note: we're not trying to do anything fancy with comments for JavaDoc purposes, a subject to be presented another time).
import java.io.*;
public class calc {
private String in_line; // input line
private int in_len; // input line length
private int currpos; // position in line
/*
The input line, its length, and the current position in it.
*/
private byte curr_tok; // current token
private int val_token; // value if num
/*
The current token and its value if it's a number.
*/
private boolean had_err; // error in parsing
/*
Used to record whether a parsing error occurred on the input.
Exception handling could also be used for this purpose, and
is used for another type of error (divide by 0).
*/
private static final byte T_NUM = 1; // token values
private static final byte T_LP = 2;
private static final byte T_RP = 3;
private static final byte T_PLUS = 4;
private static final byte T_MINUS = 5;
private static final byte T_MUL = 6;
private static final byte T_DIV = 7;
private static final byte T_EOF = 8;
private static final byte T_BAD = 9;
/*
Possible token values. These are private (available only to the
class), static (shared across all class object instances), and
final (constant).
*/
// get next token from input line
private void get_token()
{
// skip whitespace
while (currpos < in_len) {
char cc = in_line.charAt(currpos);
/*
in_line.charAt(currpos) returns the current character from
the string.
*/
if (cc != ' ' && cc != '\t')
break;
currpos++;
}
// at end of line?
if (currpos >= in_len) {
curr_tok = T_EOF;
return;
}
// grab token
char cc = in_line.charAt(currpos);
currpos++;
if (cc == '+' || cc == '-')
curr_tok = (cc == '+' ? T_PLUS : T_MINUS);
else if (cc == '*' || cc == '/')
curr_tok = (cc == '*' ? T_MUL : T_DIV);
else if (cc == '(' || cc == ')')
curr_tok = (cc == '(' ? T_LP : T_RP);
/*
This block of code could also be handled via a switch statement
or in a couple of other ways.
*/
else if (Character.isDigit(cc)) {
int n = Character.digit(cc, 10);
while (currpos < in_len) {
cc = in_line.charAt(currpos);
if (!Character.isDigit(cc))
break;
currpos++;
n = n * 10 + Character.digit(cc, 10);
}
val_token = n;
curr_tok = T_NUM;
/*
The above code grabs a number. Character.isDigit(char) is a method
of the character class that returns a boolean if the character is a
digit. Character.digit(char, int) converts a character to a number
for a given number base (10 in this case).
The primitive types like char have corresponding class types, though
you cannot call a method directly on a primitive type object. You
must instead use the techniques illustrated here.
*/
}
else {
curr_tok = T_BAD;
}
/*
The case where the token can't be recognized.
*/
}
// constructor, used to set up the input line
public calc(String s)
{
in_line = s;
in_len = in_line.length();
currpos = 0;
had_err = false;
get_token();
}
/*
The constructor sets up an object instance for doing calculations. We
set up the input line, clear any error condition, and grab the first
token.
*/
// addition and subtraction
private double expr()
{
// get first term
double d = term();
// additional terms?
while (curr_tok == T_PLUS || curr_tok == T_MINUS) {
byte t = curr_tok;
get_token();
if (t == T_PLUS)
d += term();
else
d -= term();
}
return d;
}
/*
This and the next method are similar. They grab a term() or fact()
and then check to see if there are more of them. This matches input
like:
1 + 2 + 3 + 4 ...
As each token is consumed, another one is grabbed.
*/
// multiplication and division
private double term()
{
// get first factor
double d = fact();
// additional factors?
while (curr_tok == T_MUL || curr_tok == T_DIV) {
byte t = curr_tok;
get_token();
if (t == T_MUL)
d *= fact();
else {
double d2 = fact();
if (d2 == 0.0 && !had_err)
throw new ArithmeticException();
d /= d2;
/*
This code is similar to expr() above but we check for division by 0
and throw an arithmetic exception if we find it. We will see below
where this exception is handled.
*/
}
}
return d;
}
// numbers and parentheses
private double fact()
{
double d;
// numbers
if (curr_tok == T_NUM) {
d = val_token;
get_token();
}
/*
If a number, retrieve the value stored in val_token.
*/
// parentheses
else if (curr_tok == T_LP) {
get_token();
d = expr();
if (curr_tok != T_RP) {
had_err = true;
d = 0.0;
}
get_token();
}
/*
If (, then grab the expression inside and check for ). If not found,
record that we had an error. We could also throw an exception at this
point.
*/
// garbage
else {
had_err = true;
get_token();
d = 0.0;
}
/*
The token was not recognized, so we have bad input.
*/
return d;
}
// parse input and get and print value
public String get_value()
{
double d;
try {
d = expr();
}
catch (ArithmeticException ae) {
return new String("*** divide by 0 ***");
}
if (had_err || curr_tok != T_EOF)
return new String("*** syntax error ***");
else
return String.valueOf(d);
/*
Here is where we actually try to get the value of the expression. We
convert its value back to a String for reasons of flexibility in
handling error conditions.
Division by 0 will result in an exception being thrown and caught
here.
If we encountered an error, or if we've not exhausted the input string
(for example, for input "((0)))"), then we also flag an error.
Otherwise, we return the string value of the double using the method
String.valueOf(double).
*/
}
// get a line of input from the keyboard
private static String getline()
{
DataInput di = new DataInputStream(System.in);
String inp;
try {
inp = di.readLine();
}
catch (IOException ignored) {
inp = null;
}
/*
This is a wrapper function to get a line of input from the keyboard.
*/
return inp;
}
// driver
public static void main(String args[])
{
String inp = "";
// command line arguments
if (args.length > 0) {
for (int i = 0; i < args.length; i++)
inp = inp + args[i];
calc c = new calc(inp);
System.out.println(c.get_value());
/*
If there are command-line arguments, we will append them into one
string using the "+" operator and then evaluate the value of the
expression. args.length is the number of command-line arguments, and
args[i] is the i-th argument.
The line:
calc c = new calc(inp);
creates a new calc object and calls its constructor with inp as the
String argument to the constructor.
c.get_value() returns the expression value as a String.
*/
}
// no command line arguments, prompt user
else {
for (;;) {
System.out.print("input string: ");
System.out.flush();
/*
We flush output here because it's normally line buffered and we've not
output a newline character.
*/
inp = getline();
if (inp == null)
break;
/*
End of input.
*/
calc c = new calc(inp);
System.out.println(c.get_value());
}
}
}
}
INTERFACING TO AN APPLET
Suppose that we want to take the above calculator program and call it from an applet. How would we do this? Here's a simple example of an applet that will interface with the calculator code.
import java.awt.*;
public class applet extends java.applet.Applet {
public void paint(Graphics g)
{
String input_expr = getParameter("input_expr");
calc c = new calc(input_expr);
String out = c.get_value();
g.drawString("Input = " + input_expr, 25, 50);
g.drawString("Value = " + out, 25, 75);
}
}
This is similar to the applet illustrated in the last issue, save for the lines:
String input_expr = getParameter("input_expr");
calc c = new calc(input_expr);
String out = c.get_value();
The last two of these we saw in the example above. The first line illustrates how one can get parameters passed to the applet from HTML code, kind of similar to command-line parameters. The corresponding HTML to run this applet would be:
<html>
<head>
<title>Interface to Calculator Applet
</title>
</head>
<body>
<applet code="applet.class" width=150 height=150>
<param name=input_expr value="1/2/3*4">
</applet>
</body>
</html>
This HTML is similar to that illustrated in newsletter #001, save for the line:
<param name=input_expr value="1/2/3*4">
which actually passes in the parameter value. When this applet is executed, the result will be something like:
Input = 1/2/3*4
Value = 0.666667
ACKNOWLEDGEMENTS
Thanks to Thierry Ciot, Mike Paluka, and Alan Saldanha for help with proofreading.
SUBSCRIPTION INFORMATION / BACK ISSUES
To subscribe to the newsletter, send mail to majordomo@world.std.com with this line as its message body:
subscribe java_letter
Back issues are available via FTP from:
rmii.com /pub2/glenm/javalett
or on the Web at:
-------------------------
Copyright (c) 1996 Glen McCluskey. All Rights Reserved.
This newsletter may be further distributed provided that it is copied in its entirety, including the newsletter number at the top and the copyright and contact information at the bottom.
Glen McCluskey & Associates
Professional Computer Consulting
Internet: glenm@glenmccl.com
Phone: (800) 722-1613 or (970) 490-2462
Fax: (970) 490-2463
FTP: rmii.com /pub2/glenm/javalett (for back issues)
Web: http://www.rmii.com/~glenm