Saturday, January 22, 2011

Smooth operators, part deux

Time to finish what I started, running through Java's operators. These are the less used and, in my opinion, less useful ones.

Since adding and subtracting one is so common, there are operators just for that.
  • ++ increments by one
  • -- decrements by one
So, for example, if n is 8, then ++n or n++ makes n equal to 9, and --n and n-- makes it 7.

Yeah, you can put the operator in front of the numeric variable or after it. The difference is that, when the operator is in front (pre-increment and pre-decrement), then the expression evaluates as the changed value. When the operator is in the back (post-increment and post-decrement), then the expression evaluates as the original value ... but the change happens after that evaluation. Uh, this is confusing. OK, say n is 8 again.
  • ++n == 9
  • n++ == 8
  • --n == 7
  • n-- == 8
This still might not be clear, so here's a bit more about it.
  • (++n) + 50 == 59, and n becomes 9
  • (n++) + 50 == 58, and n becomes 9 here too
See that? So the choice of pre- or post- depends on how you want to evaluate n in that context. To be honest, most of the time I end up using the post- versions.

I should mention that I used parentheses in that last item to avoid any worry about Java doing the wrong thing. If I write x+++y, what does that mean? Is it (x++)+y or x+(++y)? I think it's the first one, but I'll just use parentheses and not worry about it.

Anyway, you never need to use these, because you can obviously just use + 1, but they are handy.

Now I'm going to shift (har har) to talking about bitwise operators. These are for when you want to fiddle and twiddle the very bits inside numeric values.
  • << is for shifting bits left
  • >> is for signed-shifting bits right
  • >>> is for unsigned-shifting bits right
  • & is for bitwise AND
  • | is for bitwise OR
  • ~ is for bitwise NOT (inversion)
  • ^ is for bitwise XOR ... honestly, I forgot they even had this one until I was writing this
Way back in this blog I told you that all Java numeric types are signed. Despite that, Java goes the extra mile and gives you a way to shift bits right in an unsigned mode. When you use >>>, zeroes are always shifted in. When you use >>, the sign bit (leftmost bit) is repeated instead. If this sounds like crazy-talk, then study up on two's complement.

Bitwise operators aren't used very much. Back in the day clever C programmers would use << and >> to multiply and divide by powers of 2, because it translates to faster machine instructions. These days, though, that's too clever; let the compiler figure out how to make things fast. Use bitwise operators in those rare instances when you really need to work with bits.

I'm going to mention right now that the bitwise operators don't shortcut, but the logical ones I discussed last time do. I'll expand on that sometime in the future.

The last set of operators left to discuss are the assignment operators. These exist because this sort of thing is really common.
x = x + y;
Instead of typing all of that out, you can use the corresponding assignment operator.
x += y;
This is another case like the increment and decrement operators, where you don't need the operators but they can be handy in certain situations. Use them only if they don't make the code less readable. Anyway, the following operators have corresponding assignment operators.
  • +, -, *, /, and % become +=, -=, *=, /=, and %=
  • &, ^, |, <<, >>, and >>> become &=, ^=, |=, <<=, >>=, and >>>=

And now, here at the end, I must take back some of what I said before, because there is one more operator that I must mention, which is used all the time, and that operator is =.
x = 1
Wait, that's not ... how is that ... huh?

Yeah, you use the assignment operator, plain old =, to make zillions of Java statements, but the expression created by it also evaluates to something. Its value is whatever is on the right side of the assignment. Consider this bit of code.
int x = 7;
System.out.println (x = 9);
This actually compiles and prints 9.

Having explained this to you, I need to urge you not to do this a lot. It's weird looking and creepy. It looks like I meant to write x == 9 and print out the result of a comparison. The only context where I think this use of the evaluated value of an assignment makes sense is in a compound assignment like this.
int x, y, z;
x = y = z = 0;
Hey! You made it! Nice work. You should feel pretty empowered now, because you've covered all the Java operators, even the weird ones, so you can bust out whatever you need to perform your calculations.

Monday, January 17, 2011

Smooth operators, part 1

I think I'm turning completist in my postings here. I want to have something covering each aspect of Java, including the boring stuff. So, let's get Java's operators out of the way. Maybe I can tease up something interesting about them ...

I'll cover them in my particular order, but I want to remind you that if you're ever in doubt about which operator has precedence over another, use parentheses in your expressions. Chances are you'll be helping someone else make sense of your expressions later, and they will be thankful.

First, basic math.
  • + is for adding
  • - is for subtracting
  • * is for multiplying
  • / is for dividing
Stunning, right?

From what I've seen, Java will happily overflow (or underflow) using even these basic operators. So, adding Integer.MAX_VALUE + 1 will spin you around to Integer.MIN_VALUE without complaint. Whether this is a good thing or not is up for debate.

Java likes doing integer math using int values and floating-point math using double values. If you try to mix and match types, Java will promote one operand to the type of the other. This can help you out of some overflow situations, but you should still be careful. Knowing the type promotion rules is a good idea, but I'm not going to talk about them now.

Dividing integers, by the way, always gives you an integer result. So, 11 / 4 equals the integer 2. You get the remainder with:
  • % is for mod (modulo)
That means 11 % 4 equals the integer 3, which is the remainder of eleven divided by four. If you want the more exact floating-point answer, cast at least one of the operands to a float or double (eh, use double) first.
((double) 11) / ((double) 4)
See what I mean about parentheses? I can't remember whether casting binds more tightly to the numbers than the division operator. So I use parentheses and don't worry about it.

Let's continue with some unary operators. That means they only take one operand.
  • + is for, well, don't change the sign, which is terribly exciting and I don't think I've ever used this
  • - is for negating (sign flipping)
  • ! is for logical NOT
That unary plus, gosh, I forgot it was there. Check this out: +42 is the same as 42. I guess if you want to emphasize that a number's sign is to be left alone, then it's nice to have it.

The logical NOT operator only works on boolean values, while all the rest I've covered so far work on numeric primitive values (and, in Java 5, the wrapper types via autoboxing). Also, the binary + works on String values by concatenating them.

Speaking of operators for Boolean values:
  • && is for logical AND
  • || is for logical OR
  • there is no operator in Java for XOR
Again, these only work on Booleans, so you can't treat any other type as a Boolean like some other languages allow.

Let's get into relational operators, which render Boolean results.
  • == is for equality - that's really identity, not like Object.equals() lets you do
  • != is for inequality
  • < and > are for less than and greater than
  • <= and >= are for less than or equals and greater than or equals
  • instanceof is for type checking
Most of these are pretty obvious. The == and != operators can work on pretty much any type, while < and > and their variants are for numeric types.

The instanceof operator checks whether something is an "instance of" some class, meaning of that exact class or a subtype. For example, these expressions are all true.
"foo" instanceof String
"foo" instanceof Object
new Integer (42) instanceof Number
I've used the ternary operator a couple of times already in past postings. You can think of it as a mini version of an if-then-else statement. It has three expressions: an initial one, one for when the initial one is true, and one for when the initial one is false. The initial expression has got to evaluate to a Boolean value, and the other two have to match in their type but can otherwise be whatever type you want.

This is way easier to just explain by example.
int x = "foo" instanceof String ? 42 : 43;  // x becomes 42
score = x + (answer.equals ("Deuteronomy") ? 10 : 0);
The ternary operator is useful, but overusing it makes your code hard to read. If you have to do some wacky text formatting to make your expressions not look like gobbledy-gook, think about using a boring if-then-else instead.

I think that's enough operators for now. These are the most common ones you'll use by far. In my next post I'll cover the less used ones and, in my opinion, ones you may want to avoid.

Wednesday, January 12, 2011

Hashing things out

Now that I talked about equals(), I can discuss its partner in crime, the hashCode() method. This is another method on java.lang.Object that is therefore available on every Java object and, like equals(), you'll often want to override it with your own implementation.

In fact, the rule is: if you override equals(), you need to override hashCode() as well. I'll explain why.

First, though, you need to understand how hash tables work. If you don't, I suggest perusing the Wikipedia article for hash table, at least as a starting point.

The purpose of the hashCode() method is to provide the index of the bucket within a hash table where an object should go. At least in Java, it's not necessary for you to work in how big the hash table is when computing the code. For a hash table of size n, an object's hash index can be o.hashCode() modulo n.

One reason why the hashCode() method is part of Object is that it's just so darn useful. Hash tables show up everywhere in computing. Java's own data structures use hash tables internally. So it is convenient to have the method available all the time for any object.

A more important reason is that the concept of a hash code is so closely tied to the concept of equality. If two objects are equal, then they must reduce down to the same hash code. Said in Java terms, if a.equals (b) is true, then a.hashCode() == b.hashCode() must also be true. That's why whenever you override equals() for a class, you have to override hashCode() as well, so that they continue to agree.

All right, so why do two equal objects need to return the same hash code? Let's take the highly useful class java.util.HashSet as an example. This class implements a set of objects. A set cannot contain an object more than once. Sets cannot have duplicates.

This particular class implements its set storage by using an internal hash table. When you add something to a HashSet, that object is stored in the hash table, inside the bucket determined by its hashCode() value. That makes it fast and easy to find again, especially so that the HashSet can stop you from successfully adding the object a second time.

A HashSet, as well as almost all other instances of set implementations in Java, decides if it already contains some object if it finds an equal object within it. That is, equal as in equals(). So here are the steps for adding an object to a HashSet, laid out.
  1. Compute the hash code "c" of the object to be added.
  2. Find the bucket corresponding to "c" in the hash table.
  3. Check every object in the bucket. If any one is equal to the object according to equals(), don't add it. Otherwise, add it.
If two equal objects return different hash codes, then it will be possible to add them both to the same set (because step 2 can target the wrong bucket). That violates the "semantics" for Java sets, and will pretty much screw everything up for you.

It's important to mention that it is still OK for two unequal objects to return the same hash code. A collision like this is good to avoid, since it makes hash tables less effective, but it's not the end of the world. Those two unequal objects will just end up in the same bucket in hash tables.

The default implementation of hashCode() takes into account all the bits and pieces of your class, including all its fields. For classes where every field is "significant" with respect to equals(), this is correct. However, if you redefine equals() to only check some subset of your class's fields, then you can end up with two equal objects having different hash codes because the fields you left out are different. If that happens, you won't be able to use instances of your class in hash tables.

So, when you override equals(), you need to override hashCode() to compute its value based only on those significant fields. There are recipes for creating hashCode() methods, and I'll plug Effective Java by Joshua Bloch again as a great resource. Heck, some Java IDEs will even write them for you. Here is an example for the StringPair class we created in the last post, and since I'm hardcore I wrote it by hand.
@Override public int hashCode() {
    int c = 17;
    c = 37 * c + (s1 != null ? s1.hashCode() : 0);
    c = 37 * c + (s2 != null ? s2.hashCode() : 0);
    return c;
}
Since both string fields s1 and s2 are involved in equating StringPair objects, they both appear here. The important thing to notice is that the hash code for StringPair is calculated based on the hash codes of the String fields within it. As long as String.hashCode() follows the rules (which it does), then this implementation will render the same value for two objects with equal strings.

Monday, January 10, 2011

All things equal

In my last post I described the equals() method of the root Java class java.lang.Object. In this post I'll delve into some of the details of this important method. I'll start by restating and expanding on some of what I said last time.

What does it mean for two objects to be equal? For primitive types, the answer is easy, as it usually boils down to just math. Objects make this question more complex, though. They have types, and they are complex, that is, they can be composed of more than one piece of data.

Generally, two objects can only be equal if they are of the same type. So, a String object can only ever be equal to a String object. But what about derived classes? For example, if I have an Employee object for "Bob" as well as a Manager object for "Bob", and Manager derives from Employee, can the two objects possibly be equal?

The answer depends on how you can determine equality between Manager objects. If the procedure is the same as for any sort of Employee (that is, you'll just use the same procedure to compare Manager objects as you do Employee objects), then you'll be fine. But, if you need to consider extra information specific to Manager objects when comparing them, then you'll be in trouble. More on that in a bit.

When a class has multiple fields, then it's a good rule of thumb that two instances of that class can be considered equal if all of their corresponding fields are themselves "equal". If you imagine a StreetAddress class with fields for house number, street, city, and state (OK, it only does houses, sorry apartment dwellers), then two instances of that class should only be equal if they have the same values for house number, street, city and state.

An even better rule of thumb, though, is that two instances of a class can be considered equal if all of their corresponding "interesting" fields are themselves equal. It's up to you as the class designer to figure out what is interesting for the comparison. Our Employee class may be set up with an ID field that has a guaranteed unique value for each employee. You would certainly want to then define equals() for that class to only check the IDs of two instances being compared. Checking the rest would be superfluous, as long as the IDs are set properly.

So back to the type question. An implementation of Employee.equals() like this should serve well for subtypes like Manager as well, so you could just run with it. But imagine that you need to compare additional "interesting" fields for Manager instances to determine equality. How can you compare an Employee instance correctly then, since it lacks that data? You pretty much can't. You've broken one of the rules for implementing equals().

Ah, the rules. They make a lot of sense, really. Here they are, straight out of the Java API.
  • An object must always be equal to itself. (reflexive)
  • If object A equals object B, object B equals object A. (symmetric)
  • If A equals B and B equals C, then A equals C. (transitive)
  • If A equals B now, A equals B later unless you change one of those "interesting" fields. (consistent)
  • An object is never equal to null.

Not following these rules will lead you to a world of hurt, because you will be using an unusual concept for "equality". You can see that adding those extra "interesting" fields to the Manager class would make equals() asymmetric with respect to Employee objects.

The default implementation for equals() satisfies all of the rules, but it is usually way too strict. Consider this class, which holds a pair of strings.
package gjd;
public class StringPair {
    private String s1;
    private String s2;
    public StringPair (String s1, String s2) {
        this.s1 = s1;
        this.s2 = s2;
    }
    public String getString1() { return s1; }
    public String getString2() { return s2; }
    public static void main (String args[]) {
        StringPair p1 = new StringPair (args[0], args[1]);
        StringPair p2 = new StringPair (args[0], args[1]);
        System.out.println (p1);
        System.out.println (p2);
        System.out.println (p1.equals (p2));
    }
}
When I run this class with arguments "foo" and "bar", I get this.
gjd.StringPair@8813f2
gjd.StringPair@d58aae
false
Even though the two classes hold the same data, they aren't equal. You can see that the toString() method kind of stinks here too; it gives us the class name followed by a hexadecimal number, which functions like a memory address. The default equals() method says the two instances are not equal because they are not the exact same object, that is, they are not identical. By default, only identical objects are equal. You have to do extra work to allow two separate object instances to be equal.

Here is that work for this class.
@Override public boolean equals (Object other) {
    if (this == other) return true;
    if (!(other instanceof StringPair)) return false;
    StringPair o = (StringPair) other;
    if (!(s1 != null ? s1.equals (o.s1) : o.s1 == null)) return false;
    if (!(s2 != null ? s2.equals (o.s2) : o.s2 == null)) return false;
    return true;
 }
There is some new stuff here.

  • The @Override is an annotation which tells the Java compiler that I'm overriding the equals() method of the Object class. It's optional, but highly recommended.
  • The instanceof operator tells you whether an object is an instance of some class. The object can be of that exact class or of a subclass.
  • The exclamation point in the first if condition means "not". So, the condition evaluates to true if the object passed in is not a StringPair object.
  • The (StringPair) thing there is called a cast. It gives you a different type of reference to an object. Here, I am casting "other", which we said was an Object, to the StringPair class and storing the narrower reference in the variable "o". We know this is OK to do because the instanceof operator already checked the actual type of "other" for us.
  • The two lines before return true verify that the corresponding fields of the two objects are themselves equal. I'm going to have to leave it to you to tear those lines apart for now. Find out about the "ternary operator" in the meantime. Sorry about that. I just don't want to get too far off-topic right now.
The overall structure of this equals() method follows a recipe given in the book Effective Java by Joshua Bloch. You should add that book to your Amazon wish list. If we study the method for a little bit, we can confirm that it follows the rules for equals().
  • Is it reflexive? Yup, the first line guarantees that.
  • Is it symmetric? Sure, it doesn't matter which object the method is called on and which one is passed in.
  • Is it transitive? It is, because it depends on String equality being transitive, and it is.
  • Is it consistent? Yes. Provided that each object's s1 and s2 fields don't change, equals() will keep returning the same thing.
  • Does it prevent equality with null? It actually does, because if you pass in null, the instanceof check fails. The null value is like school on Thanksgiving: no class.
Now I run the class, and:
gjd.StringPair@8813f2
gjd.StringPair@d58aae
true
Good deal. Despite all I've written here, there are still other nuances of the equals() method that you'll need to learn. Do some research, and especially look into good Java books like Effective Java to find out more. Next time, I'll delve into the hashCode() method, the little brother of equals().

Thursday, January 6, 2011

The root of it all

When I divulged the truth about inheritance in Java, I revealed that all objects in Java ultimately descend from a single base class, java.lang.Object. It's worth checking out what this class is all about since, after all, what it provides runs through every object in Java.

Object doesn't offer up any fields for derived classes, but there are a handful of methods that are available. Some are great as they are; others you may want to override; and others are best shunned. First let's talk about the ones that are great as they stand.

The getClass() method returns a java.lang.Class instance that represents the class of an object. This may turn your brain inside out a little, so let me explain. There is a class shipped with Java whose name is Class. An instance of Class houses information about some specific Java class: maybe String, or Integer, or whatever. Even Object and, yes, Class itself, have their Class instances. These instances are useful for comparing the types of objects in some cases, and also for a lot of work with reflection, where you find out at runtime what's up with your classes. That's sort of an advanced topic, but at least right now you know it's there.

The methods notify(), notifyAll(), and wait() (in three forms) are all useful for doing multithreaded computation. Again, an advanced topic, but here's an overview. A thread can decide to pause for a while (or indefinitely) and wait for a signal to be delivered to potentially get going again. The thread calls a wait() method on some particular object of its choosing. Later, some other thread decided it wants to kick start one or all threads in such a pause state, so it uses either notify() or notifyAll() to wake one or all of them up. These methods are but one way to perform thread synchronization in Java.

Now let's talk about the Object methods you may want to override.

The equals() method is a big deal. It determines whether one object instance is equal to another, for some meaning of "equal". This is conceptually distinct from two objects being identical, that is, actually being the same bits in memory somewhere. You could have, for example, an Employee class with an ID field, and decide that every actual real-life employee has a unique ID. If that's the case, the equals() method for Employee can simply check the ID fields of two Employee objects to see if they match; if so, then you know they are "equal", regardless of whatever else they hold. That check is good enough.

The default implementation for equals() returns true only if two objects are identical. This is often overly restrictive, so you can override the method to be more intelligent about things.

The hashCode() method returns an integer value corresponding to an object. This is pretty much for supporting hash tables and similar data constructs. The rule is that two objects that are equal according to equals() must yield the same hash code. If this doesn't happen, then hash tables don't work right. The art of crafting equals() and hashCode() methods for your own classes is deserving of its own post, some other time.

The toString() method returns a string corresponding to an object. This is usually for debugging purposes, so the string should be human-readable. You can use it to render a more machine-readable specification, but in my opinion you should define a separate method for that, since you're getting into issues of parsing and such. The default implementation of toString() is not great, although it does tell you what class an object is. Definitely override this one.

Finally, let's talk about the black sheep methods of Object.

First, the finalize() method is sort of like a C++ destructor, except it turns out that you can't tell when or if it ever gets called. Prevailing opinion in the Java world is that you should never use this method for anything important. I don't recall ever using it for anything I've written. There are some cases where finalize() is useful, but think hard before you go there.

And now, lastly, the poor clone() method. The design of object cloning in Java is considered broken beyond hopes of repair, and you can tell why as I describe this method.

The clone() method creates a copy of an object. So, Employee.clone() creates a new Employee object that is often, but doesn't have to be (*sigh*), equal to the original object. The default implementation of the method does a "shallow copy" of the original, copying over field contents as is, which is sometimes OK but can cause trouble depending on what sort of fields your class has. So, if you want to support cloning of your own objects, you may need to override this thing. Oh, and your object has to implement the interface Cloneable, because the JVM won't let you clone an object that doesn't. And Object doesn't implement Cloneable.

"What were they thinking?!", you might ask. "Why did they put a method in a base class that always fails? Why is its default implementation wrong for many classes I might write? Why do I need to deal with this interface?" Folks smarter than me have studied this tragedy at length, and it boils down to good intentions gone wrong. Fortunately, there are alternatives for doing object copying in Java, which I can cover at a later date.

Hopefully this posting has put you on a more solid basis with understanding the Java object hierarchy, by highlighting the root class over all of Java. Remember that its facilities are always available for you for fundamental, common tasks.

Tuesday, January 4, 2011

Stringing along

In my last couple of posts I gave some love to Java's primitive types. Now I want to talk about a single class, java.lang.String. There's a lot more to this class than just "Hello world".

First off, you already know how to define a string:
String s = "Hello world";
C and C++ veterans may be wondering about null termination, so I'll tell you know that concept doesn't carry over. The string consists only of its constituent characters, plus the rest of the typical Object stuff. If you insert a null character ('\u0000') into a string, it just sits there in the string and doesn't mean anything particularly special.

There are lots of handy methods on the String class. Here is but a sample.
String s = "Hello world";
System.out.println (s.charAt (0));  // H
System.out.println (s.contains ("lo"));  // true
System.out.println (s.endsWith ("rld"));  // true
System.out.println (s.indexOf ('l'));  // 2
System.out.println (s.lastIndexOf ('l'));  // 9
System.out.println (s.length());  // 11
System.out.println (s.replace ("Hello", "Hi"));  // Hi world
System.out.println (s.startsWith ("Hel"));  // true
System.out.println (s.substring (3, 5));  // lo
System.out.println (s.toLowerCase());  // hello world
System.out.println (s.toUpperCase());  // HELLO WORLD
As you peruse the API for String, you may notice that there aren't any methods for changing the characters in a string, for modifying a string. This is intentional, because like the wrapper classes String is immutable. Once its characters are set they cannot be changed. For example, the replace() method above doesn't change s, it returns a new string with the replacement performed.

At some point I'll talk in depth about immutability, and then I can cover its benefits. For now, check out my post on the wrapper classes for some discussion on that.

A common mistake in Java involves string comparison. Check this out:
String s = "Hello world";
System.out.println (s == "Hello world");  // might not be true
System.out.println (s.equals ("Hello world"));  // true
It's usually wrong to compare strings for equality using the == comparison operator. How come?

In Java, objects are treated as references to data, like pointers in C or C++. So in the code above, s contains not the string "Hello world" itself, but sort of a reference to a memory location where the string lives. This is in contrast to primitive types, where the values are stored directly.

Depending on how a string is created, its variable may be pointing to a different copy of its characters than some other string variable referencing the same characters. They might or might not share the same copy. It's OK (great, actually) if they do, but it's not required. For that reason, using == can fail, because that compares the references to the strings, not the strings themselves.

In the code above, == might work, because I used a string literal on both ends, and the JVM ought to use one "canonical" copy for the both of them. But, if I had read in the string for s from some file, say, then it probably wouldn't work.

So, get in the habit of using equals() to compare strings. It's the convention anyway, and others looking at your code will thank you.

Combining strings is easy.
String s = "Hello" + " " + world;
Yeah, Java lets you use the + operator for string concatenation. Java doesn't let you do operator overloading / overriding like C++, but they did hook us up with this, at least.

Now, while this is great, it can cause a lot of trouble. This loop, for example, will have pretty horrendous performance.
String s = "";
for (int i = 0; i < someBigNumber; i++) {
  s = s + Integer.toString (i);
  s = s + " ";
}
The problem lies in how Java does string concatenation using the + operator. It kind of goes like this:
  1. Make a new "buffer" for the concatenated string.
  2. Copy the first string into the buffer.
  3. Copy the second string into the buffer.
  4. Convert the buffer contents into a string.
While this is perfectly reasonable for one-off concatenations, you can see that in a loop it requires tons of buffer creations and tons of copies, and the copies get longer for each iteration. Normally you don't try to optimize your code until late in the game, but this is an noteworthy exception. The way to fix this is to use the java.lang.StringBuilder class, or java.lang.StringBuffer if you are using Java 1.4 or older.
StringBuilder buff = new StringBuilder();
for (int i = 0; i < someBigNumber; i++) {
  buff.append (Integer.toString (i));
  buff.append (" ");
}
String s = buff.toString();
Here, only one buffer is ever created, and there is no copying. It's way way faster. Any good static analysis tool (Findbugs is my fave) should detect these cases for you, but it's better never to write them.

So this is a pretty decent introduction to Java strings. There are other issues to consider, but knowing how to create them, compare them for equality, and properly concatenate them is more than enough to get you moving.