Saturday, October 29, 2011

Making Exceptions

The idea of exceptions goes back before Java, but Java embraced the idea more than any language before it.

The fundamental problem exceptions solve is how to send error information back to the caller of a function / method / whatever. Before exceptions were prevalent, there were a couple of preferred solutions (among others). First, the function could return an error value instead of a "normal" answer; for example, a function that normally returns a non-negative number could return -1 if there was some problem. Second, the function could set some reserved global variables with error codes and messages and so on. Both solutions have their own problems. Among them is that the human doing the programming has to remember to check for error conditions. And, because humans are often lazy, rushed, or sloppy, it's easy for those checks to not get done.

Java provides two kinds of exceptions, and one of them, called checked exceptions, cannot be ignored. If a method "throws" a checked exception back to its caller, the calling code must either deal with the exception or throw it on up to its caller. Either option requires the programmer to explicitly do something: either write some code to deal with the exception, or have the method declare that it throws the exception. The programmer is forced to confront the possibility of an error. Overall, this leads to more robust code.

Let's say we have a method for opening a file that can throw java.io.IOException, which is a checked exception. The method definition could look like this:

public java.io.File open (String name) throws java.io.IOException {
  // ...
}

Any code that calls the method must deal with the possibility of an IOException popping out. It can just pass the exception on, throwing it itself:

public void doSomethingWithAFile() throws IOException {
  // ...
  // ... create a FileOpener object with our open method in it ...
  java.io.File myFile = fileOpener.open (theFileName);
  // ...
}

The other option is to handle it itself. This involves a "try-catch" block.

public void doSomethingWithAFile() {
  // ...
  try {
    java.io.File myFile = fileOpener.open (theFileName);
    // ...
  } catch (java.io.IOException exc) {
    // ... do something here, like logging or recovering ...
  }
  // ...
}

The code that could throw exceptions goes into the code block after "try". After the try block comes a "catch" block for the exception that should be caught there. It's fine for a catch block to throw some other exception; in fact, that's a well-known tactic, and a good idea a lot of the time.

I said Java provides two kinds of exceptions. One is checked, and the other is unchecked exceptions. The difference is that you don't need to explicitly deal with an unchecked exception; by default, if you don't catch it, it automatically propagates up to your code's caller, and so on up and possibly out of the JVM (which would then stop execution). Methods that throw unchecked exceptions don't even have to declare that they throw them.

Let's add a line to our "open" method to make sure the name passed in isn't null. If it is, we'll throw the unchecked IllegalArgumentException. Since that exception is unchecked, it doesn't have to be mentioned in the method definition.

public java.io.File open (String name) throws java.io.IOException {
  if (name == null) {
    throw new IllegalArgumentException ("null name");
  }
  // ...
}

Calling code can still catch IllegalArgumentException, but it doesn't have to in order to compile and run.

Unchecked exceptions obviously make code less safe, that is, more prone to stopping completely when something goes wrong. So why have them? Why aren't all Java exceptions checked?

Unchecked exceptions are supposed to be used for situations that are not "recoverable". A programmer who opts for throwing a checked exception is saying that calling code may be able to deal with the error that led to the exception, or should at least try to. On the other hand, throwing an unchecked exception implies that there isn't much calling code can do, usually, so it's expected to just let execution stop. Calling code can, of course, still catch unchecked exceptions, but it's not required.

Having explained all that, the majority of Java developers (myself not included) seem to hate checked exceptions. Historically they were overused, and forced developers to deal with a lot of conditions that are either really unrecoverable, or that aren't important to deal with at all. Prime example:

package java.io;

public class FileInputStream extends InputStream {
  // ...
  public void close() throws IOException {
  // ...
  }
  // ...
}

A FileInputStream is used to read binary data from a file. When you're done reading from the stream, you need to close it so that the system can release resources associated with the file. The close() method throws a checked exception, and you almost always don't care if you can't close a stream once you're through with it. Dealing with this exception leads to all sorts of unnecessary wacked-out try-catch gymnastics. (Java 7 promises to help, but still.)

Unfortunately, some in the Java community overreacted and abandoned checked exceptions completely. For example, the Spring and Hibernate libraries mostly throw unchecked exceptions, even for conditions that warrant checked exceptions. The argument is that it makes programmers' jobs easier. No doubt you can write less code because you don't need try-catch blocks or "throws" clauses on your own methods, but the temptation to ignore the errors returns, and the code overall is less robust.

What's called for is a more balanced approach. Use checked and unchecked exceptions judiciously, balancing convenience with robustness.

Saturday, March 26, 2011

Immutability and builders

Once you get to a certain point in working with Java, you always keep in mind how your code will work in a multi-threaded environment. It isn't a strange thing to think about: servlets run with threads, as do EJBs and Swing applications. Even if you are not dealing with threads now, you could be in the future, or someone else might try to apply your code to threads.

One of the easiest ways to handle the question "Will this work with threads?" is to make your classes immutable. The state of an immutable object cannot be changed after its construction. If an object is immutable, then there is no chance that threads will see different state in the object when they shouldn't. Immutability also leads to a simpler API for your class and overall more predictable behavior.

Here's a typical immutable class.
public class USAddress {
  private final String streetAddress;
  private final String city;
  private final String state;

  public USAddress (String a, String c, String s) {
    streetAddress = a;
    city = c;
    state = s;
  }
  public String getStreetAddress() { return streetAddress; }
  public String getCity() { return city; }
  public String getState() { return state; }
The fields in this class are all final, which means two things: first, that they can only be assigned once; second, that they must be assigned after construction. These properties of final help make the class immutable.

I didn't include implementations of equals() or hashCode(), but I'll just say that they would depend on the fields in the class. Immutability implies that the hash code of a USAddress object never changes, which is great because it will never get lost in a hash table. For some objects, if there are a lot of computations involved for generating a hash code, you could just calculate it once and cache it internally for speed.

One downside to immutable classes is that they must have all their data passed to them on construction. Let's expand the class above and see what happens. I'm going to leave out the getter methods.
public class USAddress {
  private final String streetAddressLine1;
  private final String streetAddressLine2;
  private final String city;
  private final String state;
  private final String zipCode;
  private final boolean isPostOfficeBox;
  private final boolean isAptOrCondo;

  public USAddress (String a1, String a2, String c, String s, String z,
      boolean p, boolean ac) {
    streetAddressLine1 = a1;
    streetAddressLine2 = a2;
    city = c;
    state = s;
    zipCode = z;
    isPostOfficeBox = p;
    isAptOrCondo = ac;
  }
  // ... getters ...
}
The problem is the constructor. It has five strings in a row and then two booleans in a row. It could be tricky to remember the right order of the parameters, which one is which.
USAddress a = new USAddress ("1234 Elm Street", "Apt. 56",
    "Springfield", "MA", "01103", false, true);
Quick, what do the two booleans mean again?

So you can imagine that some classes can have even more fields, with a variety of types, and working with their constructors gets ridiculous. Fortunately, there is a design pattern that can help you out: the Builder pattern.

A builder is a class that builds another class. You use it instead of directly calling a constructor. Here is a builder example.
public class USAddressBuilder {
  final String streetAddressLine1;
  String streetAddressLine2 = null;
  final String city;
  final String state;
  String zipCode = null;
  boolean isPostOfficeBox = false;
  boolean isAptOrCondo = false;

  public USAddressBuilder (String a, String c, String s) {
    if (a == null) {
      throw new IllegalArgumentException ("null street address");
    }
    if (c == null) {
      throw new IllegalArgumentException ("null city");
    }
    if (s == null) {
      throw new IllegalArgumentException ("null state");
    }
    this.streetAddressLine1 = a;
    this.city = c;
    this.state = s;
  }
  public USAddressBuilder streetAddressLine2 (String a) {
    this.streetAddressLine2 = a; return this;
  }
  public USAddressBuilder zipCode (String z) {
    this.zipCode = z; return this;
  }
  public USAddressBuilder isPostOfficeBox (boolean p) {
    this.isPostOfficeBox = p; return this;
  }
  public USAddressBuilder isAptOrCondo (boolean ac) {
    this.isAptOrCondo = ac; return this;
  }

  public USAddress build() {
    return new USAddress (this);
  }
}
Let's tear this one down.
  • This builder has the same fields as the class it builds. Some of the fields—those that are required—are final, but the optional ones aren't. The optional ones even get default values.
  • The builder's constructor only takes in the fields that are required.
  • To set the optional fields, you call specific builder methods for them. These methods return the builder again, so you can chain calls together (see below).
  • The build() method constructs a USAddress object from the builder data. The constructor (not shown) simply copies the builder fields into the object.
Here's how the builder is used.
USAddress a =
    new USAddressBuilder ("1234 Elm St.", "Springfield", "MA")
    .streetAddressLine2 ("Apt. 56").zipCode ("01103")
    .isAptOrCondo (true).build();
This code is longer, but it's much easier to read. Other nice things you get:
  • You can leave out fields that aren't important (like isPostOfficeBox).
  • The logic for constructing the class is mostly moved over to the builder. This is nice for classes that have lots of other stuff in them.
  • The builder has the option of reusing objects that it already constructed. If you ask for an object with the same data, and those objects are immutable, you could just as well use a copy made earlier. This can save on memory usage. (For more, check out another design pattern, Flyweight.)
  • The builder has the option of sending back a subclass instance. Imagine an AddressBuilder that returns Address objects; if you use a builder and pass in US-style address information, the builder can send you back a specific subclass of Address that specializes in US address data.
There are some downsides.
  • You have to write a lot more code to implement this builder. It's a sacrifice you have to make for easier use later on.
  • It's another class. You can mitigate this downside a little by making the builder an inner class of what it builds (which is what I usually do).
  • Some libraries and frameworks can only use constructors for your classes. I see this more as a problem on their end and not a fault of this pattern, but it's a practical consideration. You might have to make allowances to work within the constraints imposed on you.

Monday, March 14, 2011

Using Guice for dependency injection

So last time I described what dependency injection is. In this post I'll run quickly through how you can use a dependency injection framework. I'm going to pick Guice since I'm familiar with it, and because it's really straightforward.

You instruct Guice on how to do injection using a "module".
public class FlamingoModule extends AbstractModule {
  @Override protected void configure() {
    bind (RouletteBall.class);
    bind (RouletteTable.class);
    bind (Integer.TYPE).annotatedWith (SeatsPerTable.class)
      .toInstance (8);
    bind (SpecialtyDrink.class).to (Margarita.class);
  }
  @Provides RouletteWheel provideRouletteWheel (RouletteBall ball) {
    return new RouletteWheel (ball, true);
  }
}
This module's configure() method sets up some bindings, which is how Guice gets from what you ask it for to what it gives you. The first two bindings just make Guice aware of the RouletteBall and RouletteTable classes. The last one tells Guice that whenever someone asks for a SpecialtyDrink from Guice, it should deliver an instance of the Margarita class.

The third binding tells Guice that whenever it's asked for an integer annotated with @SeatsPerTable, it should send back the value 8. Annotations are a way that you can have Guice inject different values for the same type. Generally you need to code up the annotations yourself.

The provideRouletteWheel() method illustrates a different way Guice can get you objects. When Guice is asked for a RouletteWheel instance, it will run the "provider" method to create one. Guice handles injecting a RouletteBall instance into the method's ball parameter.

In order to finish wiring up the code, we'll need to use the @Inject annotation. This tells Guice to inject a dependency. So, the RouletteTable constructor needs a little work.
@Inject public RouletteTable (RouletteWheel w,
                              @SeatsPerTable int n) {
  numberOfSeats = n;
  wheel = w;
}
Now, when Guice needs a RouletteTable, it knows about it (from the module) and knows to inject values into its constructor. The third binding in the module's configure() method lets it inject n, and the provider method takes care of w. Since Guice also knows about RouletteBall, it can inject an instance of that class into the provider method.

To finally tie it all together, you need a starting point, something that lets you talk to Guice. That's called an injector.
Injector injector = Guice.createInjector (new FlamingoModule());
RouletteTable t = injector.getInstance (RouletteTable.class);
You can see that I could define a new module for a different casino, say, one that uses European roulette wheels and seats ten per table, and use that module with Guice to get a differently constructed RouletteTable object. The knowledge of how to create objects is wrapped up nicely in the modules and doesn't interfere with the use of the objects.

So, after all this, you may wonder why this is a good idea. What does this buy you, besides the kind of abstract architectural benefits?

One thing you get is not having to call a bevy of constructors just to make a high-level object. Instead of new this and new that getting passed to new something else, the "rules" for creating the objects are laid out in a more declarative fashion.

Another thing is get is tighter control of object creation. For example, Guice lets you inject dependencies as singletons, so you only ever get one instance across all injections. As another example, a provider method gives you free rein to control exactly how objects are built.

Perhaps the most powerful thing you get is simple swapping of object creation systems. Suppose you want to perform some testing of the code that uses RouletteTable, and you need to be able to peek into and tweak the RouletteTable and RouletteWheel instances. No problem:
Injector injector = Guice.createInjector (new TestingModule());
Now you can create a module that generates objects designed for testing purposes. The code that uses those objects doesn't need to change at all.

That's quite enough about Guice. For more, check out its user's guide. Hopefully this quick tour of Guice has shown you how neat dependency injection is and what it can do for you.

Monday, March 7, 2011

What the heck is dependency injection?

In my travels through the Java world over the years, there have been some concepts which I've had trouble finding a simple, succinct definition for. One of them is "dependency injection" (DI), which is one of the hot Java concepts of the last few years. Let me try explaining what it is and why it's useful.

In the normal way of working with Java, you build your objects with constructors. A lot of the times, they need to build other objects they need.
class RouletteBall {}

class RouletteWheel {
  private RouletteBall ball;
  public RouletteWheel() {
    ball = new RouletteBall();
  }
}

class RouletteTable {
  private RouletteWheel wheel;
  private int numberOfSeats;
  public RouletteTable (int n) {
    numberOfSeats = n;
    wheel = new RouletteWheel();
  }
}
Very straightforward. When you ask for a new RouletteTable(), that constructor creates the necessary RouletteWheel, which in turn creates the necessary RouletteBall. The higher-level classes create their own dependencies.

There's another way to do this.

class RouletteBall {}

class RouletteWheel {
  private RouletteBall ball;
  public RouletteWheel (RouletteBall b) {
    ball = b;
  }
}

class RouletteTable {
  private RouletteWheel wheel;
  private int numberOfSeats;
  public RouletteTable (RouletteWheel w, int n) {
    numberOfSeats = n;
    wheel = w;
  }
}

// then you do this
RouletteWheel wheel = new RouletteWheel (new RouletteBall());
RouletteTable table = new RouletteTable (wheel, 8);
Oh snap, guess what, we just added dependency injection. But keep reading though, there's more to this.

The constructors here have been changed to take in the dependencies from outside, instead of generating them internally. The dependencies are supplied, or injected, into the class instances. Hence, dependency injection.

The term "inversion of control" (or IOC) is often applied to this sort of thing. The "control" refers to control over object creation, and the location of that control has been "inverted" from inside the classes to outside of them.

This doesn't seem particularly earth-shattering, and really it isn't at this point. Like a lot of design patterns, it's just a good idea, one that comes from the experience of many smart people working with object-oriented languages. It turns out that employing dependency injection gives you lots of flexibility.

For example, say we augmented our RouletteWheel class to be either European style (single-zero) or American style (double-zero).
class RouletteWheel {
  private RouletteBall ball;
  private boolean doubleZero;
  public RouletteWheel (RouletteBall b, boolean dz) {
    ball = b;
    doubleZero = dz;
  }
}
The first form of the RouletteTable class has a problem now, because its constructor needs to specify a style for the table's wheel. You'd have to add another parameter to the constructor, or maybe a second constructor. There are several ways to cope, but it involves some changes.

The second form of RouletteTable has no trouble with this change, because it just takes in whatever RouletteWheel instance it's handed. No code changes! This is also great because it encapsulates the details of how a RouletteWheel is created; the RouletteWheel class doesn't need to know or care about that.

I'm going to stop here for now. There's more to talk about, but I'd rather let the basic concept of dependency injection sink in. Next time I'll discuss how some frameworks support dependency injection and can help you out.

Thursday, February 17, 2011

What's a bean?

Java evolved from simply a language into an entire ecosystem of APIs and libraries and frameworks, more than any mere mortal can comprehend. You just cannot be an expert in all of them, as more are created every day, it seems, and the ones that were all the rage five years ago are now ground into dust beneath the wheels of what's newer and slicker. Some concepts still pervade a good number of them though, and one of them is the Java "bean".

There are many specific kinds of "beans", such as Enterprise Java Beans and MBeans and Persistent Entity Beans, but there is also just this general concept of a bean. Its origin is way back before the proliferation of APIs and frameworks, and the idea was that if you made a Java class just so, then tools, or other things that got a hold of your class, could find out things about it. A very simple set of conventions was determined, and if your class follows those conventions, then it's a bean.

Your class can be a bean and something else too. It's not a restrictive concept, like most Java concepts became for a while there. It really is only a set of conventions, a small one at that, which you can often adopt, and if you do, it can help you out.

Here are the conventions.

First, your class needs to have a public "no-arg" constructor; that is, it must be possible for code anywhere to make an instance of it by just saying new MyClass(). Fortunately, if you don't write a constructor for your class, Java makes a no-arg constructor for you. If you do write some other constructors, then you can just add this.
public MyClass() {}
Second, and this is kind of optional, your class needs to have properties. The idea behind a property is that you have a public getter and (if you like) a public setter method named similarly. Here:
public String getLastName() { return ln; }
public void setLastName (String n) { ln = n; }
The methods above define a read-write property called "lastName". Note that the property name starts with a lowercase letter. The getter takes no arguments and returns the property value, while the setter takes in a new value and returns nothing. If you want the property to be read-only, omit the setter.

The type of the property can be anything you like. One wrinkle: if it's boolean, use a different naming convention for the getter.
public boolean isAdult() { return (age >= 18); }
Inside your class, the property doesn't have to be represented by a field in a one-to-one relationship. Do whatever you like in there, but only expose publicly a named property of some type.

The third convention is ... well, I'm going to be a little ornery here and say that there aren't any more conventions you need to follow. The Wikipedia entry for JavaBean says your class should be serializable, which means that instances of it need to be able to be saved off and restored, but I don't think that's strictly required. (The entry itself says it "should" be serializable, so there.) And the official JavaBean spec has bits about beans exchanging events, but again I see that as optional if you want it.

So, in my ornery opinion, any Java class with a public no-arg constructor, and maybe with some properties, is a bean. There.

I claim this because by following these two simple conventions your classes can be wired into all sorts of neat things. The Apache Commons BeanUtils library is one of them, and while that library maybe isn't all that jazzy when used directly, it's the foundation for some really powerful Java technologies—just read the overview to see some examples.

Another powerful framework drawing from the bean concept is Spring, whose most fundamental offerings are all about creating and configuring objects for you as long as they are beans (well, often even if they aren't, but it really really likes beans).

I said above that "for a while there" Java concepts got restrictive. For example, the Java Servlet specification (which I still use all the time) says that, in order for a Java class to respond to HTTP requests, it has to extend a particular class HttpServlet and implement specific methods and use certain other classes and so on and so on. On its own, this is fine. But for a working Java developer, having to know this specialized set of classes for servlets, plus another for Enterprise Java Beans, and another for some other framework, well, it's a real pain. The community called out for a return to simplicity.

And so, frameworks like Spring and new language features like annotations arrived, and mercifully, the primary goals for these endeavors included "making your life easier so you can get your work done". One big way of making that happen was to leave the restrictive concepts behind and instead let you work with simple ones, like the humble bean. Then, after some hints and nudges, the heavy lifting would be done for you.

(If this sounds like you're encouraged to be lazy, then stop using Java and go work in assembly language. :) )

So, next time you're slapping some new classes together, think about making some of them beans along the way. It might not help you right away, but down the line that decision might pay off. Even the mere fact that the bean concept is a foundation for all these regions of the Java ecosystem should indicate that it's not a bad idea to adopt it for yourself.

Friday, February 4, 2011

Taking shortcuts

Now that I finished going through all of the Java operators, I want to talk about the special way that two of them work. Those two are the logical AND and OR operators, also known as && and ||. These guys can be lazy and skip part of their evaluation; sometimes this is good for you, and sometimes not.

Let's take logical AND. If both its operands in an expression are true, then the whole thing evaluates to true. Otherwise, the expression evaluates to false. That's just AND.

Now, let's focus on that first operand. If it evaluates to true, what does that tell you about the expression's value? Well, not much, you still need to check on the second operand. But suppose the first operand evaluates to false? If that happens, you know that the whole expression is false already. The value of the second operand doesn't matter.

Java takes this lesson to heart. If the first operand for logical AND evaluates to false, then it doesn't evaluate the second operand at all. This is called a shortcut or short circuit Boolean operation. It is good for efficiency, especially for something like this:
if (debugging && timeConsumingMethod()) {
  doSomething();
}
Because logical AND shortcuts, that really expensive timeConsumingMethod won't be called unless you are debugging. Might as well not pay the price if you can help it.

Logical OR works the same way, except opposite sorta. If the first operand in its expression is true, then it shortcuts, because the entire expression must evaluate to true at that point no matter what. For example:
if (answer == null || !(answer.equals ("apple"))) {
  wrongAnswer();
}
Without short circuiting, the equals() call would throw a nasty NullPointerException when answer is null. Instead, though, because OR shortcuts, it's safe. This is a common trick for dealing with nulls in comparisons.

While shortcut operations are useful, you do have to be careful with them. Avoid having the second operand do something as a side effect that you really want to happen all the time.
int i = 0;
while (i < a.length) {
  if (a[i] == null || a[i++].equals ("")) {
    System.out.println ("I found a blank!");
  }
}
This all-too-clever loop tries to run through an array of strings, looking for either nulls or empty strings. The loop index gets incremented with that i++ bit. As long as the array contains no nulls, this loop will work, but as soon as the first null arrives, because logical OR shortcuts, the loop will get stuck, printing out "I found a blank!" forever and ever, or until you hit Control-C.

(I don't think I've covered the while loop yet. It's a basic loop that keeps going "while" the Boolean expression next to while evaluates to true. If the expression is false to begin with, the loop never executes even once. So, you have to make sure you either make that condition false at some point, or break out of the loop using either break or return.)

The example above is a bit contrived, but traps like it can and do happen more subtly in real code.

Two more things. First, the bitwise AND and OR operators, known as & and |, do not short circuit. They are more akin to mathematical operations, so this makes sense.

Second, don't use the shortcut feature to implement flow control. This is something that is idiomatic in scripting languages, but it ain't the Java way. Here's some examples. First, the typical way you open a file in Perl.
open HANDLE, "<myfile.txt" or die "Cannot open myfile.txt";
The die command, which causes the script to exit, will not execute as long as the file is opened successfully.

Another example, from bash scripting:
[[ -n DEBUG ]] && echo File is opened.
If the DEBUG variable isn't set, the AND short circuits and the debug message isn't printed out.

While this is pretty nifty and all, doing the same thing in Java will likely lead to confusion, although it may help your geek cred. Eh, it's not worth it. Really, use if statements instead.

Saturday, January 22, 2011

Smooth operators, part deux

Time to finish what I started, running through Java's operators. These are the less used and, in my opinion, less useful ones.

Since adding and subtracting one is so common, there are operators just for that.
  • ++ increments by one
  • -- decrements by one
So, for example, if n is 8, then ++n or n++ makes n equal to 9, and --n and n-- makes it 7.

Yeah, you can put the operator in front of the numeric variable or after it. The difference is that, when the operator is in front (pre-increment and pre-decrement), then the expression evaluates as the changed value. When the operator is in the back (post-increment and post-decrement), then the expression evaluates as the original value ... but the change happens after that evaluation. Uh, this is confusing. OK, say n is 8 again.
  • ++n == 9
  • n++ == 8
  • --n == 7
  • n-- == 8
This still might not be clear, so here's a bit more about it.
  • (++n) + 50 == 59, and n becomes 9
  • (n++) + 50 == 58, and n becomes 9 here too
See that? So the choice of pre- or post- depends on how you want to evaluate n in that context. To be honest, most of the time I end up using the post- versions.

I should mention that I used parentheses in that last item to avoid any worry about Java doing the wrong thing. If I write x+++y, what does that mean? Is it (x++)+y or x+(++y)? I think it's the first one, but I'll just use parentheses and not worry about it.

Anyway, you never need to use these, because you can obviously just use + 1, but they are handy.

Now I'm going to shift (har har) to talking about bitwise operators. These are for when you want to fiddle and twiddle the very bits inside numeric values.
  • << is for shifting bits left
  • >> is for signed-shifting bits right
  • >>> is for unsigned-shifting bits right
  • & is for bitwise AND
  • | is for bitwise OR
  • ~ is for bitwise NOT (inversion)
  • ^ is for bitwise XOR ... honestly, I forgot they even had this one until I was writing this
Way back in this blog I told you that all Java numeric types are signed. Despite that, Java goes the extra mile and gives you a way to shift bits right in an unsigned mode. When you use >>>, zeroes are always shifted in. When you use >>, the sign bit (leftmost bit) is repeated instead. If this sounds like crazy-talk, then study up on two's complement.

Bitwise operators aren't used very much. Back in the day clever C programmers would use << and >> to multiply and divide by powers of 2, because it translates to faster machine instructions. These days, though, that's too clever; let the compiler figure out how to make things fast. Use bitwise operators in those rare instances when you really need to work with bits.

I'm going to mention right now that the bitwise operators don't shortcut, but the logical ones I discussed last time do. I'll expand on that sometime in the future.

The last set of operators left to discuss are the assignment operators. These exist because this sort of thing is really common.
x = x + y;
Instead of typing all of that out, you can use the corresponding assignment operator.
x += y;
This is another case like the increment and decrement operators, where you don't need the operators but they can be handy in certain situations. Use them only if they don't make the code less readable. Anyway, the following operators have corresponding assignment operators.
  • +, -, *, /, and % become +=, -=, *=, /=, and %=
  • &, ^, |, <<, >>, and >>> become &=, ^=, |=, <<=, >>=, and >>>=

And now, here at the end, I must take back some of what I said before, because there is one more operator that I must mention, which is used all the time, and that operator is =.
x = 1
Wait, that's not ... how is that ... huh?

Yeah, you use the assignment operator, plain old =, to make zillions of Java statements, but the expression created by it also evaluates to something. Its value is whatever is on the right side of the assignment. Consider this bit of code.
int x = 7;
System.out.println (x = 9);
This actually compiles and prints 9.

Having explained this to you, I need to urge you not to do this a lot. It's weird looking and creepy. It looks like I meant to write x == 9 and print out the result of a comparison. The only context where I think this use of the evaluated value of an assignment makes sense is in a compound assignment like this.
int x, y, z;
x = y = z = 0;
Hey! You made it! Nice work. You should feel pretty empowered now, because you've covered all the Java operators, even the weird ones, so you can bust out whatever you need to perform your calculations.

Monday, January 17, 2011

Smooth operators, part 1

I think I'm turning completist in my postings here. I want to have something covering each aspect of Java, including the boring stuff. So, let's get Java's operators out of the way. Maybe I can tease up something interesting about them ...

I'll cover them in my particular order, but I want to remind you that if you're ever in doubt about which operator has precedence over another, use parentheses in your expressions. Chances are you'll be helping someone else make sense of your expressions later, and they will be thankful.

First, basic math.
  • + is for adding
  • - is for subtracting
  • * is for multiplying
  • / is for dividing
Stunning, right?

From what I've seen, Java will happily overflow (or underflow) using even these basic operators. So, adding Integer.MAX_VALUE + 1 will spin you around to Integer.MIN_VALUE without complaint. Whether this is a good thing or not is up for debate.

Java likes doing integer math using int values and floating-point math using double values. If you try to mix and match types, Java will promote one operand to the type of the other. This can help you out of some overflow situations, but you should still be careful. Knowing the type promotion rules is a good idea, but I'm not going to talk about them now.

Dividing integers, by the way, always gives you an integer result. So, 11 / 4 equals the integer 2. You get the remainder with:
  • % is for mod (modulo)
That means 11 % 4 equals the integer 3, which is the remainder of eleven divided by four. If you want the more exact floating-point answer, cast at least one of the operands to a float or double (eh, use double) first.
((double) 11) / ((double) 4)
See what I mean about parentheses? I can't remember whether casting binds more tightly to the numbers than the division operator. So I use parentheses and don't worry about it.

Let's continue with some unary operators. That means they only take one operand.
  • + is for, well, don't change the sign, which is terribly exciting and I don't think I've ever used this
  • - is for negating (sign flipping)
  • ! is for logical NOT
That unary plus, gosh, I forgot it was there. Check this out: +42 is the same as 42. I guess if you want to emphasize that a number's sign is to be left alone, then it's nice to have it.

The logical NOT operator only works on boolean values, while all the rest I've covered so far work on numeric primitive values (and, in Java 5, the wrapper types via autoboxing). Also, the binary + works on String values by concatenating them.

Speaking of operators for Boolean values:
  • && is for logical AND
  • || is for logical OR
  • there is no operator in Java for XOR
Again, these only work on Booleans, so you can't treat any other type as a Boolean like some other languages allow.

Let's get into relational operators, which render Boolean results.
  • == is for equality - that's really identity, not like Object.equals() lets you do
  • != is for inequality
  • < and > are for less than and greater than
  • <= and >= are for less than or equals and greater than or equals
  • instanceof is for type checking
Most of these are pretty obvious. The == and != operators can work on pretty much any type, while < and > and their variants are for numeric types.

The instanceof operator checks whether something is an "instance of" some class, meaning of that exact class or a subtype. For example, these expressions are all true.
"foo" instanceof String
"foo" instanceof Object
new Integer (42) instanceof Number
I've used the ternary operator a couple of times already in past postings. You can think of it as a mini version of an if-then-else statement. It has three expressions: an initial one, one for when the initial one is true, and one for when the initial one is false. The initial expression has got to evaluate to a Boolean value, and the other two have to match in their type but can otherwise be whatever type you want.

This is way easier to just explain by example.
int x = "foo" instanceof String ? 42 : 43;  // x becomes 42
score = x + (answer.equals ("Deuteronomy") ? 10 : 0);
The ternary operator is useful, but overusing it makes your code hard to read. If you have to do some wacky text formatting to make your expressions not look like gobbledy-gook, think about using a boring if-then-else instead.

I think that's enough operators for now. These are the most common ones you'll use by far. In my next post I'll cover the less used ones and, in my opinion, ones you may want to avoid.

Wednesday, January 12, 2011

Hashing things out

Now that I talked about equals(), I can discuss its partner in crime, the hashCode() method. This is another method on java.lang.Object that is therefore available on every Java object and, like equals(), you'll often want to override it with your own implementation.

In fact, the rule is: if you override equals(), you need to override hashCode() as well. I'll explain why.

First, though, you need to understand how hash tables work. If you don't, I suggest perusing the Wikipedia article for hash table, at least as a starting point.

The purpose of the hashCode() method is to provide the index of the bucket within a hash table where an object should go. At least in Java, it's not necessary for you to work in how big the hash table is when computing the code. For a hash table of size n, an object's hash index can be o.hashCode() modulo n.

One reason why the hashCode() method is part of Object is that it's just so darn useful. Hash tables show up everywhere in computing. Java's own data structures use hash tables internally. So it is convenient to have the method available all the time for any object.

A more important reason is that the concept of a hash code is so closely tied to the concept of equality. If two objects are equal, then they must reduce down to the same hash code. Said in Java terms, if a.equals (b) is true, then a.hashCode() == b.hashCode() must also be true. That's why whenever you override equals() for a class, you have to override hashCode() as well, so that they continue to agree.

All right, so why do two equal objects need to return the same hash code? Let's take the highly useful class java.util.HashSet as an example. This class implements a set of objects. A set cannot contain an object more than once. Sets cannot have duplicates.

This particular class implements its set storage by using an internal hash table. When you add something to a HashSet, that object is stored in the hash table, inside the bucket determined by its hashCode() value. That makes it fast and easy to find again, especially so that the HashSet can stop you from successfully adding the object a second time.

A HashSet, as well as almost all other instances of set implementations in Java, decides if it already contains some object if it finds an equal object within it. That is, equal as in equals(). So here are the steps for adding an object to a HashSet, laid out.
  1. Compute the hash code "c" of the object to be added.
  2. Find the bucket corresponding to "c" in the hash table.
  3. Check every object in the bucket. If any one is equal to the object according to equals(), don't add it. Otherwise, add it.
If two equal objects return different hash codes, then it will be possible to add them both to the same set (because step 2 can target the wrong bucket). That violates the "semantics" for Java sets, and will pretty much screw everything up for you.

It's important to mention that it is still OK for two unequal objects to return the same hash code. A collision like this is good to avoid, since it makes hash tables less effective, but it's not the end of the world. Those two unequal objects will just end up in the same bucket in hash tables.

The default implementation of hashCode() takes into account all the bits and pieces of your class, including all its fields. For classes where every field is "significant" with respect to equals(), this is correct. However, if you redefine equals() to only check some subset of your class's fields, then you can end up with two equal objects having different hash codes because the fields you left out are different. If that happens, you won't be able to use instances of your class in hash tables.

So, when you override equals(), you need to override hashCode() to compute its value based only on those significant fields. There are recipes for creating hashCode() methods, and I'll plug Effective Java by Joshua Bloch again as a great resource. Heck, some Java IDEs will even write them for you. Here is an example for the StringPair class we created in the last post, and since I'm hardcore I wrote it by hand.
@Override public int hashCode() {
    int c = 17;
    c = 37 * c + (s1 != null ? s1.hashCode() : 0);
    c = 37 * c + (s2 != null ? s2.hashCode() : 0);
    return c;
}
Since both string fields s1 and s2 are involved in equating StringPair objects, they both appear here. The important thing to notice is that the hash code for StringPair is calculated based on the hash codes of the String fields within it. As long as String.hashCode() follows the rules (which it does), then this implementation will render the same value for two objects with equal strings.

Monday, January 10, 2011

All things equal

In my last post I described the equals() method of the root Java class java.lang.Object. In this post I'll delve into some of the details of this important method. I'll start by restating and expanding on some of what I said last time.

What does it mean for two objects to be equal? For primitive types, the answer is easy, as it usually boils down to just math. Objects make this question more complex, though. They have types, and they are complex, that is, they can be composed of more than one piece of data.

Generally, two objects can only be equal if they are of the same type. So, a String object can only ever be equal to a String object. But what about derived classes? For example, if I have an Employee object for "Bob" as well as a Manager object for "Bob", and Manager derives from Employee, can the two objects possibly be equal?

The answer depends on how you can determine equality between Manager objects. If the procedure is the same as for any sort of Employee (that is, you'll just use the same procedure to compare Manager objects as you do Employee objects), then you'll be fine. But, if you need to consider extra information specific to Manager objects when comparing them, then you'll be in trouble. More on that in a bit.

When a class has multiple fields, then it's a good rule of thumb that two instances of that class can be considered equal if all of their corresponding fields are themselves "equal". If you imagine a StreetAddress class with fields for house number, street, city, and state (OK, it only does houses, sorry apartment dwellers), then two instances of that class should only be equal if they have the same values for house number, street, city and state.

An even better rule of thumb, though, is that two instances of a class can be considered equal if all of their corresponding "interesting" fields are themselves equal. It's up to you as the class designer to figure out what is interesting for the comparison. Our Employee class may be set up with an ID field that has a guaranteed unique value for each employee. You would certainly want to then define equals() for that class to only check the IDs of two instances being compared. Checking the rest would be superfluous, as long as the IDs are set properly.

So back to the type question. An implementation of Employee.equals() like this should serve well for subtypes like Manager as well, so you could just run with it. But imagine that you need to compare additional "interesting" fields for Manager instances to determine equality. How can you compare an Employee instance correctly then, since it lacks that data? You pretty much can't. You've broken one of the rules for implementing equals().

Ah, the rules. They make a lot of sense, really. Here they are, straight out of the Java API.
  • An object must always be equal to itself. (reflexive)
  • If object A equals object B, object B equals object A. (symmetric)
  • If A equals B and B equals C, then A equals C. (transitive)
  • If A equals B now, A equals B later unless you change one of those "interesting" fields. (consistent)
  • An object is never equal to null.

Not following these rules will lead you to a world of hurt, because you will be using an unusual concept for "equality". You can see that adding those extra "interesting" fields to the Manager class would make equals() asymmetric with respect to Employee objects.

The default implementation for equals() satisfies all of the rules, but it is usually way too strict. Consider this class, which holds a pair of strings.
package gjd;
public class StringPair {
    private String s1;
    private String s2;
    public StringPair (String s1, String s2) {
        this.s1 = s1;
        this.s2 = s2;
    }
    public String getString1() { return s1; }
    public String getString2() { return s2; }
    public static void main (String args[]) {
        StringPair p1 = new StringPair (args[0], args[1]);
        StringPair p2 = new StringPair (args[0], args[1]);
        System.out.println (p1);
        System.out.println (p2);
        System.out.println (p1.equals (p2));
    }
}
When I run this class with arguments "foo" and "bar", I get this.
gjd.StringPair@8813f2
gjd.StringPair@d58aae
false
Even though the two classes hold the same data, they aren't equal. You can see that the toString() method kind of stinks here too; it gives us the class name followed by a hexadecimal number, which functions like a memory address. The default equals() method says the two instances are not equal because they are not the exact same object, that is, they are not identical. By default, only identical objects are equal. You have to do extra work to allow two separate object instances to be equal.

Here is that work for this class.
@Override public boolean equals (Object other) {
    if (this == other) return true;
    if (!(other instanceof StringPair)) return false;
    StringPair o = (StringPair) other;
    if (!(s1 != null ? s1.equals (o.s1) : o.s1 == null)) return false;
    if (!(s2 != null ? s2.equals (o.s2) : o.s2 == null)) return false;
    return true;
 }
There is some new stuff here.

  • The @Override is an annotation which tells the Java compiler that I'm overriding the equals() method of the Object class. It's optional, but highly recommended.
  • The instanceof operator tells you whether an object is an instance of some class. The object can be of that exact class or of a subclass.
  • The exclamation point in the first if condition means "not". So, the condition evaluates to true if the object passed in is not a StringPair object.
  • The (StringPair) thing there is called a cast. It gives you a different type of reference to an object. Here, I am casting "other", which we said was an Object, to the StringPair class and storing the narrower reference in the variable "o". We know this is OK to do because the instanceof operator already checked the actual type of "other" for us.
  • The two lines before return true verify that the corresponding fields of the two objects are themselves equal. I'm going to have to leave it to you to tear those lines apart for now. Find out about the "ternary operator" in the meantime. Sorry about that. I just don't want to get too far off-topic right now.
The overall structure of this equals() method follows a recipe given in the book Effective Java by Joshua Bloch. You should add that book to your Amazon wish list. If we study the method for a little bit, we can confirm that it follows the rules for equals().
  • Is it reflexive? Yup, the first line guarantees that.
  • Is it symmetric? Sure, it doesn't matter which object the method is called on and which one is passed in.
  • Is it transitive? It is, because it depends on String equality being transitive, and it is.
  • Is it consistent? Yes. Provided that each object's s1 and s2 fields don't change, equals() will keep returning the same thing.
  • Does it prevent equality with null? It actually does, because if you pass in null, the instanceof check fails. The null value is like school on Thanksgiving: no class.
Now I run the class, and:
gjd.StringPair@8813f2
gjd.StringPair@d58aae
true
Good deal. Despite all I've written here, there are still other nuances of the equals() method that you'll need to learn. Do some research, and especially look into good Java books like Effective Java to find out more. Next time, I'll delve into the hashCode() method, the little brother of equals().

Thursday, January 6, 2011

The root of it all

When I divulged the truth about inheritance in Java, I revealed that all objects in Java ultimately descend from a single base class, java.lang.Object. It's worth checking out what this class is all about since, after all, what it provides runs through every object in Java.

Object doesn't offer up any fields for derived classes, but there are a handful of methods that are available. Some are great as they are; others you may want to override; and others are best shunned. First let's talk about the ones that are great as they stand.

The getClass() method returns a java.lang.Class instance that represents the class of an object. This may turn your brain inside out a little, so let me explain. There is a class shipped with Java whose name is Class. An instance of Class houses information about some specific Java class: maybe String, or Integer, or whatever. Even Object and, yes, Class itself, have their Class instances. These instances are useful for comparing the types of objects in some cases, and also for a lot of work with reflection, where you find out at runtime what's up with your classes. That's sort of an advanced topic, but at least right now you know it's there.

The methods notify(), notifyAll(), and wait() (in three forms) are all useful for doing multithreaded computation. Again, an advanced topic, but here's an overview. A thread can decide to pause for a while (or indefinitely) and wait for a signal to be delivered to potentially get going again. The thread calls a wait() method on some particular object of its choosing. Later, some other thread decided it wants to kick start one or all threads in such a pause state, so it uses either notify() or notifyAll() to wake one or all of them up. These methods are but one way to perform thread synchronization in Java.

Now let's talk about the Object methods you may want to override.

The equals() method is a big deal. It determines whether one object instance is equal to another, for some meaning of "equal". This is conceptually distinct from two objects being identical, that is, actually being the same bits in memory somewhere. You could have, for example, an Employee class with an ID field, and decide that every actual real-life employee has a unique ID. If that's the case, the equals() method for Employee can simply check the ID fields of two Employee objects to see if they match; if so, then you know they are "equal", regardless of whatever else they hold. That check is good enough.

The default implementation for equals() returns true only if two objects are identical. This is often overly restrictive, so you can override the method to be more intelligent about things.

The hashCode() method returns an integer value corresponding to an object. This is pretty much for supporting hash tables and similar data constructs. The rule is that two objects that are equal according to equals() must yield the same hash code. If this doesn't happen, then hash tables don't work right. The art of crafting equals() and hashCode() methods for your own classes is deserving of its own post, some other time.

The toString() method returns a string corresponding to an object. This is usually for debugging purposes, so the string should be human-readable. You can use it to render a more machine-readable specification, but in my opinion you should define a separate method for that, since you're getting into issues of parsing and such. The default implementation of toString() is not great, although it does tell you what class an object is. Definitely override this one.

Finally, let's talk about the black sheep methods of Object.

First, the finalize() method is sort of like a C++ destructor, except it turns out that you can't tell when or if it ever gets called. Prevailing opinion in the Java world is that you should never use this method for anything important. I don't recall ever using it for anything I've written. There are some cases where finalize() is useful, but think hard before you go there.

And now, lastly, the poor clone() method. The design of object cloning in Java is considered broken beyond hopes of repair, and you can tell why as I describe this method.

The clone() method creates a copy of an object. So, Employee.clone() creates a new Employee object that is often, but doesn't have to be (*sigh*), equal to the original object. The default implementation of the method does a "shallow copy" of the original, copying over field contents as is, which is sometimes OK but can cause trouble depending on what sort of fields your class has. So, if you want to support cloning of your own objects, you may need to override this thing. Oh, and your object has to implement the interface Cloneable, because the JVM won't let you clone an object that doesn't. And Object doesn't implement Cloneable.

"What were they thinking?!", you might ask. "Why did they put a method in a base class that always fails? Why is its default implementation wrong for many classes I might write? Why do I need to deal with this interface?" Folks smarter than me have studied this tragedy at length, and it boils down to good intentions gone wrong. Fortunately, there are alternatives for doing object copying in Java, which I can cover at a later date.

Hopefully this posting has put you on a more solid basis with understanding the Java object hierarchy, by highlighting the root class over all of Java. Remember that its facilities are always available for you for fundamental, common tasks.

Tuesday, January 4, 2011

Stringing along

In my last couple of posts I gave some love to Java's primitive types. Now I want to talk about a single class, java.lang.String. There's a lot more to this class than just "Hello world".

First off, you already know how to define a string:
String s = "Hello world";
C and C++ veterans may be wondering about null termination, so I'll tell you know that concept doesn't carry over. The string consists only of its constituent characters, plus the rest of the typical Object stuff. If you insert a null character ('\u0000') into a string, it just sits there in the string and doesn't mean anything particularly special.

There are lots of handy methods on the String class. Here is but a sample.
String s = "Hello world";
System.out.println (s.charAt (0));  // H
System.out.println (s.contains ("lo"));  // true
System.out.println (s.endsWith ("rld"));  // true
System.out.println (s.indexOf ('l'));  // 2
System.out.println (s.lastIndexOf ('l'));  // 9
System.out.println (s.length());  // 11
System.out.println (s.replace ("Hello", "Hi"));  // Hi world
System.out.println (s.startsWith ("Hel"));  // true
System.out.println (s.substring (3, 5));  // lo
System.out.println (s.toLowerCase());  // hello world
System.out.println (s.toUpperCase());  // HELLO WORLD
As you peruse the API for String, you may notice that there aren't any methods for changing the characters in a string, for modifying a string. This is intentional, because like the wrapper classes String is immutable. Once its characters are set they cannot be changed. For example, the replace() method above doesn't change s, it returns a new string with the replacement performed.

At some point I'll talk in depth about immutability, and then I can cover its benefits. For now, check out my post on the wrapper classes for some discussion on that.

A common mistake in Java involves string comparison. Check this out:
String s = "Hello world";
System.out.println (s == "Hello world");  // might not be true
System.out.println (s.equals ("Hello world"));  // true
It's usually wrong to compare strings for equality using the == comparison operator. How come?

In Java, objects are treated as references to data, like pointers in C or C++. So in the code above, s contains not the string "Hello world" itself, but sort of a reference to a memory location where the string lives. This is in contrast to primitive types, where the values are stored directly.

Depending on how a string is created, its variable may be pointing to a different copy of its characters than some other string variable referencing the same characters. They might or might not share the same copy. It's OK (great, actually) if they do, but it's not required. For that reason, using == can fail, because that compares the references to the strings, not the strings themselves.

In the code above, == might work, because I used a string literal on both ends, and the JVM ought to use one "canonical" copy for the both of them. But, if I had read in the string for s from some file, say, then it probably wouldn't work.

So, get in the habit of using equals() to compare strings. It's the convention anyway, and others looking at your code will thank you.

Combining strings is easy.
String s = "Hello" + " " + world;
Yeah, Java lets you use the + operator for string concatenation. Java doesn't let you do operator overloading / overriding like C++, but they did hook us up with this, at least.

Now, while this is great, it can cause a lot of trouble. This loop, for example, will have pretty horrendous performance.
String s = "";
for (int i = 0; i < someBigNumber; i++) {
  s = s + Integer.toString (i);
  s = s + " ";
}
The problem lies in how Java does string concatenation using the + operator. It kind of goes like this:
  1. Make a new "buffer" for the concatenated string.
  2. Copy the first string into the buffer.
  3. Copy the second string into the buffer.
  4. Convert the buffer contents into a string.
While this is perfectly reasonable for one-off concatenations, you can see that in a loop it requires tons of buffer creations and tons of copies, and the copies get longer for each iteration. Normally you don't try to optimize your code until late in the game, but this is an noteworthy exception. The way to fix this is to use the java.lang.StringBuilder class, or java.lang.StringBuffer if you are using Java 1.4 or older.
StringBuilder buff = new StringBuilder();
for (int i = 0; i < someBigNumber; i++) {
  buff.append (Integer.toString (i));
  buff.append (" ");
}
String s = buff.toString();
Here, only one buffer is ever created, and there is no copying. It's way way faster. Any good static analysis tool (Findbugs is my fave) should detect these cases for you, but it's better never to write them.

So this is a pretty decent introduction to Java strings. There are other issues to consider, but knowing how to create them, compare them for equality, and properly concatenate them is more than enough to get you moving.