Tuesday, July 9, 2013

Wizpert Chronicles: Tricky Strings

Introduction: I recently was recruited by Wizpert as a knowledgeable resource in the realm of Java programming (due to this blog, for the most part). I figure this is a nice way to help out other folks with Java programming problems. It also gives me good ideas for this blog! Here's one to get started.

The problem this time had to do with padding a string with zeroes. The code was checking the length of the string to see if padding was necessary:

if (s.length() == 5) { s = "0" + s; }

For some reason the zero was not being added. I suggested printing out the string with pipe characters around it, to check for whitespace. Sure enough:

|12345 |

An errant space was making the string length 6, causing the if test to fail. Looking back, I could have also suggested just printing out the value of s.length(), but sometimes it's nice to really see the issue directly.

Java strings provide a method trim() which, as it sounds, trims whitespace from the front and back of a string. However, it's not enough to just say:

s.trim()

Why not? Because trim() doesn't modify the original string. It returns a new string, which needs to then be assigned to a variable if you want to do something with it later.

s = s.trim();

That's better.

Strings are immutable objects in Java, so methods on them cannot change them. Instead, they return new objects. This sounds like it is a waste of memory and performance, but the benefits of immutability are worth it, and the JVM is optimized for handling many short-lived objects.

Saturday, October 29, 2011

Making Exceptions

The idea of exceptions goes back before Java, but Java embraced the idea more than any language before it.

The fundamental problem exceptions solve is how to send error information back to the caller of a function / method / whatever. Before exceptions were prevalent, there were a couple of preferred solutions (among others). First, the function could return an error value instead of a "normal" answer; for example, a function that normally returns a non-negative number could return -1 if there was some problem. Second, the function could set some reserved global variables with error codes and messages and so on. Both solutions have their own problems. Among them is that the human doing the programming has to remember to check for error conditions. And, because humans are often lazy, rushed, or sloppy, it's easy for those checks to not get done.

Java provides two kinds of exceptions, and one of them, called checked exceptions, cannot be ignored. If a method "throws" a checked exception back to its caller, the calling code must either deal with the exception or throw it on up to its caller. Either option requires the programmer to explicitly do something: either write some code to deal with the exception, or have the method declare that it throws the exception. The programmer is forced to confront the possibility of an error. Overall, this leads to more robust code.

Let's say we have a method for opening a file that can throw java.io.IOException, which is a checked exception. The method definition could look like this:

public java.io.File open (String name) throws java.io.IOException {
  // ...
}

Any code that calls the method must deal with the possibility of an IOException popping out. It can just pass the exception on, throwing it itself:

public void doSomethingWithAFile() throws IOException {
  // ...
  // ... create a FileOpener object with our open method in it ...
  java.io.File myFile = fileOpener.open (theFileName);
  // ...
}

The other option is to handle it itself. This involves a "try-catch" block.

public void doSomethingWithAFile() {
  // ...
  try {
    java.io.File myFile = fileOpener.open (theFileName);
    // ...
  } catch (java.io.IOException exc) {
    // ... do something here, like logging or recovering ...
  }
  // ...
}

The code that could throw exceptions goes into the code block after "try". After the try block comes a "catch" block for the exception that should be caught there. It's fine for a catch block to throw some other exception; in fact, that's a well-known tactic, and a good idea a lot of the time.

I said Java provides two kinds of exceptions. One is checked, and the other is unchecked exceptions. The difference is that you don't need to explicitly deal with an unchecked exception; by default, if you don't catch it, it automatically propagates up to your code's caller, and so on up and possibly out of the JVM (which would then stop execution). Methods that throw unchecked exceptions don't even have to declare that they throw them.

Let's add a line to our "open" method to make sure the name passed in isn't null. If it is, we'll throw the unchecked IllegalArgumentException. Since that exception is unchecked, it doesn't have to be mentioned in the method definition.

public java.io.File open (String name) throws java.io.IOException {
  if (name == null) {
    throw new IllegalArgumentException ("null name");
  }
  // ...
}

Calling code can still catch IllegalArgumentException, but it doesn't have to in order to compile and run.

Unchecked exceptions obviously make code less safe, that is, more prone to stopping completely when something goes wrong. So why have them? Why aren't all Java exceptions checked?

Unchecked exceptions are supposed to be used for situations that are not "recoverable". A programmer who opts for throwing a checked exception is saying that calling code may be able to deal with the error that led to the exception, or should at least try to. On the other hand, throwing an unchecked exception implies that there isn't much calling code can do, usually, so it's expected to just let execution stop. Calling code can, of course, still catch unchecked exceptions, but it's not required.

Having explained all that, the majority of Java developers (myself not included) seem to hate checked exceptions. Historically they were overused, and forced developers to deal with a lot of conditions that are either really unrecoverable, or that aren't important to deal with at all. Prime example:

package java.io;

public class FileInputStream extends InputStream {
  // ...
  public void close() throws IOException {
  // ...
  }
  // ...
}

A FileInputStream is used to read binary data from a file. When you're done reading from the stream, you need to close it so that the system can release resources associated with the file. The close() method throws a checked exception, and you almost always don't care if you can't close a stream once you're through with it. Dealing with this exception leads to all sorts of unnecessary wacked-out try-catch gymnastics. (Java 7 promises to help, but still.)

Unfortunately, some in the Java community overreacted and abandoned checked exceptions completely. For example, the Spring and Hibernate libraries mostly throw unchecked exceptions, even for conditions that warrant checked exceptions. The argument is that it makes programmers' jobs easier. No doubt you can write less code because you don't need try-catch blocks or "throws" clauses on your own methods, but the temptation to ignore the errors returns, and the code overall is less robust.

What's called for is a more balanced approach. Use checked and unchecked exceptions judiciously, balancing convenience with robustness.

Saturday, March 26, 2011

Immutability and builders

Once you get to a certain point in working with Java, you always keep in mind how your code will work in a multi-threaded environment. It isn't a strange thing to think about: servlets run with threads, as do EJBs and Swing applications. Even if you are not dealing with threads now, you could be in the future, or someone else might try to apply your code to threads.

One of the easiest ways to handle the question "Will this work with threads?" is to make your classes immutable. The state of an immutable object cannot be changed after its construction. If an object is immutable, then there is no chance that threads will see different state in the object when they shouldn't. Immutability also leads to a simpler API for your class and overall more predictable behavior.

Here's a typical immutable class.
public class USAddress {
  private final String streetAddress;
  private final String city;
  private final String state;

  public USAddress (String a, String c, String s) {
    streetAddress = a;
    city = c;
    state = s;
  }
  public String getStreetAddress() { return streetAddress; }
  public String getCity() { return city; }
  public String getState() { return state; }
The fields in this class are all final, which means two things: first, that they can only be assigned once; second, that they must be assigned after construction. These properties of final help make the class immutable.

I didn't include implementations of equals() or hashCode(), but I'll just say that they would depend on the fields in the class. Immutability implies that the hash code of a USAddress object never changes, which is great because it will never get lost in a hash table. For some objects, if there are a lot of computations involved for generating a hash code, you could just calculate it once and cache it internally for speed.

One downside to immutable classes is that they must have all their data passed to them on construction. Let's expand the class above and see what happens. I'm going to leave out the getter methods.
public class USAddress {
  private final String streetAddressLine1;
  private final String streetAddressLine2;
  private final String city;
  private final String state;
  private final String zipCode;
  private final boolean isPostOfficeBox;
  private final boolean isAptOrCondo;

  public USAddress (String a1, String a2, String c, String s, String z,
      boolean p, boolean ac) {
    streetAddressLine1 = a1;
    streetAddressLine2 = a2;
    city = c;
    state = s;
    zipCode = z;
    isPostOfficeBox = p;
    isAptOrCondo = ac;
  }
  // ... getters ...
}
The problem is the constructor. It has five strings in a row and then two booleans in a row. It could be tricky to remember the right order of the parameters, which one is which.
USAddress a = new USAddress ("1234 Elm Street", "Apt. 56",
    "Springfield", "MA", "01103", false, true);
Quick, what do the two booleans mean again?

So you can imagine that some classes can have even more fields, with a variety of types, and working with their constructors gets ridiculous. Fortunately, there is a design pattern that can help you out: the Builder pattern.

A builder is a class that builds another class. You use it instead of directly calling a constructor. Here is a builder example.
public class USAddressBuilder {
  final String streetAddressLine1;
  String streetAddressLine2 = null;
  final String city;
  final String state;
  String zipCode = null;
  boolean isPostOfficeBox = false;
  boolean isAptOrCondo = false;

  public USAddressBuilder (String a, String c, String s) {
    if (a == null) {
      throw new IllegalArgumentException ("null street address");
    }
    if (c == null) {
      throw new IllegalArgumentException ("null city");
    }
    if (s == null) {
      throw new IllegalArgumentException ("null state");
    }
    this.streetAddressLine1 = a;
    this.city = c;
    this.state = s;
  }
  public USAddressBuilder streetAddressLine2 (String a) {
    this.streetAddressLine2 = a; return this;
  }
  public USAddressBuilder zipCode (String z) {
    this.zipCode = z; return this;
  }
  public USAddressBuilder isPostOfficeBox (boolean p) {
    this.isPostOfficeBox = p; return this;
  }
  public USAddressBuilder isAptOrCondo (boolean ac) {
    this.isAptOrCondo = ac; return this;
  }

  public USAddress build() {
    return new USAddress (this);
  }
}
Let's tear this one down.
  • This builder has the same fields as the class it builds. Some of the fields—those that are required—are final, but the optional ones aren't. The optional ones even get default values.
  • The builder's constructor only takes in the fields that are required.
  • To set the optional fields, you call specific builder methods for them. These methods return the builder again, so you can chain calls together (see below).
  • The build() method constructs a USAddress object from the builder data. The constructor (not shown) simply copies the builder fields into the object.
Here's how the builder is used.
USAddress a =
    new USAddressBuilder ("1234 Elm St.", "Springfield", "MA")
    .streetAddressLine2 ("Apt. 56").zipCode ("01103")
    .isAptOrCondo (true).build();
This code is longer, but it's much easier to read. Other nice things you get:
  • You can leave out fields that aren't important (like isPostOfficeBox).
  • The logic for constructing the class is mostly moved over to the builder. This is nice for classes that have lots of other stuff in them.
  • The builder has the option of reusing objects that it already constructed. If you ask for an object with the same data, and those objects are immutable, you could just as well use a copy made earlier. This can save on memory usage. (For more, check out another design pattern, Flyweight.)
  • The builder has the option of sending back a subclass instance. Imagine an AddressBuilder that returns Address objects; if you use a builder and pass in US-style address information, the builder can send you back a specific subclass of Address that specializes in US address data.
There are some downsides.
  • You have to write a lot more code to implement this builder. It's a sacrifice you have to make for easier use later on.
  • It's another class. You can mitigate this downside a little by making the builder an inner class of what it builds (which is what I usually do).
  • Some libraries and frameworks can only use constructors for your classes. I see this more as a problem on their end and not a fault of this pattern, but it's a practical consideration. You might have to make allowances to work within the constraints imposed on you.

Monday, March 14, 2011

Using Guice for dependency injection

So last time I described what dependency injection is. In this post I'll run quickly through how you can use a dependency injection framework. I'm going to pick Guice since I'm familiar with it, and because it's really straightforward.

You instruct Guice on how to do injection using a "module".
public class FlamingoModule extends AbstractModule {
  @Override protected void configure() {
    bind (RouletteBall.class);
    bind (RouletteTable.class);
    bind (Integer.TYPE).annotatedWith (SeatsPerTable.class)
      .toInstance (8);
    bind (SpecialtyDrink.class).to (Margarita.class);
  }
  @Provides RouletteWheel provideRouletteWheel (RouletteBall ball) {
    return new RouletteWheel (ball, true);
  }
}
This module's configure() method sets up some bindings, which is how Guice gets from what you ask it for to what it gives you. The first two bindings just make Guice aware of the RouletteBall and RouletteTable classes. The last one tells Guice that whenever someone asks for a SpecialtyDrink from Guice, it should deliver an instance of the Margarita class.

The third binding tells Guice that whenever it's asked for an integer annotated with @SeatsPerTable, it should send back the value 8. Annotations are a way that you can have Guice inject different values for the same type. Generally you need to code up the annotations yourself.

The provideRouletteWheel() method illustrates a different way Guice can get you objects. When Guice is asked for a RouletteWheel instance, it will run the "provider" method to create one. Guice handles injecting a RouletteBall instance into the method's ball parameter.

In order to finish wiring up the code, we'll need to use the @Inject annotation. This tells Guice to inject a dependency. So, the RouletteTable constructor needs a little work.
@Inject public RouletteTable (RouletteWheel w,
                              @SeatsPerTable int n) {
  numberOfSeats = n;
  wheel = w;
}
Now, when Guice needs a RouletteTable, it knows about it (from the module) and knows to inject values into its constructor. The third binding in the module's configure() method lets it inject n, and the provider method takes care of w. Since Guice also knows about RouletteBall, it can inject an instance of that class into the provider method.

To finally tie it all together, you need a starting point, something that lets you talk to Guice. That's called an injector.
Injector injector = Guice.createInjector (new FlamingoModule());
RouletteTable t = injector.getInstance (RouletteTable.class);
You can see that I could define a new module for a different casino, say, one that uses European roulette wheels and seats ten per table, and use that module with Guice to get a differently constructed RouletteTable object. The knowledge of how to create objects is wrapped up nicely in the modules and doesn't interfere with the use of the objects.

So, after all this, you may wonder why this is a good idea. What does this buy you, besides the kind of abstract architectural benefits?

One thing you get is not having to call a bevy of constructors just to make a high-level object. Instead of new this and new that getting passed to new something else, the "rules" for creating the objects are laid out in a more declarative fashion.

Another thing is get is tighter control of object creation. For example, Guice lets you inject dependencies as singletons, so you only ever get one instance across all injections. As another example, a provider method gives you free rein to control exactly how objects are built.

Perhaps the most powerful thing you get is simple swapping of object creation systems. Suppose you want to perform some testing of the code that uses RouletteTable, and you need to be able to peek into and tweak the RouletteTable and RouletteWheel instances. No problem:
Injector injector = Guice.createInjector (new TestingModule());
Now you can create a module that generates objects designed for testing purposes. The code that uses those objects doesn't need to change at all.

That's quite enough about Guice. For more, check out its user's guide. Hopefully this quick tour of Guice has shown you how neat dependency injection is and what it can do for you.

Monday, March 7, 2011

What the heck is dependency injection?

In my travels through the Java world over the years, there have been some concepts which I've had trouble finding a simple, succinct definition for. One of them is "dependency injection" (DI), which is one of the hot Java concepts of the last few years. Let me try explaining what it is and why it's useful.

In the normal way of working with Java, you build your objects with constructors. A lot of the times, they need to build other objects they need.
class RouletteBall {}

class RouletteWheel {
  private RouletteBall ball;
  public RouletteWheel() {
    ball = new RouletteBall();
  }
}

class RouletteTable {
  private RouletteWheel wheel;
  private int numberOfSeats;
  public RouletteTable (int n) {
    numberOfSeats = n;
    wheel = new RouletteWheel();
  }
}
Very straightforward. When you ask for a new RouletteTable(), that constructor creates the necessary RouletteWheel, which in turn creates the necessary RouletteBall. The higher-level classes create their own dependencies.

There's another way to do this.

class RouletteBall {}

class RouletteWheel {
  private RouletteBall ball;
  public RouletteWheel (RouletteBall b) {
    ball = b;
  }
}

class RouletteTable {
  private RouletteWheel wheel;
  private int numberOfSeats;
  public RouletteTable (RouletteWheel w, int n) {
    numberOfSeats = n;
    wheel = w;
  }
}

// then you do this
RouletteWheel wheel = new RouletteWheel (new RouletteBall());
RouletteTable table = new RouletteTable (wheel, 8);
Oh snap, guess what, we just added dependency injection. But keep reading though, there's more to this.

The constructors here have been changed to take in the dependencies from outside, instead of generating them internally. The dependencies are supplied, or injected, into the class instances. Hence, dependency injection.

The term "inversion of control" (or IOC) is often applied to this sort of thing. The "control" refers to control over object creation, and the location of that control has been "inverted" from inside the classes to outside of them.

This doesn't seem particularly earth-shattering, and really it isn't at this point. Like a lot of design patterns, it's just a good idea, one that comes from the experience of many smart people working with object-oriented languages. It turns out that employing dependency injection gives you lots of flexibility.

For example, say we augmented our RouletteWheel class to be either European style (single-zero) or American style (double-zero).
class RouletteWheel {
  private RouletteBall ball;
  private boolean doubleZero;
  public RouletteWheel (RouletteBall b, boolean dz) {
    ball = b;
    doubleZero = dz;
  }
}
The first form of the RouletteTable class has a problem now, because its constructor needs to specify a style for the table's wheel. You'd have to add another parameter to the constructor, or maybe a second constructor. There are several ways to cope, but it involves some changes.

The second form of RouletteTable has no trouble with this change, because it just takes in whatever RouletteWheel instance it's handed. No code changes! This is also great because it encapsulates the details of how a RouletteWheel is created; the RouletteWheel class doesn't need to know or care about that.

I'm going to stop here for now. There's more to talk about, but I'd rather let the basic concept of dependency injection sink in. Next time I'll discuss how some frameworks support dependency injection and can help you out.

Thursday, February 17, 2011

What's a bean?

Java evolved from simply a language into an entire ecosystem of APIs and libraries and frameworks, more than any mere mortal can comprehend. You just cannot be an expert in all of them, as more are created every day, it seems, and the ones that were all the rage five years ago are now ground into dust beneath the wheels of what's newer and slicker. Some concepts still pervade a good number of them though, and one of them is the Java "bean".

There are many specific kinds of "beans", such as Enterprise Java Beans and MBeans and Persistent Entity Beans, but there is also just this general concept of a bean. Its origin is way back before the proliferation of APIs and frameworks, and the idea was that if you made a Java class just so, then tools, or other things that got a hold of your class, could find out things about it. A very simple set of conventions was determined, and if your class follows those conventions, then it's a bean.

Your class can be a bean and something else too. It's not a restrictive concept, like most Java concepts became for a while there. It really is only a set of conventions, a small one at that, which you can often adopt, and if you do, it can help you out.

Here are the conventions.

First, your class needs to have a public "no-arg" constructor; that is, it must be possible for code anywhere to make an instance of it by just saying new MyClass(). Fortunately, if you don't write a constructor for your class, Java makes a no-arg constructor for you. If you do write some other constructors, then you can just add this.
public MyClass() {}
Second, and this is kind of optional, your class needs to have properties. The idea behind a property is that you have a public getter and (if you like) a public setter method named similarly. Here:
public String getLastName() { return ln; }
public void setLastName (String n) { ln = n; }
The methods above define a read-write property called "lastName". Note that the property name starts with a lowercase letter. The getter takes no arguments and returns the property value, while the setter takes in a new value and returns nothing. If you want the property to be read-only, omit the setter.

The type of the property can be anything you like. One wrinkle: if it's boolean, use a different naming convention for the getter.
public boolean isAdult() { return (age >= 18); }
Inside your class, the property doesn't have to be represented by a field in a one-to-one relationship. Do whatever you like in there, but only expose publicly a named property of some type.

The third convention is ... well, I'm going to be a little ornery here and say that there aren't any more conventions you need to follow. The Wikipedia entry for JavaBean says your class should be serializable, which means that instances of it need to be able to be saved off and restored, but I don't think that's strictly required. (The entry itself says it "should" be serializable, so there.) And the official JavaBean spec has bits about beans exchanging events, but again I see that as optional if you want it.

So, in my ornery opinion, any Java class with a public no-arg constructor, and maybe with some properties, is a bean. There.

I claim this because by following these two simple conventions your classes can be wired into all sorts of neat things. The Apache Commons BeanUtils library is one of them, and while that library maybe isn't all that jazzy when used directly, it's the foundation for some really powerful Java technologies—just read the overview to see some examples.

Another powerful framework drawing from the bean concept is Spring, whose most fundamental offerings are all about creating and configuring objects for you as long as they are beans (well, often even if they aren't, but it really really likes beans).

I said above that "for a while there" Java concepts got restrictive. For example, the Java Servlet specification (which I still use all the time) says that, in order for a Java class to respond to HTTP requests, it has to extend a particular class HttpServlet and implement specific methods and use certain other classes and so on and so on. On its own, this is fine. But for a working Java developer, having to know this specialized set of classes for servlets, plus another for Enterprise Java Beans, and another for some other framework, well, it's a real pain. The community called out for a return to simplicity.

And so, frameworks like Spring and new language features like annotations arrived, and mercifully, the primary goals for these endeavors included "making your life easier so you can get your work done". One big way of making that happen was to leave the restrictive concepts behind and instead let you work with simple ones, like the humble bean. Then, after some hints and nudges, the heavy lifting would be done for you.

(If this sounds like you're encouraged to be lazy, then stop using Java and go work in assembly language. :) )

So, next time you're slapping some new classes together, think about making some of them beans along the way. It might not help you right away, but down the line that decision might pay off. Even the mere fact that the bean concept is a foundation for all these regions of the Java ecosystem should indicate that it's not a bad idea to adopt it for yourself.

Friday, February 4, 2011

Taking shortcuts

Now that I finished going through all of the Java operators, I want to talk about the special way that two of them work. Those two are the logical AND and OR operators, also known as && and ||. These guys can be lazy and skip part of their evaluation; sometimes this is good for you, and sometimes not.

Let's take logical AND. If both its operands in an expression are true, then the whole thing evaluates to true. Otherwise, the expression evaluates to false. That's just AND.

Now, let's focus on that first operand. If it evaluates to true, what does that tell you about the expression's value? Well, not much, you still need to check on the second operand. But suppose the first operand evaluates to false? If that happens, you know that the whole expression is false already. The value of the second operand doesn't matter.

Java takes this lesson to heart. If the first operand for logical AND evaluates to false, then it doesn't evaluate the second operand at all. This is called a shortcut or short circuit Boolean operation. It is good for efficiency, especially for something like this:
if (debugging && timeConsumingMethod()) {
  doSomething();
}
Because logical AND shortcuts, that really expensive timeConsumingMethod won't be called unless you are debugging. Might as well not pay the price if you can help it.

Logical OR works the same way, except opposite sorta. If the first operand in its expression is true, then it shortcuts, because the entire expression must evaluate to true at that point no matter what. For example:
if (answer == null || !(answer.equals ("apple"))) {
  wrongAnswer();
}
Without short circuiting, the equals() call would throw a nasty NullPointerException when answer is null. Instead, though, because OR shortcuts, it's safe. This is a common trick for dealing with nulls in comparisons.

While shortcut operations are useful, you do have to be careful with them. Avoid having the second operand do something as a side effect that you really want to happen all the time.
int i = 0;
while (i < a.length) {
  if (a[i] == null || a[i++].equals ("")) {
    System.out.println ("I found a blank!");
  }
}
This all-too-clever loop tries to run through an array of strings, looking for either nulls or empty strings. The loop index gets incremented with that i++ bit. As long as the array contains no nulls, this loop will work, but as soon as the first null arrives, because logical OR shortcuts, the loop will get stuck, printing out "I found a blank!" forever and ever, or until you hit Control-C.

(I don't think I've covered the while loop yet. It's a basic loop that keeps going "while" the Boolean expression next to while evaluates to true. If the expression is false to begin with, the loop never executes even once. So, you have to make sure you either make that condition false at some point, or break out of the loop using either break or return.)

The example above is a bit contrived, but traps like it can and do happen more subtly in real code.

Two more things. First, the bitwise AND and OR operators, known as & and |, do not short circuit. They are more akin to mathematical operations, so this makes sense.

Second, don't use the shortcut feature to implement flow control. This is something that is idiomatic in scripting languages, but it ain't the Java way. Here's some examples. First, the typical way you open a file in Perl.
open HANDLE, "<myfile.txt" or die "Cannot open myfile.txt";
The die command, which causes the script to exit, will not execute as long as the file is opened successfully.

Another example, from bash scripting:
[[ -n DEBUG ]] && echo File is opened.
If the DEBUG variable isn't set, the AND short circuits and the debug message isn't printed out.

While this is pretty nifty and all, doing the same thing in Java will likely lead to confusion, although it may help your geek cred. Eh, it's not worth it. Really, use if statements instead.