Wednesday, September 09, 2009

Formatters and Regular expressions for parsing, as well as validation

Regular expressions are regularly used as a way to validate content as it comes in to your processing code. They are extremely powerful for quickly allowing only the content you're expecting through into the rest of your system. There are many examples - like dates found here and here:
(0[1-9]|[1-9]|[12][0-9]|3[01])-(0[1-9]|1[012]|[1-9])-(19|20)\\d{2}

However - RegEx can be used for parsing as well as validating - a use often overlooked within Java systems in favour of manually parsing out data structures from strings. What's even better is that a single RegEx can both validate and parse in one single step, through the use of groups, back-references, or Sub-matches depending on your terminology - they are all the same thing. Java's java.util.regex package uses the "Group" name. Essentially, it all involves the use of brackets in your expressions. Note that they were used above, and would have accidentally worked, were it not for the fact that the \\d{2} is not inside any brackets, making the year un-parseable.

Taking a simpler example - HTML colour strings of the form #FFCC00. Here, we're allowing them to be optionally pre-pended by the hash character, and will allow both upper and lower case for the characters.
String COLOR_REGEX = "^#{0,1}[0-9A-Fa-f]{2}[0-9A-Fa-f]{2}[0-9A-Fa-f]{2}$";

This expression states that at the start of the string being processed, we expect a hash character with either 0 or 1 occurences (thereby making it optional). Following this, we expect any one of the numbers 0-9, the letters A-F and a-f twice (with the {2}). The last pattern is repeated three times to account for each of the colours being expressed. Finally, we add the $ to state we don't want any more content after the last hex byte.

This pattern is sufficient to validate a String is an HTML color:
package regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexValidator {

public static void main(String[] args) {
String regex = "^#{0,1}[0-9A-Fa-f]{2}[0-9A-Fa-f]{2}[0-9A-Fa-f]{2}$";
Pattern validator = Pattern.compile(regex);
Matcher m;
m = validator.matcher("#FFcc8A");
System.out.println("#FFcc8A - " + (m.matches() ? "Validated" : "NOT A COLOUR!"));

m = validator.matcher("#steve");
System.out.println("#steve - " + (m.matches() ? "Validated" : "NOT A COLOUR!"));
}
}

But we can do better. By adding in brackets around each of the Hex byte expresssions, the Matcher will now not only match, but allow us to pull the group content as well:
package regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexValidator {

public static void main(String[] args) {
String regex = "^#{0,1}([0-9A-Fa-f]{2})([0-9A-Fa-f]{2})([0-9A-Fa-f]{2})$";
Pattern parser = Pattern.compile(regex);
Matcher m;
m = parser.matcher("#FFcc8A");
System.out.println("#FFcc8A - " + (m.matches() ? "Validated" : "NOT A COLOUR!"));
if (m.matches()) {
System.out.println("Components - R: " + m.group(1) + " G: " + m.group(2) + " B: " + m.group(3));
}

m = parser.matcher("#steve");
System.out.println("#steve - " + (m.matches() ? "Validated" : "NOT A COLOUR!"));
}
}

In this way, we can save quite a bit of messing around with String#indexOf(), String#substring(), etc.

We can also use something similar for outputting data without having to build up output messages using StringBuilders all the time - java.util.Formatter. A Formatter instance allows us to define how some content should be output, and apply the actual content separately. Taking the example of HTML colours again, having parsed out the hex components as strings, we will likely have converted them to integers for processing. If we wanted to output the values again (I'm holding them in a List here, but a simple array would be more convenient), we could use something like:
package regex;

import java.util.Arrays;
import java.util.Formatter;
import java.util.List;

public class OutputFormatting {

public static void main(String args[]) {
List<Integer> intComps = Arrays.asList(new Integer[] {21, 243, 0});
String output = new Formatter().format("#%02X%02X%02X", intComps.toArray()).toString();
System.out.println(output);
}
}





Monday, June 15, 2009

ExtJS, DWR, Spring + Hibernate

More on the AJAX front - since all the previous posts, new frameworks have been popping up which can make life somewhat easier for producing professional intranet type applications more easily. Linking them all together was surprisingly easy, and allow for good separation of concerns ...

ExtJS is a very powerful JS framework, which comes with the most complete set of UI widgets out-of-the-box of any framework I've seen. There are also links with GWT, in the form of GXT (produced by the same company as the main framework), and Ext-GWT which was the original link, and continues as a result of the debate over the ExtJS licencing change moving from LGPL to GPL.

For our application, we've used the plain JS library, and written it as a single-page AJAX application (i.e. the browser only ever loads /index.html), along with our own JS classes to wrap up the business functionality, primarily panel-by-panel. More information on general ExtJS design principles can be found here. Specifically I would recommend a read of the Big Application, and Extending classes before you dive straight in, but there are many other good tutorials on the site.

DWR allows Java beans to be exposed straight to an HTML page as a Javascript object. At first glance, this sounds dangerous, but it's actually quite safe is a few precautions are taken. The developers of DWR have clearly thought about the security implications of such a framework, and you can easily protect yourself:
  • Create an interface on your service bean, which only has the methods designed to be called from the UI. This is good practice anyway, but also prevents any accidental exposure of public methods on your service you didn't intend to expose.
  • DWR forces you to name/mark all the beans you intend to serialize over to the JS object, including the ability to work with generics, or explicitly specify the collection types if not available.
  • DWR works with Hibernate loaded objects, and specifically proxies, so that while you have to pre-load any lazy-loaded associations, it will only expose those associations that have actually been loaded. This saves both network bandwidth, and also information leakage if you don't actually want parts of the domain model exposed to the client.
  • Used in conjunction with Spring, use Spring Security (formerly Acegi) to wrap authorization around the various service methods to ensure that only users authorized to call them can.
Note that DWR also takes care to try and skip a lot of the common AJAX security flaws, such as CSRF but one thing to bear in mind is that XSS is not protected by default (since you might actually want your application to send HTML + JS back to the server, if it was say a CMS ...)

Hibernate is still used as a persistence layer. In particular, with its support for externalised named queries, we can have our DAOs (or a base class) expose a query(String, Map<String, Object>) method, saving us bothering to put a new method for each and every query we want to allow, while only allowing the queries we want to. If you need to, you can also use Hibernate filtering to further restrict the queries at runtime based on the authenticated User for instance.

The Service can also provide a similar query(String, Map<String, Object>) which can delegate straight to the DAO (Careful! You will need to whitelist the named queries you actually need the UI to execute if there are other queries defined on your Hibernate session factory that you wouldn't want run ....)

This allows the UI (via DWR) to make Service.query(queryName, namedParameterMap) calls straight from the JS without having to implement individual query methods in the various components. All our data loaded into the UI is done via this mechanism, primarily so that we can mark everything as lazy-loading, and then use queries, along with the "fetch join" for each use-case to only load the data required for that particular user action.

Spring is used to tie all of these technologies together, configure all the various parts such as Hibernate and DWR. All the requests go through a Spring DispatcherServlet, which also allows us to link in the DWR controller and service URLs with some very small Spring MVC usage for index.html and a few admin/test pages.

Hopefully more (probably non AJAX) posts to follow without such a gap :D