Tech Team Lead News: 2007

Saturday, December 29, 2007

Best of this Week Summary 17 December - 29 December 2007

Always interesting, a look inside "the kitchen of": the people from 37signals.com have provided some inside-info on the architecture of some of their sites: Backpack en Basecamp. For
example they are using RoR, MySQL, S3 and memcached.

Post (related to the ESB entry in my post last week) regarding the rise of Tomcat as "application" server, in relation with the rise of Spring. Comparing Tomcat with WebLogic and WebSphere is more like comparing Oracle with MySQL. Tomcat is still not at the same level in certain areas as WebLogic and WebSphere, though it is making progress in these areas: clustering and high availability.

There's a new revised version of the free e-book from Microsoft on the most important security engineering activities that you should have in your development process: The Developer Highway Code. Written by Paul Maher (Microsoft UK) and Alex Mackman (CM Group Ltd).

Great news, version 2.0 of SoapIU has just been released. Improvements include webservice WSDL coverage and WS-Security completely refactored.

Sunday, December 23, 2007

Best of this Week Summary 16 December - 23 December 2007

Nice set of 10 lessons learned from designing and building a high transaction database (Microsoft SQL Server). The system was required to support 35K tps.

Reasonable comparison of 3 open source applicationservers: JBoss 4.2, Geronimo 2 and Tomcat 6. Geronimo comes out as most complete. Tomcat is a bit odd in this comparison, it is more a servlet/web-container than a full applicationserver. Glassfish for example would have fitted better here. Here's some The ServerSide feedback on it.

In this article Paul Fremantle discusses the fundamentals of the Enterprise Service Bus concept. His point is that the model of ESB sometimes might be converting into an anti-SOA pattern: the conversion of the formats happens in the ESB instead of happening at the service providers (the endpoints). Thus you would need a central ESB team that needs to deal with each application, its format and protocol the ESB needs to interface with. Note that the author is owner of the WSO2 ESB, his company and also works on Apache Synapse. As he claims these tools are designed from the ground up to match the original idea of SOA: the owners of the services take responsibility to define a clean and simple interface.

Saturday, December 15, 2007

Best of this Week Summary 10 December - 15 December 2007

Good summary of somebody using GWT for three months and the pros and cons found.

Short intro to Amazon's new offering SimpleDB, which is its third offering besides S3 (Simple Storage Service) and EC2 (Elastic Compute Cloud). It provides "a simple web services interface to create and store multiple data sets, query your data easily, and return the results".

Nice set of best programming practices for Spring.

Sunday, December 9, 2007

Best of this Week Summary 03 December - 09 December 2007

Interview with Bruce Schneider, Internet security guru on security (duh), privacy, electronic voting, encryption, passwords and more. One of the ways to attack identity theft is to not rely on authenticating the person, but on authenticating the transaction, as credit card companies do. Another thing he mentions is to *write down your passwords*, which is contradictionary to what you read everywhere; but he says, just put the paper in a safe place like your wallet! And because you write it down, you will more likely pick a strong password.

Summary of this week's held Google Web Toolkit conference "Voices that matter".

Nice inside view on how BT uses social software like RSS, Wiki, Podcasts etc. on their intranet.

Pattern specification of the requester side caching pattern and its implementation.
The requester side caching pattern is one of mediating the interaction between one or more clients and one or more data providers. The mediation consists of holding data items that have been produced by the provider(s) and using them to support requests from the client(s).

Finally the OpenID 2.0 specifications have been made final and released!

Sunday, December 2, 2007

Best of this Week Summary 26 November - 2 December 2007

Nice article on designing to facilitate unittesting, e.g. using interfaces to decouple an implementation class from its dependency.

Here's a couple of reasons when to use and when not to use stored procedures. Most of the time it is not such a good idea to use them, except for example when having data-intensive/abstract computations, or batch-oriented operations.

Interesting post on how PayPal is transacting 1500 USD per second(!) every day, and that their system is completely build in-house, running on thousands single-rack Unity servers. By using this kind of chunks (instead of a mainframe approach) they can upgrade a lot cheaper, because the servers are so cheap. This distributed, highly redundant Linux approach make the system a lot less vunerable to failures. A big benefit they have using open source is that it is a lot cheaper to have a development environment that is exactly the same as the production environment, therefore reducing the chance of inconsistent results and bugs caused by difference in environments.

Sunday, November 25, 2007

Best of this Week Summary 19 November - 25 November 2007

For software companies to test the security skills of their IT employees, the Secure Programming Council created this "Essential Skills for Secure Programmers Using Java/Java JEE". The draft is now open for public review for 60 days. Other languages like C++ and PHP will follow. Amongst the people in the Java steering committee are Ed Tracy and assisting them are for example people from the OWASP. Definitely a good initiative.

In October W3C wrote a proposal to allow cross domain capabilities (same-origin-policy) for the XMLHttpRequest (XHR) object. This article summarizes it and also compares it with the JSONRequest proposal.

Great presentations that contain comparisons of several Java web frameworks: JSF, Spring MVC, GWT, Seam, Stripes, Struts2, Tapestry and Wicket. Also included are Flex and Grails. Definitely a good starting point if you're starting with a web framework selection. Note that the general trend is that Tapestry is being used less and less, while Wicket is on the rise. GWT I would not use quickly for deployment on Internet sites since it creates quite large Javascript code. For intranet this constraint is a lot less serious. Note that soon you should be able to minimize the required Javascript libraries, thus improving download time. This is a presentation of Matt Raible where he compares a bunch of web frameworks in this 1 hour funny presentation, from a developer's point of view. He also asks who uses what (Hibernate, Websphere, Struts, JSF, Wicket etc). The above mentioned presentations are updated version of the one used in this presentation. His conclusion is that there is no real winner yet.

A nice how-to on how you can write RESTful web services in Java that conform to the JAX-RS: Java API for RESTful Web Services (JSR-311) specification. This JSR should be in Java EE 6.

Saturday, November 17, 2007

Best of this Week Summary 12 November - 18 November 2007

Ten tools to manage your SOA in the area of governance, quality and management. For each of the tools it shortly describes what the tool does does, who's behind it and which companies are using it.

Seven good tips and overview on how to make your Javascript unobtrusive.

Jilles van Gurp (who works at Nokia) describes his initial impressions on Google's Android SDK, a platform aimed at mobile devices, released earlier this week. Main thing to note is that Google has built their own JVM Dalvik and it is another Java spin-off; it is not JME. Here's why this is not such a good thing. And here's some points why it should be better than JME. Note also that Nokia is not (yet?) part of the Open Handset Alliance.

Here's a Javascript library that "solves" the same-origin-policy problem I mentioned in a previous post on OpenSocial by using _IG_FetchContent, which goes via a proxy server. Check also his other 2 posts (First and Second step), providing a higher level wrapper library (API) on top of OpenSocial.

Sunday, November 11, 2007

Best of this Week Summary 6 Nov - 11 Nov 2007

This is a good blog to get you started on JavaFX and related technologies. See this post on what's been covered until now.

Madly interesting is this technical posting about Amazon's Dynamo, which is a their internal distributed storage system in which the data is stored and looked up via a key, with a put() and get() interface. Sounds quite similar to the put() and get() for a Hashtable in Java right? ;-) Actually Dynamo is built in Java, so I guess that's no coincidence! The posting gives you quite some details on Amazon's internal infrastructure, and introduces interesting new terms like that it is an "eventually consistent storage system". What is also cool is that each of Amazon's internal applications can setup their own SLA with Dynamo. This SLA defines the amount of delays and data discrepancy the application will tolerate from Dynamo. The fact that Amazon is opening up its services (with S3 and EC2), makes it a huge differentiator from companies like Google and Microsoft, which don't open up their systems (some Google GFS info you can find here). Related to this the new term that is being coined recently: HaaS (Hardware as a Service). No time to read the whole paper? A summary you can find here. Compare it with Hadoop and CouchDB.

Related to my last week's post about OpenSocial, this week the (very) alpha version 0.5 of the Container API has been released.

Note: I turned on moderation for comments this week because of a big spamming "effort"... Thanks whoever you are...

Monday, November 5, 2007

OpenSocial: the harder technical outstanding questions

This week anybody following technical news cannot have missed the announcement of the Google OpenSocial initiative.

Here is a short introduction. A bunch of live examples can be found here.

Basically it is a widgets (gadgets) API specification, built in Javascript and XML that anybody can plugin on a social network page (like MySpace, Plaxo) to show relevant social information (e.g who are my friends in this social network).
Most posts were positive and only looked at potential positive uses. It took about half a day before the first more critical posts showed up.
With this post I'd like to provide you the current outstanding technical questions and issues with the OpenSocial API. First I'll list the posts I found until now, then my own outstanding questions.

Note: don't misunderstand me, I like the initiative, trying to get the so-called social graph standardized. But a careful examination is definitely relevant if you want to introduce it on your site.

Good points (not only technical) what is not yet so good about OpenSocial

This a good pretty good technical overview and points of critique which the above post mentions. Btw: I disagree with this statement definitly: "if REST APIs are so simple why do developers feel the need to hide them behind object models?". Object model abstraction can still be needed/desired to acquire the best level of abstraction.

Note that the container API itself is not there yet...

In the FAQ there is the question: "Can OpenSocial apps interact with other websites?". The answer is "Yes, social apps have the ability to fully interact with outside 3rd party applications using standard web protocols." But how does it then avoid the same origin policy for Javascript? Would the calls go via Google? How does it work with Google Gadgets (iGoogle), where you can specify any feed URL? My guess for this last one: via a Google proxy as provided in the Google Feed API... At the moment there exists no OpenSocial gadget that implements accessing data from multiple social networks.

Why are technologies like Microformats and APML used? My guess for now: those "standards" are too much in an initial phase. An API with Javascript and XML are based on standardized technologies and immediately available.

Really pay attention to the authentication mechanism described for the People Data , which all go through a Google account (or you can always use an email-address and password). Check the first details on this issue here.

Update: here's some sort of answer to some of the above questions.

Update: Google released version 0.5 of the OpenSocial Service Provider API (the container API). Very alpha.

Update: Here's a Javascript library that "solves" the same-origin-policy.

Sunday, October 28, 2007

Best of this Week Summary 22 Oct - 28 Oct 2007

Nice article (but quite high-level) on how HP is transforming its IT (hardware and software) into a Servce Oriented Architecture using governance, Struts, SOAP, WSRP4J (portlets) and the Spring Portlet MVC. They are reducing the number of datacenters from 85 to 6, and each application that wants to be released gets a review whether it complies to the published standards for those datacenters.

Eoin Woods, one of the IASA Fellows has written this article on the top ten software architecture mistakes. Most are quite obvious, but getting reminded of them can't do any harm! If you're in a hurry, here's short summary.

Ten Java Open Source caching solutions shortly described. Different types of caching mechanism are shown: distributed, local, in-memory. Hmm, who was first...

Sunday, October 21, 2007

Best of this Week Summary 14 Oct - 21 Oct 2007

Very interesting post about the Fotolog architecture. Note that they re-implemented a PHP part of the site in Java to improve performance! And they have to be scalable regarding these numbers for example: "300 Million photos and over 500,000 photos are uploaded each day. Over 30,000 new members are added each day and attracts more than 4.6 million daily users."

This is a good comparison and overview of 4 open source version control systems: CVS, SVN, Bazaar and Mercurial. The last two take a distributed repository approach, in contrast to a central repository approach like CVS and SVN.

Good high level overview of the history of software development. From the Waterfall Model to the current Agile (Iterative) Model. The main gain in all these years is that now it is possible to create quite complex systems with significant less persons because of the progress modern program languages and libraries have made. All of which you already knew of course, even if you didn't start in the 70's... ;-)

James Gosling made the (quite logical) announcement earlier this week that in the future (mobile) devices will get so powerful there won't be the need for a separate Java Micro Edition anymore; there will be only one Java SE (Standard Edition). Also a few interesting points about the upcoming Java 6 update N (preload of the Consumer Java Runtime at computer boot), which should be available a few months into 2008, preparing the desktops for JavaFX.

Saturday, October 13, 2007

Outstanding issues with OpenID and tips for improvements

In this fourth post about OpenID I'll try to give a complete overview of the outstanding issues with OpenID. At the end I'll give a couple of tips for improvements on the outstanding issues. You can find my previous posts here, here and here.

Not average-internet-joe ready
An OpenID is just not in the format an average user currently understands. My parents for example would not be able to "grasp" the idea of a URL being your identification. XRI is a work in progress to let people pick a username that is more or less like regular usernames like '=Paul.Smith'.

Many websites that support OpenID just point to OpenID.net (just renewed by the way, where it's now a lot easier way to find OpenID providers). From there on the user is just "on her own" to figure out what to do. This is not really OpenID's fault, but if the sites that are OpenID-enabled (consumers) are not making it easy for the user to create an account, users will just give up and sign in the "old fashioned" way. These sites should provide direct pointers to solid OpenID providers.

Delegation is quite hard to explain and understand. Even harder to actually use. It is just too hard for the average internet user to set it up. Though delegation is a real important aspect of OpenID, because it allows you to not be dependent on one OpenID provider. If you don't set up delegation for yourself from the start (because that will be your OpenID URL which you can then point (delegate) to any OpenID provider you want), you are out of luck if ever your OpenID provider goes busted.

Different OpenID providers show different, sometimes even confusing, messages when the user has to confirm the site they want to get access to. It gets even more difficult when you can assign multiple personas (see my second post for an explanation of personas). Which one to pick? And why?

Security
Of course there's the phishing issue (man-in-the-middle) that a malicious consumer site can just redirect the user to a fake OpenID provider. A solution some providers take is forcing the user to login via their regular login page first (and bookmarklet). Though it provides a small barrier, it makes the whole OpenID process just more confusing to the user. And it is not a full solution against phishing; on the phishing site, just tell the user the separate login-page has been fixed (just a bit of social engineering :-). Note that OpenID trusts DNS to direct the given URL to the correct machine; DNS servers are known for being hacked too.

There's also the replay-attack issue, where a sniffer can grab the authenticating response and replay it to the consumer. A partial barrier for that is the use of a nonce (number-used-once, see my third post for some references). Version 2.0 of OpenID should by default contain the nonce-fix for replay-attacks. This does not protect against the case that the man-in-the-middle is the first to use the resonse-URL (more a "pre-play" attack).

If an attacker gains access to a user's OpenID login, he immediately has access to all sites that user can login to; with the same OpenID/password combination.

Since all OpenID providers have the option to stay logged-in to it (thus authenticating without providing a password), CSRF attacks become very easy: no password is required.

Exploitation of an XSS flaw on trusted domains as something.CNN.com or else.microsoft.com to prevent an OpenID provider to know where the user is really signing in to. For a full explanation see "[OpenID] What's broken in OpenID 2.0? (IIW session)".

How can a consumer use OpenID in an API it provides? The consumer can not ask the user for credentials at each API call. It should ask via the OpenID provider. Work in this area is the oAuth protocol, which I'll cover a bit more in my next post about OpenID.

Privacy
Since all authentication (ownership prove of the OpenID) goes via the OpenID provider, the provider can track all the sites their users are accessing.

Improvement tips
Below I list a couple of ways OpenID can be improved to tackle the above mentioned problems:

Integrate the flow of signing up to an OpenId provider into your consumer/relying party (OpenID enabled) website.
OpenID providers should provide clarity upfront to the users whether their service will always be for free, whether it does support multiple personas etc.
Consumer sites should implement OpenID more transparently. There is no need to make a distinction at registration between OpenID or not. If the user enters no password, it's probably an OpenID so try OpenID authentication, otherwise it's probably a regular signup.
Find a better solution to handle phishing and replay-attacks. SSL client-certificates could be a solution, but then you'd have to bring the public-private keys to every browser and delete them again. The solution could be cryptography using private/public keys (thus not using a password).
OpenID providers should not recycle inactive accounts or at least use a nonce, which the consumers should also check.

References
- The Identity Corner » The problem(s) with OpenID
- Beginner's guide to OpenID phishing
- Single Sign-On for the Internet: A Security Story

Sunday, October 7, 2007

Best of this Week Summary 30 Sept - 07 Oct 2007

Interesting and provocative point of view in this blog "Why most large-scale Web sites are not written in Java". Quite an extensive discussion can be followed in this TSS refering post. I think the main reason is that the use of the programming language and stack depends on the requirements of the application. Most of the example websites given do not have serious transactional requirements, including transactions that would run over multiple systems, requiring XA. For those example websites no real big harm is done when a transaction occasionally fails. Note also that most Java/JEE implementations use at least some part of the LAMP stack, like Linux and Apache. See here for other reasons.

Danny Ayers is wondering whether JSON is missing a DTD or XSD. Because if you take an arbitrary JSON document from the Internet, you can't tell what it is containing; and usually you can't find it out either. Is that a good or bad thing? I'd say you've got XML and its associated XSD or DTD for interfaces that require a well-defined interface that can be passed on to other, potentially external, 3rd parties without too much effort. For interfaces that stay internal to your system (for example an AJAX call from the browser to the server), adding the extra overhead of some validation format like an XSD or DTD causes just too much overhead, losing the gain of the compactness of JSON. Data is validated anyway by the frontend and backend, though with a bit more programming effort.

Saturday, September 29, 2007

Best of this Week Summary 23 Sept - 29 Sept 2007

Interesting article on TSS on how to integrate user "presence" (like the status of a user in instant messaging) into JEE (J2EE) environments. More uses beside IM come to mind for JEE applications. The article suggests using XMPP, also known as Jabber. Session Initiation Protocol (RFC-3856) could also have been used but XMPP has been chosen because of the maturity of its existing server and Java implementations. In the example the opensource software OpenFire together with Smack from JiveSoftware Inc is used. Shown is a solution with JMS.

This guy, Derek Silvers, tried to rewrite his website (built with PHP) using Ruby on Rails. After 2(!) years he was only halfway. So he switched back to building it in PHP. The most interesting part of the post are the first 3 bullets in the "Inspired by Rails" part, where he lists his lessons learned:
- all logic is coming from the models, one per database table, like Martin Fowler’s Active Record pattern.
- no requires or includes needed, thanks to __autoload.
- real MVC separation: controllers have no HTML or business-logic, and only use REST-approved HTTP. (GET is only get. Any destructive actions require POST.)

Saturday, September 22, 2007

Hands-on experience implementing OpenID

This is my third post about OpenID. To get you started, see my previous posts here and here.
In this post I'll be providing an overview of programming libraries that implement the consumer (a site that enables an OpenID login) and/or identity provider (service/site where a user has registered her OpenID), and also my experiences with them.
For many programming languages an implementation exists (both consumer and server). Libraries are available for: Java, C#, C++, Perl, Python, Ruby, Coldfusion and PHP. For an extensive list of these libraries, see this list.

My experiences with PHP libraries
As I mentioned in my first post, I've been working on a project where I had to add OpenID to an existing site. This site was built in PHP, so I had to look for a consumer library to OpenID-enable the site.
At first I looked at the high-quality opensource OpenID libraries provided by JanRain, Inc, which you can find here. These support also older PHP versions, from PHP 4.3.0 and upwards. The site I had to add OpenID to, still runs on an older PHP version, so this requirement was met. But after trying to integrate the library, I found out that it requires many PHP extensions the customer who the site belongs to, did not want to install all of them. For example PEAR::DB is needed if you use SQLite, PostgreSQL, or MySQL to store the OpenID data. (You might wonder: who does't use PEAR::DB? Well this customer doesnt' :-) Note that you might get away with a FileStore, as mentioned in this EasyOpenID implementation.

So to make the implementation more lightweight (I only needed a consumer supporting OpenID 1.1, preferably w/o any PEAR dependency), I started to look for alternatives. The most lightweight PHP library I could find was the Simple OpenID PHP Class. The only requirement it has is CURL. Basically it is only one PHP class file. It did contain some bugs originally, in the forum of the class you can find the most up-to-date code with a bunch of fixes.
Since the site I had to modify already had existing users, I had to come up with an implementation plan that handles migrating them too. This meant allowing existing users to be able to have an OpenID too. One problem is: how do you associate them with an OpenID? We basically did the same thing as is elaborately described in this nice article from Joseph Smarr, who implemented OpenID for their Plaxo platform. A recommened read if you're about to do the same job!
Basically, the implementation of adding OpenID to an existing site is a significant task. Your users will not see much change from the outside, but internally you most likely will have to modify your login flow, your forgot password flow and your change password flow. Still, providing your users to register with an OpenID is definitely a step forward for the user-friendliness of your site.

Other libraries
My main interest lies in Java, so I was seriously interested in the Java versions of these libraries. I've looked at the code of the OpenId4Java implementation, which originally was created by Sxip and donated to the Google code system.
It supports auto-detection for OpenID version 1.1 and 2.0. If for example the consumer finds out the OpenID version supported by the provider is not 2.0, it will create a client-nonce and append it. Thus a really elaborate library. Check here for a quick introduction btw to see what a nonce is.
But sadly I've not yet been able to integrate or implement one of the Java libraries.
A few note on the Java libraries listed at openid.net:

The idprism.org link is dead.
The NetMesh site gives quite a few warnings about the libraries being unsupported or in pre-release.
As mentioned above, the Sxip library can now be found at Google Code.
The "Informed Control Schemat Consumer, AX attribute metadata retriever" is not really a consumer/provider library, but a library for parsing and generating RDF.

Conclusion
I definitely recommend the mentioned PHP class if you only need to build a consumer with OpenID 1.1 support. If you need OpenID 2.0 support, I recommend one of the JanRain libraries. If you don't want to use PEAR::DB, you might be able to get away with this EasyOpenID implementation of a consumer. If you're using CakePHP, I'd recommend checking the OpenID module built for it (I've not tried this library). I've not been able to try out any of the Java classes yet, but I'd definitely recommend checking out the above mentioned quality implementation.

Sunday, September 16, 2007

Best of this Week Summary 09 Sept - 16 Sept 2007

Good discussion about the question whether CSS frameworks are useful.

Quite basic but still interesting free online chapter "Beautiful tests" for a forthcoming book named "Beautiful Code". Interesting in it is that it shows that even the shortest piece of code can contain bugs, like a Binary Search implementation. Via TSS.

Good tips (thoughts) on scalability of an application. Quite a large focus on threading and decoupling of tasks, and a bit on memory usage.

Simple flexible little Java framework (I'd call it a pattern) to decouple event production and consumption. Check also this Ph. D thesis for an elaborate staged event-driven architecture.

There's now a nice introduction with example code to create a sample application for AIR (previously Apollo), Silverlight and JavaFX. I've also added it to post where I compare these and Flash/Flex.

Saturday, September 8, 2007

Best of this Week Summary 03 Sept - 08 Sept 2007

Good interview with IBM VP about BPM and SOA and how they are related.

Microsoft released Silverlight 1.0 and is also going to built a version for Linux, together with Novell. That version, based on Mono, will be named Moonlight. Some more reports of this news here and here. See my post from May for a quick overview of Silverlight vs AIR vs JavaFX vs Flash/Flex.

6 interesting questions you should address when considering SOA. Questions range from security to ROI.

Sunday, September 2, 2007

Eight top OpenID providers comparison

This is my second post in a series on OpenID. See my previous post here.
For a project I did, I had to add OpenID to an existing website. One requirement of the project was that external OpenID providers should be used (thus the site would not also "be" an OpenID provider). To make sure the newly added code to support OpenID registration would work with most OpenID providers, I tested quite a few of these providers. This gave me quite a good overview of what functionality OpenID providers (should) provide, and how they compare to eachother. The OpenID providers I used for testing and this comparison are a sub-list from here.

The comparison table below lists each OpenID provider and gives a comparison of the most important features these providers (should) support. To be part of this comparison, the provider has to provide all functionality at least in English.

OpenID provider details the OpenID providers.
Version shows which OpenID version is supported. Listed will be either 1.1, and/or 2.0 (still in draft), and/or XRDS and/or Yadis.
HTTPs indicates whether HTTPs is enforced during the authentication, even if you type in the OpenID without the protocol (i.e. no leading http:// or https://).
Login redirect indicates whether the OpenID provider will allow you to login from a consumer (regular website that provides an OpenID login) by redirecting the user to the OpenID provider's login page. Already a few providers don't allow you this anymore. They will send the user to a very basic page, telling the user to first login to the OpenID provider. This page usually does not even contain a link to the login page. That page mentions that not putting a link on the page is to prevent phishing. I don't see that. How does not showing a link prevent phishing? A user would only know there is no link on that page if she has ended up on that page before. And even if she has seen the page before, would she remember that if ever ending on a phishing page with a link to the supposed login? I doubt that.
Simple registration ext indicates whether the OpenID provider supports this extension which allows very basic profile information to be passed back to the consumer. Examples are an email address and the nickname.
Personas allows you to assign a multiple of those profiles to the same OpenID (URL).
Additional features lists any specific features worth mentioning.

OpenID provider	Version	HTTPs	Login redirect	Simple registration ext	Personas	Additional features
WordPress	1.1	No	No. Shows after login whether you want to continue signing in.	Yes	1	N/a
LiveJournal	1.1, Yadis, XRDS.	No	No. But shows username + password fields on the landing page.	No, e.g. nickname is not passed back.	0. Could not find where to enter e.g nickname.	N/a
AOL	1.1, Yadis.	Yes	Yes	No	0. Could not find where to enter e.g nickname.	The OpenID takes the form of openid.aol.com/yourname instead of yourname.aol.com or similar.
VeriSign PIP	1.1, 2.0, Yadis, XRDS.	N/a	No. Does not show whether you want to continue after login.	Yes	1. At authentication you can indicate which fields should be passed back. You can also create new custom fields!	Still in beta. I do remember seeing multiple personas but it seems they dropped it. Very basic landing page if you go to the OpenID URL.
MyOpenID	1.1, 2.0, Yadis, XRDS.	Yes	Yes	Yes	Yes, many.	Very elaborate OpenID provider. Provides the most functionality. From JanRain, Inc, which also provides many libraries for implementing OpenID.
GetOpenID	1.0, maybe 1.1.	Yes	Yes	No	0. Could not find where to enter e.g nickname.	The OpenID takes the form of getopenid.com/yourname instead of yourname.getopenid.com
Videntity.org	1.0, maybe 1.1.	No	Yes	No. At least, you can fill it in on a profile page, but I noticed multiple sites not being able to find any nickname in the OpenID reply.	1	Strange that they seem to support a profile, but I couldn't get it to return for example a nickname when logging in with an OpenID. In any case, on the page where you have to allow/deny, it does NOT show any of the fields I filled in on the profile page.
ClaimID	1.0, maybe 1.1, Yadis, XRDS.	Yes when you specify the protocol in your OpenID	Yes	Yes	1. If you haven't filled in your profile, you can enter it there on the spot.	The OpenID takes the form of claimid.com/yourname instead of yourname.claimid.com.

I was really surprised to find out that not all providers perform the authentication in HTTPs. Sounds like a basic security feature that be enabled by default as OpenID provider. Also all above OpenID providers seem to be run by a commercial company. Not many non-profit versions exist (like mijnopenid.nl). This one I did not include because it is in Dutch.

If you want a free anonymous OpenID, check this Anonymous OpenID server. Note that anybody can use that anonymous OpenID since it requires no authentication!
This service lets you use your Yahoo! account as an OpenID.

Conclusion
Based upon the above table and my experience, the most secure (i.e. HTTPs), solid (not in beta) and flexible (multiple profiles) OpenID provider is myOpenID.com. Of course you should try not to be dependent on one provider and therefore use delegation; see my previous posting for an explanation of delegation.

Friday, August 24, 2007

Best of this Week Summary 20 August - 25 Aug 2007

No real world shocking discoveries for me this week. Still quite interesting though were:

Spring Web Services 1.0 was announced this week. One of its major features is that it facilitates contract-first ("design by contract") webservices creation. This is were you create/generate the WSDL first, then build the implementation (closely related to Spring's interface-based Spring framework). This is different from JAX-WS, where you generate the WSDL from the Java (implementation) classes. Definitely check the comments too, for example to get a feel on how these standards/frameworks relate to eachother: JAX-RPC, JAX-WS, XFire, Axis2, Spring-WS and REST.
On a I-wonder-why-they-did-this-side-note: Sun has changed their Nasdaq symbol from SUNW to JAVA. I'm quite suprised they did it, Sun is a lot more than Java and one day Java will be replaced by another programming language... really... trust me ;-)
Interesting support from Yahoo! for the Apache project Hadoop. Quoting the About page:

"Hadoop is a framework for running applications on large clusters of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a computational paradigm named map/reduce, where the application is divided into many small fragments of work, each of which may be executed or reexecuted on any node in the cluster. In addition, it provides a distributed file system that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both map/reduce and the distributed file system are designed so that node failures are automatically handled by the framework."
Finally, check these this IBM developerworks article about the new 2.0 release of Mylyn (formerly called Mylar), a task-driven management tool for Eclipse. It adds two facilities to Eclipse: integrated task management and automated context management.
"Task management integrates your task/bug/defect/ticket/story/issue tracker into Eclipse and provides advanced task-editing and task-scheduling facilities. Context management monitors your interactions with Eclipse, automatically identifies information relevant to the task at hand, and focuses structured views and editors to show only the relevant information."

Sunday, August 19, 2007

An introduction to OpenID

Recently I've been working on extending an existing site with OpenID. In the coming weeks I'll be going into details of OpenID in different ways. This week I'm going to give an overview of OpenID and the areas of the specification that can be improved for readability. In the following weeks I'll be addressing:

A comparison of OpenID authentication providers.
A description of my experience implementing OpenID, including a comparison of libraries.
Outstanding issues with OpenID (security).
Where's OpenID going to.

In this first post I'm not going to give a full detailed explanation of OpenID. There are many sources that provide quite a good description. Here's a list of what I am not going to describe, but can be a good starting point to learn about OpenID:

Wikipedia has a good definition:

"OpenID is a decentralized single sign-on system. Using OpenID-enabled sites,
web users do not need to remember traditional authentication tokens such as
username and password. Instead, they only need to be previously registered on a website with an OpenID "identity provider", sometimes called an i-broker. Since OpenID is decentralized, any website can employ OpenID software as a way for users to sign in; OpenID solves the problem without relying on any centralized website to confirm digital identity."
A good starting point is of course the home of the OpenID specification. You'll see there that the current version is 1.1 and 2.0 is in draft.
The difference between SAML and OpenID. Here's a good starting point.
Examples of major websites supporting OpenID are: WordPress, LiveJournal, AOL and Digg.

I'd like to focus on a few elements that I found not very well explained on the OpenID website. For example, not very well described (to me :-) is how the delegation of your OpenID provider works. It took me quite some investigation and looking at other sites to figure out how it exactly works. What is comes down to is that OpenID is all about you being able to prove that you are owner of a URL. And a URL is basically just a webpage. The idea is that normally that page contains a <link> tag in its HTML, within the tag, providing where your OpenID provider/authenticator is located. Say you have an OpenID account named 'mytest' at myopenid.com. Than you can actually go to the URL mytest.myopenid.com with your browser. When you look at the HTML you'll find this:

<link rel="openid.server" href="http://www.myopenid.com/server" />

Use <link rel="openid2.provider" href="http://www.myopenid.com/server" /> for version 2.0 providers

This tells you which OpenID server should be used to authenticate the URL mytest.myopenid.com. There are two major disadvantages to this:

Maybe you want to use a different URL, not with "myopenid.com" in it. E.g. mytest.com.
What if myopenid.com goes out of business? You can't login anymore to *any* of the sites you registered with that OpenID!

The solution for this is to use delegation. In that case the <link> tag in the page returned when going to the URL mytest.com would look like this:

<link rel="openid.server" href="http://www.myopenid.com/server" />

<link rel="openid.delegate" href="http://mytest.myopenid.com/" />

Note the additional "openid.delegate" <link> tag. In the above example it points to the myopenid.com OpenID provider and uses mytest.myopenid.com for authentication: authentication of mytest.com is delegated to myopenid.com. If ever myopenid.com is not available anymore, you can just create an account at another OpenID provider and put that in the href attribute in the above <link> tag instead of mytest.myopenid.com. This is using the essential OpenID principle that you actually own the URL. Since you can change the OpenID provider that was contained in the URL, you must be owner of the URL!

Another thing to realize is that the OpenID protocol has no failover requirements defined for OpenID providers. You as owner of the URL will have to arrange that and make sure that you can still authenticate your URL in case the OpenID provider is down/out of business. The only way you can do that is via via delegation. I find this one of the lesser things of OpenID. Delegation is quite hard to understand for "average" Internet users, thus putting the responsibility of "failover" in their (own) hands can cause quite some suprises for them.

The third area where I found the OpenID home and this other good source of OpenID information not well done is giving a good visual diagram of the OpenID protocol. Much clearer is this very detailed description of the protocol flow in OpenID 2.0, including a nice sequence diagram, in this ServerSide article. It is already discussion version 2.0, but it also applies to 1.x except the XRI/XRDS parts. Below that diagram is shown for easy reference:

Sunday, August 12, 2007

Best of this Week Summary 05 August - 12 August 2007

A bunch of tips on branching and merging in SubVersion (svn).

There was quite a lot of security related news this week. Check this good short overview of what happened at Blackhat Ops)2007.

At the conference it was shown that many Web 2.0 sites are making the same mistakes as they were in Web 1.0. For example:
- Improper use of cookies (e.g. CSRF)
- Putting business logic only in the Javascript client

If you want to dive into some more low level security details, here's a presentation from the conference which shows three security related issues. It gives ways to exploit these security issues and ways to prevent and/or detect them:
- DNS rebinding regarding Same Origin Policy in your browser. Also known as cross-IP scripting, also known as TCP relaying. It allows an external attacker to access your internal network, thus bypassing your firewall!
- Provider Hostility, i.e. Internet providers modifying content of data from websites you visit.
- Audio captchas, which is "speech, distorted and overlaid with a quieter speech".

Saturday, August 4, 2007

Best of this Week Summary 29 July - 04 August 2007

This is a very interesting overview (basically a summary) of YouTube's architecture and how it handles scalability. Python is used all over the place. Includes lessons learned! Of course YouTube does not need heavy business-logic nor a solid transaction-architecture as mentioned here too. This makes scaling a little less of a challenge; but since the numbers are so huge, the challenges are nevertheless large.
This (a bit older) article describes how Digg handles scalability. Definitely interesting too.
And here's the inside scoop on MySpace architecture and scalability.
Here are details on Google's file system GFS and how it helps solve them scalability.
This PDF shows how Japan's largest social networking site (SNS) Mixi is handling scalability with MySQL.
Finally, interesting regarding scalability and availability is this architecture "mashup": Java JVM is used to improve scalability in Drupal.
This looks like an interesting upcoming feature in HTMLUnit for unittesting AJAX enabled web-applications. It will actually re-synchronize AJAX calls that should run asynchronously.

Sunday, July 29, 2007

Best of this Week Summary 22 - 28 July 2007

Interesting conclusion made in a technical report from the Delft University of Technology regarding the use of push vs pull techniques in AJAX applications for web-based realtime notifications. They concluded that:

"In this paper we have compared pull and push solutions for achieving web-based real time event notification. The contributions of this paper include the experimental design, a reusable implementation of a sample application in push and pull style as well as a measurement framework, and the experimental results.
Our experiment shows that if we want high data coherence and high network performance, we should choose the push approach. However, push brings some scalability issues; the server application CPU usage is 7 times higher as in pull. According to our results, the server starts to saturate at 350-500 users. For larger number of users, load balancing and server clustering techniques are unavoidable."
Yahoo released this very cool extension to Firebug (the standard tool for webdevelopers) named YSlow. Here's a good introduction with screenshots and what it can do. Here's an example of it analyzing this blog (click on it for the large version):

On the funny side: a list of commonly used development methodologies that are "broken" to say the least :-)

Saturday, July 21, 2007

Best of this Week Summary 15 - 21 July 2007

Related to this posting which mentions a good article about Continous Integration (CI), I found this interesting article which compares four open source CI tools: CruiseControl, Continuum, Luntbuild and Hudson. In short it is telling you:

CruiseControl:
+ Open source
+ Actively checks your source control system (SCM) for changes
+ Good notification options (including RSS)
- Bit more complex to setup
- Wrapper Ant build script/Maven SCM plugin needed
- Inter-project dependencies not a strong point

Continuum:
+ Open source
+ Easy setup
+ Strong Maven integration
+ Good notification options (though no RSS notification)
- Less extensive list of suppored SCM
- Checks SCM changes on user defined scheduled times (so not actively)
- Inter-project dependencies not a strong point

Luntbuild:
+ Open source
+ Easy setup
+ Most feature-richh of the OSS versions
+ Supports inter-project dependencies
+ Supports many SCMs
+ Separation of build scripts and schedulers
- Not so many notification options
- Checks SCM changes on user defined scheduled times (so not actively)

Hudson:
+ Easy setup (war deployment)
+ Scheduling and ability to poll SCM for changes
+ Good built-in JUnit support
+ Unique fingerprint per build so you can easily find out what was in that build
+ Supports inter-project dependencies
- Not so many notification options (but RSS is available)
- Supports not so many SCMs
- Lack of flexibility (e.g Ant build.xml has to be in your project's root directory)

Conclusion
Looking at the above +s and -s, my conclusion is that for smaller projects where inter-project dependencies are not important, CruiseControl or Continuum can be used. For larger projects with inter-project dependencies, Luntbuild and Hudson are a good option, though you really have to evaluate whether the new kid on the block Hudson is currently mature enough to be used in large projects.

Saturday, July 14, 2007

Best of this Week Summary 01 - 14 July 2007

Last week GPL v3 has been released. This article gives a good overview of what has changed, with one of the major changes being the anti-"tivoization" provision. This ensures that the owner of a device that uses GPL software can change that software.
When you are going to use open source tools that use this GPL3, it can have a serious impact the use of it on your project and product.

This is a free downloadable handy Ruby tool that checks for a given CSS file whether the CSS statements are used in any of the supplied HTML pages. For large projects the CSS might have gotten so big that is too hard to check whether all selectors are actually used. CSS files are usually a significant part of any website page, so size does matter for CSS files. Removing any unused selectors is thus very useful excercise.

Sunday, July 8, 2007

Service Oriented Architecture diagram template

In a past project I had to create a Service Oriented Architecture diagram. Before I created it, I searched on the internet to find a template I could re-use.That I couldn't find anywhere. As next best solution, I used many diagrams as inspiration to come up with my version.

Since I believe in re-use, I wanted to share my knowledge and give others a headstart by providing the the diagram I created online. The actual diagram from the project I couldn't use, so I recreated a new diagram from scratch. Below you see the final version and a link to the editable(!) actual .odg document you can download. It has been created in OpenOffice 2.1 Draw.

The diagram is free for you to use, just as long as you refer to were you got it from: http://ttlnews.blogspot.com/

And this is the link to the actual document.

Hope you find it useful. Comments/feedback/tips are of course welcome too!

Saturday, June 30, 2007

Best of this Week Summary 24 - 30 June 2007

Nice use of RSS: use it to publish your nightly build results. They then use the (commercial) product Confluence to build a wiki page out of it.

Interesting write-up of one of the developers of the Digg comments system. Very AJAX and PHP orientated solution. Notice for example that the DOM is created mostly dynamically and there is 4x as much Javascript code than PHP code for the comments functionality! For interesting detailed design desicions made (main question to anser: how to the commenting more complex and simpler at the same time), see this link, also mentioned in the article.

Saturday, June 23, 2007

Business process modeling interviews one-on-one

On page 10 of this whitepaper I read an interesting statement regarding interviewing a customer to figure out what business processes they have (with as ultimate goal integrating different back-end systems to provide an automated business process in an SOA environment).

The process-modeling interviews were done one-on-one, instead of meetings where all involved parties are present. The system integrator's experience showed that otherwise (too) much time would have been spent on educating everybody about the big picture. In such a "grand" meeting, roles would be ranging from frontline-workers to the CIO, for which each would be contributing very different process insights, causing a significant delay.

For me it is quite remarkable, since all the process-modeling I was involved in over the years did involve a more consensus-oriented meeting structure were all relevant parties were put in one room. It does work, but indeed much time is spent on getting everybody on the same page. If time doesn't permit, the one-on-one interviewing technique is definitely a viable option. Still, it might provide less buy-in from the involved parties. Also some process optimization(s) might be missed because everybody still only sees his/her part of the process. And indeed, all involved did conclude that in the future a more participatory style and ongoing committee meetings will be most beneficial, especially in the area of SOA governance and the evolution of the business services layer. The paper also shows that iterative (agile) design/development with involvement of the business analysts from the start is still definitely a good way to go: it's just hard to get everything correct in the first iteration. As an SOA architect, you start constructing business services layers with BPEL, which future business applications can reuse (top-down). Then at almost the same time you start to webservice-enable existing systems (bottom-up). Note the combined effort: it was not implemented pure top-down or pure-bottom up, it was a mix. Here's a link to an elaborate description of the project.

Below is a screenshot of a great diagram in the paper which put sthe above architecture terms SOA, webservices and BPEL in perspective:

Best of this Week Summary 17 - 23 June 2007

Good introduction to Continuous Integration on software projects. Excerpt from the book "Continuous Integration: Improving Software Quality and Reducing Risk". It includes:
- What is the value of CI?
- What prevents teams from using CI?
- How one gets to CI?
- When and how should project implement CI?
- How does it relate to other development practices like coding standards, refactoring and iterative development?
Another excerpt, this time from the book "SOA Using Java Web Services". It walks you through code for building an Ajax application that consumes RESTful Java Web services endpoints. Mainly useful for creating mashups, since enterprises usually use WSDL and SOAP. Section 10.4 is the most interesting with a list of conclusions. Regarding point 4 in that section, I rather recommend using soapUI as tool to explore service interface data, instead of building your own AJAX frontend, which can be a significant effort.

Saturday, June 16, 2007

Best of this Week Summary 10 - 16 June 2007

Good list of 30 questions you can ask when interviewing medium and senior Java developers for your team.
Good comparison of three bugtracking systems: Bugzilla, Trac and Jira. Bugzilla and Trac are open source, Jira is commercial.
Bugzilla: exists the longest. It is the hardest one to install. Not an intuitive user interface. Trac: open source with Subversion integration. Lightweight. Just like Bugzilla a bit harder to install. Uses Python, not as many features as Bugzilla. Wiki based.
Jira: commercial. Installation wizard.
Which one you really prefer is up to you and the specific project situation. Check also the comments for other suggestions of bugtracking tools like Eventum, Mantis, JTrac, Project Dune and Track+.
Apollo has just been renamed to Apollo Integrated Runtime, or AIR for short. And it has gone into beta (was previously released in developer preview).

Apple released Safari for windows this week, a public beta version. Ars Technica looked at it. In general, the result is that the differences are so small, it's not worth it (yet) to switch to it. Pros and cons found were:
+ A little bit faster page load.
+ The ability to resize textboxes on forms (just like the new, just released, Netscape browser which is based upon Mozilla).
+ Since it should run on the iPhone (soon to be released), it should be able to fully support AJAX on that mobile phone.
- On day one of the release 0 day security exploits were discovered.
- Unstable, sometimes even requiring a full system reset while it was being tested.
- Cross platform inconsistencies in for example the keyboard shortcuts to switch between tabs.
An update that fixes most of the exploits has been released already.

Saturday, June 9, 2007

Best of this Week Summary 03 May - 09 June 2007

An interview with writers of new book about REST. One free chapter for download. Interesting is the interview with the writers were they explain why they think REST is great and (when) preferable above WS-* (SOAP).
An overview of nine free virtualization environments. Plus one free spreadsheet to compute the TCO of virtualization.
Interest points made about what Google Gears does not yet support out-of-the-box, like synchronization strategies. But as mentioned already a little bit in the article, the correct synchronization strategy depends on the application you are building. So maybe Gears could (should?) have contained already one or two simple syncing strategies. Also, I think saying it replaces one problem with another is a not really correct. I'd say Gears is a starting point in providing a full offline library/framework solution, but it does not solve all problems at this point in time.
The writers of the book "GWT in Action" answered a couple of questions on their impression is on GWT after a year.

Most important points made:
- Regarding debugging: most of the time you don't need to go into the generated Javascript,
- The great fact that you can re-use all your Java knowledge and tools,
- That for simple apps it would still be too heavy-weight (overkill),
- How GWT tries to make the generated Javascript as little as bloated as possible,
- That the generated Javascript code can be output in pretty, more readable Javascript,
- A GWT application can mix Javascript and Java (so it is not that "crippled" as it seems),
- There are wrappers available for Scriptaculous, JSCalendar etc,
- Javascript files are provided per browser type, instead of one big file that handles all browsers,
- Changing locale can require you to switch to a dynamic approach, losing GWT stripping,
- Modularization can be improved; now you have to always download all modules that your application uses.

Saturday, June 2, 2007

Best of this Week Summary 27 May - 02 June 2007

This article shows why you probably shouldn't use subdomains to differentiate user accounts as in http://username.mydomain.com, giving a very good reason not to if you're using SSL. Use a REST style (like on Flickr, Del.icio.us) instead, like http://www.mydomain.com/username/. A reason you *could* be using it is to prevent XSS in user generated content sites...

How does OpenID work and how can you integrate it into your applications.
OpenID is an open centralized authentication approach. It performs the same functionalities as SAML but is a slimmed down and lighter version to use. The article gives a good overview and also goes into the protocol details. It also shows how to implement OpenID with OpenID4Java.

Firefox plugins are getting more and more attention of hackers via a man-in-the-middle attack. An effective way of implementing this kind of attack is by setting up your own wifi-accesspoint, let your malicious code scan for what comes by, and insert a modified Firefox plugin. SSL connections like on the offical Firefox plugins website do not run this risk (or to be more precise, very limited).

Of course the announcement of the availability of Google's Gears was a biggie.

It consists of three major components:
LocalServer:
Database:
and WorkerPool (async javascript):
Dojo Offline will be ported to move its API on top of Gears. Trying to make this an industry standard, Google is working with Adobe to get it integrated with Apollo.
You can use the three components also seperately, you don't need to only use it for writing synchronzation software.
Note that the workers in the WorkerPool are not threads, that is, they don't share anything with eachother, so they should be considered more as separate processes rather than threads.
Here's a couple of nice examples with explanation on how you can use Gears.
The question you can ask yourself is whether we should go down this Javascript road. It is so hard to create rich controls which work well always on all browsers. Are the desktop solutions (Apollo, JavaFX) the better way to go?