Archive for the ‘Software Engineering’ Category

Everything you know about scaling is wrong

Friday, August 3rd, 2007

Scalability of the favourite argument topics of the technically inclined. Having developed some fantastic, all-singing, all-dancing, user-generated social networking Web 2.0 platform*, someone will invariably ask:

“Oh, you used {LANGUAGE X}”
“Will it scale?”

You have to imagine the slightly derisive tone for yourself.

Trouble is, languages don’t scale, systems do.

Assessing scalability on the basis of a particular programming language is like saying to someone who is writing an Encyclopedia: “English won’t scale”.

Systems scale.

If you wrote your Encyclopedia the same way you wrote your product brochure, it wouldn’t scale – small page format, lots of pictures, minimal text with plenty of whitespace. If you write an Encyclopedia, you need to develop a system for handling the information – table of contents, indexing, cross referencing, multiple volumes, alphabetical organisation, thin paper, multiple columns of text, readable fonts.

The language used to write the Encyclopedia or Brochure is secondary to the system that delivers it.

The same is true of Web Applications.

Languages are irrelevant to scale and frameworks (being very close to languages and essentially an extension) only have a minimal impact.

I moved to Ruby on Rails from PHP. The standard argument against Rails from PHP developers is that “Rails won’t scale”**. However, this focus misses the point entirely. I have seen arguments from PHP developers who suggest using single quote for strings ‘ rather than double quotes “, because single quotes parse faster in PHP. This thinking is radically broken.

Scalability has nothing to do with processing performance at the language level.

Scalability is about the system as a whole.

At some point as an application scales, it will invariably require multiple servers and multiple databases, and there is very little any language or framework can do to mitigate this requirement. These requirements are system level, not language level – your only real option is to be ready to build your system out accordingly as it grows. The only trouble you find is when you have made decisions early on that limit the way your application can grow out.

As a corrolary to the scaling issue, my final point is this:

You aren’t going to scale

Call me cynical, call me pessimistic, but lots of people build for scale prematurely.

The focus should always be on creating great user experience.

Unless you have the load to warrant a particular system decision you should not be creating that system (but always within the framework of sensible architectural decisions). Scaling issues are often unpredictable (you don’t know your load profile until you hit it, or are hit by it) but worrying about them before you have to wastes valuable developer resources on infrastructure rather than the interface.

In the Web 2.0 world, scaling problems are a sign of success, but the focus should be on the user, not the system.

* We don’t develop applications or, god forbid, sites anymore, we develop platforms.

** I am aware the argument is a bit confused here, because PHP is compared to Rails, when Rail is a framework based on the Ruby Language and should be compared to a PHP framework like CakePHP.

Multitasking is evil

Wednesday, August 1st, 2007

Agile in Action has some very good reasons not to multitask.

I have a larger post on this issue, but I have some other stuff to do first.

Why a code review could save you money

Tuesday, July 24th, 2007

I’ve recently run into several clients who’ve been burnt by contract developers.

There are some obvious financial reasons to contract remote developers in different countries, but the practice can lead to problems.
The problems my clients have had break down into two areas:

  • communication issues
  • code quality

Today I am going to focus on Code Quality. I will write about documenting your requirements effectively another time.

One of my clients needed some very simple changes made to an application. I opened the code up and knew instantly that there had been some very average programmers involved. Files and folders everywhere, with names like “Accounts”, “AccountsOLD”, “AccountsBAK”. To a good programmer this kind of sight is an instant warning – good code is organised, clear and sensible.

I still had hope – code may be disorganised from having several developers work on it, but not necessarily be bad.

However, once I started delving into the code itself, the situation became much worse.

The code structure was just as disorganised and incredibly convoluted. Now, you may be thinking, “if the site is running, who cares if some programmer doesn’t think the code’s nice?“, but the crucial thing here is that bad code costs money. Just in case you missed my point:

Bad code costs money

Instead of having a few simple changes to make, I now had a few difficult and complex changes to make.

The bad code meant that I had to revise my estimate. So I revised my quote. Up. 1 hour of work became 5.

This is great for me, as a contract developer charging by the hour, but really bad if you’re trying to run a business on a budget.

Unfortunately, as I explored further things went from worse to really really worse.

The code was not just bad, but dangerous.

The code was wide-open to a couple of well-known security problems called SQL Injection and Command Injection.  These are the code equivalent of leaving the front door open. To a good programmer, they are too obvious to even really worry about – good practice avoids these issues in much the same way you lock your front door when you go out. You don’t really think of it as a security measure, it’s just what you do.

Good code is automatically protected from these obvious security holes.

These problems meant that anyone could gain access to the system and take complete control of the site.

The  client’s code was riddled with these issues. In fact, all of the code was vulnerable – like someone had built a house with no doors at all, just holes in the walls.  The unfortunate fact is that fixing the code will be a long and involved process because the code was so badly organised and written in the first place.

What can you do?

Using cheaper developers is a business reality. Not everyone can afford to hire the best software developers in town.

However, an independent code audit and review can help ensure your project is on track by providing an independent and expert view of development progress. And if your code is complete, an audit can ensure that your product is rock-solid and production ready. And a code audit gets you the knowledge of an expert without all the cost.

A code audit would consider one or all of the following issues:

  • Application Security
  • Scalability & Performance
  • Code Conventions
  • Code Quality
  • Test Coverage
  • Data Privacy
  • User Interaction
  • Information Architecture

A code audit can catch average code before it goes bad and it could save you a ton of money in the long run.

Trivial, Hard and not going to do it

Tuesday, July 17th, 2007

Charles Miller has posted a great article entitled Understanding Engineers Feasibility that dealing with classes of problems as approached by software engineers. Estimation is notoriously difficult at the best of times, but some classes of problems are more difficult than others.

In my very first development job out of university I was once asked to quote on how long it would take to develop some discussion forum software that would translate between English and Japanese in real-time to facilitate conversation between tourists and locals in an area near Brisbane.

My answer was “20 years and many millions of dollars”. The sales rep mostly hated me after that, but I thought she was joking at the time.

The big issues in this type of problem assessment boils down to the difference between Known Unknowns and Unknown Unknowns.

Some problems have Known Unknowns. Recently I was developing some code that relied on a solution to Subset-Sum (one of the NP-Complete family). We all know that solving NP-Complete is very hard – lot’s of mathematicians have been trying for a long time. However, there are some solutions that, while not being perfect from a theoretical stand point (as in, not provable), are good enough for real-world software. 

Unknown Unknowns are a different matter. If you don’t know the things you don’t know … well, you’re in trouble. I’ve recently been involved in conceptual work with something that will involve natural language processing and sifting through massive amounts of data in real-time. I have no idea what would be involved in approaching this type of problem … and the problems we don’t know that we don’t know are the ones that prove very hard indeed.
And as Charles concludes:

Very Hard is the extreme of hard problems. You’ll often see both words capitalised for emphasis, even in the middle of a sentence. Indexing the entire World Wide Web and providing relevant search results in millisecond response times is a Very Hard problem. Breaking commercial-grade encryption within practical hardware and time limitations is a Very Hard problem. Peace in the Middle East is a Very Hard problem.

‘Very Hard’ is usually reserved for the class of problem that if you solved it, you could change the world. Or at least build a successful business on top of your solution.