Archive for the ‘Programming’ Category

REST: I don’t quite get it

Friday, August 10th, 2007

I’ve been playing with RESTful Rails on one of my projects. I must admit to being a bit perplexed.

You have to bend your code to get REST working properly, which smells to me.

For example, when editing a model, you need to push a hidden element into your form to spoof a HTTP PUT method. Rails automates some of this, but … why?  What do you gain by forcing the system to only accept puts for particular actions, particuarly when browsers need to be tricked into playing nicely?  What is lost by having an update action accept POSTs?

Anyway, I will keep playing …

2 things I most loved about Ruby on Rails this week

Thursday, August 9th, 2007
  1. attachment_fu
    From install to an Image Upload with Amazon S3 storage in about 10 minutes. See the attachment_fu tutorial.

  2. RESTful rails
    An XML API for nothing? Why thanks, Rails.

The REST stuff is really fun to play with … I am finding it breaks a little in the real-world with non-trivial use-cases. Start adding auto-completes and other AJAX elements to your page and you start bending the RESTful model a little. I am figuring it’s best to have the core REST architecture and build on it, rather than bend my code to adhere mindlessly to some REST ideal.

I {heart} Rails.

Everything you know about scaling is wrong

Friday, August 3rd, 2007

Scalability of the favourite argument topics of the technically inclined. Having developed some fantastic, all-singing, all-dancing, user-generated social networking Web 2.0 platform*, someone will invariably ask:

“Oh, you used {LANGUAGE X}”
“Will it scale?”

You have to imagine the slightly derisive tone for yourself.

Trouble is, languages don’t scale, systems do.

Assessing scalability on the basis of a particular programming language is like saying to someone who is writing an Encyclopedia: “English won’t scale”.

Systems scale.

If you wrote your Encyclopedia the same way you wrote your product brochure, it wouldn’t scale - small page format, lots of pictures, minimal text with plenty of whitespace. If you write an Encyclopedia, you need to develop a system for handling the information - table of contents, indexing, cross referencing, multiple volumes, alphabetical organisation, thin paper, multiple columns of text, readable fonts.

The language used to write the Encyclopedia or Brochure is secondary to the system that delivers it.

The same is true of Web Applications.

Languages are irrelevant to scale and frameworks (being very close to languages and essentially an extension) only have a minimal impact.

I moved to Ruby on Rails from PHP. The standard argument against Rails from PHP developers is that “Rails won’t scale”**. However, this focus misses the point entirely. I have seen arguments from PHP developers who suggest using single quote for strings ‘ rather than double quotes “, because single quotes parse faster in PHP. This thinking is radically broken.

Scalability has nothing to do with processing performance at the language level.

Scalability is about the system as a whole.

At some point as an application scales, it will invariably require multiple servers and multiple databases, and there is very little any language or framework can do to mitigate this requirement. These requirements are system level, not language level - your only real option is to be ready to build your system out accordingly as it grows. The only trouble you find is when you have made decisions early on that limit the way your application can grow out.

As a corrolary to the scaling issue, my final point is this:

You aren’t going to scale

Call me cynical, call me pessimistic, but lots of people build for scale prematurely.

The focus should always be on creating great user experience.

Unless you have the load to warrant a particular system decision you should not be creating that system (but always within the framework of sensible architectural decisions). Scaling issues are often unpredictable (you don’t know your load profile until you hit it, or are hit by it) but worrying about them before you have to wastes valuable developer resources on infrastructure rather than the interface.

In the Web 2.0 world, scaling problems are a sign of success, but the focus should be on the user, not the system.

* We don’t develop applications or, god forbid, sites anymore, we develop platforms.

** I am aware the argument is a bit confused here, because PHP is compared to Rails, when Rail is a framework based on the Ruby Language and should be compared to a PHP framework like CakePHP.

Multitasking is evil

Wednesday, August 1st, 2007

Agile in Action has some very good reasons not to multitask.

I have a larger post on this issue, but I have some other stuff to do first.

Lies, Damned Lies, and Equity

Monday, July 30th, 2007

I keep getting offers to work on projects.

This would be great, but most of these project offers are in exchange Equity: “We have a great idea, help us implement it and we’ll give you a piece of the profits down the track”.

Unfortunately, work for equity in practice turns out to mean work for nothing.

Maybe it’s just me, but I’ve had equity in a number of companies, mostly when I was young and inexperienced and Dot Com Exuberance was the style of the time. Unfortunately for me, none of this equity ever turned into anything useful. You know, like actual cold hard cash.  In fact, in at least one case the lure of equity kept me working long past the use-by date of the company.

You need to think very carefully before taking on projects for equity - your effort may never be rewarded.

The harsh reality of the startup world is that most companies will fail.

Startups  that don’t fail will probably not be as big as they promised.

It’s hard to predict the next Google.

If you’re  software developer, it’s very easy to think that it’s all about having a great product - but if you build it they may not come.  It’s important to remembre that it’s not necessarily all about a great product, you also need great marketing, and most importantly of all, incredible luck.

So I may have just turned the next Google down, but I ‘m betting that I haven’t. And I have to focus on my own startup efforts. After all, maybe I’m building the next Google …

Why a code review could save you money

Tuesday, July 24th, 2007

I’ve recently run into several clients who’ve been burnt by contract developers.

There are some obvious financial reasons to contract remote developers in different countries, but the practice can lead to problems.
The problems my clients have had break down into two areas:

  • communication issues
  • code quality

Today I am going to focus on Code Quality. I will write about documenting your requirements effectively another time.

One of my clients needed some very simple changes made to an application. I opened the code up and knew instantly that there had been some very average programmers involved. Files and folders everywhere, with names like “Accounts”, “AccountsOLD”, “AccountsBAK”. To a good programmer this kind of sight is an instant warning - good code is organised, clear and sensible.

I still had hope - code may be disorganised from having several developers work on it, but not necessarily be bad.

However, once I started delving into the code itself, the situation became much worse.

The code structure was just as disorganised and incredibly convoluted. Now, you may be thinking, “if the site is running, who cares if some programmer doesn’t think the code’s nice?“, but the crucial thing here is that bad code costs money. Just in case you missed my point:

Bad code costs money

Instead of having a few simple changes to make, I now had a few difficult and complex changes to make.

The bad code meant that I had to revise my estimate. So I revised my quote. Up. 1 hour of work became 5.

This is great for me, as a contract developer charging by the hour, but really bad if you’re trying to run a business on a budget.

Unfortunately, as I explored further things went from worse to really really worse.

The code was not just bad, but dangerous.

The code was wide-open to a couple of well-known security problems called SQL Injection and Command Injection.  These are the code equivalent of leaving the front door open. To a good programmer, they are too obvious to even really worry about - good practice avoids these issues in much the same way you lock your front door when you go out. You don’t really think of it as a security measure, it’s just what you do.

Good code is automatically protected from these obvious security holes.

These problems meant that anyone could gain access to the system and take complete control of the site.

The  client’s code was riddled with these issues. In fact, all of the code was vulnerable - like someone had built a house with no doors at all, just holes in the walls.  The unfortunate fact is that fixing the code will be a long and involved process because the code was so badly organised and written in the first place.

What can you do?

Using cheaper developers is a business reality. Not everyone can afford to hire the best software developers in town.

However, an independent code audit and review can help ensure your project is on track by providing an independent and expert view of development progress. And if your code is complete, an audit can ensure that your product is rock-solid and production ready. And a code audit gets you the knowledge of an expert without all the cost.

A code audit would consider one or all of the following issues:

  • Application Security
  • Scalability & Performance
  • Code Conventions
  • Code Quality
  • Test Coverage
  • Data Privacy
  • User Interaction
  • Information Architecture

A code audit can catch average code before it goes bad and it could save you a ton of money in the long run.

Note to self: server date and Amazon S3

Tuesday, July 24th, 2007

Just lost an hour of my life trying to work out why the test server received 403 errors when attempting to store an image using Amazon S3.

The server date was a day out and S3 uses the date as part of its authentication scheme.

Note to self: check server date when seeing inexplicable errors in Amazon S3.

Trivial, Hard and not going to do it

Tuesday, July 17th, 2007

Charles Miller has posted a great article entitled Understanding Engineers Feasibility that dealing with classes of problems as approached by software engineers. Estimation is notoriously difficult at the best of times, but some classes of problems are more difficult than others.

In my very first development job out of university I was once asked to quote on how long it would take to develop some discussion forum software that would translate between English and Japanese in real-time to facilitate conversation between tourists and locals in an area near Brisbane.

My answer was “20 years and many millions of dollars”. The sales rep mostly hated me after that, but I thought she was joking at the time.

The big issues in this type of problem assessment boils down to the difference between Known Unknowns and Unknown Unknowns.

Some problems have Known Unknowns. Recently I was developing some code that relied on a solution to Subset-Sum (one of the NP-Complete family). We all know that solving NP-Complete is very hard - lot’s of mathematicians have been trying for a long time. However, there are some solutions that, while not being perfect from a theoretical stand point (as in, not provable), are good enough for real-world software. 

Unknown Unknowns are a different matter. If you don’t know the things you don’t know … well, you’re in trouble. I’ve recently been involved in conceptual work with something that will involve natural language processing and sifting through massive amounts of data in real-time. I have no idea what would be involved in approaching this type of problem … and the problems we don’t know that we don’t know are the ones that prove very hard indeed.
And as Charles concludes:

Very Hard is the extreme of hard problems. You’ll often see both words capitalised for emphasis, even in the middle of a sentence. Indexing the entire World Wide Web and providing relevant search results in millisecond response times is a Very Hard problem. Breaking commercial-grade encryption within practical hardware and time limitations is a Very Hard problem. Peace in the Middle East is a Very Hard problem.

‘Very Hard’ is usually reserved for the class of problem that if you solved it, you could change the world. Or at least build a successful business on top of your solution.

More on Triggers

Tuesday, July 10th, 2007

I realised this morning as I played with my unit tests, based on yesteday’s post on Transactional Full-Text Search in MySQL that there is some potential for bad data. MySQL is smart enough to not create duplicates in the search table. If there’s an existing entry (in the case where a delete has not removed an old shadow-copy of the data) - MySQL will simply update the values. However, if there is no entry (because of some error when the content was created) MySQL is not quite smart enough to create one.

We can, however, do this in our trigger:

DELIMITER //;
CREATE TRIGGER content_update_search AFTER UPDATE ON pages
FOR EACH ROW BEGIN
IF (SELECT content_id FROM content_search WHERE content_id = NEW.id) IS NOT NULL THEN
UPDATE content_search SET title = NEW.title, content = NEW.content WHERE content_id = NEW.id;
ELSE
INSERT INTO content_search (content_id, content) VALUES (NEW.id, NEW.content);
END IF;
END//

If we find an existing entry, we update, if no entry exists, we create one.

An anonymous reader (thanks) also pointed out that if data storage is an issue, you can strip the stopwords from the data - MySQL ignores words less than four characters long as well as whole list of longer common words.

Transactional Full-Text Search in MySQL

Monday, July 9th, 2007

One of the issues with using Full-text search in MySQL is that it requires the MyISAM table Engine. In MySQL tables need the InnoDB Engine to use transactions.

This means that we can only ever have full-text search, or transactions, but not both. Given that we really want transactions all the time*, we should generally be running with the InnoDB engine. Not to mention scaling and other issues**.

Enter the Shadow MyISAM Table Pattern to enable (near) Transactional Full-Text Search in MySQL.

All of the following code assumes assumes MySQL 5+.

Suppose you have a table with some content that you want to search using Full-Text, but also want to manage in a transactional environment:

CREATE TABLE content (
id int(10) unsigned NOT NULL auto_increment,
content text,
PRIMARY KEY (id)
) ENGINE=InnoDB;

We create a shadow table using MyISAM and a full-text index that maps the content using a foreign key:

CREATE TABLE content_search (
id int(10) unsigned NOT NULL auto_increment,
content_id int(10) unsigned NOT NULL,
content text,
PRIMARY KEY (id),
FULLTEXT KEY index_fulltext (content)
) ENGINE=MyISAM;

.Then we add some triggers to update the shadow table automatically AFTER INSERT and AFTER UPDATE:

CREATE TRIGGER insert_content_search AFTER INSERT ON content
FOR EACH ROW
INSERT INTO content_search (content_id, content) VALUES (NEW.id, NEW.content);
CREATE TRIGGER update_content_search AFTER UPDATE ON content
FOR EACH ROW
UPDATE content_search SET title = NEW.title, content = NEW.content WHERE content_id = NEW.id;

Changes in the content table are now automatically reflected in the content_search table and this table gives you access to MySQL’s full-text search capability:
SELECT * FROM content_search s LEFT JOIN content c ON c.id = s.content_id WHERE MATCH (content) AGAINST ('lorem ipsum')

One possible drawback of this technique is that it essentially doubles your storage requirements. An alternative in this case is not duplicating the field(s), but pushing them to a search table (so no content field in the content table - it sits in the content_search table and is joined as required. You would them lose the ability to use triggers to manage the data as well as losing transactions on this data.

The End


* Seriously, in any non-trivial application, transactions should be a minimum requirement
** I need to find the reference, but there’s a LiveJournal scaling document that insists that the use of InnoDB is a minimum for scaling MySQL effectively.