Archive for July, 2007

Lies, Damned Lies, and Equity

Monday, July 30th, 2007

I keep getting offers to work on projects.

This would be great, but most of these project offers are in exchange Equity: “We have a great idea, help us implement it and we’ll give you a piece of the profits down the track”.

Unfortunately, work for equity in practice turns out to mean work for nothing.

Maybe it’s just me, but I’ve had equity in a number of companies, mostly when I was young and inexperienced and Dot Com Exuberance was the style of the time. Unfortunately for me, none of this equity ever turned into anything useful. You know, like actual cold hard cash.  In fact, in at least one case the lure of equity kept me working long past the use-by date of the company.

You need to think very carefully before taking on projects for equity - your effort may never be rewarded.

The harsh reality of the startup world is that most companies will fail.

Startups  that don’t fail will probably not be as big as they promised.

It’s hard to predict the next Google.

If you’re  software developer, it’s very easy to think that it’s all about having a great product - but if you build it they may not come.  It’s important to remembre that it’s not necessarily all about a great product, you also need great marketing, and most importantly of all, incredible luck.

So I may have just turned the next Google down, but I ‘m betting that I haven’t. And I have to focus on my own startup efforts. After all, maybe I’m building the next Google …

Why a code review could save you money

Tuesday, July 24th, 2007

I’ve recently run into several clients who’ve been burnt by contract developers.

There are some obvious financial reasons to contract remote developers in different countries, but the practice can lead to problems.
The problems my clients have had break down into two areas:

  • communication issues
  • code quality

Today I am going to focus on Code Quality. I will write about documenting your requirements effectively another time.

One of my clients needed some very simple changes made to an application. I opened the code up and knew instantly that there had been some very average programmers involved. Files and folders everywhere, with names like “Accounts”, “AccountsOLD”, “AccountsBAK”. To a good programmer this kind of sight is an instant warning - good code is organised, clear and sensible.

I still had hope - code may be disorganised from having several developers work on it, but not necessarily be bad.

However, once I started delving into the code itself, the situation became much worse.

The code structure was just as disorganised and incredibly convoluted. Now, you may be thinking, “if the site is running, who cares if some programmer doesn’t think the code’s nice?“, but the crucial thing here is that bad code costs money. Just in case you missed my point:

Bad code costs money

Instead of having a few simple changes to make, I now had a few difficult and complex changes to make.

The bad code meant that I had to revise my estimate. So I revised my quote. Up. 1 hour of work became 5.

This is great for me, as a contract developer charging by the hour, but really bad if you’re trying to run a business on a budget.

Unfortunately, as I explored further things went from worse to really really worse.

The code was not just bad, but dangerous.

The code was wide-open to a couple of well-known security problems called SQL Injection and Command Injection.  These are the code equivalent of leaving the front door open. To a good programmer, they are too obvious to even really worry about - good practice avoids these issues in much the same way you lock your front door when you go out. You don’t really think of it as a security measure, it’s just what you do.

Good code is automatically protected from these obvious security holes.

These problems meant that anyone could gain access to the system and take complete control of the site.

The  client’s code was riddled with these issues. In fact, all of the code was vulnerable - like someone had built a house with no doors at all, just holes in the walls.  The unfortunate fact is that fixing the code will be a long and involved process because the code was so badly organised and written in the first place.

What can you do?

Using cheaper developers is a business reality. Not everyone can afford to hire the best software developers in town.

However, an independent code audit and review can help ensure your project is on track by providing an independent and expert view of development progress. And if your code is complete, an audit can ensure that your product is rock-solid and production ready. And a code audit gets you the knowledge of an expert without all the cost.

A code audit would consider one or all of the following issues:

  • Application Security
  • Scalability & Performance
  • Code Conventions
  • Code Quality
  • Test Coverage
  • Data Privacy
  • User Interaction
  • Information Architecture

A code audit can catch average code before it goes bad and it could save you a ton of money in the long run.

Note to self: server date and Amazon S3

Tuesday, July 24th, 2007

Just lost an hour of my life trying to work out why the test server received 403 errors when attempting to store an image using Amazon S3.

The server date was a day out and S3 uses the date as part of its authentication scheme.

Note to self: check server date when seeing inexplicable errors in Amazon S3.

Code Audit & Review

Sunday, July 22nd, 2007

An independent code audit and review can help ensure your project is on track by providing an independent and expert view of development progress.

If your code is complete, an audit can ensure that your product is rock-solid and production ready.

Factors to consider:

  • Application Security
  • Scalability & Performance
  • Code Conventions
  • Code Quality
  • Test Coverage
  • Data Privacy
  • User Interaction
  • Information Architecture

Contact me for more information.

Trivial, Hard and not going to do it

Tuesday, July 17th, 2007

Charles Miller has posted a great article entitled Understanding Engineers Feasibility that dealing with classes of problems as approached by software engineers. Estimation is notoriously difficult at the best of times, but some classes of problems are more difficult than others.

In my very first development job out of university I was once asked to quote on how long it would take to develop some discussion forum software that would translate between English and Japanese in real-time to facilitate conversation between tourists and locals in an area near Brisbane.

My answer was “20 years and many millions of dollars”. The sales rep mostly hated me after that, but I thought she was joking at the time.

The big issues in this type of problem assessment boils down to the difference between Known Unknowns and Unknown Unknowns.

Some problems have Known Unknowns. Recently I was developing some code that relied on a solution to Subset-Sum (one of the NP-Complete family). We all know that solving NP-Complete is very hard - lot’s of mathematicians have been trying for a long time. However, there are some solutions that, while not being perfect from a theoretical stand point (as in, not provable), are good enough for real-world software. 

Unknown Unknowns are a different matter. If you don’t know the things you don’t know … well, you’re in trouble. I’ve recently been involved in conceptual work with something that will involve natural language processing and sifting through massive amounts of data in real-time. I have no idea what would be involved in approaching this type of problem … and the problems we don’t know that we don’t know are the ones that prove very hard indeed.
And as Charles concludes:

Very Hard is the extreme of hard problems. You’ll often see both words capitalised for emphasis, even in the middle of a sentence. Indexing the entire World Wide Web and providing relevant search results in millisecond response times is a Very Hard problem. Breaking commercial-grade encryption within practical hardware and time limitations is a Very Hard problem. Peace in the Middle East is a Very Hard problem.

‘Very Hard’ is usually reserved for the class of problem that if you solved it, you could change the world. Or at least build a successful business on top of your solution.

Contact Toby

Wednesday, July 11th, 2007

More on Triggers

Tuesday, July 10th, 2007

I realised this morning as I played with my unit tests, based on yesteday’s post on Transactional Full-Text Search in MySQL that there is some potential for bad data. MySQL is smart enough to not create duplicates in the search table. If there’s an existing entry (in the case where a delete has not removed an old shadow-copy of the data) - MySQL will simply update the values. However, if there is no entry (because of some error when the content was created) MySQL is not quite smart enough to create one.

We can, however, do this in our trigger:

DELIMITER //;
CREATE TRIGGER content_update_search AFTER UPDATE ON pages
FOR EACH ROW BEGIN
IF (SELECT content_id FROM content_search WHERE content_id = NEW.id) IS NOT NULL THEN
UPDATE content_search SET title = NEW.title, content = NEW.content WHERE content_id = NEW.id;
ELSE
INSERT INTO content_search (content_id, content) VALUES (NEW.id, NEW.content);
END IF;
END//

If we find an existing entry, we update, if no entry exists, we create one.

An anonymous reader (thanks) also pointed out that if data storage is an issue, you can strip the stopwords from the data - MySQL ignores words less than four characters long as well as whole list of longer common words.

Transactional Full-Text Search in MySQL

Monday, July 9th, 2007

One of the issues with using Full-text search in MySQL is that it requires the MyISAM table Engine. In MySQL tables need the InnoDB Engine to use transactions.

This means that we can only ever have full-text search, or transactions, but not both. Given that we really want transactions all the time*, we should generally be running with the InnoDB engine. Not to mention scaling and other issues**.

Enter the Shadow MyISAM Table Pattern to enable (near) Transactional Full-Text Search in MySQL.

All of the following code assumes assumes MySQL 5+.

Suppose you have a table with some content that you want to search using Full-Text, but also want to manage in a transactional environment:

CREATE TABLE content (
id int(10) unsigned NOT NULL auto_increment,
content text,
PRIMARY KEY (id)
) ENGINE=InnoDB;

We create a shadow table using MyISAM and a full-text index that maps the content using a foreign key:

CREATE TABLE content_search (
id int(10) unsigned NOT NULL auto_increment,
content_id int(10) unsigned NOT NULL,
content text,
PRIMARY KEY (id),
FULLTEXT KEY index_fulltext (content)
) ENGINE=MyISAM;

.Then we add some triggers to update the shadow table automatically AFTER INSERT and AFTER UPDATE:

CREATE TRIGGER insert_content_search AFTER INSERT ON content
FOR EACH ROW
INSERT INTO content_search (content_id, content) VALUES (NEW.id, NEW.content);
CREATE TRIGGER update_content_search AFTER UPDATE ON content
FOR EACH ROW
UPDATE content_search SET title = NEW.title, content = NEW.content WHERE content_id = NEW.id;

Changes in the content table are now automatically reflected in the content_search table and this table gives you access to MySQL’s full-text search capability:
SELECT * FROM content_search s LEFT JOIN content c ON c.id = s.content_id WHERE MATCH (content) AGAINST ('lorem ipsum')

One possible drawback of this technique is that it essentially doubles your storage requirements. An alternative in this case is not duplicating the field(s), but pushing them to a search table (so no content field in the content table - it sits in the content_search table and is joined as required. You would them lose the ability to use triggers to manage the data as well as losing transactions on this data.

The End


* Seriously, in any non-trivial application, transactions should be a minimum requirement
** I need to find the reference, but there’s a LiveJournal scaling document that insists that the use of InnoDB is a minimum for scaling MySQL effectively.

Note to self

Friday, July 6th, 2007

A couple of things that have stumped me today.

A syntax error in database.yml caused the TextMate model generator to die silently. In case of doubt, check the config.

After installing rFacebook on Dreamhost, the application would not pick up the gem without unpacking it and requiring the explicit path in production.rb:

require "#{RAILS_ROOT}/vendor/rfacebook-0.6.2/
lib/facebook_rails_controller_extensions"

Wrong Language

Monday, July 2nd, 2007

Don’t you hate it when you spend ages wondering why your code won’t work:

$queryString += $value;

And then realise that in PHP you use “.” to join strings:

$queryString .= $value;

Stupid PHP.