Note to self: server date and Amazon S3

July 24th, 2007

Just lost an hour of my life trying to work out why the test server received 403 errors when attempting to store an image using Amazon S3.

The server date was a day out and S3 uses the date as part of its authentication scheme.

Note to self: check server date when seeing inexplicable errors in Amazon S3.

Code Audit & Review

July 22nd, 2007

An independent code audit and review can help ensure your project is on track by providing an independent and expert view of development progress.

If your code is complete, an audit can ensure that your product is rock-solid and production ready.

Factors to consider:

  • Application Security
  • Scalability & Performance
  • Code Conventions
  • Code Quality
  • Test Coverage
  • Data Privacy
  • User Interaction
  • Information Architecture

Contact me for more information.

Trivial, Hard and not going to do it

July 17th, 2007

Charles Miller has posted a great article entitled Understanding Engineers Feasibility that dealing with classes of problems as approached by software engineers. Estimation is notoriously difficult at the best of times, but some classes of problems are more difficult than others.

In my very first development job out of university I was once asked to quote on how long it would take to develop some discussion forum software that would translate between English and Japanese in real-time to facilitate conversation between tourists and locals in an area near Brisbane.

My answer was “20 years and many millions of dollars”. The sales rep mostly hated me after that, but I thought she was joking at the time.

The big issues in this type of problem assessment boils down to the difference between Known Unknowns and Unknown Unknowns.

Some problems have Known Unknowns. Recently I was developing some code that relied on a solution to Subset-Sum (one of the NP-Complete family). We all know that solving NP-Complete is very hard - lot’s of mathematicians have been trying for a long time. However, there are some solutions that, while not being perfect from a theoretical stand point (as in, not provable), are good enough for real-world software. 

Unknown Unknowns are a different matter. If you don’t know the things you don’t know … well, you’re in trouble. I’ve recently been involved in conceptual work with something that will involve natural language processing and sifting through massive amounts of data in real-time. I have no idea what would be involved in approaching this type of problem … and the problems we don’t know that we don’t know are the ones that prove very hard indeed.
And as Charles concludes:

Very Hard is the extreme of hard problems. You’ll often see both words capitalised for emphasis, even in the middle of a sentence. Indexing the entire World Wide Web and providing relevant search results in millisecond response times is a Very Hard problem. Breaking commercial-grade encryption within practical hardware and time limitations is a Very Hard problem. Peace in the Middle East is a Very Hard problem.

‘Very Hard’ is usually reserved for the class of problem that if you solved it, you could change the world. Or at least build a successful business on top of your solution.

Contact Toby

July 11th, 2007

More on Triggers

July 10th, 2007

I realised this morning as I played with my unit tests, based on yesteday’s post on Transactional Full-Text Search in MySQL that there is some potential for bad data. MySQL is smart enough to not create duplicates in the search table. If there’s an existing entry (in the case where a delete has not removed an old shadow-copy of the data) - MySQL will simply update the values. However, if there is no entry (because of some error when the content was created) MySQL is not quite smart enough to create one.

We can, however, do this in our trigger:

DELIMITER //;
CREATE TRIGGER content_update_search AFTER UPDATE ON pages
FOR EACH ROW BEGIN
IF (SELECT content_id FROM content_search WHERE content_id = NEW.id) IS NOT NULL THEN
UPDATE content_search SET title = NEW.title, content = NEW.content WHERE content_id = NEW.id;
ELSE
INSERT INTO content_search (content_id, content) VALUES (NEW.id, NEW.content);
END IF;
END//

If we find an existing entry, we update, if no entry exists, we create one.

An anonymous reader (thanks) also pointed out that if data storage is an issue, you can strip the stopwords from the data - MySQL ignores words less than four characters long as well as whole list of longer common words.

Transactional Full-Text Search in MySQL

July 9th, 2007

One of the issues with using Full-text search in MySQL is that it requires the MyISAM table Engine. In MySQL tables need the InnoDB Engine to use transactions.

This means that we can only ever have full-text search, or transactions, but not both. Given that we really want transactions all the time*, we should generally be running with the InnoDB engine. Not to mention scaling and other issues**.

Enter the Shadow MyISAM Table Pattern to enable (near) Transactional Full-Text Search in MySQL.

All of the following code assumes assumes MySQL 5+.

Suppose you have a table with some content that you want to search using Full-Text, but also want to manage in a transactional environment:

CREATE TABLE content (
id int(10) unsigned NOT NULL auto_increment,
content text,
PRIMARY KEY (id)
) ENGINE=InnoDB;

We create a shadow table using MyISAM and a full-text index that maps the content using a foreign key:

CREATE TABLE content_search (
id int(10) unsigned NOT NULL auto_increment,
content_id int(10) unsigned NOT NULL,
content text,
PRIMARY KEY (id),
FULLTEXT KEY index_fulltext (content)
) ENGINE=MyISAM;

.Then we add some triggers to update the shadow table automatically AFTER INSERT and AFTER UPDATE:

CREATE TRIGGER insert_content_search AFTER INSERT ON content
FOR EACH ROW
INSERT INTO content_search (content_id, content) VALUES (NEW.id, NEW.content);
CREATE TRIGGER update_content_search AFTER UPDATE ON content
FOR EACH ROW
UPDATE content_search SET title = NEW.title, content = NEW.content WHERE content_id = NEW.id;

Changes in the content table are now automatically reflected in the content_search table and this table gives you access to MySQL’s full-text search capability:
SELECT * FROM content_search s LEFT JOIN content c ON c.id = s.content_id WHERE MATCH (content) AGAINST ('lorem ipsum')

One possible drawback of this technique is that it essentially doubles your storage requirements. An alternative in this case is not duplicating the field(s), but pushing them to a search table (so no content field in the content table - it sits in the content_search table and is joined as required. You would them lose the ability to use triggers to manage the data as well as losing transactions on this data.

The End


* Seriously, in any non-trivial application, transactions should be a minimum requirement
** I need to find the reference, but there’s a LiveJournal scaling document that insists that the use of InnoDB is a minimum for scaling MySQL effectively.

Note to self

July 6th, 2007

A couple of things that have stumped me today.

A syntax error in database.yml caused the TextMate model generator to die silently. In case of doubt, check the config.

After installing rFacebook on Dreamhost, the application would not pick up the gem without unpacking it and requiring the explicit path in production.rb:

require "#{RAILS_ROOT}/vendor/rfacebook-0.6.2/
lib/facebook_rails_controller_extensions"

Wrong Language

July 2nd, 2007

Don’t you hate it when you spend ages wondering why your code won’t work:

$queryString += $value;

And then realise that in PHP you use “.” to join strings:

$queryString .= $value;

Stupid PHP.

Dear Blogosphere: Shut up about the iPhone

July 2nd, 2007

Dear Blogosphere,

Shut up about the iPhone.

I don’t want to hear it.

I am already totally over the iPhone, and won’t even see one until 2008, at the earliest. See, Dear Blogosphere, not all of us live in the US of A.

Not all of are effected by the apparent revolution caused by some people strapping a cell phone onto an iPod and hooking it up to a network that barely works.

======================

It does seem apparent that Apple is going to do to the mobile phone market what they did to the MP3 player market. If they can survive the telecommunications companies - all reports are saying that AT&T are really dropping the ball. In the music market, Apple could get traction without playing directly with  the Record Companies, because people already had music collections. Once the iPod took-off, Apple had some leverage to play with. In the telecommunications world, things are very different … you can’t have an iPhone without the network, which means playing with the telcos. Australia’s equivalent of AT&T, Telstra, has a long history of customer abuse and mismanagement, but it’s the dominant player (read: only player in some areas of Australia) and Apple may be forced to deal with them.

Multimedia died for a reason

June 28th, 2007

The 37 Signals Blog has stirred up controversy again with a post about HTML, CSS, and JavaScript:

On the user experience side of things, we’re not even close to tapping out the potential of HTML. The majority of web sites and applications still suck.

Of course, the flame-war began immediately:

“Flex/Flash/Apollo is totally the future”
“No it isn’t”

I think that is definitely true that we have only scratched the surface of HTML. It’s only in the last couple of years that HTML, JavaScript and CSS have really become advanced, stable and widespread enough to be used for complex application development.  Even features that have gained ubiquity, like auto-complete text fields that talk back to the server, are very recent additions to the developer’s toolkit.

Having worked with WebStart for several years now, the hardest problem to solve is the additional installation. Java makes this particularly hard, but when you use a runtime on-top of the browser, you will always have this additional barrier to adoption. Rather than prospective customers being able to use your application straight away, you are placing an extra hurdle in front of them.

The problem is not insurmountable, but it is there.

On top of all of this, most RIAs I have seen don’t really do much more than a “vanilla” Web 2.0 application anyway. I come from a Multimedia background  (I did a lot of CD Authoring with Director in the 90s) and I’ve done far too much Swing, so I’ve seen many, many fantastically bad applications. There is a reason why software tends toward the a standard set of application principles - business applications don’t need much singing and dancing.

Auto-complete, edit-in-place, drag/drop, lists, options are all standard fare with HTML, CSS and JavaScript  About the only piece that is really missing from the stack is the ability to zoom effectively.

Whichever side of the debate you come down on, you have to admit that it’s going to be an interesting few years.