Archive for the ‘MySQL’ Category

backup_fu: making Amazon S3 backups simple

Sunday, February 17th, 2008

My favourite new plugin for Ruby on Rails is backup_fu.

backup_fu makes managing Amazon S3 backups really simple.

I have been running my own Rake Task for MySQL Backup to Amazon S3, but I’ve switched my projects over to backup_fu instead because it has much better control over your backups and will even backup stored files.

I’ve also created a patch for backup_fu that allows you to specify the mysqldump options. It should get rolled into an upcoming release, but if you would like it sooner, just contact me at toby@info-architects.net.

Amazon SimpleDB: death of the database?

Saturday, December 15th, 2007

With the announcement of Amazon SimpleDB Scoble has asked if Amazon Web Services are going to kill MySQL and Oracle.

I think the short answer is no, but the game is changing rapidly, and Amazon is at the vanguard.

SimpleDB basically acts like a big structured bucket.
The Model is represented by Domains. A Domain can have Attributes. Attributes are key/value pairs.

There is no schema attached to Domains, you can PUT any combination of attribute-values in the domain:
PUT (item, 123), (description, sweater), (color, blue), (color, red)
PUT (item, 456), (description, dress shirt), (color, white), (color, blue)
PUT (item, 789), (description, shoes), (color, black), (material, leather)

The real issue here is that developers are strongly tied to the RDBMS. Frameworks assume you are running on top of a relational database, so there is an instant barrier to entry. However, I suspect it won’t be long before the first set of libraries and drivers for SimpleDB are developed.

The much bigger issues are in the features that SimpleDB doesn’t have:

  • No ACID
  • No transactions (see ACID, above)
  • No relationships
  • No data integrity
  • No SQL
  • Latency

SQL is actually pretty incredible - it makes accessing structured data very simple. If you’ve ever played with other types of data store you miss SQL almost instantly (Prevayler, anyone?).

All of that said, Amazon continues to lead the way with Web Services. Consider SimpleDB a warning shot to the database incumbents.

Help Wanted: Advanced Slicehost with Capistrano

Monday, December 3rd, 2007

I’m looking for someone to help me with some advanced server setup adminstration.

I have configured a Slicehost VPS to deploy my applications, but I need to get MySQL Replication working, along with some Capistrano/Deprec Recipes to automate deployment and configuration.

If anyone is interested in some paid work, drop me a line:

Rake Task for MySQL Backup to Amazon S3

Saturday, August 25th, 2007

There are a couple of these around, but I rolled my own because I wanted something quite simple.

# S3 Backup Task for MySQL
# Assumes InnoDB tables
# Database User needs the "reload" permission on the database (for --flush-logs in mysqldump)
#
# Stores files in Amazon S3 using the excellent AWS Gem: http://amazon.rubyforge.org/
# For information about Amazon S3: http://aws.amazon.com/s3
#
# Installation
# 1) Install AWS Gem
# 2) Enter your S3 Bucket, access_key_id and secret_access_key in this file
# 3) Run (rake db:backup)
#
# Inspired by code from:
#   http://blog.craigambrose.com/articles/2007/03/01/a-rake-task-for-database-backups
#   http://www.rubyinside.com/advent2006/15-s3rake.html

namespace :db do
  require "aws/s3"

  desc "Backup database to Amazon S3"

  task :backup => :environment do
    BUCKET = "bucket"

    begin
      AWS::S3::Base.establish_connection!(
        :access_key_id     => "access_key_id ",
        :secret_access_key => "secret_access_key "
      )

      db_config = ActiveRecord::Base.configurations[RAILS_ENV]

      backup_path = "db/backup"
      File.makedirs(backup_path)

      datestamp = Time.now.strftime("%Y%m%d%H%M%S")
      file_name = "#{RAILS_ENV}_#{db_config['database']}_#{datestamp}.sql.gz"

      backup_file = File.join(backup_path, file_name)

      sh "mysqldump -u #{db_config['username']} -p#{db_config['password']} --single-transaction --flush-logs --add-drop-table --add-locks --create-options --disable-keys --extended-insert --quick #{db_config['database']} | gzip -c > #{backup_file}"
      puts "Created backup: #{file_name}"

      puts "Storing file in S3: #{BUCKET}"
      AWS::S3::S3Object.store(file_name, open(backup_file), BUCKET)

      puts "Backup Complete"
    rescue AWS::S3::ResponseError => error
      puts error.response.code.to_s + ": " + error.message
    end

  end

end

More on Triggers

Tuesday, July 10th, 2007

I realised this morning as I played with my unit tests, based on yesteday’s post on Transactional Full-Text Search in MySQL that there is some potential for bad data. MySQL is smart enough to not create duplicates in the search table. If there’s an existing entry (in the case where a delete has not removed an old shadow-copy of the data) - MySQL will simply update the values. However, if there is no entry (because of some error when the content was created) MySQL is not quite smart enough to create one.

We can, however, do this in our trigger:

DELIMITER //;
CREATE TRIGGER content_update_search AFTER UPDATE ON pages
FOR EACH ROW BEGIN
IF (SELECT content_id FROM content_search WHERE content_id = NEW.id) IS NOT NULL THEN
UPDATE content_search SET title = NEW.title, content = NEW.content WHERE content_id = NEW.id;
ELSE
INSERT INTO content_search (content_id, content) VALUES (NEW.id, NEW.content);
END IF;
END//

If we find an existing entry, we update, if no entry exists, we create one.

An anonymous reader (thanks) also pointed out that if data storage is an issue, you can strip the stopwords from the data - MySQL ignores words less than four characters long as well as whole list of longer common words.

Transactional Full-Text Search in MySQL

Monday, July 9th, 2007

One of the issues with using Full-text search in MySQL is that it requires the MyISAM table Engine. In MySQL tables need the InnoDB Engine to use transactions.

This means that we can only ever have full-text search, or transactions, but not both. Given that we really want transactions all the time*, we should generally be running with the InnoDB engine. Not to mention scaling and other issues**.

Enter the Shadow MyISAM Table Pattern to enable (near) Transactional Full-Text Search in MySQL.

All of the following code assumes assumes MySQL 5+.

Suppose you have a table with some content that you want to search using Full-Text, but also want to manage in a transactional environment:

CREATE TABLE content (
id int(10) unsigned NOT NULL auto_increment,
content text,
PRIMARY KEY (id)
) ENGINE=InnoDB;

We create a shadow table using MyISAM and a full-text index that maps the content using a foreign key:

CREATE TABLE content_search (
id int(10) unsigned NOT NULL auto_increment,
content_id int(10) unsigned NOT NULL,
content text,
PRIMARY KEY (id),
FULLTEXT KEY index_fulltext (content)
) ENGINE=MyISAM;

.Then we add some triggers to update the shadow table automatically AFTER INSERT and AFTER UPDATE:

CREATE TRIGGER insert_content_search AFTER INSERT ON content
FOR EACH ROW
INSERT INTO content_search (content_id, content) VALUES (NEW.id, NEW.content);
CREATE TRIGGER update_content_search AFTER UPDATE ON content
FOR EACH ROW
UPDATE content_search SET title = NEW.title, content = NEW.content WHERE content_id = NEW.id;

Changes in the content table are now automatically reflected in the content_search table and this table gives you access to MySQL’s full-text search capability:
SELECT * FROM content_search s LEFT JOIN content c ON c.id = s.content_id WHERE MATCH (content) AGAINST ('lorem ipsum')

One possible drawback of this technique is that it essentially doubles your storage requirements. An alternative in this case is not duplicating the field(s), but pushing them to a search table (so no content field in the content table - it sits in the content_search table and is joined as required. You would them lose the ability to use triggers to manage the data as well as losing transactions on this data.

The End


* Seriously, in any non-trivial application, transactions should be a minimum requirement
** I need to find the reference, but there’s a LiveJournal scaling document that insists that the use of InnoDB is a minimum for scaling MySQL effectively.


Close
E-mail It