ruby on rails - ActiveRecord: Handling DB races among workers -
i have rails 3 project running on top of postgresql 9.0.
use case: users can request follow artists
name. this, submit list of names rest resource. if can't find artist
name in local collection, consult last.fm information them, , cache information locally. process can take time, delegated background job called indexartistjob
.
problem: indexartistjob
run in parallel. thus, possible 2 users may request add same artist
@ same time. both users should have artist
added collection, 1 artist
should end in local database.
relevant portions of artist
model are:
require 'services/lastfm' class artist < activerecord::base validates_presence_of :name validates_uniqueness_of :name, :case_sensitive => false def self.lookup(name) artist = artist.find_by_name(name) return artist if not artist.nil? info = lastfm.get_artist_info(name) return if info.nil? # check local db again corrected name. if name.downcase != info.name.downcase artist = artist.find_by_name(info.name) return artist if not artist.nil? end artist.new( :name => info.name, :image_url => info.image_url, :bio => info.bio ) end end
the indexartistjob
class defined as:
class indexartistjob < struct.new(:user_id, :artist_name) def perform user = user.find(user_id) # may return new, uncommitted artist model, or existing, committed one. artist = artist.lookup(artist_name) return if artist.nil? # presume thread pre-empted here long enough time such # work done worker violates db's unique constraint. user.artists << artist rescue activerecord::recordnotunique # lost race, defer winning model user.artists << artist.lookup(artist_name) end end
what i'm trying here let each worker commit new artist
finds, hoping best. if conflict occur, want slower worker(s) abandon work did in favor of artist
inserted, , add artist
specified user.
i'm aware of fact rails validators no substitute actual data integrity checking @ level of database. handle this, added unique index on artist table's lowercased name field handle (and use searching). now, if understand documentation correctly, ar's association collection commits changes item being added (artist
in case) , underlying collection in transaction. can't guaranteed artist
added.
am doing correctly? if so, there nicer way it? feel structuring around exceptions accentuates fact problem 1 of concurrency, , bit subtle.
sounds use simple queuing mechanism. using database table:
when "front-end" thread discovers missing artist, have write artist name table status "waiting" (have unique index on artist name can happen once).
meanwhile background thread/process sits in loop , queries table new jobs:
a) start transaction
b) find first artist status="waiting"
c) update artist status "processing"
d) end transactionthe background thread indexes artist. noone else try because can see status "processing".
when finished, background thread deletes artist table.
using method, run multiple background threads increase concurrency on artist indexing.
also @ beanstalk manage process. see http://railscasts.com/episodes/243-beanstalkd-and-stalker.
Comments
Post a Comment