Faisal's Interactions


Beanstalk – Part 2

Posted in Uncategorized by f10i on November 3, 2009
Tags: , ,

Now that we have setup the beanstalk server, it’s time to start using it. First thing we need to do is to install a beanstalk client. Check this list for a client appropriate to your programming language. I am using ruby, so I installed my client using rubygems:

gem install beanstalk-client

All following code is run using ruby’s interactive environment, irb. Feel free to use a different environment, or even write a script file and run that. I prefer the interactive environment as it gives me immediate feedback of my work.

First thing you need to do, is to connect to the beanstalk server:
bs = Beanstalk::Pool.new('127.0.0.1:11300')

You can check the status of your beanstalk server:
bs.stats

To add a new message to the queue, you simply “put” it:
bs.put('Hello World!')

Or, if you are like me, whenever I am sending complex data (anything other than a single string), I use Hashes. Beanstalk offers the “yput” convenience method that converts the input data into a YAML string.
bs.yput({:name => 'Faisal', :blog => 'f10i.wordpress.com'})

Queues are a First-In-First-Out (FIFO) data structure. That means that messages are fetched in the same order they were placed in the queue. You can modify that behavior by assigning priority to messages. A message with high priority will be fetched before messages with lower priority, regardless of its position in the queue.

Priority is indicated by an integer number, the smaller the number the higher the priority. By default all messages are added with priority 65536. Messages that have the same priority number are fetched according to normal FIFO.
bs.put("Message with high priority", 100)
bs.put("Message with low priority", 500)
bs.put("Message with lowest priority")
bs.put("Message with highest priority", 1)
bs.put("Another message with low priority", 500)

The above messages would be fetched in the following order: 4 – 1 – 2 – 5 – 3

You can also delay a message from entering the queue for a specified amount of time. By default all messages have delay 0, and are immediately available for consumption
bs.put("This message will not be available until 20 seconds pass", 65536, 20)

The final option to adding a message to the queue is to assign it a time-to-run value. What this does is that once a client “reserves” a message (explained below), the client would have time-to-run seconds to manipulate the message and either delete, release or bury the message within that time (those actions are explained below). If the client fails to take any of those actions within the time-to-run, the server would automatically release the job, and make it available again in the queue. By default, all messages have 120 seconds as time-to-run.
bs.put("Message with high priority (1), delayed 0 seconds, and have 1 hour time to run", 1, 0, 3600)

Now that we know how to add messages to the queue, let’s get that information out of the queue. You fetch a message by “reserving” it so that no other client can get that message while your client is processing it.
message = bs.reserve

Note that reserve is a BLOCKING method call. What this means is that when you call reserve it’ll try to fetch a message from the queue. If no message is available, it will block, and keep waiting for a message on the queue. Your code won’t progress until a message is retrieved. If what you want is to check for a message and return regardless if a message exists or not, you need to specify a timeout to the reserve method call:
message = bs.reserve(2)

The code will now try to fetch a message from the queue. If no message is available, it will wait for 2 seconds, before throwing a timeout exception.

You can check the message status, using the “stats” method:
message.stats

Now that we fetched a message from the queue, call the “body” method to get the message contents. If you used “yput” to place a YAML representation of your message, you can call the “ybody” method to unserialize the YAML string.
bs.put("Hello There")
message = bs.reserve
message.body -> Hello There
bs.yput({:name => 'Faisal', :blog => 'f10i.wordpress.com'})
message = bs.reserve
message.ybody -> {:name => 'Faisal', :blog => 'f10i.wordpress.com'}

Once you reserve a message, you have time-to-run seconds (default 120) to process this message and issue one of the following commands:
message.delete -> Deletes the message
message.release -> Releases the message back into the queue and becomes available for fetching again
message.bury -> Didn't actually use this command, so don't know much about it

If you do not take one of those actions within the time-to-run, the server would automatically release the message.

If message processing is taking too long, and you do not want to take any of those actions, you can “touch” the message to reset the time-to-run counter.
message.touch

One last thing you can do with a message is to “put_back” in the queue.
message.put_back

This is similar to “release” in the sense that it puts the message back in the queue. However some major differences do exist:

  1. Whereas “release” method releases the message back into the queue, the “put_back” method creates a new identical message and puts the new message in the queue. The old message is not modified or manipulated at all.
  2. The “put_back” action is not considered one of the actions to be done within the time-to-run, and running it does not remove the necessity to take one of the actions mentioned earlier

And that’s about it for beanstalk clients. In my next article, I’ll explore beanstalk queues, or tubes, which allows you to have multiple simultaneous separate queues to add and consume messages.

Beanstalk – Part 1

Posted in Uncategorized by f10i on October 29, 2009
Tags: , ,

As I mentioned in my previous post, I will write some articles about Beanstalk queuing server to help share what I find out about it. In this post I will cover installing and running Beanstalk.

One point worth mentioning is that Beanstalk is an in-memory based queuing system. What this means that everything is stored in the memory. So if the power goes out, the machine hangs, or beanstalk terminates for any reason, any and all jobs in the queue will be lost. Of course this behavior can be changed with startup options (explained below).

Beanstalk requires having libevent installed in order for it to work. So lets start by installing libevent. In a terminal on your linux or mac machines run the following commands:


wget http://monkey.org/~provos/libevent-1.4.12-stable.tar.gz
tar -xzf libevent-1.4.12-stable.tar.gz
cd libevent-1.4.12-stable
./configure
make
make install

Now that you have libevent installed, we need to install beanstalk:

wget http://xph.us/dist/beanstalkd/beanstalkd-1.4.2.tar.gz
tar -xzf beanstalkd-1.4.2.tar.gz
cd beanstalkd-1.4.2
./configure
make
make install

That wasn’t too hard :)

To run beanstalk, all you have to do now is run the command “beanstalkd”. However, there are some interesting options you can pass to the command:

  • -d to detach the process, or to run it as a daemon. The process will be removed from the foreground and you will get your terminal back, but beanstalk will be still working in the background.
  • -b as mentioned above, beanstalk is a memory-based queue. To let your jobs persist power outages or server crashes the -b option would store any job beanstalk receives into a binary log file. If beanstalk terminates for any reason, you can start it again with the same -b option, and your jobs will be restored.
  • -s BYTES Limit binary log file size to BYTES maximum. Default 10485760
  • -l specify the address to listen to. Default 0.0.0.0
  • -p specify the port to listen to. Default 11300
  • -u which user to run as
  • -z BYTES limit job size to BYTES maximum. Default 65535

I am running the server with the following command: beanstalkd -d -l 127.0.0.1 -p 11300 -b PATH_TO_BIN_DIR

Now we have a running beanstalkd server. In my next article, I’ll talk about how to use a client to connect to the running beanstalkd instance, and interact with it.

Web Queues

Posted in Uncategorized by f10i on October 29, 2009
Tags: , , , , ,

When writing a web application, developers usually think in the “user request” -> “server processing” -> “response” cycle. And they are right to think so, since this is how HTTP works. However sometimes you need to accomplish tasks outside the scope of this cycle, to run scheduled jobs, long running jobs or callback jobs.

There are many ways to go about this, but the best way IMO, is to have a separate worker listening to a queue. What you do is that you create a queue, and whenever you want to run a job outside the scope of the web cycle, you push a message to the queue. Separate workers would exist and listen to that queue, and whenever there is a new job, they will fetch that job and run it.

So far, I have tried using three different systems for queuing jobs: ActiveMQ, Amazon SQS, and most recently Beanstalk. I have to say that my personal preference is Beanstalk. ActiveMQ was a complete pain to install and configure, and I actually gave up on it before getting it to run.

Amazon SQS has no setup or configuration since it is a service offered by Amazon. However, not having the queue locally means slower response times, and you don’t get that feeling of having full control over your jobs. More importantly, for some reason, some jobs would get completely lost and I can’t figure where they went. Having to manually reinsert jobs into queues every so often became tedious quickly, and after a year of this we started looking for an alternative.

Beanstalk is an extremely fast and lightweight queuing server, has almost no setup overhead (you have to compile from source but that’s easily done on mac or linux), and can take on some serious load. The only problem I found with beanstalk is that it desperately lacks good documentation online, as I find myself resorting to the source code to figure out what functionality it offers.

As such, I will write articles about Beanstalk whenever I find a new feature that I didn’t know about to help spread the knowledge about this amazing queuing server. Stay tuned.


Follow

Get every new post delivered to your Inbox.