Monday, March 26, 2012

Poison Messages - 5 times's a charm

Hello again,
I have some poison message detection in place, based on the BOL sample. My problem is that after the 5th message retry my queue goes down - that is the fifth retry on any message. In actuallity, the first message is retried 3 times and it is taken off the queue [for real], the second message comes in and on the second retry - pooof - the queue is down.

I though the poison mechanism should work on a per message basis. It there a setting for the queue I missed? Is my only chance for to fix this: re-enable the queue upon BROKER_QUEUE_DISABLED event notification?

Thanks,

Lubomir

The poison message counter is based on RECEIVE verbs being rolled back. So five RECEIVEs rolled back in a row and your queue is disabled. The number of individual messages received/rolled back is irelevant. Any RECEIVE commited will reset the counter to 0.

Unfortunately this is not configurable in any way, you only have the BROKER_QUEUE_DISABLED notification as a way to react.

HTH,
~ Remus

|||Sorry to unmark the answer. I want to have a check after a ROLLBACK on the queue to see if the queue is DISBLED? i just could not figure out the TSQL for that?

if QUEUE is disabled
ALTER [dbo].[ProcessQueue] WITH STATUS = ON

EVENT NOTIFICATION does not allow for a direct TSQL exectuion from what I have seen so far, getting the status of queue and acting on it is not straight forward.

Thanks,

Lubomir|||

Event notifications are deliverd through normal Service Broker messages, so the way to execute TSQL on an event notification is to have an activated procedure on the queue that receives the event notification and execute the TSQL in it. The name of the queue being disabled will be part of the notification message.

To get the queue status use the is_receive_enabled or is_enqueue_enabled in sys.service_queues:

if (1 = (select is_receive_enabled from sys.service_queues where name = '...'))
begin
alter queue [...] with status = on;
end

Trying to enable back the queue from the activated procedure, after the rollback, however, will probably not work, because the queue disabling mechanism is asynchronous so you're verification check query will sometimes get the status disabled, sometimes not.

I recommend you try to avoid the rollbacks in the first place. If you try to incorporate rollback in the logic then you might hit a real poison message situation in production and you system will halt to a grind, spining on this one message.

HTH,
~ Remus

|||Thank you for the reply Remus,
My logic does take out the message after 3 retries, so I figure I should be safe, no?
I perform the rollbacks to freshly process the message in the hope conditions that caused the failure have disappeared - slim chance on that and probably something that will cause more trouble than good.
What do yo think?

Lubomir|||

If what you're looking for is a retry mechanism, then you should probably consider dialog timers, perhaps used in a fashion similar to this:

BEGIN TRANSACTION;
WAITFOR(RECEIVE ... FROM ...), TIMEOUT ...;
IF MESSAGE TYPE IS 'UNRELIABLE REQUEST'
BEGIN
INSERT REQUEST MESSAGE BODY INTO USER TABLE;
BEGIN DIALOG TIMER @.someretrytimeout;
END
ELSE IF MESSAGE TYPE IS 'TIMER'
BEGIN
SELECT REQUEST MESSAGE BODY FROM USER TABLE;
-- Re-arm the timer
BEGIN DIALOG TIMER @.someretrytimeout;
END
COMMIT;

DO UNRELIABLE WORK HERE BASED ON THE RECEIVED/SELECTED REQUEST

IF SUCCEEDED
BEGIN
BEGIN TRANSACTION
SEND BACK RESPONSE;
-- RESET TIMER TO 0 (or END CONVERSATION, as appropiate)
BEGIN DIALOG TIMER 0;
COMMIT
END

This way you separate the error prone processing outside the transaction and you don't need to rollback on failure. Also, you get to controll the retry details: number of retries, time between retries, backout policy etc.

HTH,
~ Remus

No comments:

Post a Comment