2015-12-10 22:15:13 |
Travis Tripp |
description |
The error handling for failure to index in ElasticSearch is not done properly right now. [1] [2]
If it succeeds, it is returning oslo_messaging.NotificationResult.HANDLED
If it fails, it just returns (implicit None)
However, according to OSLO notification handling docs [3]:
"An endpoint method can explicitly return oslo_messaging.NotificationResult.HANDLED to acknowledge a message or oslo_messaging.NotificationResult.REQUEUE to requeue the message.
The message is acknowledged only if all endpoints either return oslo_messaging.NotificationResult.HANDLED or None."
In addition, right now, we are not specifying whether or not to allow requeue when getting the listener:
https://github.com/openstack/searchlight/blob/099df8875ef344f4b909e6673b3201c0c7efbc96/searchlight/listener.py#L108
We must address this and must consider the following concerns:
* Plugins probably need to throw a "Fatal Exception" when a document failure will not benefit from requeue. For example, a data mapping failure will not change if tried again whereas a network failure will.
* Completely failed requests should potentially go to an error queue with a time to live for additional investigation
* We should consider a BP for publishing the status of a particular resource and its current coherency (e.g. /plugins includes errors count or something to that effect).
* Finally as the pipeline of publishers is worked through as part or the notification forwarding spec, we should consider how this should be done across publishers.
Reference:
[1] https://github.com/openstack/searchlight/blob/master/searchlight/elasticsearch/plugins/base.py#L469-L477
[2] https://github.com/openstack/searchlight/blob/master/searchlight/elasticsearch/plugins/designate/notification_handlers.py#L90-L92
[3] http://docs.openstack.org/developer/oslo.messaging/notification_listener.html |
The error handling for failure to index in ElasticSearch is not done properly right now. [1] [2]
If it succeeds, it is returning oslo_messaging.NotificationResult.HANDLED
If it fails, it just returns (implicit None)
However, according to OSLO notification handling docs [3]:
"An endpoint method can explicitly return oslo_messaging.NotificationResult.HANDLED to acknowledge a message or oslo_messaging.NotificationResult.REQUEUE to requeue the message.
The message is acknowledged only if all endpoints either return oslo_messaging.NotificationResult.HANDLED or None."
In addition, right now, we are not specifying whether or not to allow requeue when getting the listener:
https://github.com/openstack/searchlight/blob/099df8875ef344f4b909e6673b3201c0c7efbc96/searchlight/listener.py#L108
We must address this and must consider the following concerns:
* Plugins probably need to throw a "Fatal Exception" when a document failure will not benefit from requeue. For example, a data mapping failure will not change if tried again whereas a network failure will.
* Completely failed requests should potentially go to an error queue with a time to live for additional investigation
* We should consider a BP for publishing the status of a particular resource and its current coherency (e.g. /plugins includes errors count or something to that effect).
* Finally as the pipeline of publishers is worked through as part or the notification forwarding spec, we should consider how this should be done across publishers.
Reference:
[1] https://github.com/openstack/searchlight/blob/master/searchlight/elasticsearch/plugins/base.py#L469-L477
[2] https://github.com/openstack/searchlight/blob/master/searchlight/elasticsearch/plugins/designate/notification_handlers.py#L90-L92
[3] http://docs.openstack.org/developer/oslo.messaging/notification_listener.html
[4] https://blueprints.launchpad.net/searchlight/+spec/document-es-failure-handling |
|