Bug #819604 “when an idle ssh transport is interrupted, bzrlib e...” : Series 2.4 : Bugs : Bazaar

Jelmer Vernooij (jelmer) on 2011-08-02

Changed in bzr:
status:	New → Triaged
status:	Triaged → Confirmed
importance:	Undecided → High

Jelmer Vernooij (jelmer) on 2011-08-02

Changed in bzr:
importance:	High → Critical
assignee:	nobody → canonical-bazaar (canonical-bazaar)

Revision history for this message

John A Meinel (jameinel) wrote on 2011-08-05: Re: [Bug 819604] Re: when an idle ssh transport is interrupted, bzrlib errors

#1

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 8/2/2011 12:45 PM, Jelmer Vernooij wrote:
> ** Changed in: bzr Importance: High => Critical
>
> ** Changed in: bzr Assignee: (unassigned) => canonical-bazaar
> (canonical-bazaar)
>

Is this actually critical?

Also, the initial report was "idle ssh transport". Monty's traceback is
certainly not 'idle' given that we are actively fetching content.

And while the transfer-of-content is roughly stateless, I'm not 100%
sure we want to default-reconnect SSH connections.

I suppose we could use a reasonable try-again-but-not-forever, or
try-once-but-fail-if-last-connect-failed. People with flaky connections
would get slow-but-useful connections. Restarting codehosting would
allow things to continue roughly interrupted, but the network going down
wouldn't cause us to spin indefinitely.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk48AXUACgkQJdeBCYSNAANUhQCgqCGM8zIhGfqife6E2TIaLwQx
Nc4AnRq18lIiBiyGM6t7/uy+SmXgQlXt
=8wVS
-----END PGP SIGNATURE-----

Revision history for this message

Jelmer Vernooij (jelmer) wrote on 2011-08-05:

#2

On 05/08/11 16:43, John A Meinel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 8/2/2011 12:45 PM, Jelmer Vernooij wrote:
>> ** Changed in: bzr Importance: High => Critical
>>
>> ** Changed in: bzr Assignee: (unassigned) => canonical-bazaar
>> (canonical-bazaar)
>>
> Is this actually critical?
I'm not sure if it is critical anymore. The timeout for codehosting
haproxy has now been increased, which prevents this from being a big
problem at the moment.
>
> Also, the initial report was "idle ssh transport". Monty's traceback is
> certainly not 'idle' given that we are actively fetching content.
I think the problem is that the connection was idle for a while - enough
for it to timeout. Trying to reuse it again later then triggered this
exception, at least that is my understanding of it.

> And while the transfer-of-content is roughly stateless, I'm not 100%
> sure we want to default-reconnect SSH connections.
>
> I suppose we could use a reasonable try-again-but-not-forever, or
> try-once-but-fail-if-last-connect-failed. People with flaky connections
> would get slow-but-useful connections. Restarting codehosting would
> allow things to continue roughly interrupted, but the network going down
> wouldn't cause us to spin indefinitely.
>
That sounds reasonable.

Cheers,

Jelmer

Revision history for this message

Robert Collins (lifeless) wrote on 2011-08-08:

#3

This is critical for LP: we're currently unable to deploy to the
codehosting service again, until this is fixed and pushed out to SRUs.

The logic being:
- we can't reasonably implement a server side 'please disconnect now'
trigger to the bzr server if it will cause all current clients to
error
- we can't deploy to the service if it needs an admin to baby sit it for hours
- and we can't set the idle time low enough to timeout on CI servers
if the timeout causes clients to error
- and so with no timeout and no explicit 'please stop now' deploys to
that service always end up having to interrupt a connection that is in
an unknown state.

Its also critical path for the faster ssh daemon stuff, because until
we can deploy reliably we can't half-deploy and experiment with
confidence.

-Rob

Revision history for this message

Martin Pool (mbp) wrote on 2011-08-09:

#4

This doesn't need to block bzr 2.4.0 release, so I'm going to
downgrade it to high, and also opening tasks for the backports. Now
the timeout is backed off enough it's not dropping all the time it's
not an operational incident.

> This is critical for LP: we're currently unable to deploy to the codehosting service again, until this is fixed and pushed out to SRUs.

I think you still can deploy, the same way you could deploy a month
ago. Clients will drop their connections and presumably recover in
some client-specific way, perhaps generating some noise.

Fixing this will get Launchpad closer to doing apparently-seamless
deployments, but we also have to fix things like bug 795025.

tags:

added: error-reporting hpss launchpad ssh

Martin Pool (mbp) on 2011-08-10

Changed in bzr:
importance:	Critical → High
assignee:	canonical-bazaar (canonical-bazaar) → Martin Pool (mbp)

Martin Pool (mbp) on 2011-08-10

description:

updated

Revision history for this message

Martin Pool (mbp) wrote on 2011-08-10: Re: when an idle ssh transport is interrupted, bzrlib errors

#5

The medium is a fairly thin wrapper around the socket/pipes for the connection. It seems it would be reasonable to either have the medium reconnect, or to throw away the medium and make a new one.

There are several _accept_bytes implementations; the most relevant ones here are SmartTCPClientMedium and (especially for Launchpad) SmartSSHClientMedium.

We can't just reconnect at arbitrary points in the protocol, only at the start of a request. The medium itself doesn't know about request boundaries so the higher-level client needs to be involved.

Retrying may be a bit complex if we're planning to send streaming data coming from a generator; so ideally we will find that the connection was closed and restart before we start taking anything from a generator. We could just leave this case unhandled and deal with it if it does come up: it wasn't here.

An alternative approach might be check if the pipe/socket has closed before we try to write the start of a request. In the case that the remote end has timed out and closed the socket and we are using an external ssh it will probably have already noticed this.

Revision history for this message

Andrew Bennetts (spiv) wrote on 2011-08-10: Re: [Bug 819604] Re: when an idle ssh transport is interrupted, bzrlib errors

#6

Martin Pool wrote:
> We can't just reconnect at arbitrary points in the protocol, only at the
> start of a request. The medium itself doesn't know about request
> boundaries so the higher-level client needs to be involved.

Actually, SmartClientMedium does know about requests: the are constructed via
its get_request[1] method, all the SmartClientStreamMedium classes track a
_current_request instance attribute. (This state has to be tracked so we can
generate TooManyConcurrentRequests errors.)

It's perhaps not clear in the API docs but _current_request is always set to
None when the current request/response cycle has been fully consumed, and I
think it would be reasonable to rely on this (and add a public API to expose
this fact cleanly, perhaps has_unfinished_request?). So you don't need to
involve the higher-level client necessarily.

> Retrying may be a bit complex if we're planning to send streaming data
> coming from a generator; so ideally we will find that the connection was
> closed and restart before we start taking anything from a generator. We
> could just leave this case unhandled and deal with it if it does come
> up: it wasn't here.

Also there's the tricky case of when you've fully sent a request but the
connection died before you got the response: did the request complete or not?
Some requests are read-only or might be idempotent, but not all of them, so you
can't retry a request without some higher-level knowledge about what can be
retried (or perhaps in some cases how to retry: e.g. if you don't know if your
lock_write call worked you can perhaps try a second lock_write and inspect the
LockContention details to see if it's actually your lock or not?).

And in the extreme there's no guarantee that the server hasn't started acting on
a request before receiving the complete request. Off the top of my head I don't
think we have any current requests where that's a practical issue (e.g.
insert_stream writes to a temporary file with a randomly generated name), but
probably worth quickly scanning the request implementations to make sure there's
no surprises here, and then documenting any new constraints this adds to the
protocol.

> An alternative approach might be check if the pipe/socket has closed
> before we try to write the start of a request. In the case that the
> remote end has timed out and closed the socket and we are using an
> external ssh it will probably have already noticed this.

At a guess we'll often first notice timed out connections when we start writing
a request (and the socket may even appear to be alive until we try that, e.g. if
a NAT table in a router has timed out our connection then our TCP stack may know
nothing about it). So perhaps a better alternative is “if the pipe/socket has
closed on our first attempt to write to it during a request, then try
reconnecting”?

[1] And at a glance I see a bug that SmartClientMedium doesn't actually define
get_request, although it's intended to define it as an abstract method. It
doesn't matter in practice because all subclasses do define it.

Martin Pool wrote:
> We can't just reconnect at arbitrary points in the protocol, only at the
> start of a request.  The medium itself doesn't know about request
> boundaries so the higher-level client needs to be involved.

Actually, SmartClientMedium does know about requests: the are constructed via
its get_request[1] method, all the SmartClientStreamMedium classes track a
_current_request instance attribute.  (This state has to be tracked so we can
generate TooManyConcurrentRequests errors.)

It's perhaps not clear in the API docs but _current_request is always set to
None when the current request/response cycle has been fully consumed, and I
think it would be reasonable to rely on this (and add a public API to expose
this fact cleanly, perhaps has_unfinished_request?).  So you don't need to
involve the higher-level client necessarily.

> Retrying may be a bit complex if we're planning to send streaming data
> coming from a generator; so ideally we will find that the connection was
> closed and restart before we start taking anything from a generator.  We
> could just leave this case unhandled and deal with it if it does come
> up: it wasn't here.

Also there's the tricky case of when you've fully sent a request but the
connection died before you got the response: did the request complete or not?
Some requests are read-only or might be idempotent, but not all of them, so you
can't retry a request without some higher-level knowledge about what can be
retried (or perhaps in some cases how to retry: e.g. if you don't know if your
lock_write call worked you can perhaps try a second lock_write and inspect the
LockContention details to see if it's actually your lock or not?).

And in the extreme there's no guarantee that the server hasn't started acting on
a request before receiving the complete request.  Off the top of my head I don't
think we have any current requests where that's a practical issue (e.g.
insert_stream writes to a temporary file with a randomly generated name), but
probably worth quickly scanning the request implementations to make sure there's
no surprises here, and then documenting any new constraints this adds to the
protocol.

> An alternative approach might be check if the pipe/socket has closed
> before we try to write the start of a request.  In the case that the
> remote end has timed out and closed the socket and we are using an
> external ssh it will probably have already noticed this.

At a guess we'll often first notice timed out connections when we start writing
a request (and the socket may even appear to be alive until we try that, e.g. if
a NAT table in a router has timed out our connection then our TCP stack may know
nothing about it).  So perhaps a better alternative is “if the pipe/socket has
closed on our first attempt to write to it during a request, then try
reconnecting”?

[1] And at a glance I see a bug that SmartClientMedium doesn't actually define
    get_request, although it's intended to define it as an abstract method.  It
    doesn't matter in practice because all subclasses do define it.

Revision history for this message

Martin Pool (mbp) wrote on 2011-08-10: Re: when an idle ssh transport is interrupted, bzrlib errors

#7

test-819604-disconnect.py Edit (706 bytes, text/x-python)

here's an external black box test that breaks the ssh connection to Launchpad

Revision history for this message

Martin Pool (mbp) wrote on 2011-08-11: Re: [Bug 819604] Re: when an idle ssh transport is interrupted, bzrlib errors

#8

Download full text (3.1 KiB)

On 10 August 2011 17:11, Andrew Bennetts
<email address hidden> wrote:
> Martin Pool wrote:
>> We can't just reconnect at arbitrary points in the protocol, only at the
>> start of a request. The medium itself doesn't know about request
>> boundaries so the higher-level client needs to be involved.
>
> Actually, SmartClientMedium does know about requests: the are constructed via
> its get_request[1] method, all the SmartClientStreamMedium classes track a
> _current_request instance attribute. (This state has to be tracked so we can
> generate TooManyConcurrentRequests errors.)

And of course we want to generate them ;-)

>
> It's perhaps not clear in the API docs but _current_request is always set to
> None when the current request/response cycle has been fully consumed, and I
> think it would be reasonable to rely on this (and add a public API to expose
> this fact cleanly, perhaps has_unfinished_request?). So you don't need to
> involve the higher-level client necessarily.

I see, so it's not exactly the medium that knows, but it's something
quite intimately related.

> Also there's the tricky case of when you've fully sent a request but the
> connection died before you got the response: did the request complete or not?
> Some requests are read-only or might be idempotent, but not all of them, so you
> can't retry a request without some higher-level knowledge about what can be
> retried (or perhaps in some cases how to retry: e.g. if you don't know if your
> lock_write call worked you can perhaps try a second lock_write and inspect the
> LockContention details to see if it's actually your lock or not?).
>
> And in the extreme there's no guarantee that the server hasn't started acting on
> a request before receiving the complete request. Off the top of my head I don't
> think we have any current requests where that's a practical issue (e.g.
> insert_stream writes to a temporary file with a randomly generated name), but
> probably worth quickly scanning the request implementations to make sure there's
> no surprises here, and then documenting any new constraints this adds to the
> protocol.
>
>> An alternative approach might be check if the pipe/socket has closed
>> before we try to write the start of a request. In the case that the
>> remote end has timed out and closed the socket and we are using an
>> external ssh it will probably have already noticed this.
>
> At a guess we'll often first notice timed out connections when we start writing
> a request (and the socket may even appear to be alive until we try that, e.g. if
> a NAT table in a router has timed out our connection then our TCP stack may know
> nothing about it). So perhaps a better alternative is “if the pipe/socket has
> closed on our first attempt to write to it during a request, then try
> reconnecting”?

Yes, something like that. In the case we care about typically there
will be an external ssh process in the line, so there's a bit more
buffering: we could write several things to ssh before it starts
writing to the network and on the other hand ssh may discover the
connection has closed even before we actually write anything. I think
some trials in realisti...

On 10 August 2011 17:11, Andrew Bennetts
<andrew-launchpad@bemusement.org> wrote:
> Martin Pool wrote:
>> We can't just reconnect at arbitrary points in the protocol, only at the
>> start of a request.  The medium itself doesn't know about request
>> boundaries so the higher-level client needs to be involved.
>
> Actually, SmartClientMedium does know about requests: the are constructed via
> its get_request[1] method, all the SmartClientStreamMedium classes track a
> _current_request instance attribute.  (This state has to be tracked so we can
> generate TooManyConcurrentRequests errors.)

And of course we want to generate them ;-)

>
> It's perhaps not clear in the API docs but _current_request is always set to
> None when the current request/response cycle has been fully consumed, and I
> think it would be reasonable to rely on this (and add a public API to expose
> this fact cleanly, perhaps has_unfinished_request?).  So you don't need to
> involve the higher-level client necessarily.

I see, so it's not exactly the medium that knows, but it's something
quite intimately related.

> Also there's the tricky case of when you've fully sent a request but the
> connection died before you got the response: did the request complete or not?
> Some requests are read-only or might be idempotent, but not all of them, so you
> can't retry a request without some higher-level knowledge about what can be
> retried (or perhaps in some cases how to retry: e.g. if you don't know if your
> lock_write call worked you can perhaps try a second lock_write and inspect the
> LockContention details to see if it's actually your lock or not?).
>
> And in the extreme there's no guarantee that the server hasn't started acting on
> a request before receiving the complete request.  Off the top of my head I don't
> think we have any current requests where that's a practical issue (e.g.
> insert_stream writes to a temporary file with a randomly generated name), but
> probably worth quickly scanning the request implementations to make sure there's
> no surprises here, and then documenting any new constraints this adds to the
> protocol.
>
>> An alternative approach might be check if the pipe/socket has closed
>> before we try to write the start of a request.  In the case that the
>> remote end has timed out and closed the socket and we are using an
>> external ssh it will probably have already noticed this.
>
> At a guess we'll often first notice timed out connections when we start writing
> a request (and the socket may even appear to be alive until we try that, e.g. if
> a NAT table in a router has timed out our connection then our TCP stack may know
> nothing about it).  So perhaps a better alternative is “if the pipe/socket has
> closed on our first attempt to write to it during a request, then try
> reconnecting”?

Yes, something like that.  In the case we care about typically there
will be an external ssh process in the line, so there's a bit more
buffering: we could write several things to ssh before it starts
writing to the network and on the other hand ssh may discover the
connection has closed even before we actually write anything.  I think
some trials in realistic cases will be needed.

Revision history for this message

Andrew Bennetts (spiv) wrote on 2011-08-11:

#9

Martin Pool wrote:
> > _current_request instance attribute. (This state has to be tracked so we can
> > generate TooManyConcurrentRequests errors.)
>
> And of course we want to generate them ;-)

Something like that ;)

Martin Pool (mbp) on 2011-08-12

summary:

- when an idle ssh transport is interrupted, bzrlib errors
+ when an idle ssh transport is interrupted, bzrlib errors; should
+ reconnect instead

Revision history for this message

John A Meinel (jameinel) wrote on 2011-09-13:

#10

Related to bug #824797

Changed in bzr:
assignee:	Martin Pool (mbp) → John A Meinel (jameinel)

John A Meinel (jameinel) on 2011-09-26

Changed in bzr:
status:	Confirmed → Triaged
status:	Triaged → In Progress

Revision history for this message

John A Meinel (jameinel) wrote on 2011-09-26:

#11

Download full text (6.0 KiB)

In my simple testing, I actually found that we don't discover we are disconnected until we *read* from the closed socket. I'm guessing the issue is that the write buffer is large enough vs the amount of content we are writing ('get', 'filename') that it just gets buffered until we wait to read back the response, and then the local socket notices that the remote one closed its end. I even tried an early socket.shutdown(SHUT_RD) on the server side, but still the only failure I got was during "_read_more."

This certainly makes it a bit more clumsy. Also, we default to buffering ~1MB (until the end of 'this write') before we actually write anything onto the socket.

All the layers and indirection, not to mention that sometimes we are writing to SSH, sometimes to 'simple pipes', sometimes to a socket, sometimes over HTTP, etc, etc.

I'm focusing on "(get)" as a first draft, because after doing "get_record_stream()" the first thing (at least older clients) do is probe the remote object for the real format, which involves reading .bzr/branch-format directly.

And I'm focusing on 'first thing after stream', because that is likely to be the bulk of the time spent, so it is the most likely time that we would end up interrupted. (And the most frustrating, since you'll finish 10min of data transfer, only to be interrupted checking if there are tags you might need.) An example traceback (bzr-2.1):

...
  File "C:\dev\bzr\bzr.dev\bzrlib\fetch.py", line 103, in __fetch
    self._fetch_everything_for_search(search_result)
  File "C:\dev\bzr\bzr.dev\bzrlib\fetch.py", line 134, in _fetch_everything_for_search
    stream = source.get_stream_for_missing_keys(missing_keys)
  File "C:\dev\bzr\bzr.dev\bzrlib\remote.py", line 2197, in get_stream_for_missing_keys
    self.from_repository._ensure_real()
  File "C:\dev\bzr\bzr.dev\bzrlib\remote.py", line 1089, in _ensure_real
    self.bzrdir._ensure_real()
  File "C:\dev\bzr\bzr.dev\bzrlib\remote.py", line 410, in _ensure_real
    self.root_transport, _server_formats=False)
  File "C:\dev\bzr\bzr.dev\bzrlib\bzrdir.py", line 858, in open_from_transport
    redirected)
  File "C:\dev\bzr\bzr.dev\bzrlib\transport\__init__.py", line 1683, in do_catching_redirections
    return action(transport)
  File "C:\dev\bzr\bzr.dev\bzrlib\bzrdir.py", line 845, in find_format
    transport, _server_formats=_server_formats)
  File "C:\dev\bzr\bzr.dev\bzrlib\controldir.py", line 769, in find_format
    return prober.probe_transport(transport)
  File "C:\dev\bzr\bzr.dev\bzrlib\bzrdir.py", line 1346, in probe_transport
    format_string = transport.get_bytes(".bzr/branch-format")
  File "C:\dev\bzr\bzr.dev\bzrlib\transport\remote.py", line 226, in get_bytes
    resp, response_handler = self._client.call_expecting_body('get', remote)
  File "C:\dev\bzr\bzr.dev\bzrlib\smart\client.py", line 145, in call_expecting_body
    method, args, expect_response_body=True)
  File "C:\dev\bzr\bzr.dev\bzrlib\smart\client.py", line 81, in _call_and_read_response
    expect_body=expect_response_body),
  File "C:\dev\bzr\bzr.dev\bzrlib\smart\message.py", line 299, in read_response_tuple
    self._wait_for_response_args()
  File "C:\dev\bzr\bzr.dev\bzrlib\...

In my simple testing, I actually found that we don't discover we are disconnected until we *read* from the closed socket. I'm guessing the issue is that the write buffer is large enough vs the amount of content we are writing ('get', 'filename') that it just gets buffered until we wait to read back the response, and then the local socket notices that the remote one closed its end. I even tried an early socket.shutdown(SHUT_RD) on the server side, but still the only failure I got was during "_read_more."

This certainly makes it a bit more clumsy. Also, we default to buffering ~1MB (until the end of 'this write') before we actually write anything onto the socket.

All the layers and indirection, not to mention that sometimes we are writing to SSH, sometimes to 'simple pipes', sometimes to a socket, sometimes over HTTP, etc, etc.

I'm focusing on "(get)" as a first draft, because after doing "get_record_stream()" the first thing (at least older clients) do is probe the remote object for the real format, which involves reading .bzr/branch-format directly.

And I'm focusing on 'first thing after stream', because that is likely to be the bulk of the time spent, so it is the most likely time that we would end up interrupted. (And the most frustrating, since you'll finish 10min of data transfer, only to be interrupted checking if there are tags you might need.) An example traceback (bzr-2.1):

...
  File "C:\dev\bzr\bzr.dev\bzrlib\fetch.py", line 103, in __fetch
    self._fetch_everything_for_search(search_result)
  File "C:\dev\bzr\bzr.dev\bzrlib\fetch.py", line 134, in _fetch_everything_for_search
    stream = source.get_stream_for_missing_keys(missing_keys)
  File "C:\dev\bzr\bzr.dev\bzrlib\remote.py", line 2197, in get_stream_for_missing_keys
    self.from_repository._ensure_real()
  File "C:\dev\bzr\bzr.dev\bzrlib\remote.py", line 1089, in _ensure_real
    self.bzrdir._ensure_real()
  File "C:\dev\bzr\bzr.dev\bzrlib\remote.py", line 410, in _ensure_real
    self.root_transport, _server_formats=False)
  File "C:\dev\bzr\bzr.dev\bzrlib\bzrdir.py", line 858, in open_from_transport
    redirected)
  File "C:\dev\bzr\bzr.dev\bzrlib\transport\__init__.py", line 1683, in do_catching_redirections
    return action(transport)
  File "C:\dev\bzr\bzr.dev\bzrlib\bzrdir.py", line 845, in find_format
    transport, _server_formats=_server_formats)
  File "C:\dev\bzr\bzr.dev\bzrlib\controldir.py", line 769, in find_format
    return prober.probe_transport(transport)
  File "C:\dev\bzr\bzr.dev\bzrlib\bzrdir.py", line 1346, in probe_transport
    format_string = transport.get_bytes(".bzr/branch-format")
  File "C:\dev\bzr\bzr.dev\bzrlib\transport\remote.py", line 226, in get_bytes
    resp, response_handler = self._client.call_expecting_body('get', remote)
  File "C:\dev\bzr\bzr.dev\bzrlib\smart\client.py", line 145, in call_expecting_body
    method, args, expect_response_body=True)
  File "C:\dev\bzr\bzr.dev\bzrlib\smart\client.py", line 81, in _call_and_read_response
    expect_body=expect_response_body),
  File "C:\dev\bzr\bzr.dev\bzrlib\smart\message.py", line 299, in read_response_tuple
    self._wait_for_response_args()
  File "C:\dev\bzr\bzr.dev\bzrlib\smart\message.py", line 264, in _wait_for_response_args
    self._read_more()
  File "C:\dev\bzr\bzr.dev\bzrlib\smart\message.py", line 286, in _read_more
    "Unexpected end of message. "
ConnectionReset: Connection closed: Unexpected end of message. Please check connectivity and permissions, and report a bug if problems persist.

I might focus on newer clients, but Lucid is still in support, and has bzr 2.1. Note that in the above case, I manually did a "SIGHUP" to the server as soon as get_stream started, about 150s before the stream stopped. Now, that particular code was before I did socket.shutdown(SHUT_RD). Which may act differently. However, in a test case, after sending a request, sending the shutdown, reading the request finish, seeing the server stopped serving the connection, and then trying another request, I *still* saw the failure during the read:
  File "C:\dev\bzr\work\bzrlib\tests\test_smart_transport.py", line 1279, in test_graceful_shutdown_waits_for_clients_to
_stop
    'bigfile')
  File "C:\dev\bzr\work\bzrlib\smart\client.py", line 145, in call_expecting_body
    method, args, expect_response_body=True)
  File "C:\dev\bzr\work\bzrlib\smart\client.py", line 81, in _call_and_read_response
    expect_body=expect_response_body),
  File "C:\dev\bzr\work\bzrlib\smart\message.py", line 299, in read_response_tuple
    self._wait_for_response_args()
  File "C:\dev\bzr\work\bzrlib\smart\message.py", line 264, in _wait_for_response_args
    self._read_more()
  File "C:\dev\bzr\work\bzrlib\smart\message.py", line 286, in _read_more
    "Unexpected end of message. "
ConnectionReset: Connection closed: Unexpected end of message. Please check connectivity and permissions, and report a bug if problems persist.

Maybe we are accidentally suppressing a write failure from:
class SmartTCPClientMedium(SmartClientStreamMedium):
...
    def _accept_bytes(self, bytes):
        """See SmartClientMedium.accept_bytes."""
        self._ensure_connection()
        osutils.send_all(self._socket, bytes, self._report_activity)

Anyway, this certainly complicates things (IMO). If we can't detect at write time that the connection is closed, then we don't actually know whether the server got the request, and we just failed to read the response. Which means that we can only retry requests which are idempotent. If we saw a write failure, then we could say "oh, we didn't send a complete message, so clearly we should retry".

I would be ok if we just couldn't retry "call_with_body_stream()". Though it is a shame that if the connection is closed just before we start streaming, then we start streaming only to find the connection closed, we can't just try again.

Suggestions welcome. Especially if I'm just measuring something wrong. Though especially for stuff like "bzr+ssh" there are too many layers of buffering for me to feel like we can reliably detect a write failure. (local buffering, pipe buffering, SSH buffering, network buffering, etc)

Revision history for this message

Martin Pool (mbp) wrote on 2011-09-27: Re: [Bug 819604] Re: when an idle ssh transport is interrupted, bzrlib errors; should reconnect instead

#12

I agree that there's a lot of buffering and indirection so it will be
hard to detect it for writes.

I do wonder if you could try a nonblocking select-for-read before we
start writing: if the remote end has closed, this may tell us "there's
something readable" and then the thing we read turns out to be an EOF.
I think that is the correct semantic for determining when the remote
end has closed. However, the layers may still trip us up, if it turns
out that ssh does not close the client's stdin until the entire
connection has closed. There is a complication also if we actually
get a non-eof byte, but that is unlikely and probably could be
handled. I think this would be worth a try.

Revision history for this message

John A Meinel (jameinel) wrote on 2011-10-03:

#13

Download full text (3.1 KiB)

There is an initial implementation in the linked branch: lp:~jameinel/bzr/2.1-client-reconnect-819604

I'm starting the work based on 2.1, since that is what is present in LTS Lucid. (Hardy only has 1.3, and I don't think we are going to bother backporting that far.)

ATM, it works only during the write phase. So if we get a ConnectionReset while writing out the request, we assume:

a) If we didn't finish writing before getting ConnectionReset, the server will throw away the request as being incomplete. Thus
b) It is always safe to just retry the request.

I tested it successfully by doing a branch from devpad (using bzr.dev on the server), and sending SIGHUP to the 'bzr serve --inet' process. It shutdown gracefully on devpad, the client logged "hey I got disconnected, I'll retry", and the 'bzr branch' finished.

I did observe that if I did the SIGHUP during the 'get_parent_map' discovery phase, it actually failed during 'read_response' rather than 'write_end'. I'm assuming that is because get_parent_map is happening fast enough the time between reading the previous response and writing the next one was insufficient for the local ssh client to notice that the connection hand been closed.

Compare that with "get_stream()" which has to spend some local time chewing on the final request stream (such as validating all the indexes refer to proper data, etc).

Now, I think the patch as it stands is already helpful, since most times if you are getting SIGHUP, it is going to occur during long-running requests.

The remaining concerns as I see it:

1) I can't re-try 'body_stream' requests at this level. I suppose at the lower level we could see if we got 'ConnectionReset' before we started consuming any of the body_stream. However, the way caching is implemented, we default to buffering 1MB of local content before we send any of it. Which sounds like it is pretty likely to get at least some of body_stream (possibly all).
a) If we know that we got all the request (we got all the way to _write_end), it seems like we could just re-send our local buffer. (If we also check that there is only one buffer that needs to be sent, etc.)
b) We change the logic so that ConnectionReset is handled much higher, at least for cases where body_stream is involved. I like handling it at this layer if we can, because it means I don't have to update tons of call sites.

2) Some requests might not be obviously incomplete if you only write some of the request header. However, I'm pretty sure that ProtocolThree prefixes all requests with a number-of-bytes header. I think it can send 'chunked' body content, but if it does so, I'm hoping it clearly identifies "end-of-chunks". I'll try to validate that.
That would validate my assumption that "if we fail during write, we are guaranteed the server will ignore the request".

3) I think some (many, in fact) of our requests are perfectly idempotent. Such as 'get' or 'get_parent_map', etc. I'm not sure yet how to tell whether a given request is ok to retry transparently. For example 'insert_stream' doesn't seem like it should be retried if we fail during reading the response. (Though here it is certainly much more likely to fail du...

There is an initial implementation in the linked branch: lp:~jameinel/bzr/2.1-client-reconnect-819604

I'm starting the work based on 2.1, since that is what is present in LTS Lucid. (Hardy only has 1.3, and I don't think we are going to bother backporting that far.)

ATM, it works only during the write phase. So if we get a ConnectionReset while writing out the request, we assume:

a) If we didn't finish writing before getting ConnectionReset, the server will throw away the request as being incomplete. Thus
b) It is always safe to just retry the request.

I tested it successfully by doing a branch from devpad (using bzr.dev on the server), and sending SIGHUP to the 'bzr serve --inet' process. It shutdown gracefully on devpad, the client logged "hey I got disconnected, I'll retry", and the 'bzr branch' finished.

I did observe that if I did the SIGHUP during the 'get_parent_map' discovery phase, it actually failed during 'read_response' rather than 'write_end'. I'm assuming that is because get_parent_map is happening fast enough the time between reading the previous response and writing the next one was insufficient for the local ssh client to notice that the connection hand been closed.

Compare that with "get_stream()" which has to spend some local time chewing on the final request stream (such as validating all the indexes refer to proper data, etc).

Now, I think the patch as it stands is already helpful, since most times if you are getting SIGHUP, it is going to occur during long-running requests.

The remaining concerns as I see it:

1) I can't re-try 'body_stream' requests at this level. I suppose at the lower level we could see if we got 'ConnectionReset' before we started consuming any of the body_stream. However, the way caching is implemented, we default to buffering 1MB of local content before we send any of it. Which sounds like it is pretty likely to get at least some of body_stream (possibly all).
  a) If we know that we got all the request (we got all the way to _write_end), it seems like we could just re-send our local buffer. (If we also check that there is only one buffer that needs to be sent, etc.)
  b) We change the logic so that ConnectionReset is handled much higher, at least for cases where body_stream is involved. I like handling it at this layer if we can, because it means I don't have to update tons of call sites.

2) Some requests might not be obviously incomplete if you only write some of the request header. However, I'm pretty sure that ProtocolThree prefixes all requests with a number-of-bytes header. I think it can send 'chunked' body content, but if it does so, I'm hoping it clearly identifies "end-of-chunks". I'll try to validate that.
That would validate my assumption that "if we fail during write, we are guaranteed the server will ignore the request".

3) I think some (many, in fact) of our requests are perfectly idempotent. Such as 'get' or 'get_parent_map', etc. I'm not sure yet how to tell whether a given request is ok to retry transparently. For example 'insert_stream' doesn't seem like it should be retried if we fail during reading the response. (Though here it is certainly much more likely to fail during writing, etc.)

Revision history for this message

John A Meinel (jameinel) wrote on 2011-10-07:

#14

Some thoughts about idempotent requests.

1) Any read-only request should be idempotent.

2) This leaves requests like put(_bytes) and locking calls. Also, append. Note that RemoteTransport.open_write_stream uses an AppendBasedFileStream structure. So if we are using VFS to write out a remote pack file, we will call 'append $PATH content' repeatedly. 'insert_stream_1.19' is also a concern here.

2a) put_bytes should be an all-or-nothing call. If we repeat it, the final content is going to be the same, we may just be uploading the same content twice.

2b) append seems to be the one that can cause the most problems. bzrlib.pack.ContainerWriter does have start and end records that are potentially written as different append calls. It has a '.begin()' which calls write_func then a '.add_bytes_record()' that creates a string with headers and then calls 'write_func', then a final '.end()' that adds the trailer.

So imagine that any one of those got interrupted via a real network disconnect after the server had gotten the message and completed the request, before the client got the 'ok'. You would end up writing 'begin' two times before an actual record, etc.

To make matters even worse, ContainerWriter tracks 'current_offset'. Which means that if any 'write()' call ended up double sent, the index would no longer line up with the real content on disk.

I had originally thought we might be ok, because the indexes only reference the actual content in the pack file. So we could have ended up with bogus data in the middle, but it wouldn't be referenced. However:
i) Pack files really do want to be readable without their associated index. They aren't 100% ok today, but leaving garbage in the middle wouldn't be a good way forward.
ii) Our 'write_func' doesn't have any concept of number-of-bytes written and offset-they-were-written to. At least not at the level of ContainerWriter. We *do* have that concept as part of 'Transport.append()' but it isn't exposed to the other locations such that we could make use of it.

3) So I'm thinking to just figure out a way to flag specific remote 'method' calls as being illegal to retry.

I'm thinking to try and hook it into bzrlib.smart.request.request_handlers. That is a server-side registry, but it is the only place that we know about what methods might be available. And even if a server implements different apis, a client won't ever try a request it doesn't know about. And I don't think a given request will change from being safe to unsafe.

Some thoughts about idempotent requests.

1) Any read-only request should be idempotent.

2) This leaves requests like put(_bytes) and locking calls. Also, append. Note that RemoteTransport.open_write_stream uses an AppendBasedFileStream structure. So if we are using VFS to write out a remote pack file, we will call 'append $PATH content' repeatedly. 'insert_stream_1.19' is also a concern here.

2a) put_bytes should be an all-or-nothing call. If we repeat it, the final content is going to be the same, we may just be uploading the same content twice.

2b) append seems to be the one that can cause the most problems. bzrlib.pack.ContainerWriter does have start and end records that are potentially written as different append calls. It has a '.begin()' which calls write_func then a '.add_bytes_record()' that creates a string with headers and then calls 'write_func', then a final '.end()' that adds the trailer.

So imagine that any one of those got interrupted via a real network disconnect after the server had gotten the message and completed the request, before the client got the 'ok'. You would end up writing 'begin' two times before an actual record, etc.

To make matters even worse, ContainerWriter tracks 'current_offset'. Which means that if any 'write()' call ended up double sent, the index would no longer line up with the real content on disk.

I had originally thought we might be ok, because the indexes only reference the actual content in the pack file. So we could have ended up with bogus data in the middle, but it wouldn't be referenced. However:
 i) Pack files really do want to be readable without their associated index. They aren't 100% ok today, but leaving garbage in the middle wouldn't be a good way forward.
 ii) Our 'write_func' doesn't have any concept of number-of-bytes written and offset-they-were-written to. At least not at the level of ContainerWriter. We *do* have that concept as part of 'Transport.append()' but it isn't exposed to the other locations such that we could make use of it.

3) So I'm thinking to just figure out a way to flag specific remote 'method' calls as being illegal to retry.

I'm thinking to try and hook it into bzrlib.smart.request.request_handlers. That is a server-side registry, but it is the only place that we know about what methods might be available. And even if a server implements different apis, a client won't ever try a request it doesn't know about. And I don't think a given request will change from being safe to unsafe.

Revision history for this message

Vincent Ladeuil (vila) wrote on 2011-10-07:

#15

I like the sound of it.

A useful first step (IMHO) will be to retry the write requests at the highest possible level.

From there we'll know we have a safety net and can try to refine this at a finer-grained level.

Revision history for this message

John A Meinel (jameinel) wrote on 2011-10-08:

#16

Download full text (8.4 KiB)

Analysis of whether any given RPC is safe to retry. Because of potential
corruption, I'm tempted to whitelist them instead of blacklist them. However,
99% of requests are safe, so whitelisting is a bit of a pain.

I'm thinking to use the 'info' parameter of register_lazy to signal the
retry-semantics of a given command.

'append', 'bzrlib.smart.vfs', 'AppendRequest')
Not safe. The current location in the file is tracked on the client for some
callers of this function. An append which succeeded but then got retried will
put bad data in the middle of the file.

'Branch.get_config_file', 'bzrlib.smart.branch',
'SmartServerBranchGetConfigFile')
safe, read-only

'Branch.get_parent', 'bzrlib.smart.branch', 'SmartServerBranchGetParent')
safe, read-only

'Branch.get_tags_bytes', 'bzrlib.smart.branch',
'SmartServerBranchGetTagsBytes')
safe, read-only

'Branch.set_tags_bytes', 'bzrlib.smart.branch',
'SmartServerBranchSetTagsBytes')
safe, this is an all-or-nothing set. so setting it two times to the same thing
still leaves you at the same content in the end.

'Branch.get_stacked_on_url', 'bzrlib.smart.branch', 'SmartServerBranchRequestGetStackedOnURL')
safe, read-only

'Branch.last_revision_info', 'bzrlib.smart.branch', 'SmartServerBranchRequestLastRevisionInfo')
safe, read-only

'Branch.lock_write', 'bzrlib.smart.branch', 'SmartServerBranchRequestLockWrite')
safe, will fail if first succeeded
safe in that it won't cause corruption. not safe in that if the first request
succeeds and the connection dies (so we don't get to see the lock tocken), the
second request will fail because the branch is already locked.
I'm still willing to call this 'safe', since it won't silently corrupt
anything. The worst case is that the action will get interrupted, but we
already had that problem because of ConnectionReset. (Retrying is strictly
better than not retrying.)

'Branch.revision_history', 'bzrlib.smart.branch', 'SmartServerRequestRevisionHistory')
safe, read-only

'Branch.set_config_option', 'bzrlib.smart.branch', 'SmartServerBranchRequestSetConfigOption')
I think this is safe. Setting an option to the same value twice should be safe.
The config code already should handle multiple concurrent requests.

'Branch.set_last_revision', 'bzrlib.smart.branch', 'SmartServerBranchRequestSetLastRevision')
deprecated function, since it uses branch._lefthand_history to determine the
revision number to store. However, still safe, as setting the tip revision to
the same value is safe.

'Branch.set_last_revision_info', 'bzrlib.smart.branch', 'SmartServerBranchRequestSetLastRevisionInfo')
safe

'Branch.set_last_revision_ex', 'bzrlib.smart.branch', 'SmartServerBranchRequestSetLastRevisionEx')
safe

'Branch.set_parent_location', 'bzrlib.smart.branch', 'SmartServerBranchRequestSetParentLocation')
safe

'Branch.unlock', 'bzrlib.smart.branch', 'SmartServerBranchRequestUnlock')
safe, will fail if first succeeded
same as the lock_write call. If the first one succeeds, the second one will
fail with a 'not locked' and cause us to spuriously interrupt what we are
doing. However, it is still 'safe' in that we would ha...

Analysis of whether any given RPC is safe to retry. Because of potential
corruption, I'm tempted to whitelist them instead of blacklist them. However,
99% of requests are safe, so whitelisting is a bit of a pain.

I'm thinking to use the 'info' parameter of register_lazy to signal the
retry-semantics of a given command.

'append', 'bzrlib.smart.vfs', 'AppendRequest')
Not safe. The current location in the file is tracked on the client for some
callers of this function. An append which succeeded but then got retried will
put bad data in the middle of the file.

'Branch.get_config_file', 'bzrlib.smart.branch',
    'SmartServerBranchGetConfigFile')
safe, read-only

'Branch.get_parent', 'bzrlib.smart.branch', 'SmartServerBranchGetParent')
safe, read-only

'Branch.get_tags_bytes', 'bzrlib.smart.branch',
    'SmartServerBranchGetTagsBytes')
safe, read-only

'Branch.set_tags_bytes', 'bzrlib.smart.branch',
    'SmartServerBranchSetTagsBytes')
safe, this is an all-or-nothing set. so setting it two times to the same thing
still leaves you at the same content in the end.