Snapshot action fails with keys-version=v2

Bug #1921797 reported by Paul Goins
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Etcd Charm
Triaged
Medium
Unassigned

Bug Description

In a customer cloud, and replicated in a test environment, I've observed problems with running the snapshot action of the etcd charm.

Context:

* Deployed etcd charm: cs:etcd-546
* V3 users of etcd are: cs:~containers/kubernetes-master-926 and cs:~containers/kubernetes-worker-718
* V2 users of etcd are: cs:~containers/canal-748
* Overall environment is CDK 1.19, although I think etcd may have been upgraded to a CDK 1.20 version; I'm not totally sure.
* Version of etcdctl installed onto k8s workers/masters: 2.3.7 (installed to /usr/local/bin/etcdctl)
* Version of etcdctl installed onto etcd units: 3.4.5 (via etcd snap tracking 3.4/stable)

I am not 100% sure of the logic here, but I can clearly see via querying etcd that the V2 keys all seem related to Flannel, and thus I'm suspecting they're using the /usr/local/bin/etcdctl binary. The K8s master/worker units aren't storing their data via the V2 API, so I suspect they are using a library for interacting with etcd rather than the etcdctl binary. (This whole paragraph is admittedly one big guess based on the data I saw in the database, so take this with a grain of salt.)

Anyway, in this environment:
* Backing up V3 keys works, e.g.: juju run-action etcd/0 snapshot keys-version=v3
* Backing up V2 keys fails, e.g.: juju run-action etcd/0 snapshot keys-version=v2

Output of such failures looks like this:

$ juju run-action etcd/0 snapshot --wait keys-version=v2
unit-etcd-0:
  UnitId: etcd/0
  id: "16"
  message: exit status 1
  results:
    ReturnCode: 1
    Stderr: |
      ++ action-get target
      + ETCD_BACKUP_TARGET_DIR=/home/ubuntu/etcd-snapshots
      ++ action-get keys-version
      + ETCD_KEYS_VERSION=v2
      + UNIT_NAME=etcd
      + UNIT_NUM=0
      + ETCD_DATA_DIR=/var/snap/etcd/current/etcd0.etcd/
      + '[' '!' -d /var/snap/etcd/current/etcd0.etcd/ ']'
      + ETCD_DATA_DIR=/var/snap/etcd/current/
      ++ date +%Y-%m-%d-%H.%M.%S
      + DATE_STAMP=2021-03-29-22.13.16
      + ARCHIVE=etcd-snapshot-2021-03-29-22.13.16.tar.gz
      + mkdir -p /home/ubuntu/etcd-snapshots/16
      + '[' v2 == v2 ']'
      + /snap/bin/etcd.etcdctl backup --data-dir /var/snap/etcd/current/ --backup-dir /home/ubuntu/etcd-snapshots/16
      Error: unknown command "backup" for "etcdctl"
      Run 'etcdctl --help' for usage.
      Error: unknown command "backup" for "etcdctl"
  status: failed
  timing:
    completed: 2021-03-29 22:13:17 +0000 UTC
    enqueued: 2021-03-29 22:13:14 +0000 UTC
    started: 2021-03-29 22:13:16 +0000 UTC

Revision history for this message
Paul Goins (vultaire) wrote :

I'm unable to find a clear workaround for the v2 case.

If I copy the etcdctl from one of the flannel charms onto one of the etcd units for the sake of running "etcdctl backup", it fails:

$ sudo ./etcdctl backup --data-dir /var/snap/etcd/current/ --backup-dir /home/ubuntu/etcd-snapshots/$(date +%Y%m%d_%H%M%S)
2021-03-29 23:13:41.117268 W | snap: skipped unexpected non snapshot file db
2021-03-29 23:13:41.118151 W | wal: ignored file 1.tmp in wal
panic: runtime error: makeslice: len out of range

goroutine 1 [running]:
github.com/coreos/etcd/wal.(*decoder).decode(0xc0000f5590, 0xc00016f8f0, 0x0, 0x0)
        /etcd/gopath/src/github.com/coreos/etcd/wal/decoder.go:55 +0x14c
github.com/coreos/etcd/wal.(*WAL).ReadAll(0xc0001720d0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /etcd/gopath/src/github.com/coreos/etcd/wal/wal.go:237 +0x157
github.com/coreos/etcd/etcdctl/command.handleBackup(0xc000122a20)
        /etcd/gopath/src/github.com/coreos/etcd/etcdctl/command/backup_command.go:90 +0x551
github.com/coreos/etcd/Godeps/_workspace/src/github.com/codegangsta/cli.Command.Run(0xaf3d50, 0x6, 0x0, 0x0, 0x0, 0x0, 0x0, 0xaff39d, 0x18, 0x0, ...)
        /etcd/gopath/src/github.com/coreos/etcd/Godeps/_workspace/src/github.com/codegangsta/cli/command.go:137 +0x709
github.com/coreos/etcd/Godeps/_workspace/src/github.com/codegangsta/cli.(*App).Run(0xc0001227e0, 0xc00001e1e0, 0x6, 0x6, 0x0, 0x0)
        /etcd/gopath/src/github.com/coreos/etcd/Godeps/_workspace/src/github.com/codegangsta/cli/app.go:175 +0x6e8
main.main()
        /etcd/gopath/src/github.com/coreos/etcd/etcdctl/main.go:69 +0x1d69

George Kraft (cynerva)
summary: - Snapshot action appears to not work in mixed v2/v3 environments
+ Snapshot action fails with keys-version=v2
Revision history for this message
George Kraft (cynerva) wrote :

I can reproduce this easily enough. It looks like the call to `etcdctl backup` is simply missing the ETCDCTL_API=2 environment variable that tells the client to use the v2 API.

Revision history for this message
George Kraft (cynerva) wrote :

> I can clearly see via querying etcd that the V2 keys all seem related to Flannel, and thus I'm suspecting they're using the /usr/local/bin/etcdctl binary.

Flannel itself uses a golang lib, and only supports ETCD v2. It will likely never support ETCD v3. That etcdctl binary (which is woefully out of date) is only used by the charm for initial configuration of Flannel.

> If I copy the etcdctl from one of the flannel charms onto one of the etcd units for the sake of running "etcdctl backup", it fails

You should be able to do this with the etcd 3.4 client by setting the ETCDCTL_API=2 environment variable: `ETCDCTL_API=2 etcdctl backup ...`

Changed in charm-etcd:
importance: Undecided → Medium
status: New → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.