[Bug 1649616] Re: Keystone Token Flush job does not complete in HA deployed environment

OpenStack Infra 1649616 at bugs.launchpad.net
Wed Jul 12 07:41:46 UTC 2017


Reviewed:  https://review.openstack.org/482552
Committed: https://git.openstack.org/cgit/openstack/puppet-keystone/commit/?id=c1bda5f81e35ec597f8d684a35c2c1446c8b1527
Submitter: Jenkins
Branch:    stable/newton

commit c1bda5f81e35ec597f8d684a35c2c1446c8b1527
Author: Juan Antonio Osorio Robles <jaosorior at redhat.com>
Date:   Tue Apr 18 13:13:27 2017 +0300

    Change keystone token flush to run hourly
    
    In a recent commit [1] the keystone token flush cron job was changed to
    run twice a day. However, this change was not enough for big
    deployments.
    
    After getting some customer feedback and looking at what other projects
    are doing [2] [3] [4]. It seems that running this job hourly is the way
    to go.
    
    [1] Ia0b0fb422318712f4b0f4d023cbb3a61d40bb85d
    [2] https://www.ibm.com/support/knowledgecenter/en/SSB27U_6.4.0/com.ibm.zvm.v640.hcpo4/exptoken.htm
    [3] https://review.openstack.org/#/c/88670/8
    [4] https://github.com/openstack/charm-keystone/blob/master/templates/keystone-token-flush
    
    Conflicts:
    	manifests/cron/token_flush.pp
    	spec/acceptance/keystone_federation_identity_provider_spec.rb
    	spec/acceptance/keystone_federation_shibboleth_spec.rb
    	spec/acceptance/keystone_wsgi_apache_spec.rb
    	spec/classes/keystone_cron_token_flush_spec.rb
    
    (cherry picked from commit f694b5551f896042df6aeb751c65986ef3342f54)
    Change-Id: I6ec7ec8111bd93e5638cfe96189e36f0e0691d65
    Related-Bug: #1649616
    (cherry picked from commit 90ffc7f6008370e9d9893ab69f31683352c854c3)


** Tags added: in-stable-newton

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to keystone in Ubuntu.
https://bugs.launchpad.net/bugs/1649616

Title:
  Keystone Token Flush job does not complete in HA deployed environment

Status in Ubuntu Cloud Archive:
  In Progress
Status in Ubuntu Cloud Archive mitaka series:
  Fix Committed
Status in Ubuntu Cloud Archive newton series:
  Fix Committed
Status in Ubuntu Cloud Archive ocata series:
  Fix Committed
Status in OpenStack Identity (keystone):
  Fix Released
Status in OpenStack Identity (keystone) newton series:
  In Progress
Status in OpenStack Identity (keystone) ocata series:
  In Progress
Status in puppet-keystone:
  Triaged
Status in tripleo:
  In Progress
Status in keystone package in Ubuntu:
  In Progress
Status in keystone source package in Xenial:
  Fix Committed
Status in keystone source package in Yakkety:
  Fix Committed
Status in keystone source package in Zesty:
  Fix Committed

Bug description:
  [Impact]

   * The Keystone token flush job can get into a state where it will
  never complete because the transaction size exceeds the mysql galara
  transaction size - wsrep_max_ws_size (1073741824).

  [Test Case]

  1. Authenticate many times
  2. Observe that keystone token flush job runs (should be a very long time depending on disk) >20 hours in my environment
  3. Observe errors in mysql.log indicating a transaction that is too large

  Actual results:
  Expired tokens are not actually flushed from the database without any errors in keystone.log.  Only errors appear in mysql.log.

  Expected results:
  Expired tokens to be removed from the database

  [Additional info:]

  It is likely that you can demonstrate this with less than 1 million
  tokens as the >1 million token table is larger than 13GiB and the max
  transaction size is 1GiB, my token bench-marking Browbeat job creates
  more than needed.

  Once the token flush job can not complete the token table will never
  decrease in size and eventually the cloud will run out of disk space.

  Furthermore the flush job will consume disk utilization resources.
  This was demonstrated on slow disks (Single 7.2K SATA disk).  On
  faster disks you will have more capacity to generate tokens, you can
  then generate the number of tokens to exceed the transaction size even
  faster.

  Log evidence:
  [root at overcloud-controller-0 log]# grep " Total expired" /var/log/keystone/keystone.log
  2016-12-08 01:33:40.530 21614 INFO keystone.token.persistence.backends.sql [-] Total expired tokens removed: 1082434
  2016-12-09 09:31:25.301 14120 INFO keystone.token.persistence.backends.sql [-] Total expired tokens removed: 1084241
  2016-12-11 01:35:39.082 4223 INFO keystone.token.persistence.backends.sql [-] Total expired tokens removed: 1086504
  2016-12-12 01:08:16.170 32575 INFO keystone.token.persistence.backends.sql [-] Total expired tokens removed: 1087823
  2016-12-13 01:22:18.121 28669 INFO keystone.token.persistence.backends.sql [-] Total expired tokens removed: 1089202
  [root at overcloud-controller-0 log]# tail mysqld.log
  161208  1:33:41 [Warning] WSREP: transaction size limit (1073741824) exceeded: 1073774592
  161208  1:33:41 [ERROR] WSREP: rbr write fail, data_len: 0, 2
  161209  9:31:26 [Warning] WSREP: transaction size limit (1073741824) exceeded: 1073774592
  161209  9:31:26 [ERROR] WSREP: rbr write fail, data_len: 0, 2
  161211  1:35:39 [Warning] WSREP: transaction size limit (1073741824) exceeded: 1073774592
  161211  1:35:40 [ERROR] WSREP: rbr write fail, data_len: 0, 2
  161212  1:08:16 [Warning] WSREP: transaction size limit (1073741824) exceeded: 1073774592
  161212  1:08:17 [ERROR] WSREP: rbr write fail, data_len: 0, 2
  161213  1:22:18 [Warning] WSREP: transaction size limit (1073741824) exceeded: 1073774592
  161213  1:22:19 [ERROR] WSREP: rbr write fail, data_len: 0, 2

  Disk utilization issue graph is attached.  The entire job in that
  graph takes from the first spike is disk util(~5:18UTC) and culminates
  in about ~90 minutes of pegging the disk (between 1:09utc to 2:43utc).

  [Regression Potential] 
  * Not identified

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1649616/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list