-- BeJones - 18 Nov 2013

Why do a puppetdb intervention?

The PostgreSQL database under PuppetDB is bloated, its performance is degraded. This means that commands sent to PuppetDB (replace facts, replace catalogs) are waiting in a queue until the data can be entered in the database. Due to the degraded performance the queue's length is continuously increasing (120 000+ at the moment of writing), which means several hours of delay. New exported resources, or facts with change values cannot be queried from PuppetDB for the amount of this delay.

The reason for bloat is partially because of a bug in PuppetDB. When a catalog is sent to PuppetDB a hash is calculated. If the catalog stored has the same hash as the new one, then only a timestamp is updated. If there is a difference, the new catalog is stored, and the old ones are wiped periodically, at the garbage collection (hourly). The problem is that in the current version of PuppetDB a different hash can generated for the very same catalog, causing a lot of unnecessary writes, increasing table bloat. The automatic maintenance jobs in the database cannot keep up with this bloat, and regular maintenance is required.

The rate at which catalogs are replaced is called catalog duplication rate. Our rate is currently 5%, which is a very low value, meaning that 95% of incoming catalogs contain changes. That obviously cannot be true.

PuppetDB dashboard

The above figure shows the low catalog duplication rate in our node population, as well as the high queue.

Lemon

This figure shows the growth of the queue in the last 3 days.

In order do this with minimal, or no downtime at all, PuppetDB is switched to a backup database, that is restored from a backup created on the master database. Due to the amount of data, this procedure takes 2 hours.The command queue processing is not halted, so any changes that were entering the databse during that two hours are essentially lost, until the next Puppet run. Keeping a consistent state would require taking PuppetDB offline for the duration of the backup and restore operations

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng lemon.png r1 manage 25.2 K 2013-11-19 - 09:30 AkosHencz  
PNGpng pdb_dashboard.png r1 manage 118.3 K 2013-11-19 - 09:30 AkosHencz  
Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2013-11-19 - AkosHencz
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback