[ZODB-Dev] Optimizing RelStorage Packing for large DBs

Discussion:

Jens W. Klein

2013-11-16 01:01:00 UTC

I started a new packing script for Relstorage (history free,
postgresql). It is based on incoming reference counting. Why do we need
this? First a bit story telling (fast forward down to section "new
script" if you're bored)

The Problem
-----------

So we had a database with ~298.000.000 objects stored, grown over 4-5
months. The beast was never packed. We import/sync data on a weekly
base: A complex Plone based product catalog optimized for web in 27
languages each ~18.000 products, each with several variants (stored in
annotated btrees). Lots of stuff is deleted and recreated on import. So
we had a lot of garbage in DB.

We're using history free Relstorage. Relstorage performs much better on
concurrent writes and speeds up our import a lot. First on MySQL, now on
PostgreSQL. Even if PostgreSQL does not solve all problems it seems to
be the stable choice. At least for one time we had a frozen server. We
blame MySQL, but well, its a CentOS and there are several other problems
injecting side-effects (old unpatched kernel, etc, pp, ... we're dev not
op, its enterprise level and decisions made in past are our constraints,
carved in stone, more to say?)

Anyway: Our Plone with ZODB Relstorage performs very very well - with
this many objects stored including all the garbage!

BUT...we want to get rid of the garbage left on every import.

Relstorage deploys with a packaging script. Running it on the ~300
million objects resulted in RAM-consumption >20GB with 300 days
estimated time of completion (we did not try if this is true). Phew.

So something had to be done. Storage usage grew about 15-20GB per week.
As a quick (took some days) fix we decided to try classical Data.fs
packing (zeopack).

We blocked content managers from editing and stopped imports and cloned
a snapshot of the virtual maschine with the DB.
At this point we had roughly 160GB in Postgres.
On the clone we converted the Relstorage to a Filesystem Storage. This
took 40 hours.
Storage shrinked to a 55GB in size Data.fs
Then we started the classical packing and after some hours the DB was
shrunken to 7.8 GB Data.fs still containing 44 million objects.
Next we converted the Data.fs back to Relstorage (~6 hours) and switched
live over to the packed DB.

Packing on the 44 million object relstorage with the default script
shall take several days (3-4 days) only for the prepacking phase. It
consumes less memory (~5GB) but is still insane.

New Script
----------

After analyzing the current (generic - mysql, pgsql, oracle - history
and history free) way to pack the DB I found no good way to optimize the
code here. Other were there before me.

Trial 1:

So I tried to rethink the way packing is done and tried to follow the
copy-and-switch method the filestorage uses: Copy all in-use zoids from
object_state to a new table (traverse the graph starting at zoid 0),
drop the old object_state, rename the new table to object_state. The
copy process can be distributed. Several workers can do it in parallel
until the database is at it max transactions/second.

I had this working. 4-6 workers running, a queue with zoids to process
in DB, a master controlling the process and so on. But even with this
approach the total time for a pack of the whole DB is after SQL
optimizations(!) roughly 8 days for a 44 million object graph.

OK - Back to the drawing board.

Trial 2:

On thursday evening I had a good conservation (and some beer) with a
buddy from the local linux user group and we discussed the problem. The
outcome was to try it with reference counting.

The idea is simple:

- iterate over all transactions starting with the lowest
transaction id (tid)
- for each transaction load the object states connected with tid
- for each state fetch its outgoing references and fill a table where
all incoming references of an object are stored as an array.
if an state has no references write it anyway to the table with empty
outgoing references

After reaching the highest tid its easy: all entries with no incoming
references are garbage. Delete them and remove all incoming references
of this entry from other entries.

When new transactions are happening its enough to only process
delta-transactions (newer than the last processed in the reference
counting table). This is pretty fast if the delta small, i.e. one day
only. On the deltas another check need to detect references gone
meanwhile, but since the delta isnt that big this is probably not a problem.

I implemented this roughly now. Building the incoming reference counting
table from the scratch takes less than 7 hours. Deleting itself is not
implemented yet, but this is not that difficult :-)

Code
----

The current code is at:
https://github.com/bluedynamics/relstorage_packer

Did I miss something? Any opinions much appreciated!

Expect updates in this thread :)

Jens Klein

--
Klein & Partner KG, member of BlueDynamics Alliance

Jan-Wijbrand Kolman

2013-11-18 11:13:01 UTC

Permalink

Post by Jens W. Klein
Did I miss something? Any opinions much appreciated!
Expect updates in this thread :)

We did experience relstorage problems that we think could have been
related to packing - we're not sure yet. So, I'm following this thread
and your updates with great interest!

regards, jw

Jim Fulton

2013-11-18 11:19:33 UTC

Permalink

I started a new packing script for Relstorage (history free, postgresql). It
is based on incoming reference counting.

Did you look at zc.zodbdgc? I think it implements something very close to
what you're proposing. It's been in production for a few years now at ZC.

Not sure if it would need to be updated for relstorage.

Jim

--
Jim Fulton
http://www.linkedin.com/in/jimfulton

Jan-Wijbrand Kolman

2013-11-18 13:43:00 UTC

Permalink

Post by Jim Fulton

I started a new packing script for Relstorage (history free, postgresql). It
is based on incoming reference counting.

Did you look at zc.zodbdgc? I think it implements something very close to
what you're proposing. It's been in production for a few years now at ZC.
Not sure if it would need to be updated for relstorage.

AFAICT it does not work against a relstorage backend. Or at least I
think to understand that from:

http://www.zodb.org/en/latest/documentation/articles/multi-zodb-gc.html

[...This documentation does not apply to RelStorage which has the same
features built-in, but accessible in different ways. Look at the options
for the zodbpack script. The ?prepack option creates a table containing
the same information as we are creating in the reference database[...]

regards, jw

Jim Fulton

2013-11-18 14:40:50 UTC

Permalink

On Mon, Nov 18, 2013 at 8:43 AM, Jan-Wijbrand Kolman

Post by Jim Fulton

I started a new packing script for Relstorage (history free, postgresql). It
is based on incoming reference counting.

Did you look at zc.zodbdgc? I think it implements something very close to
what you're proposing. It's been in production for a few years now at ZC.
Not sure if it would need to be updated for relstorage.

AFAICT it does not work against a relstorage backend. Or at least I think to
http://www.zodb.org/en/latest/documentation/articles/multi-zodb-gc.html
[...This documentation does not apply to RelStorage which has the same
features built-in, but accessible in different ways. Look at the options for
the zodbpack script. The ?prepack option creates a table containing the same
information as we are creating in the reference database[...]

I didn't write that. I think zodbdgz probably would work, possibly
with some modifications.
If nothing else, it should be consulted, but then again, writing
software is fun.

Note that the important aspect here isn't cross-database references,
but the garbage
collection algorithm, which is incremental and uses a linear scan of
the database.

Jim

--
Jim Fulton
http://www.linkedin.com/in/jimfulton

Jens W. Klein

2013-11-18 17:00:48 UTC

Permalink

Hi Jim,

thanks for the hint (also in the other post). I looked at zc.zodbdgc and
took some inspiration from it. As far as I understand it stores the
incoming references in a separate filestorage backend. So this works
similar to my impelmentation but uses the ZODB infrastructure. I dont
see how I make zc.zodbdgc play with Relstorage and since it works on the
abstracted ZODB level using pickles I suspected it to be not fast enough
for so many obejcts - so I skipped this alternative.

Jens

Post by Jim Fulton

I started a new packing script for Relstorage (history free, postgresql). It
is based on incoming reference counting.

Did you look at zc.zodbdgc? I think it implements something very close to
what you're proposing. It's been in production for a few years now at ZC.
Not sure if it would need to be updated for relstorage.
Jim

--
Klein & Partner KG, member of BlueDynamics Alliance

Jim Fulton

2013-11-18 17:49:55 UTC

Permalink

Post by Jens W. Klein
Hi Jim,
thanks for the hint (also in the other post). I looked at zc.zodbdgc and
took some inspiration from it. As far as I understand it stores the incoming
references in a separate filestorage backend.

This is just a temporary file to avoid storing all the data in memory.

Post by Jens W. Klein
So this works similar to my
impelmentation but uses the ZODB infrastructure. I dont see how I make
zc.zodbdgc play with Relstorage and since it works on the abstracted ZODB
level using pickles

I don't know what you're saying, since I don't know what "it" refers to.

zodbdgc works with storages. relstorage conforms to the storage API.
It's possible some changes would be needed, but they should be minor.

Post by Jens W. Klein
I suspected it to be not fast enough for so many obejcts

No idea why, or what "fast enough" is. We use it on a database with
~200 million objects.

Post by Jens W. Klein
- so I skipped this alternative.

Good luck.

Jim

--
Jim Fulton
http://www.linkedin.com/in/jimfulton

Shane Hathaway

2013-11-18 16:29:04 UTC

Permalink

Post by Jens W. Klein
- iterate over all transactions starting with the lowest
transaction id (tid)
- for each transaction load the object states connected with tid
- for each state fetch its outgoing references and fill a table where
all incoming references of an object are stored as an array.
if an state has no references write it anyway to the table with empty
outgoing references

I would describe the RelStorage packing algorithm with the same words,
but since you reimplemented the algorithm from scratch, you found a more
optimal implementation for your database. Good work!

Shane

Jens W. Klein

2013-11-18 17:10:34 UTC

Permalink

Post by Shane Hathaway

I would describe the RelStorage packing algorithm with the same words,
but since you reimplemented the algorithm from scratch, you found a more
optimal implementation for your database. Good work!

Thanks for the feedback!

The only change I made is regarding to arrays - they are difficult to
handle in postgres (no fun at all). It is much easier (and faster!) to
add one row for each incoming reference (plus a reference to self).
Deleting is now easy by fetching all references with only the
self-reference and removing the objects, its left self-reference and
deleting incoming references of the objects referenced by the deleted
object.

My SQL was a bit rusty after zome years of ODBs. I'am sure a person with
deeper SQL knowledge may optimize the queries in some way.

regards Jens

--
Klein & Partner KG, member of BlueDynamics Alliance

Jens W. Klein

2013-11-21 22:57:32 UTC

Permalink

If of interest, its released here
https://pypi.python.org/pypi/relstorage_packer/1.0

Thanks for your support! (even if my english was not always clear enough
to express my thoughts exact enough)

regards Jens

Post by Jens W. Klein
I started a new packing script for Relstorage (history free,
postgresql). It is based on incoming reference counting. Why do we need
this? First a bit story telling (fast forward down to section "new
script" if you're bored)
The Problem
-----------
So we had a database with ~298.000.000 objects stored, grown over 4-5
months. The beast was never packed. We import/sync data on a weekly
base: A complex Plone based product catalog optimized for web in 27
languages each ~18.000 products, each with several variants (stored in
annotated btrees). Lots of stuff is deleted and recreated on import. So
we had a lot of garbage in DB.
We're using history free Relstorage. Relstorage performs much better on
concurrent writes and speeds up our import a lot. First on MySQL, now on
PostgreSQL. Even if PostgreSQL does not solve all problems it seems to
be the stable choice. At least for one time we had a frozen server. We
blame MySQL, but well, its a CentOS and there are several other problems
injecting side-effects (old unpatched kernel, etc, pp, ... we're dev not
op, its enterprise level and decisions made in past are our constraints,
carved in stone, more to say?)
Anyway: Our Plone with ZODB Relstorage performs very very well - with
this many objects stored including all the garbage!
BUT...we want to get rid of the garbage left on every import.
Relstorage deploys with a packaging script. Running it on the ~300
million objects resulted in RAM-consumption >20GB with 300 days
estimated time of completion (we did not try if this is true). Phew.
So something had to be done. Storage usage grew about 15-20GB per week.
As a quick (took some days) fix we decided to try classical Data.fs
packing (zeopack).
We blocked content managers from editing and stopped imports and cloned
a snapshot of the virtual maschine with the DB.
At this point we had roughly 160GB in Postgres.
On the clone we converted the Relstorage to a Filesystem Storage. This
took 40 hours.
Storage shrinked to a 55GB in size Data.fs
Then we started the classical packing and after some hours the DB was
shrunken to 7.8 GB Data.fs still containing 44 million objects.
Next we converted the Data.fs back to Relstorage (~6 hours) and switched
live over to the packed DB.
Packing on the 44 million object relstorage with the default script
shall take several days (3-4 days) only for the prepacking phase. It
consumes less memory (~5GB) but is still insane.
New Script
----------
After analyzing the current (generic - mysql, pgsql, oracle - history
and history free) way to pack the DB I found no good way to optimize the
code here. Other were there before me.
So I tried to rethink the way packing is done and tried to follow the
copy-and-switch method the filestorage uses: Copy all in-use zoids from
object_state to a new table (traverse the graph starting at zoid 0),
drop the old object_state, rename the new table to object_state. The
copy process can be distributed. Several workers can do it in parallel
until the database is at it max transactions/second.
I had this working. 4-6 workers running, a queue with zoids to process
in DB, a master controlling the process and so on. But even with this
approach the total time for a pack of the whole DB is after SQL
optimizations(!) roughly 8 days for a 44 million object graph.
OK - Back to the drawing board.
On thursday evening I had a good conservation (and some beer) with a
buddy from the local linux user group and we discussed the problem. The
outcome was to try it with reference counting.
- iterate over all transactions starting with the lowest
transaction id (tid)
- for each transaction load the object states connected with tid
- for each state fetch its outgoing references and fill a table where
all incoming references of an object are stored as an array.
if an state has no references write it anyway to the table with empty
outgoing references
After reaching the highest tid its easy: all entries with no incoming
references are garbage. Delete them and remove all incoming references
of this entry from other entries.
When new transactions are happening its enough to only process
delta-transactions (newer than the last processed in the reference
counting table). This is pretty fast if the delta small, i.e. one day
only. On the deltas another check need to detect references gone
meanwhile, but since the delta isnt that big this is probably not a problem.
I implemented this roughly now. Building the incoming reference counting
table from the scratch takes less than 7 hours. Deleting itself is not
implemented yet, but this is not that difficult :-)
Code
----
https://github.com/bluedynamics/relstorage_packer
Did I miss something? Any opinions much appreciated!
Expect updates in this thread :)
Jens Klein

--
Klein & Partner KG, member of BlueDynamics Alliance