jason.madden
2014-01-30 14:58:47 UTC
Hello ZODB dev,
I was recently trying to GC a large multi-database setup for the first time using zc.zodbdgc. The process wouldn't complete (or really even get started) because of an IndexError being thrown from `zc.zodbdgc.getrefs` (__init__.py line 287). As I traced through it, it began to look like the combination of `cPickle.Unpickler.noload` and multi-database persistent ids (which in ZODB are list objects) fails, generating an empty list instead of the expected [ref type, args] list documented in `ZODB.serialize`. This makes it impossible to correctly GC a multi-database.
I was curious if anyone else had seen this, or maybe I'm just doing something wrong? We solved our problem by using `load` instead of `noload`, but I wondered if there might be a better way?
Details:
I'm working under Python 2.7.6 and 2.7.3 with ZODB 4.0.0, zc.zodbdgc 0.6.1 and eventually zodbpickle 0.5.2. Most of my results were repeated on both Mac OS X and Linux.
['m', ('Users_2_Prod', '\x00\x00\x00\x00\x00\x00\x00\x01', <class 'zope.site.folder.Folder'>)],
('\x00\x00\x00\x00\x00\x00\x00\x10', <class 'zope.site.folder.Folder'>)]
The results are the same using zodbpickle or using an actual callback function instead of the append-directly-to-list shortcut.
If we fix the IndexError by checking the size of the list first, we miss all the cross-db references, meaning that a GC is going to be too aggressive. But using `load` is slower and requires access to all of the classes referenced. If anyone has run into this before or has other suggestions, I'd appreciate hearing them.
Thanks,
Jason
I was recently trying to GC a large multi-database setup for the first time using zc.zodbdgc. The process wouldn't complete (or really even get started) because of an IndexError being thrown from `zc.zodbdgc.getrefs` (__init__.py line 287). As I traced through it, it began to look like the combination of `cPickle.Unpickler.noload` and multi-database persistent ids (which in ZODB are list objects) fails, generating an empty list instead of the expected [ref type, args] list documented in `ZODB.serialize`. This makes it impossible to correctly GC a multi-database.
I was curious if anyone else had seen this, or maybe I'm just doing something wrong? We solved our problem by using `load` instead of `noload`, but I wondered if there might be a better way?
Details:
I'm working under Python 2.7.6 and 2.7.3 with ZODB 4.0.0, zc.zodbdgc 0.6.1 and eventually zodbpickle 0.5.2. Most of my results were repeated on both Mac OS X and Linux.
p = 'cBTrees.OOBTree\nOOBTree\nq\x01.((((X\x0c\x00\x00\x00Users_1_Prodq\x02]q\x03(U\x01m(U\x0cUsers_1_Prodq\x04U\x08\x00\x00\x00\x00\x00\x00\x00\x01q\x05czope.site.folder\nFolder\nq\x06tq\x07eQX\x0c\x00\x00\x00Users_2_Prodq\x08]q\t(U\x01m(U\x0cUsers_2_Prodq\nU\x08\x00\x00\x00\x00\x00\x00\x00\x01q\x0bh\x06tq\x0ceQX\x0b\x00\x00\x00dataserver2q\r(U\x08\x00\x00\x00\x00\x00\x00\x00\x10q\x0eh\x06tQttttq\x0f.'
import cPickle
import cStringIO
refs = []
u = cPickle.Unpickler(cStringIO.StringIO(p))
u.persistent_load = refs
u.noload()
u.noload()
refs
[[], [], ('\x00\x00\x00\x00\x00\x00\x00\x10', None)]import cPickle
import cStringIO
refs = []
u = cPickle.Unpickler(cStringIO.StringIO(p))
u.persistent_load = refs
u.noload()
u.noload()
refs
refs = []
u = cPickle.Unpickler(cStringIO.StringIO(p))
u.persistent_load = refs
u.noload()
u.load()
refs
[['m', ('Users_1_Prod', '\x00\x00\x00\x00\x00\x00\x00\x01', <class 'zope.site.folder.Folder'>)],u = cPickle.Unpickler(cStringIO.StringIO(p))
u.persistent_load = refs
u.noload()
u.load()
refs
['m', ('Users_2_Prod', '\x00\x00\x00\x00\x00\x00\x00\x01', <class 'zope.site.folder.Folder'>)],
('\x00\x00\x00\x00\x00\x00\x00\x10', <class 'zope.site.folder.Folder'>)]
The results are the same using zodbpickle or using an actual callback function instead of the append-directly-to-list shortcut.
If we fix the IndexError by checking the size of the list first, we miss all the cross-db references, meaning that a GC is going to be too aggressive. But using `load` is slower and requires access to all of the classes referenced. If anyone has run into this before or has other suggestions, I'd appreciate hearing them.
Thanks,
Jason