in python, faster a) build set list of n items b) insert n items set?
i found page (http://wiki.python.org/moin/timecomplexity) did not have enough information conclude faster.
it seems, inserting items 1 @ time in worst case take o(n*n) time (given uses dicts), , o(n*1) in average case. initializing set list offer performance improvement?
in terms of o()
complexity - it's same, because both approaches same - insert n
items set.
the difference comes implementation: 1 clear advantage of initialization iterable save lot of python-level function calls - initialization iterable done wholly on c level (**).
indeed, tests on list of 5,000,000 random integers shows adding 1 one slower:
lst = [random.random() in xrange(5000000)] set1 = set(lst) # takes 2.4 seconds set2 = set() # takes 3.37 seconds item in lst: set2.add(item)
(**) looking inside code of sets (objects/setobject.c
), item insertion boils down call set_add_key
. when initializing iterable, function called in tight c loop:
while ((key = pyiter_next(it)) != null) { if (set_add_key(so, key) == -1) { py_decref(it); py_decref(key); return -1; } py_decref(key); }
on other hand, each call set.add
invokes attribute lookup, resolves c set_add
function, in turn calls set_add_key
. since item addition relatively quick (the hash table implementation of python very efficient), these calls build up.
Comments
Post a Comment