Performance comparison: insert vs build Python set operations -


in python, faster a) build set list of n items b) insert n items set?

i found page (http://wiki.python.org/moin/timecomplexity) did not have enough information conclude faster.

it seems, inserting items 1 @ time in worst case take o(n*n) time (given uses dicts), , o(n*1) in average case. initializing set list offer performance improvement?

in terms of o() complexity - it's same, because both approaches same - insert n items set.

the difference comes implementation: 1 clear advantage of initialization iterable save lot of python-level function calls - initialization iterable done wholly on c level (**).

indeed, tests on list of 5,000,000 random integers shows adding 1 one slower:

lst = [random.random() in xrange(5000000)] set1 = set(lst)    # takes 2.4 seconds  set2 = set()       # takes 3.37 seconds item in lst:     set2.add(item) 

(**) looking inside code of sets (objects/setobject.c), item insertion boils down call set_add_key. when initializing iterable, function called in tight c loop:

while ((key = pyiter_next(it)) != null) {   if (set_add_key(so, key) == -1) {     py_decref(it);     py_decref(key);     return -1;   }    py_decref(key); } 

on other hand, each call set.add invokes attribute lookup, resolves c set_add function, in turn calls set_add_key. since item addition relatively quick (the hash table implementation of python very efficient), these calls build up.


Comments