python - Best way (threads/event-driven) of fetching data from many web pages -


i don't want make holywar on reason, advice , continue development.

i need write crawler, must able fetch data list of urls , parse it.

i going use ruby (mechanize + nokogiri) or python (mechanize + beautifulsoup).

but need parallel data handling efficiency. that's big problem me now.

mechanize (for both languages) not thread safe far know, using threads not "good practice" many programmers says. other side have no idea event-driven programming technics , how can used in case.

any appreciated. thanks.

i've been using scrapy great success. it's quite straightforward , allows multiple crawlers @ once. outputs json, xml, etc or directly database. it's worth sure.


Comments