we working on project feature real-time statistics of actions (e.g. clicks). on every click, log information date, age , gender (these come facebook), location, etc.
we discussing best place store these information , use them real-time stats. display aggregate statistics: example, number of clicks, number of clicks made male/female, number of clicks divided age groups (e.g. 18-24, 24-30...).
since on site using mongodb everywhere, colleague thought should store statistics inside well. i, however, prefer sql-based database task, mysql (or maybe drizzle), because believe sql better when doing operations data aggregation. although there's overhead of parsing sql, think mysql/drizzle may faster no-sql databases here. , inserts not slow too, when using insert delayed queries.
please note not need perform joins or collect data multiple tables/collections. thus, don't care if database different. however, care scalability , reliability. building (hopefully) become big, , we've designed every single line of code scalability in mind.
what think this? there reason prefer mongodb on mysql/drizzle this? or indifferent? 1 use, if us?
thank you, alessandro
so buddymedia using of this. gilt groupe has done pretty cool hummingbird (node.js + mongodb).
having worked large online advertiser in social media space, can attest real-time reporting pain. trying "roll-up" 500m impressions day challenge, trying real time worked, carried significant limitations. (like delayed 5-minutes :)
frankly, type of problem 1 of reasons started using mongodb. , i'm not one. people using mongodb kinds of real-time analytics: server monitoring, centralized logging, dashboard reporting.
the real key when doing type of reporting understand data structure different mongodb, you're going avoid "aggregation" queries, queries , output charts going different. there's coding work on client side.
here's key may point in right direction doing mongodb. take @ following data structure:
{ date: "20110430", gender: "m", age: 1, // 1 bucket impression_hour: [ 100, 50, ...], // 24 of these impression_minute: [ 2, 5, 19, 8, ... ], // 1440 of these clicks_hour: [ 10, 2, ... ], ... } there tweaks here, appropriate indexes, maybe mushing data+gender+age _id. that's kind of basic structure of click analytics mongodb. it's easy update impression , clicks { $inc : { clicks_hour.0 : 1 } }. update whole document atomically. , it's pretty natural report on. have array containing hourly or minute-level data points.
hopefully that's points in right direction.
Comments
Post a Comment