Their explanation for the mathematical mismatch is that they only accept votes from regular voters. However the raw data is not given for these set of votes, and IMHO, their rankings don't seem to be of particularly higher quality from this filtering. Bowling for Columbine (a very controvertial and highly political, but nevertheless extremely popular movie), briefly ranked amongst the top 50 movies on IMDb, then suddenly disappeared followed by an update to their FAQs indicating that documentaries were to be excluded from the top 250 list (where prior to this, they had not -- in particular I remember seeing Hoop Dreams amongst the top 250 at one point.)
In the list below I have rectified these two major problems. Documentaries from their top 50 documentary list have been added, those that had too few votes eliminated, and all scores were recalculated from scratch using the exact formula as originally described by IMDb (but taking everyone's votes into account.) The results are startling to say the least.
I should point out that I am not a movie industry shill, nor do I have any hidden adgenda. The IMDb top 250 just did not seem very accurate to me, so I wanted to fix it.
Update: From the September 30, 2003 refresh, it seems likely from the numeric values, that probably there should be a completely new set of movies beyond the rank of 240. I am now using an entry retention algorithm so that as I update the list, movies which fall out of IMDb's top 250 list are not forgotten. Hopefully over time, this along with manual entries, will give more accurate entries for the lower end of this list.
So as it stands I would say that the movies ranked up to about the top 235 or so are definately more accurate with this list versus IMDb's in all respects, while those listed after that point are more accurately ordered here, but likely closer to their actual rank on IMDb (in theory -- since many of these are not even ranked by IMDb).
Update: IMDb changed their server configuration to limit the maximum number of accesses per second that I or anyone else can access their servers at. Of course, its not my intention at all to cause any kind of denial of service for them -- the tool I use to pull down their webpages observes "530 messages" which the server is supposed to issue when its overloaded. The problem is that when IMDb gets tired of sending me 530 messages, they start just giving me a substitute HTML page (which is worse for them and me) with a message indicating that I'm hitting their site too often. There are much better ways of dealing with this -- my tool (and any proper tool that understands 530s) tries to do the right thing and decreases the retry frequency in an exponential way. So they really could just keep sending me 530s (which should be very low overhead for their server), and it would work out for both sides.
This is the primary reason that there was such a lag in the updating of this page. In absence of a fix on their side, I implemented a work around to slow down the downloads from my side (to once every 10 seconds as they suggest) which seems to have worked just fine.
Update: Just for those geeks out there that are interested: I have significantly revamped the tool I use to pull down IMDb. The primary fix is to redownload any URL grabs that cannot be parsed (sometimes IMDb returns with completely corrupted HTML.) This fixes an issue where movies would sometimes "disappear" off the chart. I also now retain the URL <-> title mapping, so that I can bypass the mainpages of each movie and just fetch the ratings pages directly (this just makes it faster for me, which means I will be less disinclined to update this page). Finally, the tool now operates in an incremental state -- so if a run fails (possibly because I am too impatient and stop it because I need to use my computer for something else) I can rerun it and it will pick up where it left off.
This site has been noticed!
The corrected list.Last updated: 04-11-2013 |