Today, even though Redfin.com looks exactly the same, it’s actually a little bit more precise in nearly 14 million different ways.
As Arthur Patterson explains on our devblog, the new version of Redfin.com that we began rolling out last week is another milestone in our Freakish Depth strategy – to provide more and better data about every home for sale in the markets we serve, and to build an online application that takes consumers deeper into the home-buying process. We set out on this strategy January 2008 after we realized real estate consumers’ appetite for data is very high, and their tolerance for quality glitches is low.
Since then, our quest to show every home for sale, in exactly the right location, has taken us to some obscure places. For example, over the past two weeks, we’ve been re-analyzing the position of each of 17,643,511 properties on Redfin’s map, getting a second and sometimes a third opinion on the location of each. In about 80% of cases, we adjusted a property’s location on our map a little or a lot, so that each property would be placed dead-center on the parcel of land defined by its deed.
Mapping 95%+ of Properties In Our Markets
As a result, a new version of Redfin has now improved the percentage of properties we can map from 94.4% to 95.4%, decreasing unmapped properties by 17%. More important, we’ve increased the percentage of for-sale listings we can map to an exact point from 55.4% last week to 70.2% now. (And this is just listings; considering all properties on Redfin, which includes both listings and properties that have sold in the past 20 – 30 years, we can point-map well more than 80%.) In some places, like San Francisco, one of our partners expects to get significantly better point data over the next few months.
We now map only 17.4% of listings by interpolation, using the address just to guess where the listing is on a block (an example of an interpolation would be “about two-thirds down, on the left-hand side”). We also use location data provided by assessors’ offices and broker databases (Multiple Listing Services, or MLSs) to locate another 6.1% of listings on a map.
Of the remaining 6.3% of listings that are totally unmapped, nearly 6% are unmapped because the real estate agent chose to withhold its address or entered a nonsensical address such as “0 Main Street.” We are thus near the theoretical maximum of listings that we can map. If you include properties that aren’t for sale, only 5.4% of the total are unmapped, also near its theoretical maximum.
Getting a Second Opinion for Every Unmapped Property
How did we increase our precision to these new heights? Well, for starters, we shelled out for two different geo-coders. A geo-coder is a service run by different Redfin partners for locating addresses on a map. I like to think – and it is probably true – that a geo-coder is a descendant of the Cold-War technology the U.S. once developed to drop a nuclear bomb directly on the bathroom of the Moscow apartment AND the country dacha of every member of the Politburo.
Using two geo-coders allows us to pump every address through both to see which one can come up with a match. And on top of that, we run listings through our special human-powered PITA geo-coder; whenever a home-owner emails us to complain about the location of a property on our map, we move it by hand, and add a note to our database so we remember the correct spot.
Yikes. Does this scale? No. Was I initially opposed to moving properties by hand? Yes. But it is hard to tell employees to care less about quality, particularly when you start from the premise that your website is the beginning of a trusting relationship between Redfin and the consumer. Here’s the full breakdown of how we map listings by order of priority:
|Mapping Technique||Old Way||New Way|
|Point-Mapped by Hand||1.1%||1.1%|
|Point-Mapped by Geo-coder #1||54.3%||63.1%|
|Point-Mapped by Geo-coder #2||0%||6.0%|
|Interpolated by Geo-coder #2||0%||16.4%|
|Interpolated by Geo-coder #1||28.3%||1.0%|
|Located by MLS or Tax Assessor||7.5%||6.1%|
Was running each location through a 6-point check worth it? Only you can be the judge. For the Freak in Freakish Depth is none other than you, the gentle reader, who is typically about to plunk down $700,000 for a house. All our professional lives, software people dreamed in vain of finding people like you, who care as much as we do about data quality, and who will take as much data as we can publish.
The Freak Behind The Freak
And the Freak behind the Freak is Arthur Patterson and his colleagues, who developed this new algorithm for precisely locating properties.
For every place and time in history there is a man or woman uniquely suited to what the age demands: Winston Churchill had been rattling sabers since Gallipoli but finally found an adversary worth fighting in Nazi Germany. Joe Montana had his arm-strength questioned since high school, but fit right into an offense that prized finesse over fifty-yard bombs.
Arthur and his colleagues bring the same enthusiasm to the vast, untamed wilderness of unmapped listings, fat-fingered addresses and freshly bulldozed cul-de-sacs. It’s a problem so daunting that most real estate websites have ignored it, instead publishing the good alongside the bad.
But we have decided to invest in data quality because it is an area where we really can be different: unlike other online real estate companies, Redfin as a broker has access to the broker databases, so we have much better data. But unlike the other brokers, we as Internet software folks are more willing and able to publish all of the data available to us.
If you find a property on Redfin that isn’t mapped right – annoyingly, there are still thousands – email us at support (at) Redfin (dot) com. We may not get every property every time, but we’ll try. And, at least where property location is concerned, we’ll never make the same mistake twice.
Finally, if there are other areas where you’d like to see Redfin improve its data quality or gather new data, let us know in the comments below. We’re always eager to hear your ideas.