Redfin Gets Better in Almost 14 Million Different Places

Today, even though Redfin.com looks exactly the same, it’s actually a little bit more precise in nearly 14 million different ways.

As Arthur Patterson explains on our devblog, the new version of Redfin.com that we began rolling out last week is another milestone in our Freakish Depth strategy – to provide more and better data about every home for sale in the markets we serve, and to build an online application that takes consumers deeper into the home-buying process. We set out on this strategy January 2008 after we realized real estate consumers’ appetite for data is very high, and their tolerance for quality glitches is low.

Since then, our quest to show every home for sale, in exactly the right location, has taken us to some obscure places. For example, over the past two weeks, we’ve been re-analyzing the position of each of 17,643,511 properties on Redfin’s map, getting a second and sometimes a third opinion on the location of each. In about 80% of cases, we adjusted a property’s location on our map a little or a lot, so that each property would be placed dead-center on the parcel of land defined by its deed.

Mapping 95%+ of Properties In Our Markets
As a result, a new version of Redfin has now improved the percentage of properties we can map from 94.4% to 95.4%, decreasing unmapped properties by 17%. More important, we’ve increased the percentage of for-sale listings we can map to an exact point from 55.4% last week to 70.2% now. (And this is just listings; considering all properties on Redfin, which includes both listings and properties that have sold in the past 20 – 30 years, we can point-map well more than 80%.) In some places, like San Francisco, one of our partners expects to get significantly better point data over the next few months.

We now map only 17.4% of listings by interpolation, using the address just to guess where the listing is on a block (an example of an interpolation would be “about two-thirds down, on the left-hand side”). We also use location data provided by assessors’ offices and broker databases (Multiple Listing Services, or MLSs) to locate another 6.1% of listings on a map.

parcelscreenshot.jpg

Of the remaining 6.3% of listings that are totally unmapped, nearly 6% are unmapped because the real estate agent chose to withhold its address or entered a nonsensical address such as “0 Main Street.” We are thus near the theoretical maximum of listings that we can map. If you include properties that aren’t for sale, only 5.4% of the total are unmapped, also near its theoretical maximum.

Getting a Second Opinion for Every Unmapped Property
How did we increase our precision to these new heights? Well, for starters, we shelled out for two different geo-coders. A geo-coder is a service run by different Redfin partners for locating addresses on a map. I like to think – and it is probably true – that a geo-coder is a descendant of the Cold-War technology the U.S. once developed to drop a nuclear bomb directly on the bathroom of the Moscow apartment AND the country dacha of every member of the Politburo.

Using two geo-coders allows us to pump every address through both to see which one can come up with a match. And on top of that, we run listings through our special human-powered PITA geo-coder; whenever a home-owner emails us to complain about the location of a property on our map, we move it by hand, and add a note to our database so we remember the correct spot.

Yikes. Does this scale? No. Was I initially opposed to moving properties by hand? Yes. But it is hard to tell employees to care less about quality, particularly when you start from the premise that your website is the beginning of a trusting relationship between Redfin and the consumer. Here’s the full breakdown of how we map listings by order of priority:

Mapping Technique Old Way New Way
Point-Mapped by Hand 1.1% 1.1%
Point-Mapped by Geo-coder #1 54.3% 63.1%
Point-Mapped by Geo-coder #2 0% 6.0%
Interpolated by Geo-coder #2 0% 16.4%
Interpolated by Geo-coder #1 28.3% 1.0%
Located by MLS or Tax Assessor 7.5% 6.1%
Unmapped 9.9% 6.3%

Was running each location through a 6-point check worth it? Only you can be the judge. For the Freak in Freakish Depth is none other than you, the gentle reader, who is typically about to plunk down $700,000 for a house. All our professional lives, software people dreamed in vain of finding people like you, who care as much as we do about data quality, and who will take as much data as we can publish.

The Freak Behind The Freak
And the Freak behind the Freak is Arthur Patterson and his colleagues, who developed this new algorithm for precisely locating properties.

For every place and time in history there is a man or woman uniquely suited to what the age demands: Winston Churchill had been rattling sabers since Gallipoli but finally found an adversary worth fighting in Nazi Germany. Joe Montana had his arm-strength questioned since high school, but fit right into an offense that prized finesse over fifty-yard bombs.

Arthur and his colleagues bring the same enthusiasm to the vast, untamed wilderness of unmapped listings, fat-fingered addresses and freshly bulldozed cul-de-sacs. It’s a problem so daunting that most real estate websites have ignored it, instead publishing the good alongside the bad.

But we have decided to invest in data quality because it is an area where we really can be different: unlike other online real estate companies, Redfin as a broker has access to the broker databases, so we have much better data. But unlike the other brokers, we as Internet software folks are more willing and able to publish all of the data available to us.

If you find a property on Redfin that isn’t mapped right – annoyingly, there are still thousands – email us at support (at) Redfin (dot) com. We may not get every property every time, but we’ll try. And, at least where property location is concerned, we’ll never make the same mistake twice.

Finally, if there are other areas where you’d like to see Redfin improve its data quality or gather new data, let us know in the comments below. We’re always eager to hear your ideas.

Discussion

  • http://www.consultantninja.com Consultant Ninja

    Your post describes a rare brilliance in businesses.

    Many organizations are driven by either grandiose strategic visions (led by MBAs with strategic differentiating blue-ocean leveraging advantages in their starry eyes), or by inward-looking bureaucratic organizations that change by moving people and roles.

    The rare trait you show is the power of starting with a basic but extensible product architecture and then continually looking for small, incremental improvements that improve the customer value proposition.

    At IBM’s e-business group in the 90s, I remember this being called “start small, grow fast.”

    At another client I worked for, we kept on measuring changes by estimate the % improvement delivered. We never wanted a 50%, or 100% improvement to their product; we just wanted to keep finding lots of tweaks and adds that really made the service better. We weekly thought of and implemented, 2%, 4%, 5%, improvements. This rapid iteration (enabled by the flexible product architecture) quickly added up to a truly kick-ass product.

    Which is what you have done here at Redfin. Well done sirs, well done.

  • http://www.diversesolutions.com Andrew Mattie

    I find it really interesting that you have a measurable percentage of your properties mapped by the MLS. I assume that, it some cases, it means that you’re using the latitude/longitude pairs that the MLS provides in their data? Our internal testing showed that the data from the MLS’s is almost always horribly inaccurate, and so we usually ditch it in favor of the three geocoding sources we use. It’s a really tough decision for us in each case to make the call in each case just because we’re concerned that the listing agents themselves actually had input into where they wanted the property to be located, but we make the decision to sacrifice for the greater good.

    Do you actually do hand-mapping in-house? Did you know that Proxix (http://www.proxix.com/) has a “hand-mapping” service? It’s expensive, but it might be something you’re interested in.

    Overall, your percentage of “ungeocodables” is surprisingly low. I’m really proud of our own system and the geocoding tips we learned in our travels (which I shared with Arthur via email last week), but our percentage of ungeocodables is definitely higher than yours. My suspicion is that a bunch of our smaller MLS’s that have less mapping data available and so it brings down our percentage of geocodeable properties. In large MLS’s though, like SoCal, our geocoding percentage is very high (95.7%).

  • arthur.patterson

    @Andrew:

    Yes, we definitely use the lat/long pairs provided by the MLS in some cases. I say some because like you mentioned, the data can be horribly inaccurate, so we only take it from the sources that our testing found to be reasonably close, most of the time, and then only if we are unable to get a geocode of any reasonable accuracy through any other means. This helps primarily with vacant land; it’s other impacts are minimal.

    Yes, we do very little hand-mapping in house, usually as a customer support request. We’re investigating whether we can get the user-supplied mappings from Google Maps.

    Thank you for the insights you shared! Your theory about smaller MLSs is probably right, and hopefully as the mergers increase the data quality will go up.

    It looks like you’ve actually got us beat (slightly!) for SoCalMLS. Do you map the undisclosed address listings there? I’m pretty sure you’re allowed to by MLS rules, but it’s not something we’ve tackled yet.

  • Asela Dahana

    Brilliant! Wow I had no idea. As another person said, small incremental improvements are always better that grandiose percentages. I am definitely buying a house via Redfin. The data you guys provide is incredible.

  • http://www.consultantninja.com Consultant Ninja

    Glenn-

    Along this vein of continually finding small improvements, I have a suggestion on refining your listing page.

    Currently you have a $/sq ft line chart on every listing page. You’ve set the baseline of the chart axis at zero, which is normally a good practice for line charts, since variance will be properly put into context of the overall value.

    However, in this case, I don’t believe that’s what people are looking for. Customers are comparing that trend to the current $/sqft price of the listing they are looking at.

    For example, a customer is interested in how average prices in that zip code have varied from $200-$250/sqft over the last few years, and that it’s currently at $218/sqft but trending downwards, relative to the current listing’s price of $228/sqft.

    Thus, a better way to show this graph would be to set the y-axis min a little below the minimum $/sqft relevant for the region, rather than 0.

  • John Kim

    @Ninja,

    We’ll be changing that in the next release. If you’d like the whole story…

    Those graphs are used in multiple places. In the first release of those graphs, they were scaled so that range was a little below the max and min values. However, this brought up strange problem. Regions with small changes looked like they were having massive price fluctuations because the small range was being scaled up to fit the graph. It’s like when you see a magnified picture of your skin. It looks like the Grand Canyon. People were having a hard time differentiating between areas that had small fluctuations to those areas with big fluctuations because they weren’t really bothering to read the scale.

    So we went to the data display books and many of them said that the only “honest” way to display data like this was to show them at the same scale. Due to the wide range of areas we cover, that meant starting the scale at 0. In reality, we’re not using the exact same scale because the max value still varies. But it did reduce people thinking there were huge fluctuations when in fact there were often times very little.

    But it also made it harder to pick out relevant details. So now we’re kinda going back to a middle ground were the scale does not start at 0, but it doesn’t just cover the max and min values either. We pad on either side of the graphs to smooth things out but still let people pick out details.

    In the future, we’re thinking of changing the graphs even more to allow user interaction and let you zoom in and out.

    John Kim | Product Manager

  • http://www.consultantninja.com Consultant Ninja

    John-

    I was looking at a house on redfin today, and wondered to myself, “hmm, what is their $/sqft rate relative to the local market” and scrolled down to the graph. I had two reactions in sequence:

    1) Holy cow, they actual listened to their blog, and implemented it! That’s unbelievable!

    2) Of course! These redfin guys really get it, I should have expected them to implement it that quickly.

    I think you nailed the solution in the short-to-medium term. I’m so happy that you didn’t do the tick marks in random ranges (eg 237, 261, 285, etc) like some sites.

    So, thank you.