Archive for the ‘Maps’ Category
January 16, 2009
Those of us on the Redfin data team see every data problem that gets reported by our users. Of course, if we were to respond directly to each of them, we’d never get anything else done (okay, our data’s not that bad), but luckily the product management team takes care of that for us. One of the most bothersome ones we see, partly due to getting at least a dozen each week, is where:
- The user is reporting we mapped something wrong;
- We know it’s because our mapping software isn’t perfect;
- No-of-course-we-don’t-check-each-one-of-our-450,000-listings-by-hand; and
- Hopefully the user provided enough information that we can correct that one listing
Well, we decided it was time to do something about it. In particular, we upgraded the geocoding algorithm we use to place listings on the map so that, for the listings in our system:
- Approximately 1.1% have been hand-mapped
- Our automated “point-level” (that is, mapped to the rooftop of the given address by a geocoder) mapping percentage went from about 53.3% to 69.1%
- Our percentage of listings that are mapped, but not necessarily to the exact rooftop, went from 35.7% to 23.5%
- Our unmapped percentage went from 9.9% to 6.3%
- Our percentage of listings that will never be mappable (due to the agent choosing to not disclose the address) is 5.7%, so this is a pretty big improvement.
How’d we do it? As I’m sure you’ve already noticed we made the switch to Google Maps in Mid-December, much to the dismay of our Birds-Eye loving users. One of the benefits of this was access to Google’s web-based geocoder. We investigated a wholesale replacement of our existing (super-secret) geocoder with the Google geocoder, but decided instead to enhance our geocoding rate and accuracy by integrating Google’s geocoder into our current system and using a feedback algorithm when we knew we weren’t getting the best result possible.
This seems fairly straightforward, but unfortunately there were a few classes of gotchas that we ran into. Many of these stem from us relying on our geocoder not only for geocoding, but for address parsing and normalization as well. Here are the biggest problems we found with the current version of the Google HTTP geocoder.
- If possible, don’t pass unit information (the “Unit 33″ part of the address) to Google’s geocoder
Currently it discards the unit information; it’s not returned in the parsed, corrected address, so you get no benefit from inputting it in the first place. The real problem though is that if it finds a better match to the address you input – say, 3000 Federal Avenue, Unit 33 – by replacing the street number with the unit number – for example, saying you live on 33 Federal Avenue when really you live on 3000 North Federal Avenue, Unit 33 – then that’s what it will return you.
- Google’s geocoder doesn’t warn you when it drastically changes an address – for example Google could do something that seems totally oddball like changing “822 Country Avenue, Quincy, Washington” to “822 North Quincy Street, Arlington, Virginia”, without telling you. Sometimes you’ll see these suggestions as a consumer when Google Maps asks, “Did you mean: 822 North Quincy Street, Arlington, Virginia?” But when you’re dealing with them programmatically, they don’t even ask.
We solved this in a couple of ways:
- If the state code changes, we disregard the results. If your input is clean you could actually be stricter about when to disregard the results, but unfortunately when dealing with real estate data it’s common to see a zip code fat-fingered, different city names for the same actual city, a mistyped street name, a missing directional, or the wrong street type.
- We use the string distance between the input address and our possible results to determine which result is best. Sometimes Google’s geocoder will provide multiple results, and we always have the results from our other geocoder, so this tends to filter out the most erroneous outliers. For example, let’s say our input way “Quincey Road” and we ended up with two results, “Quincey Street” and “Quincy Road”. We would take the second result, because there’s only a one-character difference rather than differing on an entire word.
- Sometimes Google’s geocoder over-simplifies complex street numbers – reducing “1421-1423 Hayes Street” to just the first address “1421 Hayes Street” (compound addresses like this are somewhat common for tenancy in common listings in San Francisco)
We got around this simply by checking that the street number of the input corresponds to the street number of the result, and disregarding the result if they don’t match. But it’s important to bear in mind the cases where a simplified version of your input address might be a valid address, albeit not the result you wanted – for the Hayes Street example, it’s very important for us to maintain that we’re talking about both 1421 and 1423 Hayes Street, not just one of them, even though each are valid addresses taken separately. Another great example of this in real estate is when we’re dealing with the historical form of Chicago addresses (which I only learned about because of this problem, and I must say, are completely sweet in their functionality).
There were also a few lessons that we learned and proved to be important:
- Google’s address level geocodes (indicated in their system as having accuracy code “8″) can be either point-level or street-interpolated. Their street level geocodes are only on the best-matching street, and don’t appear to take the street number into account when placing the coordinates on the given block. This also means that the street number is not returned as part of the normalized address at this accuracy.
- Their geocoder can return multiple results, and it’s not always the case that the first result returned is the one you want. Having a good filtering algorithm for choosing the best result is incredibly important.
- java.util.concurrent has some incredibly powerful and easy-to-use utility classes for multi-threading applications that can be broken up into independent units of work.

- Just because snow shuts down your city doesn’t mean you don’t have to work. Yes, that’s right, we cranked this out over the holidays!
Overall, we’ve been happy with the results of integrating Google’s geocoder (clearly, otherwise we wouldn’t be talking so much about it). Many of our initial concerns ended up being non-issues, partly because Google was so helpful when we brought them up. That being said, there’s still a short list of things we’d like to see added (who knows, maybe they’ll read this!):
- The ability to distinguish between point-level and street-interpolated geocodes. This is one of our largest remaining issues, since we take our data quality so seriously and we like to be able to measure it.
- More point-level data. All of the listings that are mapped directly onto a street rather than over a specific house are placed there because Google didn’t know exactly which house to put them over, only approximately how far down the street they are. We would love to see these directly over houses in the future.
- Componentized address parsing. Instead of just telling us the result is “710 2nd Avenue” we’d like to know that “710″ is the street number, “2nd” is the street name, and “Avenue” is the street type, without having to do any post-processing on our end.
- Less latency. Since it’s a web-based service, the network latency can add considerably to the time it takes us to get results back. A batch geocoder would be one possible solution.
- More throughput. Currently there are caps of 10 requests per second per IP address (so Google can protect against denial-of-service attacks). It would be nice if they could raise or eliminate these caps for customers.
- A pony. A big, shiny one.
There are still a few areas left where we know our geocoding could still be improved. One of the biggest remaining problems is with vacant land, which might not have a complete address yet, and neither of our geocoders supports partial addresses (such as “123XX Main St”). To take care of these cases, we’ll be looking to integrate a geocoder that can geocode by APN (essentially, a locally unique id that every property has).
Have you found any other bugs in Google’s geocoder that we might not have caught? Or know of any other cities with quirks that make geocoding difficult? Perhaps you know of a good APN geocoder? Let us know!
(Photo credits: tympsy and cheukiecfu on Flickr, respectively)
October 22, 2007
Last week, Microsoft released a beta version of a tool called MapCruncher. MapCruncher allows you to take a rendered image (GIF, PNG, JPG) and create a set of map tile layers that can be then drawn over a Virtual Earth (VE) map with the correct projection.
Since Redfin’s map is based on VE’s platform, I spent a couple hours using MapCruncher on a test project to see what we could potentially do with it. My project was to take an image of the BART routes in the Bay Area and place it on the map.
It worked out pretty well. A screenshot of the resulting mashup map that was created, or you can just check out the resulting mashup yourself.
What did I learn about MapCruncher?
- It’s really easy to use. It provides a side-by-side view of your desired overlay image and a live VE map view. You simply line up landmarks in the cross hairs of each view and mark it as the same location.
- It’s pretty quick. I had generated the first version in over an hour and it worked.
- Your overlay images need to be very basic and have no text. The original route map image I tried had all the location names and some other features drawn on it. When it was re-projected, the text was completely warped out of shape and unreadable. I used a image editing tool to remove all the text, background colors, etc. – everything except the routes themselves. That simplified route map image considerably cleaned up the resulting mashup.
- More landmarks will definitely improve your results. This may seem obvious, but MapCruncher says it only needs around 10 waypoints to render the mashup. I set 10 along the west edge of the map and two on the right. It did great alignment on that left set of points, but the projection was way out of whack along the right in Oakland and Pleasanton. I ended up setting 21 points with an even mix across the different quadrants of the map.
- Zoom levels will make you want two source images. I set the render to only go to zoom level 11 first, but didn’t like how soon the route map disappeared when I zoomed in lower. So I cranked it up to render all the way down to 13. Since it’s all doing it from one image, the routes were drawn gigantically at lower zoom levels. I’m guessing you probably need to have different source images for different sets of zoom levels to account for size differences.
Here’s the components of my project if you want to try it out yourself:
MapCruncher seems like its great for either a simple image overlay or a quick and dirty project. If all we had was a rendered image of some information, I could defintely see it as a way to get that data integrated. Honestly, I can’t now imagine trying to generate a set of projected tile layers of it without MapCruncher.
You’ll probably have to spend a fair amount of time getting the overlay images just right so that they work at all zoom levels. You’ll probably want to play with transparency as well to make sure it doesn’t completely obscure the map. At some point, I’ll have to try rendering the image from raw GIS data in a tool like ESRI to see how that improves the results.
You’re probably wondering when this will show up on Redfin’s map. I’m not sure it will. We already display all the BART stops on the map thanks to VE. Adding the route overlays actually adds a lot of clutter to the map without providing a lot more information.
We could certainly be wrong on that though. Drop us a comment if you believe this is very valuable info to add to the Bay Area map or if you have suggestions on what other data overlays we should add to help in your house search.
January 12, 2007

There’s a pretty big technology change on Redfin.com today – the integration of Microsoft Virtual Earth as our underlying map platform. Redfin pioneered the use of satellite maps to display information about for sale homes. We built an in-house mapping solution using imagery acquired from various sources (mostly the USGS.) Since then a few other major players have come on the scene in the mapping world and it caused us to reconsider how we want to move Redfin forward.
We evaluated our top options: Google Maps, Microsoft Virtual Earth, or continue to develop our own mapping technology. An “arms race” is happening in the mapping space and it was clear that we don’t want to compete with Google and Microsoft in the map platform arena. We had to exit the map technology business and switch to a web mapping platform that met our needs.
In the end the race was close as the platforms are very similar in many respects, but Microsoft Virtual Earth was the best fit for our requirements. The table below shows how they compare.
| |
Google Maps |
Microsoft Virtual Earth |
| Speed of development |

API is easy to use |

API is easy to use |
| Ability to support Redfin feature set |
—Overall, our features were supported; parcel outlines were a potential problem. |
—Overall, our features were supported; parcel outlines were a potential problem.Easier to customize for a Redfin experience. |
| Map imagery (today & future)Photos are the most critical component to searching for homes online. Other than “can we make it work?”, this was the most important component for us. |
—Consistent imagery with reliable updates, but Google’s goal is to be fast, not deep; additional imagery is outside their focus.Example: Houses in San Jose, CA on Google Maps |

Imagery nationwide is spotty, but for Redfin-supported metros the aerials are good.Example: Houses in San Jose, CA on Virtual EarthVirtual Earth wins by adding “bird’s-eye” imagery for the same location. If as a home shopper, you are trying to decide on whether you liked a house well enough to tour, this view is a significant improvement over the 2-D views. (We didn’t get the Bird’s Eye view in this release, but we will soon.)Microsoft also has some interesting future concepts for street-level imagery that could take our home searching experience even further. |
| Additional data layers and web services |
—Support for:
- Address/location lookup
- Geocoding
- Cross-street location lookup
- Business/Yellow Page listings
- Driving directions
- Direct integration into Redfin site |
Support for:
- Address/location lookup
- Geocoding and batch geocoding
- Business/Yellow Page listings
- Driving directions
- Real-time traffic incidents/congestion
- Points of interest near a location
- Direct integration into Redfin site
- Getting a list of geographic entities for a particular geographic latitude/longitude |
| Browser support |
Firefox, Internet Explorer, Safari |
—Firefox, Internet ExplorerNo support for the Safari browser on the Mac – this was a hard decision for us, as many of our San Francisco users are Safari users.Fortunately, Firefox 2 does work great on the Mac and is free, but still, not the result we wanted. |
| Cost |
(Service costs were similar.) |
(Service costs were similar.) |
FWIW, I originally had a third “Do-It-Ourselves” column, but the negatives were so obvious I dropped it. The data acquisition and integration development costs were just too high to be a sensible choice for Redfin moving forward.
Having completed the transition from our Flash-based Redfin map to Microsoft Virtual Earth, our principal engineer, Michael, had the following observations about developing for a web mapping platform:
Drawbacks developing with a web mapping platform:
— No support for clickable or hoverable polygons for our property lot outlines, an important piece of the Redfin map experience. We use the VE API to paint polygons and pushpins on the map, but there is no VE facility for making those items clickable (i.e. when a user clicks on the parcel, the property information balloon should pop up.) Adding clickability was a MAJOR pain, since VE does not expose access to the underlying drawing primitives or tag the primitives with the ID of the corresponding VE object. It doesn’t look like Google Maps has this support either; so we’d probably have been equally pained with them.
— Asynchronous pans and zooms make the programming model difficult. Redfin rolled our own pop-up balloons for property information and therefore Redfin is responsible to placing them in the correct location on the screen. VE pans and zooms asynchronously, so the map will actually sometimes move after we draw and place the balloon, but the balloon won’t move with it because VE doesn’t natively know to move it.
Pros developing with Virtual Earth:
— API is very straight-forward and well documented. Getting a demo up and running was trivial, and the object model is clear.
— Technical support exists and has been very responsive. We usually received a response within a few hours.
Cons developing with Virtual Earth:
— Interoperability with other JavaScript libraries is compromised by VEs approach to the onunload handler. VE replaces the onunload handler with its own function, so that it can do cleanup of any items it has allocated. It appears to throw away any pre-existing onunload handler and other libraries that also require cleanup will be “orphaned” (their onunload handlers will not be called.) This is poor form, and can cause massive memory leaks in Internet Explorer 6.
— It was difficult to get VE to show up in the correct location on the screen. Redfin uses absolute positioning and sizing in CSS to set the location of the VE map. Getting this to work correctly with VE took too many hours (and broke again numerous times.)
— Microsoft doesn’t offer a “debug” version of their library. The library that they deliver is compressed and somewhat obfuscated, which makes it harder to reverse engineer and debug. We understand why they did that (customers shouldn’t need to be looking under the covers), but it does make debugging a pain for our developers.
—Microsoft doesn’t support tying your app to a particular version of Virtual Earth. You can specify a major version (e.g. version 4) but you can’t specify a minor version (e.g. 4.0.2.3.) Microsoft reserves the right to swap versions out under their customers at their discretion. In general, this isn’t a problem, but it becomes a problem when you depend on the internal workings of VE. We do hack VE a little (since it doesn’t support all the features we need, as mentioned below), so we’re vulnerable to any VE changes – our site could break without notice due to a Microsoft change, which is… less than ideal, but was a necessary trade-off for the Redfin experience to continue.
Would we make the same decision to go with Virtual Earth if we were starting Redfin today knowing what we know now?
The answer is definitely “Yes.”
With Virtual Earth, Redfin is able to continue to deliver the same Redfin search experience experience with an immediate ability to double our geographic coverage and improve our road maps, quickly expand to more metropolitan areas, and add more features. You’ll see more of the features Virtual Earth provides integrated in future versions of Redfin.com and I’m sure we’ll write more as we go.
We’d love to hear how your experience developing with Microsoft Virtual Earth or Google Maps has compared to ours.
Note: originally posted in the Redfin Corporate Blog here.