Getting started with Mechanical Turk for data cleanup

In the months leading to the launch of Redfin Open Book, we embarked in an ambitious data cleanup project. We had 7,500+ free-form text fields from which we needed to extract structured vendor information. We ended up with 3,000+ cleaned database records. In this post I’ll walk you through how we used MTurk, and share the lessons I learned from using it. Read More

JSON in Postgres

Since Dec 2011 we’ve been storing a small portion of our data as JSON in Postgres.  This blog post gives a quick overview of why we decided to do this, how it works, and what we’ve learned so far. Why JSON in Postgres? Redfin’s basic stack includes a Java-based web app on top of a… Read More

Prefetching Web Content: Trials and Tribulations

Stoyan is totally right and I’m totally wrong (see his comment below, which reads “The thing about google maps you load is that it’s an html page. When you load html page in object tag it’s as if you put it in an iframe. It includes all markup and extra css/js/img resources.”) My test was… Read More

Use dojo.hash instead of dojo.back

In Dojo 1.4, the Dojo Toolkit team introduced a new “dojo.hash” library for managing the back button in AJAX applications. It’s a replacement for “dojo.back,” which was available in Dojo 1.0. If you’re deciding whether to use dojo.hash vs. dojo.back for your next web application, you should use dojo.hash. Background: Back Button in AJAX AJAX… Read More

IE9′s Viewport Code is Broken

Internet Explorer 9 Beta sometimes gives the wrong answer when we ask for the size of the viewport (viewable area of the browser window).  On an HTML document with “X-UA compatible” set to IE=7 and Windows font DPI set to 125% or 150% with the browser window maximized, IE9 claims the viewport is a few… Read More