Archive for the ‘Uncategorized’ Category

November 16, 2009

One Week After The Outage

We launched a major new version of Redfin.com a week and a half ago. The headliner was the addition of near-real-time “solds” data through our MLS-based virtual office website (VOW) data feeds. On launch day, we had a 3 hour outage and intermittent “brownouts” for another 2 days after. We wanted to give people an idea of what happened and what we’re doing to make sure this kind of outage doesn’t happen again.

Better and… Bigger
For 14 days prior to launch, we ran data imports day and night. We added 1.4 million records, 9 million photos, and revamped our internal database schema. As a result, the disk space used by our Postgres database grew by 30%. Way more disk was needed to store photos.

By Thursday morning, we were not able to go live with all our slave databases as planned. We use Slony replication for our slave databases. Errors in scripts can cause a Slony slave to require a complete re-sync, and that is, unfortunately, what happened to us. We launched believing that our single master database would handle the load. We were wrong.

By 9am PST on Thursday, our site was maxing out. First it was slow, then it was non-responsive. The problem wasn’t a rush of traffic from the press coverage. The problem was our single master database. The increase in database size and new schema overloaded it. We ended up throttling our database to allow most people to access Redfin.com, but this just caused intermittent issues and “brownouts,” where the site would be overwhelmed with requests and become non-responsive for a minute or two at a time.

Many engineers spent all Thursday and Friday looking at code, looking at the database, and looking at the traffic. Everyone was looking for some magical bug that was causing the problem. In the end, the solution was very simple. Once the slave databases were synced up and put into production on Friday at 8pm PST, the problem mostly went away. We’re still investigating the root cause, but all indicators are strongly pointing to the idea that we just didn’t have enough RAM to avoid disk I/O slowness and thrashing.

Lessons Learned
Redfin learned that the scalability & performance testing that we do before every release isn’t good enough. This outage hurt our professional pride, and we are newly dedicated to fixing this. We need to know every new release is going to run well against expected load and existing hardware.

For our next major release in December, we had been planning to upgrade our master database from Postgres 8.3 on 32GB of RAM to Postgres 8.4 on 72GB of RAM. The database servers are over two years old now. Too bad we didn’t do it sooner, but we’ve accelerated the hardware upgrade to have it ready this week. We’re also intrigued by the idea of using Fusion-IO SSDs at some point.

We also plan to spend more time looking at ways we can streamline the code to run the site more efficiently on the hardware. Hardware is relatively cheap these days, but smart engineers can often find places in the code that can be made 10x faster!

And as the site grows, we’ll also look at more scalable database solutions like partitioning or switching at least some parts to Hadoop HBase. We use Hadoop for log analysis, but it’s very promising as a high-scale query engine.

I know there are a lot of folks in technology who use Redfin. What do you think? Did we learn the right lessons?


September 30, 2009

How to Set Up Hot Code Replacement with Tomcat and Eclipse

This blog post will guide you through setting up Tomcat hot code replacement (also called hotswap debugging) in Eclipse.

  • What Is “Hot Code Replace”?
  • What’s the Catch?
  • What About JavaRebel?
  • Configuring Your Web Application in Eclipse
    1. Download Eclipse “JEE” Edition
    2. Switch to the “Java EE” Perspective
    3. Configure Your WAR Project
    4. Create a New Server
    5. Magic Setting: Disable “Auto Reloading” on Each Project in the Server
    6. Performance Tip: Adjust Memory Settings
    7. Start Your Tomcat Server in Debug Mode
  • Why Disable Auto Reloading?
  • Disable Auto Reloading but Enable Auto Publishing
  • Finding the tmp0 Fake Tomcat Directory
  • Exorcising the tmp0 Directory
  • Troubleshooting: What Do I Do If I Still Can’t Get It to Work?

What Is “Hot Code Replace”?

Hot Code Replace” (HCR) lets you make modifications to a Java class and see the effect immediately in a running JVM, without restarting your application. HCR is part of the Java Platform Debugger Architecture (JPDA) and is available on all modern JVMs.

Consider this ordinary application:

public class Sample {
  public static void main(String[] args) {
    String foo = "unchangeable";
    foo += blah();
    System.out.println(foo);
  }

  public static String blah() {
    String bar = "bar";
    bar += "blah";
    return bar;
  }

}

If you debug this class in Eclipse, you can make changes to it, on the fly, without restarting the JVM. For example, try setting a breakpoint on the second line of blah(). Next, change the literal blah to quz. Save the file and the program will continue running with the new code.

You can do this with Tomcat web applications in Eclipse, but it’s a lot trickier.

What’s the Catch?

There are a few limitations when using hot code replace. You can’t use JPDA HCR to change the signature of a class (add/remove methods or fields) or to add new classes on the fly. Additionally, some method calls (”stack frames”) can’t be modified, including the main method or any method invoked via reflection, that is, by using java.lang.reflect.Method.invoke().

Here’s what happens if you try to replace the unchangeable variable in the main method of Sample.java above.

unchangeable 300x115 How to Set Up Hot Code Replacement with Tomcat and Eclipse

What About JavaRebel?

JavaRebel is a hot code replacement system that’s a little better than JPDA HCR. (Maybe a lot better.)

With JavaRebel you can add/remove methods and classes without restarting Tomcat. However, JavaRebel costs $149 per developer per year, so it may or may not be worthwhile for you.

Configuring Your Web Application in Eclipse

  1. Download Eclipse “JEE” edition

    Most developers already use this, since it’s the first option available on the Eclipse download page, but if you’re using “Eclipse IDE for Java Developers” (92MB), you’ll need to go back and download “Eclipse IDE for Jave EE Developers” (189MB). It’s worth it, I promise!

    download screenshot 300x188 How to Set Up Hot Code Replacement with Tomcat and Eclipse

    Note: The difference between the regular Java IDE and the Java EE IDE is that the JEE edition comes with the Eclipse “Web Tools Project” (WTP), which includes “Web Server Tools” (WST). The terms are sometimes used interchangeably; keep an eye out for this if you need to search for them.

  2. Switch to the “Java EE” Perspective

    Make sure you’re in the “Java EE” perspective, not the “Java” perspective. If it’s not in the upper-right corner of your Eclipse window, you might need to enable it manually (Window menu -> Open Perspective -> Other…). If “Java EE” doesn’t appear on this list, you’ve probably downloaded the wrong package of Eclipse; go back and download the Java EE version.

    jee screenshot How to Set Up Hot Code Replacement with Tomcat and Eclipse

  3. Configure Your WAR Project

    From scratch: From the New menu, select “Dynamic Web Project”. Configure your source and output folders, as well as your “Content directory”, which will contain your JSPs, your WEB-INF directory, etc.

    Via Maven: Use Maven to create a WAR project. For example:

    mvn archetype:create -DarchetypeArtifactId=maven-archetype-webapp -DartifactId=YOURNAMEHERE -DgroupId=YOURNAMEHERE

    Modify your pom.xml to include an explicit reference to maven-eclipse-plugin, like this:

    <!-- ... -->
    <build>
        <!-- ... -->
        <plugins>
            <!-- ... -->
            <plugin>
                <artifactId>maven-eclipse-plugin</artifactId>
                <configuration>
                    <wtpversion>2.0</wtpversion>
                </configuration>
            </plugin>
        </plugins>
    </build>
    <!-- ... -->
    

    Now generate an Eclipse project from the command line, like this:

    mvn eclipse:eclipse

    Here’s an example Maven project you can use. Just download it, extract it, and run mvn eclipse:eclipse to generate your Eclipse project. (If this is your first time using Maven with Eclipse, you’ll also need to add an M2_REPO classpath variable in your Eclipse workspace preferences that points to your Maven repository, typically $HOME/.m2/repository or %USERPROFILE%\.m2\repository.)

  4. Create a New Server

    From the New menu, select Other… -> Server -> Server. For your server type, expand the “Apache” folder and select the version of Tomcat you intend to use. Choose “Next” and then specify the path to your Tomcat installation directory, e.g. c:\tools\tomcat-6.0. Add your web project as a “resource” to this server.

  5. Magic Setting: Disable “Auto Reloading” on Each Project in the Server

    You now have a weird pseudo-project called “Servers” in your Project Explorer. In the explorer, your server looks like a folder called something like “Tomcat v6.0 Server at localhost-config” …but that’s not what you want. You need to interact with your server using the “Servers” tab. (Eclipse calls these tabs “Views,” but everybody else just calls them “tabs.”)

    The “Servers” tab should be exposed by default, but in case it isn’t, you can expose it via Window -> Show View -> Servers. From there you can double-click on your server to configure it.

    Note that the configuration panel for your server has two tabs, “Overview” and “Modules”, down at the bottom of the window. Click on the “Modules” tab to bring up the list of projects associated with the server.

    Select your project in the list and click on “Edit”. You’ll see the magic secret checkbox: “Auto reloading enabled”. It’s checked by default. For the love of Pete, uncheck it!

    (It’s interesting to note that JavaRebel also requires disabling auto reloading to work properly in Eclipse.)

    magic checkbox screenshot 300x216 How to Set Up Hot Code Replacement with Tomcat and Eclipse

  6. Performance Tip: Adjust Memory Settings

    Before you start up your server, I strongly recommend adjusting your server’s -Xmx memory settings; otherwise, it will constrain itself to the default value, 64 MB, which is just not enough!

    Double-click on your server in the “Servers” tab and switch to the “Overview” tab. Click on the “Open launch configuration” link. Switch to the Arguments tab; there you can add relevant memory settings to the “VM Arguments” section. For example, you might add -Xmx512m to the end of the existing arguments.

    memory screenshot 300x273 How to Set Up Hot Code Replacement with Tomcat and Eclipse

  7. Start Your Tomcat Server in Debug Mode

    Now you can right-click on the Server in your Servers tab and choose “Debug”. Changes you make to your JSPs or Java classes will be instantly hotswapped into your running WAR.

Why Disable Auto Reloading?

Auto reloading is a feature of Tomcat that allows you to replace Java classes at runtime without using JPDA. In this mode, Tomcat uses Java classloaders to try to unload classes and reload them; whenever it reloads, it also tries to reinitialize your application, re-launching any servlets that are marked load-on-startup in your web.xml file.

As a result, Tomcat auto reloading may not save you any time at all if your webapp has a lot of startup code. For example, if your load-on-startup code needs to warm up Hibernate database caches, Spring/Guice dependency injection configuration, etc., it may take almost as long to restart your webapp as it does to restart Tomcat.

Worse, an app that has been auto reloaded can behave strangely, and can quickly run out of PermGen memory due to frequent unloading/reloading of classes. When this happens, restarting Tomcat typically fixes the problem. If you spend even five minutes investigating a weird auto reloading problem, you’ve just wasted all the time you were hoping to save by avoiding a restart. (Not to mention your stress and frustration!)

By disabling auto reloading and using JPDA hot code replace instead, you get a more reliable code replacement system.

Disable Auto Reloading but Enable Auto Publishing

In the screenshot above you can see how to disable auto reloading on the “Modules” tab of the Tomcat server; auto reloading is bad for JPDA debugging. But there’s another setting called “Automatically publish when resources change” on the “Overview” tab of the Tomcat server. It’s hidden by default, collapsed under the “Publishing” section. You can see it if you expand that section; you want to make sure auto publishing is enabled while auto reloading is disabled.

autopublish screenshot 300x205 How to Set Up Hot Code Replacement with Tomcat and Eclipse

To understand the difference between auto publishing and auto reloading, we’ll have to go into how exactly Eclipse WTP works. When you create a “Server” in Eclipse, the IDE creates a fake Tomcat directory, complete with the conf, logs, temp, webapps and work directories. When you configured the server, you told Eclipse where to find Tomcat, but it doesn’t use any of your settings files or any data from your webapps directory. Instead, Eclipse launches Tomcat with special command-line arguments, indicating where to find the fake Tomcat directory.

“Publishing” means to populate the fake Tomcat directory with all of your code. It copies your JSPs, JARs up your Java, regenerates settings files, etc.

If you turn off auto publishing, then you have to right-click on your Server and “Publish” your changes manually every time you save your Java code. Additionally, auto publishing allows Tomcat to reload JSPs automatically, regardless of whether you use JPDA or not.

server menu screenshot 300x279 How to Set Up Hot Code Replacement with Tomcat and Eclipse

Finding the tmp0 Fake Tomcat Directory

Sometimes it can be helpful to look inside the fake Tomcat directory and see what’s going on in there. Eclipse tells you where it put the Tomcat directory in the “Server Locations” section of your “Tomcat” server configuration panel. (Double-click on your Server in the “Servers” tab to open the configuration panel.) Typically, Eclipse says that your server is in .metadata/.plugins/org.eclipse.wst.server.core/tmp0; for this reason I typically call it the tmp0 directory (pronounced “tempo”).

The .metadata folder is inside your Eclipse workspace directory. (You can find your Eclipse workspace directory by going to File -> Switch Workspace; the default value is your current workspace directory.) In the worst case, you can always just search your hard drive for tmp0. It’s there somewhere!

Inside, you can see all the folders Eclipse has created. Check out the generated server.xml file in tmp0/conf. Examine generated .java files in tmp0/work. Your tmp0/webapps directory is probably empty; Eclipse has probably generated your webapp in wtpwebapps.

Exorcising the tmp0 Directory

Unfortunately, sometimes Eclipse gets a little confused about what to put in your WAR file, and you need to perform various stages of exorcism depending on how badly your tmp0 directory is messed up.

  • Try republishing your tmp0 directory. Open the “Servers” tab, right-click on your server and select “Clean…” (not “Clean Tomcat Work Directory…”). Then select “Publish.” That should completely rebuild your tmp0 directory.
  • Try restarting Eclipse. This works more often than I’d like to admit.
  • Try completely deleting and recreating your server. Follow this ritual:
    1. Open the “Servers” tab, right-click on the server and select “Delete”.
    2. Make sure “Delete unused server configuration(s)” is checked, then click OK.
    3. Look at your “Servers” pseudo-project; make sure the folder for your server is gone. If it isn’t, right-click on it and Delete it.
    4. Quit Eclipse.
    5. Go find your tmp0 directory (if it’s still present) and delete it from your file system.
    6. Launch Eclipse and recreate your server from scratch.
  • Try creating a new workspace. File -> Switch Workspaces: specify an empty directory. Create your server from scratch.

Troubleshooting: What Do I Do If I Still Can’t Get It to Work?

  • My project works in regular Tomcat, but doesn’t work in Tomcat under Eclipse

    Try using Eclipse to generate a WAR file for Tomcat. Right-click on your web project and select Export -> WAR file, and install it in Tomcat by dropping the exported WAR into your Tomcat webapps directory.

    If the exported WAR file doesn’t work, then you now have two WARs: one working WAR generated by your build script, and one non-working WAR generated by Eclipse. WAR files are just zips; extract them both, find the difference, and fix it! Right-click on your web project and select “Properties”. The problem is somewhere in here.

    On the other hand, if the exported WAR file does work, then you know that the problem has to do with the way Eclipse is launching Tomcat. Find your tmp0 directory (described above) and poke around. Does everything look OK in there? Be sure to check your server.xml file, as well as your webapp itself in wtpwebapps. Make sure to note your WEB-INF/lib directory, typically in tmp0/wtpwebapps/YOURAPP/WEB-INF/lib.

  • Tomcat is throwing NoClassDefFoundError or ClassNotFoundException

    First, double-check whether this problem happens in regular Tomcat. See My project works in regular Tomcat, but doesn’t work in Tomcat under Eclipse above.

    If this problem occurs in the exported WAR file under regular Tomcat, then your webapp is probably missing JARs. See My exported WAR is missing JARs below.

    If the exported WAR works but your webapp is still broken under Tomcat, you may need to perform an exorcism. (See Exorcising the tmp0 Directory above.) If this happens to you often, double-check that you haven’t accidentally disabled auto publishing. (See Disable Auto Reloading but Enable Auto Publishing above.)

  • My exported WAR is missing JARs

    Right-click on your web project and select “Properties.” The problem is somewhere in here. Make sure you see your JAR listed under “Java Build Path” in the Properties dialog.

    Beware: not every JAR in “Java Build Path” gets exported to the WAR. The list of JARs for the WAR is under “Java EE Module Dependencies.” If a JAR/project is unchecked on that list, it won’t appear in your WAR file.

  • My tmp0 directory is missing an important configuration file

    Eclipse will publish files that it finds in the “Tomcat” folder of the “Servers” pseudo-project to your tmp0/conf directory; if you’re missing files, you can add them here.

  • My server.xml file is messed up

    That file is copied from the “Tomcat” folder in your “Servers” pseudo-project to your tmp0/conf directory. But note that the server.xml file is at least partially autogenerated by Eclipse. If you make direct changes to the file, Eclipse will do its best to try to incorporate your changes, but it often gets confused and does the wrong thing. When possible, it’s better to find the appropriate Eclipse settings and change them there instead of modifying the server.xml file directly.

    Note that one of the most common problems with server.xml is an incorrect path attribute on your webapp’s <Context> element, causing your webapp to appear on a non-standard URL. See the following question about 404 errors for more details about this problem.

  • Tomcat is giving me strange 404 errors

    First, double-check whether this problem happens in regular Tomcat. (See My project works in regular Tomcat, but doesn’t work in Tomcat under Eclipse) If it happens in regular Tomcat too, then it’s probably a bug in your code.

    If the problem only happens in Eclipse, then it’s probably a server.xml configuration problem. Check your tmp0/conf/server.xml file’s <Context> element; check the path attribute. The path attribute indicates the virtual directory of your webapp. For example, if your Context/path is “examplePath”, then your webapp will appear at http://localhost:8080/examplePath. If it’s misconfigured, your webapp may not be available at the URL you expect.

    The path attribute is auto-generated based on settings in the properties of your WAR project. Right-click on your web project, select “Properties” and go to the “Web Project Settings” section. There’s only one setting here; it’s called “Context root”. Specify the context you intend to use here. If you want your project to appear in the root directory, you’ll need to put / as your context root (since you aren’t allowed to leave it blank).

    context root screenshot 300x272 How to Set Up Hot Code Replacement with Tomcat and Eclipse

  • Tomcat times out when starting under Eclipse (”Server [...] was unable to start within 45 seconds”)

    The Eclipse developers, in their infinite wisdom, have added a timeout to Tomcat startup. If Tomcat doesn’t declare a successful startup in 45 seconds, it kills Tomcat automatically. (Gee, thanks, guys!)

    You can increase that timeout. Open the “Servers” tab and double-clicking on your server to open the server configuration panel. Make sure the panel’s “Overview” tab is selected. Expand the “Timeouts” section and increase the Start timeout to something reasonable for your server.

  • I had Tomcat working under Eclipse, but now it’s broken and I can’t figure out why

    You may need to perform an exorcism. (See Exorcising the tmp0 Directory above.)

  • My web project starts up fine, but when I save .java files in Eclipse, it doesn’t take effect until I restart
    • Did you make sure to launch the server in Debug mode, as opposed to Run mode? JPDA only works when you Debug the server.
    • Is your server configured to auto publish? (See Disable auto reloading but Enable auto publishing above.)
    • Did you change something that broke JPDA? (See What’s the catch? above.) If you make large changes to your classes, JPDA may be unable to replace the code; if you choose to “Continue” past that point, further changes will have no effect.
  • My web project starts up fine, but when I save .jsp files in Eclipse, it doesn’t take effect until I restart

    This is typically due to disabled auto publishing. Double-check that your server is configured to auto publish. (See Disable auto reloading but Enable auto publishing above.)

    If that doesn’t work, examine your tmp0 directory to make sure Tomcat is using the correct JSP. It should automatically begin consuming new JSPs as they are installed in the tmp0/wtpwebapps directory.

  • Whenever I save a .java file in Eclipse, Tomcat restarts

    This is typically due to Tomcat auto reloading. Double-check that you correctly disabled auto reloading. (See Magic Setting above.)

  • Tomcat in Eclipse is much slower than regular Tomcat

    Try increasing your memory settings as described above, if you haven’t already.

    Try running Tomcat in “Run” mode (as opposed to “Debug”) mode. If that fixes the problem, then there may be nothing you can do about it. JPDA does have some overhead; you can turn it off, but while you’ve turned it off you won’t have access to hot code replacement.


September 29, 2009

Installing Beta Builds on iPhone

Probably the hardest part of learning to code an iPhone app is figuring out how to get your app installed on a phone. Even after you’ve installed your app on your personal development phone, installing it on other people’s phones for beta testing is especially tricky.

Here’s the rough outline:

  1. Create an “Ad Hoc” Provisioning Profile (a .mobileprovision file) for beta testing.
  2. Find yourself a friendly beta tester.
  3. Each beta tester needs to give you the Universal Device Identifier (UDID) of his/her device. (This number is not easily visible to users.)
  4. Submit the UDID as a named “Device” to Apple.
  5. Attach the device to your Provisioning Profile. Apple will generate (or regenerate) a .mobileprovision file for you to download. Always use the latest .mobileprovision file.
  6. Build (or rebuild) your software in Xcode using the latest Provisioning Profile.
  7. Zip up your application.
  8. Deliver your finished .app together with the .mobileprovision to your beta tester.
  9. The beta tester installs your provisioning profile, either by dragging it into iTunes or by using the iPhone Configuration Utility (iPCU).
  10. Finally, the beta tester installs your app, using either iTunes or iPCU.

That’s a lot of steps; there’s a lot of room for things to go wrong. Here’s a few gotchas we encountered on our way to releasing the Redfin iPhone app.

Some Zip Tools Don’t Work; Use ditto

Zipping your .app file is surprisingly tricky business. For example, if you use Apache Ant’s <zip> task to create your zipfile, it will ignore the UNIX permissions of the files it zips. When unsuspecting beta testers try to extract the app, they’ll find that it installs successfully under Windows (which doesn’t honor UNIX permissions) but if they install from OS X, the app will immediately crash on startup, with no crash log.

FYI, if you observe this permissions crash via the device console logs, it looks a little like this:

Tue Sep 29 17:47:04 unknown com.apple.launchd[1] : (UIKitApplication:com.redfin.redfin[0x43d3]) posix_spawn("/var/mobile/Applications/2B5E0CE3-362F-4FF0-80A2-C45DE3923C86/Redfin.app/Redfin", ...): Permission denied
Tue Sep 29 17:47:04 unknown com.apple.launchd[1] : (UIKitApplication:com.redfin.redfin[0x43d3]) Exited with exit code: 1
Tue Sep 29 17:47:04 unknown SpringBoard[24] : Failed to spawn Redfin. Unable to obtain a task name port right for pid 13432: (os/kern) failure
Tue Sep 29 17:47:04 unknown com.apple.launchd[1] : (UIKitApplication:com.redfin.redfin[0x43d3]) Throttling respawn: Will start in 2147483647 seconds
Tue Sep 29 17:47:04 unknown SpringBoard[24] : Application 'Redfin' exited abnormally with exit status 1

(Please excuse the scrolling; WordPress automatically screws up this log file entry.)

Even the standard command-line zip tool from Info-ZIP that comes with OS X doesn’t quite do the job. Insidiously, zip archives created with zip work fine for beta testing, but the App Store will reject them, claiming that the app signature is invalid.

You can check the validity of your app signature with codesign -vvvv YourAppName.app. (Yes, all four “v”s are necessary.) Valid apps should look like this:

$ codesign -vvvv Redfin.app
Redfin.app: valid on disk
Redfin.app: satisfies its Designated Requirement

It is very easy to disturb this signature. For example, if you copy the app with cp -r Redfin.app /tmp and then check the signature, you’ll get a message like this:

$ codesign -vvvv Redfin.app
Redfin.app: a sealed resource is missing or invalid
/private/tmp/Redfin.app/CodeResources: resource added

Unfortunately, cp isn’t the only command-line tool that can invalidate the signature. If you create your zip archive using zip -r yourappname YourAppName.app and then unzip it (either with Finder or with unzip), you’ll invalidate the signature.

You don’t get this problem if you use Finder to create the zip archive, by putting YourAppName.app in a folder called YourAppName, and right-clicking on the YourAppName folder and selecting the “Compress” option. To duplicate this functionality automatically, you’ll have to use ditto, which comes with OS X. ditto -c -k YourAppName YourAppName.zip should do the trick.

Don’t Use iTunes; Use the iPhone Configuration Utility

Some people are able to successfully install apps and provisioning profiles using iTunes; if iTunes works for you, great. But a lot of our beta testers weren’t so lucky. iTunes would often fail mysteriously, sometimes with no useful error messages.

The iPhone Configuration Utility (iPCU) allows users to generate log files with error messages; these error messages are critical for diagnosing problems with application installation. More generally, iPCU includes better error messages than iTunes all around; in many cases, switching to iPCU gave our beta testers enough information to solve the problem without the aid of developers.

There’s another good reason to use iPCU instead of iTunes for app installation: you can sync an iPhone with only one iTunes machine. If you want to install beta apps using any other machine, you have to use the iPhone Configuration Utility. In most cases, you can workaround even the trickiest app installation problems just by using iPCU to install from another machine.

To install an app using iPCU:

  1. Download and install the iPhone Configuration Utility from Apple.
  2. Connect your iPhone/iPod to your computer via USB cable.
  3. Close iTunes, if it is open, before launching iPCU
  4. Launch iPCU. You should see your device appear in the “Devices” section on the left side of the window.
  5. Add the provisioning profile (mobile provision) to the iPCU Library: Click on “Provisioning Profiles”, then drag the .mobileprovision file into the Provisioning Profiles pane.
  6. Add the application to the iPCU Library: Click on “Applications”, then drag the app into the Applications Pane. On Windows, the app will be a folder, named something like YourAppName.app. On Mac OS X, the app will be a single application file.
  7. Access your device: Click on your device in the list of devices on the left. You should see five tabs appear in the main window: “Summary”, “Configuration Profiles”, “Provisioning Profiles”, “Applications”, and “Console”.
  8. Install the Provisioning Profile: Click on the “Provisioning Profiles” tab of your device. You should see your Ad Hoc Provisioning Profile in the list of profiles. In the “Install” column you should see either a button that says “Install” or a button that says “Remove” (if you’ve installed this profile previously).
  9. Install the app: Click on your device’s “Applications” tab. You should see your app in the list of applications. In the “Install” column you should see either a button that says “Install” or a button that says “Uninstall” (if you’ve already installed the app).

That’s all there is to it, assuming that nothing goes wrong.

Error Messages: What Could Go Wrong?

  • iPhone Config Utility: “Could not start session with device. Error: kAMDSessionActiveError

    Your device has turned off / fallen asleep. Push the power button, slide to unlock, and try it again.

  • iPhone Config Utility: I tried to install, and I got an error like “Could not install application on device. Error: -402620395.”.

    This is almost certainly because the .mobileprovision didn’t install correctly. Clear out all provisioning profiles from your phone and from the iPCU Library (following directions above), drag your .mobileprovision file into the Library / Provisioning Profiles section, then click on Devices / Your Device / Provisioning Profiles. Make sure you see only one profile on the list, and make sure it’s installed.

  • iPhone Config Utility: I tried to install, and I got an error like “Could not install application on device. Error: -402620393.”.

    We saw this error several times but never figured out the root cause. Installing from another machine worked around the problem.

  • iPhone Config Utility: I tried to install, and I got an error like “Could not transfer application to device. Error: kAMDUndefinedError.”.

    A number of our beta users had this error, but we never deduced the root cause. In at least one case, re-installing the latest version of iPCU resolved the problem. In all known cases, installing from another machine worked around the problem.

  • iTunes: I tried to sync, and I got an error like “The application ‘YourAppName’ was not installed on the iPhone because an unknown error occurred (0xE8008017).”

    This is the same as error -402620393 above. (Note that E8008017 is hexadecimal for the signed integer -402620393.) We saw this error several times but never figured out the root cause. Installing from another machine without iTunes (using iPCU) worked around the problem.

  • iTunes: I tried to sync, and I got an error like “The application ‘YourAppName’ was not installed on the iPhone because an unknown error occurred (0xE8008015).”

    This is the same as error -402620395 above. (Note that E8008015 is hexadecimal for the signed integer -402620395.)This is almost certainly because the .mobileprovision didn’t install correctly. Try dragging the .mobileprovision again to Applications pane in iTunes and syncing. It might also help to delete all of your existing mobile provisions (see Clear Out the Phone Using the Phone below).

Troubleshooting

Still not working? Try this.

  1. Using iTunes? Try the iPhone Configuration Utility.
  2. Try installing from another machine. It’s often especially helpful to try installing from a Mac if you’re failing on Windows, or to try installing from a Windows box if you’re failing on a Mac. (If you’re on Windows and you don’t have access to a Mac, try switching to a Windows machine that does not have iTunes installed.)
  3. Close iTunes. The iPhone Configuration Utility doesn’t work so well if iTunes is open. Note that iTunes auto-launches by default when you plug in an iPhone/iPod.
  4. Try restarting your phone. Press and hold the power button at the top of the phone, slide to power off. Wait until the screen turns off, then press and hold the power button again to start it up again. No, seriously. This actually works, more often than you’d think.
  5. Clear out the app and provisioning profiles correctly, as described below.
  6. Delete your entire iPCU Library, as described below.
  7. Using iPhone Configuration Utility? Try iTunes.
  8. Still not working? Examine the iPCU Log. Directions below.

Clear Out the Phone Using the Phone

Both iTunes and iPCU provide the ability to remove apps and provisioning profiles from the phone, but they don’t work very reliably, especially when you have multiple versions of the same app/profile bouncing around.

The most reliable way to remove an app from the iPhone is to do it the normal way: on the Home screen, press and hold your finger on the app; an X will appear. Tap on the X to delete the app.

You can remove provisioning profiles using the iPhone “Settings” app. In the “General” section, scroll down to the “Profiles” section, select it, tap on the old provisioning profile, and tap “Remove”. (Ordinarily you shouldn’t need to delete provisioning profiles, but the best way to be sure that you’ve installed a .mobileprovision file is to remove all profiles and then install just the provision you need.)

profiles 200x300 Installing Beta Builds on iPhone

Deleting the iPhone Configuration Utility Library

Removing the app and provisioning profile from your iPCU library should just be a simple matter of deleting them in iPCU. But if something goes really wrong, you may have to purge the iPCU library. (In at least one recorded case, purging the iPCU library not only fixed iPCU, but also fixed installing the app via iTunes.)

On OSX, the iPCU library is stored in your Home directory, in ./Library/MobileDevice. You can delete the entire folder.

On Windows, the library is stored in your local application data folder. To find your local app data folder:

  • On Vista, press the Windows key, type %localappdata% in the search box, and press Enter.

  • On XP, press the Windows key, select “Run” and type %userprofile%\Local Settings\Application Data

Go to the “Apple Computer” folder. (There may also be an “Apple” folder and an “Apple_Inc” folder; ignore those.) You should see an “iPhone Configuration Utility” folder here and a “MobileDevice” folder here; delete both of them.

Extracting the iPCU Log

To access your device console, click on your device in the “Devices” section on the left pane of iPCU. There is a “Console” tab on the far right side. Click on that and you should start seeing console messages. (It sometimes takes a few seconds for the messages to appear.) You can click “Save Log As…” to generate a text file that you can send out as an attachment.

September 28, 2009

Maven Reactor Tricks

Not many people know this, but you can use Maven to resume a failed build from the middle, or to build just a subset of projects in a multi-module (a.k.a. “reactor”) build.

Example Reactor Project

Consider this complex multi-module reactor build:

my-root-project
|-- pom.xml
|-- barBusinessLogic
|   `-- pom.xml
|-- bazDataAccess
|   `-- pom.xml
|-- quz
|   |-- pom.xml
|   |-- quzAdditionalLogic
|   |   `-- pom.xml
|   `-- quzUI
|       `-- pom.xml
`-- fooUI
    `-- pom.xml

Suppose project fooUI depends on project barBusinessLogic, which depends on project bazDataAccess.

fooUI --> barBusinessLogic --> bazDataAccess

Furthermore, quzUI depends on quzAdditionalLogic, which depends on barBusinessLogic.

quzUI --> quzAdditionalLogic --> barBusinessLogic --> bazDataAccess

Ordinarily, when you run mvn install from my-root-project, you’ll build the projects in this order:

  1. my-root-project (parent project)
  2. bazDataAccess
  3. barBusinessLogic
  4. fooUI
  5. quz (parent project)
  6. quzAdditionalLogic
  7. quzUI

Resuming the Build with --resume-from

Suppose you’re working on your code and you attempt to run mvn install from my-root-project, but you encounter a test failure in fooUI. You make additional changes to barBusinessLogic without changing bazDataAccess; you know that bazDataAccess is fine, so there’s no need to rebuild/test it. You can then use the --resume-from argument, like this:

mvn install --resume-from=fooUI

That will skip over bazDataAccess and barBusinessLogic and pick up the build where you left off in fooUI. If fooUI succeeds, it will go on to build quzAdditionalLogic and quzUI.

Specify a Subset of Projects with –projects

Suppose you’ve made some changes to fooUI and bazDataAccess and would like to rebuild just those two projects. You can use the --projects argument, like this:

mvn install --projects fooUI,bazDataAccess

That will automatically build just those two projects, saving you the trouble of running Maven in each directory separately.

Making fooUI Without Building quz Using --also-make

Suppose you’re a developer working on fooUI; you don’t want to work on quz right now, but just want to get a working build of fooUI. You can use --also-make, like this:

mvn install --projects fooUI --also-make

--also-make will examine fooUI and walk down its dependency tree, finding all of the projects that it needs to build. In this case, it will automatically build bazDataAccess, barBusinessLogic and fooUI without building quz.

Changing barBusinessLogic and Verifying You Didn’t Break Anything Using --also-make-dependents

Suppose you’ve made a change to barBusinessLogic; you want to make sure you didn’t break any of the projects that depend on you. You also want to avoid rebuilding/testing projects that you know you haven’t changed. In this case, you want to avoid building bazDataAccess. You can use --also-make-dependents, like this:

mvn install --projects barBusinessLogic --also-make-dependents

--also-make-dependents will examine all of the projects in your reactor to find projects that depend on barBusinessLogic; it will automatically build those and anything that depends on them.

Resuming a make Build

When you use --also-make or --also-make-dependents, you run a subset of projects, but that doesn’t mean stuff won’t fail halfway through the build. You can resume an --also-make build from the project that stopped the build by using --resume-from together with --also-make, like this:

mvn install --projects quz/quzUI --also-make --resume-from barBusinessLogic

At Redfin, our Maven reactor has 86 projects; avoiding unnecessary rebuilds is essential to our productivity!


March 6, 2009

Cleaning Whitespace in Postgres

School TeacherAs part of our recent release, we added every survey submitted about our agents to our agent profiles. To prep for this, we needed to do a mail merge with each survey response, which we exported from a Postgres database. The problem was, when our users submitted their answers, they used the skills their grade 3 English teacher taught them and wrote in paragraphs. But all the common export formats indicate new records by using line-breaks, so we needed a way to clean the whitespace from these surveys.

As it turns out, Postgres has some very useful regex functions that make string operations a breeze. But since no one wants to have to reconstruct the appropriate syntax every time they need to clean whitespace, you can make a Postgres function that wraps the functionality:

CREATE OR REPLACE FUNCTION clean_whitespace(to_clean text) RETURNS text AS $$
    BEGIN
        RETURN regexp_replace(to_clean, E'[ \t\n\r]+', ' ', 'g');
    END;
$$ LANGUAGE plpgsql IMMUTABLE;

This replaces each group of whitespace in the argument with a single space. The immutable flag indicates that the function will have no side-effects, and thus allows it to be used in indices. Also notice that we only want to match occurrences of length at least 1 (by using “+” rather than “*”), because otherwise you end up with a space between every character!

Thanks to Thomas Kellerer on the postgres message board for pointing us in the right direction with regards to the arguments we needed.

(Photo credits: jamesdale10 on Flickr)


February 3, 2009

Announcing SitemapGen4j 1.0

Redfin is happy to announce SitemapGen4j 1.0. SitemapGen4j is a library to generate XML sitemaps in Java.

Download SitemapGen4j 1.0

What’s an XML sitemap?

Quoting from sitemaps.org:

Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.

Web crawlers usually discover pages from links within the site and from other sites. Sitemaps supplement this data to allow crawlers that support Sitemaps to pick up all URLs in the Sitemap and learn about those URLs using the associated metadata. Using the Sitemap protocol does not guarantee that web pages are included in search engines, but provides hints for web crawlers to do a better job of crawling your site.

Sitemap 0.90 is offered under the terms of the Attribution-ShareAlike Creative Commons License and has wide adoption, including support from Google, Yahoo!, and Microsoft.

Getting started

The easiest way to get started is to just use the WebSitemapGenerator class, like this:

WebSitemapGenerator wsg = new WebSitemapGenerator("http://www.example.com", myDir);
wsg.addUrl("http://www.example.com/index.html"); // repeat multiple times
wsg.write();

Configuring options

But there are a lot of nifty options available for URLs and for the generator as a whole. To configure the generator, use a builder:

WebSitemapGenerator wsg = WebSitemapGenerator.builder("http://www.example.com", myDir)
    .gzip(true).build(); // enable gzipped output
wsg.addUrl("http://www.example.com/index.html");
wsg.write();

To configure the URLs, construct a WebSitemapUrl with WebSitemapUrl.Options.

WebSitemapGenerator wsg = new WebSitemapGenerator("http://www.example.com", myDir);
WebSitemapUrl url = new WebSitemapUrl.Options("http://www.example.com/index.html")
    .lastMod(new Date()).priority(1.0).changeFreq(ChangeFreq.HOURLY).build();
// this will configure the URL with lastmod=now, priority=1.0, changefreq=hourly
wsg.addUrl(url);
wsg.write();

Configuring the date format

One important configuration option for the sitemap generator is the date format. The W3C datetime standard allows you to choose the precision of your datetime (anything from just specifying the year like “1997″ to specifying the fraction of the second like “1997-07-16T19:20:30.45+01:00″); if you don’t specify one, we’ll try to guess which one you want, and we’ll use the default timezone of the local machine, which might not be what you prefer.

// Use DAY pattern (2009-02-07), Greenwich Mean Time timezone
W3CDateFormat dateFormat = new W3CDateFormat(Pattern.DAY);
dateFormat.setTimeZone(TimeZone.getTimeZone("GMT"));
WebSitemapGenerator wsg = WebSitemapGenerator.builder("http://www.example.com", myDir)
    .dateFormat(dateFormat).build(); // actually use the configured dateFormat
wsg.addUrl("http://www.example.com/index.html");
wsg.write();

Lots of URLs: a sitemap index file

One sitemap can contain a maximum of 50,000 URLs. (Some sitemaps, like Google News sitemaps, can contain only 1,000 URLs.) If you need to put more URLs than that in a sitemap, you’ll have to use a sitemap index file. Fortunately, WebSitemapGenerator can manage the whole thing for you.

WebSitemapGenerator wsg = new WebSitemapGenerator("http://www.example.com", myDir);
for (int i = 0; i < 60000; i++) wsg.addUrl("http://www.example.com/doc"+i+".html");
wsg.write();
wsg.writeSitemapsWithIndex(); // generate the sitemap_index.xml

That will generate two sitemaps for 60K URLs: sitemap1.xml (with 50K urls) and sitemap2.xml (with the remaining 10K), and then generate a sitemap_index.xml file describing the two.

It’s also possible to carefully organize your sub-sitemaps. For example, it’s recommended to group URLs with the same changeFreq together (have one sitemap for changeFreq “daily” and another for changeFreq “yearly”), so you can modify the lastMod of the daily sitemap without modifying the lastMod of the yearly sitemap. To do that, just construct your sitemaps one at a time using the WebSitemapGenerator, then use the SitemapIndexGenerator to create a single index for all of them.

WebSitemapGenerator wsg;
// generate foo sitemap
wsg = WebSitemapGenerator.builder("http://www.example.com", myDir)
    .fileNamePrefix("foo").build();
for (int i = 0; i < 5; i++) wsg.addUrl("http://www.example.com/foo"+i+".html");
wsg.write();
// generate bar sitemap
wsg = WebSitemapGenerator.builder("http://www.example.com", myDir)
    .fileNamePrefix("bar").build();
for (int i = 0; i < 5; i++) wsg.addUrl("http://www.example.com/bar"+i+".html");
wsg.write();
// generate sitemap index for foo + bar
SitemapIndexGenerator sig = new SitemapIndexGenerator("http://www.example.com", myFile);
sig.addUrl("http://www.example.com/foo.xml");
sig.addUrl("http://www.example.com/bar.xml");
sig.write();

You could also use the SitemapIndexGenerator to incorporate sitemaps generated by other tools. For example, you might use Google’s official Python sitemap generator to generate some sitemaps, and use WebSitemapGenerator to generate some sitemaps, and use SitemapIndexGenerator to make an index of all of them.

Validate your sitemaps

SitemapGen4j can also validate your sitemaps using the official XML Schema Definition (XSD). If you used SitemapGen4j to make the sitemaps, you shouldn’t need to do this unless there’s a bug in our code. But you can use it to validate sitemaps generated by other tools, and it provides an extra level of safety.

It’s easy to configure the WebSitemapGenerator to automatically validate your sitemaps right after you write them (but this does slow things down, naturally).

WebSitemapGenerator wsg = WebSitemapGenerator.builder("http://www.example.com", myDir)
    .autoValidate(true).build(); // validate the sitemap after writing
wsg.addUrl("http://www.example.com/index.html");
wsg.write();

You can also use the SitemapValidator directly to manage sitemaps. It has two methods: validateWebSitemap(File f) and validateSitemapIndex(File f).

Google-specific sitemaps

Google can understand a wide variety of custom sitemap formats that they made up, including a Mobile sitemaps, Geo sitemaps, Code sitemaps (for Google Code search), Google News sitemaps, and Video sitemaps. SitemapGen4j can generate any/all of these different types of sitemaps.

To generate a special type of sitemap, just use GoogleMobileSitemapGenerator, GoogleGeoSitemapGenerator, GoogleCodeSitemapGenerator, GoogleCodeSitemapGenerator, GoogleNewsSitemapGenerator, or GoogleVideoSitemapGenerator instead of WebSitemapGenerator.

You can’t mix-and-match regular URLs with Google-specific sitemaps, so you’ll also have to use a GoogleMobileSitemapUrl, GoogleGeoSitemapUrl, GoogleCodeSitemapUrl, GoogleNewsSitemapUrl, or GoogleVideoSitemapUrl instead of a WebSitemapUrl. Each of them has unique configurable options not available to regular web URLs.


January 16, 2009

Improved Geocoding, Or: How I Learned to Stop Worrying and Love the Map

Those of us on the Redfin data team see every data problem that gets reported by our users. Of course, if we were to respond directly to each of them, we’d never get anything else done (okay, our data’s not that bad), but luckily the product management team takes care of that for us. One of the most bothersome ones we see, partly due to getting at least a dozen each week, is where:

  • The user is reporting we mapped something wrong;
  • We know it’s because our mapping software isn’t perfect;
  • No-of-course-we-don’t-check-each-one-of-our-450,000-listings-by-hand; and
  • Hopefully the user provided enough information that we can correct that one listing

Well, we decided it was time to do something about it. In particular, we upgraded the geocoding algorithm we use to place listings on the map so that, for the listings in our system:

  • Approximately 1.1% have been hand-mapped
  • Our automated “point-level” (that is, mapped to the rooftop of the given address by a geocoder) mapping percentage went from about 53.3% to 69.1%
  • Our percentage of listings that are mapped, but not necessarily to the exact rooftop, went from 35.7% to 23.5%
  • Our unmapped percentage went from 9.9% to 6.3%
  • Our percentage of listings that will never be mappable (due to the agent choosing to not disclose the address) is 5.7%, so this is a pretty big improvement.

How’d we do it? As I’m sure you’ve already noticed we made the switch to Google Maps in Mid-December, much to the dismay of our Birds-Eye loving users. One of the benefits of this was access to Google’s web-based geocoder. We investigated a wholesale replacement of our existing (super-secret) geocoder with the Google geocoder, but decided instead to enhance our geocoding rate and accuracy by integrating Google’s geocoder into our current system and using a feedback algorithm when we knew we weren’t getting the best result possible.Dangerous cliffs

This seems fairly straightforward, but unfortunately there were a few classes of gotchas that we ran into. Many of these stem from us relying on our geocoder not only for geocoding, but for address parsing and normalization as well. Here are the biggest problems we found with the current version of the Google HTTP geocoder.

  • If possible, don’t pass unit information (the “Unit 33″ part of the address) to Google’s geocoder

    Currently it discards the unit information; it’s not returned in the parsed, corrected address, so you get no benefit from inputting it in the first place. The real problem though is that if it finds a better match to the address you input – say, 3000 Federal Avenue, Unit 33 – by replacing the street number with the unit number – for example, saying you live on 33 Federal Avenue when really you live on 3000 North Federal Avenue, Unit 33 – then that’s what it will return you.

  • Google’s geocoder doesn’t warn you when it drastically changes an address – for example Google could do something that seems totally oddball like changing “822 Country Avenue, Quincy, Washington” to “822 North Quincy Street, Arlington, Virginia”, without telling you. Sometimes you’ll see these suggestions as a consumer when Google Maps asks, “Did you mean: 822 North Quincy Street, Arlington, Virginia?” But when you’re dealing with them programmatically, they don’t even ask.

    We solved this in a couple of ways:

    • If the state code changes, we disregard the results. If your input is clean you could actually be stricter about when to disregard the results, but unfortunately when dealing with real estate data it’s common to see a zip code fat-fingered, different city names for the same actual city, a mistyped street name, a missing directional, or the wrong street type.
    • We use the string distance between the input address and our possible results to determine which result is best. Sometimes Google’s geocoder will provide multiple results, and we always have the results from our other geocoder, so this tends to filter out the most erroneous outliers. For example, let’s say our input way “Quincey Road” and we ended up with two results, “Quincey Street” and “Quincy Road”. We would take the second result, because there’s only a one-character difference rather than differing on an entire word.
  • Sometimes Google’s geocoder over-simplifies complex street numbers – reducing “1421-1423 Hayes Street” to just the first address “1421 Hayes Street” (compound addresses like this are somewhat common for tenancy in common listings in San Francisco)

    We got around this simply by checking that the street number of the input corresponds to the street number of the result, and disregarding the result if they don’t match. But it’s important to bear in mind the cases where a simplified version of your input address might be a valid address, albeit not the result you wanted – for the Hayes Street example, it’s very important for us to maintain that we’re talking about both 1421 and 1423 Hayes Street, not just one of them, even though each are valid addresses taken separately. Another great example of this in real estate is when we’re dealing with the historical form of Chicago addresses (which I only learned about because of this problem, and I must say, are completely sweet in their functionality).

There were also a few lessons that we learned and proved to be important:

  •  Google’s address level geocodes (indicated in their system as having accuracy code “8″) can be either point-level or street-interpolated. Their street level geocodes are only on the best-matching street, and don’t appear to take the street number into account when placing the coordinates on the given block. This also means that the street number is not returned as part of the normalized address at this accuracy.
  • Their geocoder can return multiple results, and it’s not always the case that the first result returned is the one you want. Having a good filtering algorithm for choosing the best result is incredibly important.
  • java.util.concurrent has some incredibly powerful and easy-to-use utility classes for multi-threading applications that can be broken up into independent units of work.Streets shut down in Seattle
  • Just because snow shuts down your city doesn’t mean you don’t have to work. Yes, that’s right, we cranked this out over the holidays!

Overall, we’ve been happy with the results of integrating Google’s geocoder (clearly, otherwise we wouldn’t be talking so much about it). Many of our initial concerns ended up being non-issues, partly because Google was so helpful when we brought them up. That being said, there’s still a short list of things we’d like to see added (who knows, maybe they’ll read this!):

  •  The ability to distinguish between point-level and street-interpolated geocodes. This is one of our largest remaining issues, since we take our data quality so seriously and we like to be able to measure it.
  • More point-level data. All of the listings that are mapped directly onto a street rather than over a specific house are placed there because Google didn’t know exactly which house to put them over, only approximately how far down the street they are. We would love to see these directly over houses in the future.
  • Componentized address parsing. Instead of just telling us the result is “710 2nd Avenue” we’d like to know that “710″ is the street number, “2nd” is the street name, and “Avenue” is the street type, without having to do any post-processing on our end.
  • Less latency. Since it’s a web-based service, the network latency can add considerably to the time it takes us to get results back. A batch geocoder would be one possible solution.
  • More throughput. Currently there are caps of 10 requests per second per IP address (so Google can protect against denial-of-service attacks). It would be nice if they could raise or eliminate these caps for customers.
  • A pony. A big, shiny one.

There are still a few areas left where we know our geocoding could still be improved. One of the biggest remaining problems is with vacant land, which might not have a complete address yet, and neither of our geocoders supports partial addresses (such as “123XX Main St”). To take care of these cases, we’ll be looking to integrate a geocoder that can geocode by APN (essentially, a locally unique id that every property has).

Have you found any other bugs in Google’s geocoder that we might not have caught? Or know of any other cities with quirks that make geocoding difficult? Perhaps you know of a good APN geocoder? Let us know!

(Photo credits: tympsy and cheukiecfu on Flickr, respectively)


December 22, 2008

Deciphering Apache error messages and other pleasant pastimes

There comes a dreaded time in every developer’s life when the inevitable happens – you are forced to switch to a new machine, and re-configure the entire dev environment you spent months (or years, in my case) tweaking and perfecting.  For me, that time unexpectedly came last week when the hard disk on my laptop decided it was time for a winter holiday.

Armed with a reimaged laptop and Redfin’s internal developer machine setup guide, I was making decent progress until I hit a stumbling block: Apache was crashing. No matter what I did – on the first browser hit, I was greeted with a friendly error:

Apache error

The Event Log had a slightly more descriptive message:

Faulting application Apache.exe, version 2.0.63.200, faulting module ntdll.dll, version 5.1.2600.2180, fault address 0x00011e58.

Apache’s error log was mysterious in its own way:

[notice]  Parent: child process exited with status 3221225477 -- Restarting.

To save you hours of debugging, hair pulling, reinstalling, and commenting out of httpd.conf that I went through, I’ll just point you directly to the root cause of the problem:  https://issues.apache.org/bugzilla/show_bug.cgi?id=44338: mod_deflate crashes and does not return response.  If you have Apache 2.0.63 on Windows, are loading the mod_deflate.so module, and are using it by including AddOutputFilterByType/SetOutputFilter directives in your httpd.conf – Apache will crash.  Currently there is no fix for this (although the bug above may be updated with a fix in the future) – you have a choice of taking out mod_deflate out of your Apache config, or upgrading to Apache 2.2.

Since mod_deflate is not essential in our developer setup, I happily chose the former option and got my Apache up and running.


December 11, 2008

A Virtual Earth to Google Maps Transition: From Idea to Deployment In a Few Weeks

VE and GMaps, side-by-sideWhenever I out myself as a member of the Redfin search team to someone who has used Redfin, one of the first questions I get is, “so why do you guys use the Microsoft Map?  Why didn’t you choose Google?”.  The full answer is a bit long, but the short answer is easy: speed.

Every few months, we’ve tested Google Maps against Virtual Earth, and the result was always the same: Google’s script and tiles loaded considerably faster, but Virtual Earth was as much as four times faster at adding a ton of items to the map.  Since our user interface can add up to 500 houses at a time to a map, we just felt like Google wasn’t able to give us the performance we needed.  To be fair, part of VE’s speed was due to a bulk add feature that we had lobbied for them to put in, but it worked well, and so we put Google aside, wistfully looking at those speedy script and tile load times.

A few months ago, though, we started contract renegotiations with Microsoft, and we decided to give Google Maps a closer look.  One of my colleagues, the brilliant Dan Fabulich of Selenium fame, figured out that we could code our own custom GOverlay to make Google Maps display items much faster than it had previously.  The question then became: how hard would it be to port our site from one platform to the other?  And would it be worth it to do so? Read the rest of this entry »


May 6, 2008

Fun with generate_series

A few months ago I attended the PostgreSQL conference in Portland, OR. There were a lot of talks ranging from hard-core stuff like Neil Conway’s talk about the internals of query execution, to random fun stuff like David Fetter’s discussion of procedural languages, including LOLCODE.

gulakuno Fun with generate series

During their talks, a few people mentioned a handy function called generate_series. It took me a while to discover how useful this function really is. I thought I’d post an example. Here goes…

Let’s say that you have a table with sales information:

postgres=# select * from sales order by date;
    date    | sales_person | part_number
------------+--------------+-------------
 2008-05-05 | Glenn        |           1
 2008-05-05 | Shahaf       |           1
 2008-05-06 | Mike         |           1
 2008-05-06 | Mike         |           2
 2008-05-08 | Glenn        |           1
 2008-05-08 | Shahaf       |           1
 2008-05-08 | Mike         |           2
 2008-05-09 | Mike         |           1
 2008-05-09 | Glenn        |           1
(9 rows)

You might want to get an idea of how many sales happened on each day. You could try to do it with a query like this:

postgres=# select date, count(*) from sales group by date order by date;
    date    | count
------------+-------
 2008-05-05 |     2
 2008-05-06 |     2
 2008-05-08 |     3
 2008-05-09 |     2
(4 rows)

This basically works, but it hides one important fact — no sales happened on May 7.

To fix this, we can use generate_series. When you run this function normally, it just generates a series of numbers:

postgres=# select generate_series(0,4);
generate_series
-----------------
0
1
2
3
4
(5 rows)

However, you can easily change it to generate a series of dates:

postgres=# select generate_series(0,4) + date '2008-05-05' as date;
date
------------
2008-05-05
2008-05-06
2008-05-07
2008-05-08
2008-05-09
(5 rows)

Once you have this, you can now join against the sales table to generate the report:

postgres=# select series.date, count(sales.date)
from (select generate_series(0,4) + date '2008-05-05' as date) as series
left outer join sales on series.date=sales.date group by series.date
order by series.date;
    date    | count
------------+-------
 2008-05-05 |     2
 2008-05-06 |     2
 2008-05-07 |     0
 2008-05-08 |     3
 2008-05-09 |     2
(5 rows)

The trick here is to do a left-outer join betwee the date sequence and the sales, and to count the rows that have a non-null sales date.

This is by no means the only use of generate_series, it’s just the most recent use I found. If you know of other ways to use this function, or if you know of other handy functions, drop a note below.

Image credits: Electric Vehicle Guide


close