<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Redfin Developers&#039; Blog &#187; Uncategorized</title>
	<atom:link href="http://blog.redfin.com/devblog/category/uncategorized/feed" rel="self" type="application/rss+xml" />
	<link>http://blog.redfin.com/devblog</link>
	<description>Redfin Developers&#039; Blog</description>
	<lastBuildDate>Mon, 23 Jan 2012 04:49:17 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Prefetching Web Content: Trials and Tribulations</title>
		<link>http://blog.redfin.com/devblog/2011/03/prefetching_web_content_trials_and_tribulations.html</link>
		<comments>http://blog.redfin.com/devblog/2011/03/prefetching_web_content_trials_and_tribulations.html#comments</comments>
		<pubDate>Thu, 17 Mar 2011 21:43:12 +0000</pubDate>
		<dc:creator>Michael Smedberg</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.redfin.com/devblog/?p=532</guid>
		<description><![CDATA[Stoyan is totally right and I&#8217;m totally wrong (see his comment below, which reads &#8220;The thing about google maps you load is that it&#8217;s an html page. When you load html page in object tag it&#8217;s as if you put it in an iframe. It includes all markup and extra css/js/img resources.&#8221;) My test was [...]]]></description>
			<content:encoded><![CDATA[<p><a href='http://www.phpied.com/preload-cssjavascript-without-execution/'>Stoyan</a> is totally right and I&#8217;m totally wrong (see his comment below, which reads &#8220;The thing about google maps you load is that it&#8217;s an html page. When you load html page in object tag it&#8217;s as if you put it in an iframe. It includes all markup and extra css/js/img resources.&#8221;)  My test was incorrect.  I was testing with a <a href='http://maps.google.com/maps?oe=utf-8&amp;file=api&amp;v=2.193&amp;client=gme-redfin&amp;key=ABQIAAAAs8xBAcNJJYewz0__DWutVBTRtbDPQN51lWuTk2cjlljNVeQwDRSCrVHNxjeGU_CaVw6wCLKru3fHXA'>Google Maps URL</a>, but I should have been testing with a <a href='http://maps.google.com/maps/api/js?sensor=false&amp;oe=utf-8&amp;file=api&amp;v=3.3&amp;client=gme-redfin'>Google Maps API URL</a>.  I can&#8217;t explain how I used the wrong URL- I THOUGHT I copied that URL directly from our <a href='http://www.redfin.com/homes-for-sale'>web site</a>, but apparently not.</p>
<p>I&#8217;m sorry for the mistake and any confusion it may have created.</p>
<p><em>[Click below for the full content of the original post.]</em><br />
<span id="more-532"></span></p>
<p><s>There has been <a href='http://www.adequatelygood.com/2010/1/Preloading-JS-and-CSS-as-Print-Stylesheets' target='_new'>a</a> <a href='https://developer.mozilla.org/en/Link_prefetching_FAQ' target='_new'>lot</a> <a href='http://www.petefreitag.com/item/312.cfm' target='_new'>of</a> discussion of resource <a href='http://en.wikipedia.org/wiki/Link_prefetching' target='_new'>prefetching</a> for HTTP, but unfortunately all of the alternatives I&#8217;ve seen have problems.</s></p>
<p><s><br />
<h2>What Is It?</h2>
<p></s></p>
<p><s>Prefetching resources can greatly enhance the perceived performance of your web apps.  A common use of prefetching is to download images that will be used later.  For instance, the Aardvark page might link to the Bonobo page, and the Bonobo page might display a large image of Jerry the Bonobo.  If a user were reading the Aardvark page, it might be good for the browser to download Jerry&#8217;s image in preparation for when the user clicks on the link to the Bonobo page.  When the user clicks through to the Bonobo page, the browser can render the image without having to download it- Jerry&#8217;s photo can be rendered very quickly.</s></p>
<p><s>Prefetching resources can also HURT performance when done wrong.  In the worst case, the browser might download Jerry&#8217;s photo before downloading content for the Aardvark page, so a user that wants to see aardvarks is delayed.  And if the photo isn&#8217;t properly cached, the browser might have to go get Jerry&#8217;s photo AGAIN when the user goes to the Bonobo page.</s></p>
<p><s><br />
<h2>Approaches</h2>
<p></s></p>
<p><s><strong>Firefox Native Support</strong></s></p>
<p><s><a href='https://developer.mozilla.org/en/Link_prefetching_FAQ' target='_new'>Firefox</a> has a special feature just for this- including &lt;link rel=&#8217;prefetch&#8217; href=&#8217;http://www.animals.net/jerry.jpg&#8217;&gt; in your html tells Firefox to download the photo of Jerry when it gets some free time.  Firefox will NOT make the user wait for any content on the CURRENT page.  Pretty nifty.</s></p>
<p><s>Unfortunately, it doesn&#8217;t work in other browsers.</s></p>
<p><s>Worse, it triggers a <a href='http://statichtml.com/2011/link-prefetching-broken-in-chrome.html' target='_new'>BAD bug</a> in Chrome 9.  Chrome 9 will prefetch the resource, but then refuse to use it on the next page AND NOT TRY TO GET A NEW COPY.  The second page will just barf.  At least that&#8217;s what happens when the resource is Javascript.  NOTE: This seems to be fixed in Chrome 10 (and users are auto-upgraded), but wow, what a nasty bug.</s></p>
<p><s>One workaround would be to detect the User Agent on the server, and return browser specific HTML.  For Firefox, you could use link prefetching, and for other browsers you could skip it.  This is bad for a couple reasons.  First and obviously, you don&#8217;t get the benefits of resource prefetching in other browsers.  Second, if the content varies by browser, it&#8217;s hard to cache the page well (e.g. in a <a href='http://en.wikipedia.org/wiki/Content_delivery_network' target='_new'>CDN</a>.)</s></p>
<p><s><strong>Custom Javascript</strong></s></p>
<p><s><a href='http://www.phpied.com/' target='_new'>Stoyan Stefanov</a> describes a better workaround <a href='http://www.phpied.com/preload-cssjavascript-without-execution/' target='_new'>on his blog</a>.  He advocates including some client-side Javascript that&#8217;ll include the resources in a browser specific manner- via dynamic Images in IE, and via dynamic Objects in other browsers.</s></p>
<p><s>Others have jumped on this bandwagon.  The estimable <a href='http://stevesouders.com/' target='_new'>Steve Souders</a> uses this approach in his <a href='http://www.stevesouders.com/blog/2010/12/15/controljs-part-1/' target='_new'>ControlJS</a> library.  Further, it&#8217;s been adopted by <a href='https://github.com/caridy/yui3-gallery/blob/master/src/gallery-preload/js/gallery-preload.js' target='_new'>YUI</a>.</s></p>
<p><s>Unfortunately, this approach is (slightly) broken.</s></p>
<p><s>Stoyan wrote a great test case which you can see here: <a href='http://www.phpied.com/files/object-prefetch/page1.php?id=1' target='_new'>http://www.phpied.com/files/object-prefetch/page1.php?id=1</a>.  His code works perfectly for his test case.  In particular, his &#8220;page 1&#8243; will download BUT NOT EXECUTE a Javascript file (1.sleep.expires.js.)</s></p>
<p><s>I extended his test case slightly here: <a href='http://blog.redfin.com/devblog/files/2011/03/page_one_a.html' target='_new'>http://blog.redfin.com/devblog/files/2011/03/page_one_a.html</a>.  I added an additional resource- a link to Google Maps.  Unfortunately, including Google Maps breaks this code.  The code from Google Maps is executed when prefetched (in Firefox and Chrome.)  This is worse than it might seem.  First, Google Maps hasn&#8217;t been properly initialized, etc., so it throws errors (and also has other bizarre effects- in some cases it creates a hidden iframe, and sets focus to that frame!)  Second, the parsing and execution of Google Maps code takes a while, and the browser is frozen during that time.  It can be hard to tell that the code is executed, but you can verify it with a debugging proxy like <a href='http://www.charlesproxy.com/' target='_new'>Charles</a>.  If you <a href='http://blog.redfin.com/devblog/files/2011/03/page_one_a.html' target='_new'>hit the page</a>, you&#8217;ll see that your browser downloads a LOT of content from Google- a lot more than we intended to prefetch.</s></p>
<p><s><br />
<h2>Our Approach</h2>
<p></s></p>
<p><s>This is tricky- how do you get prefetching on multiple browsers without execution and without server-side logic?  We ended up using Stoyan&#8217;s approach for IE, using the &#8216;native&#8217; approach on Firefox (via a dynamic iframe), and giving up on other browsers.</s></p>
<p><s>The Javascript looks like this (using the <a href='http://dojotoolkit.org/' target='_new'>Dojo library</a>):</s></p>
<p><code><br />
var prefetchURLs = [<br />
&nbsp;&nbsp;&nbsp;&nbsp;'url1',<br />
&nbsp;&nbsp;&nbsp;&nbsp;...<br />
];<br />
dojo.addOnLoad(function() {<br />
&nbsp;&nbsp;&nbsp;&nbsp;redfin.util.prefetchMapURLs(prefetchURLs);<br />
});</p>
<p>...</p>
<p>redfin.util.prefetchMapURLs = function(URLs) {<br />
&nbsp;&nbsp;&nbsp;&nbsp;if (dojo.isIE) {<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;setTimeout(<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;new function() {<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;for (var i = 0; i &lt; URLs.length; ++i) {<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;new Image().src = URLs[i];<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;},<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;500<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;);<br />
&nbsp;&nbsp;&nbsp;&nbsp;}<br />
&nbsp;&nbsp;&nbsp;&nbsp;else if (dojo.isFF) {<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;var iframe = document.createElement(&#039;iframe&#039;);<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;iframe.id = &#039;ifrm_prefetch&#039;;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;iframe.width=&#039;0&#039;;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;iframe.height=&#039;0&#039;;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;iframe.src=&#039;/prefetch-urls&#039;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;dojo.body().appendChild(iframe);<br />
&nbsp;&nbsp;&nbsp;&nbsp;}<br />
}<br />
</code></p>
<p><s>For Firefox, the /prefetch-urls link will return HTML that uses the standard Firefox technique (&lt;link rel=&#8217;prefetch&#8217;&gt;, as discussed above.)  Firefox properly defers download until it is not busy, it doesn&#8217;t run the Javascript, etc.  For Internet Explorer, we wait a bit after onload and then download the content using Stoyan&#8217;s approach.  For other browsers, we don&#8217;t prefetch.  As other browsers start to support &lt;link rel=&#8217;prefetch&#8217;&gt; it&#8217;s easy to add support for them.</s></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.redfin.com/devblog/2011/03/prefetching_web_content_trials_and_tribulations.html/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Use dojo.hash instead of dojo.back</title>
		<link>http://blog.redfin.com/devblog/2010/11/use_dojohash_instead_of_dojoback.html</link>
		<comments>http://blog.redfin.com/devblog/2010/11/use_dojohash_instead_of_dojoback.html#comments</comments>
		<pubDate>Thu, 04 Nov 2010 18:53:06 +0000</pubDate>
		<dc:creator>Dan Fabulich</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.redfin.com/devblog/?p=503</guid>
		<description><![CDATA[In Dojo 1.4, the Dojo Toolkit team introduced a new &#8220;dojo.hash&#8221; library for managing the back button in AJAX applications. It&#8217;s a replacement for &#8220;dojo.back,&#8221; which was available in Dojo 1.0. If you&#8217;re deciding whether to use dojo.hash vs. dojo.back for your next web application, you should use dojo.hash. Background: Back Button in AJAX AJAX [...]]]></description>
			<content:encoded><![CDATA[<p>In Dojo 1.4, the Dojo Toolkit team introduced a new &#8220;dojo.hash&#8221; library for managing the back button in AJAX applications.  It&#8217;s a replacement for &#8220;dojo.back,&#8221; which was available in Dojo 1.0.  If you&#8217;re deciding whether to use dojo.hash vs. dojo.back for your next web application, you should use dojo.hash.</p>
<h2>Background: Back Button in AJAX</h2>
<p>AJAX applications can update a page inline without navigating to a new page.  This makes the page much more responsive, but since everything happens in one page, it makes the back button much less useful.</p>
<p>Way back in 2005, someone figured out a clever trick that would allow AJAX applications to support the back button.  If a page at http://www.example.com/#beds=4 changed the URL to http://www.example.com#beds=3, the &#8220;#&#8221; in the URL would guarantee that the page wouldn&#8217;t reload, but the user could still click the back button to go back to &#8220;#beds=4&#8243;.  The user could then click the forward button to go forward to &#8220;#beds=3&#8243;, navigating back and forth through the browser&#8217;s history.</p>
<p>In general, modifying the URL of a page would add an entry to the browser&#8217;s history, even if the change was in the URL&#8217;s &#8220;hash fragment&#8221;, the part of the URL that appears after the &#8220;#&#8221; sign.  Since hash fragment changes don&#8217;t cause the page to reload, AJAX applications could use the hash fragment to add browser history entries dynamically.</p>
<p>To detect these changes, the page would use a &#8220;setInterval&#8221; timer to automatically poll the URL for updates every 100ms or so.  (In IE8, Microsoft introduced the &#8220;onhashchange&#8221; event,  eliminating the need to poll for changes in the hash fragment; all modern browsers now support &#8220;onhashchange&#8221;.)</p>
<p>It wasn&#8217;t quite that easy, of course.  On some browsers, the page must use a hidden iframe to add entries to the browser&#8217;s history.  In 2007, Brad Neuberg worked out most of the thorny details and wrote the <a href="http://code.google.com/p/reallysimplehistory/">Really Simple History</a> library, which is now in wide use across many JavaScript libraries.  His insights were incorporated into the dojo.back library, which was released with Dojo v1.0.</p>
<h2>What&#8217;s Wrong with dojo.back</h2>
<p>There are two problems with dojo.back: first, dojo.back loses history information when the user refreshes the page, and second, dojo.back uses document.write, which makes it difficult to use dojo.back correctly.</p>
<p><b>1) Refreshing the Page and Bookmarking the URL</b></p>
<p>dojo.back allows us to pass it a JavaScript object, storing the object in a hashmap in memory.  dojo.back modifies the URL to include a random unique string. When the user clicks the back button, dojo.back fires a &#8220;back&#8221; event; when the user clicks the &#8220;forward&#8221; button, dojo.back fires a &#8220;forward&#8221; event.</p>
<p>So, if a user navigates to our site at http://www.example.com/ and performs some action that the user should be allowed to undo, we can pass a memento to dojo.back, e.g. {beds: 4, baths: 2}.  dojo.back will modify the URL to http://www.example.com/#1288732596876 and keep a record in memory that &#8220;#1288732596876&#8243; corresponds to {beds: 4, baths: 2}.  If the user clicks the back button, the URL will revert to http://www.example.com/, and dojo will notify our code.</p>
<p>But that introduces a problem: what if the user refreshes the page?  The hashmap in memory is then erased, and all of those entries in the browser&#8217;s history are now useless.</p>
<p>A similar problem occurs if the user wants to bookmark your URL for later, or share the URL with another user over email.  Since the URL doesn&#8217;t contain the data we need to re-create the original object, the data is lost when the current window closes.</p>
<p>There seems to be an obvious fix for this problem: couldn&#8217;t we just use &#8220;#beds=4&amp;baths=2&#8243; in the URL instead of a random string?  Instead of purely unique hash values, we could use a hash value that has meaning, i.e. a &#8220;semantically-named&#8221; hash value.</p>
<p>That is the correct fix, but there&#8217;s no way to do this correctly with dojo.back. dojo.back allows us to configure the &#8220;unique&#8221; string of the hash value, but we can&#8217;t instruct dojo.back to reconstruct mementos from the hash.  If the user refreshes the page, dojo.back will erase its in-memory hashmap; it&#8217;s not smart enough to read &#8220;#beds=4&amp;baths=2&#8243; and re-inflate that into {beds: 4, baths: 2}.</p>
<p>Worse, if we use &#8220;#beds=4&amp;baths=2&#8243; as our semantically-named hash value, there&#8217;s a good chance that it won&#8217;t be unique.  So if the user starts at &#8220;#beds=4&amp;baths=2&#8243; and then goes on to &#8220;#beds=4&amp;baths=1&#8243; and then finally to &#8220;#beds=4&amp;baths=2&#8243;, dojo.back has no way to know whether the user went FORWARD to &#8220;baths=2&#8243; or BACKWARD to &#8220;baths=2&#8243;</p>
<p>For this reason, dojo.back includes this cryptic warning at the top of its documentation:</p>
<blockquote><p><b>NOTE:</b> There are problems with using dojo.back with semantically-named fragment identifiers (&#8220;hash values&#8221; on an URL). In most browsers it will be hard for dojo.back to know distinguish a back from a forward event in those cases. For back/forward support to work best, the fragment ID should always be a unique value (something using new Date().getTime() for example). If you want to detect hash changes using semantic fragment IDs, then consider using dojo.hash instead (in Dojo 1.4+).</p></blockquote>
<p>In other words, if we use dojo.back, our hash values need to be dates, e.g. &#8220;#1288732596876&#8243;, not meaningful strings like &#8220;#beds=4&amp;baths=2&#8243;, or Dojo will confuse “back” navigation with “forward” navigation.</p>
<p>So, if you want your back button to keep working after refresh, or if you want users to share your URLs with each other, you should use dojo.hash instead of dojo.back.</p>
<p><b>2) dojo.back is harder to deploy because it uses document.write</b></p>
<p>Most of what we now know about back-button support was figured out in 2005, when the web was younger.  At that time, no browser supported the onhashchange event; the only way to get AJAX back button to work properly in some browsers was to include the embedded iframe before the onload event.</p>
<p>Brad Neuberg figured out that the magic trick was to add the iframe using document.write; this technique was incorporated into dojo.back.  document.write is a very dangerous technique on modern browsers, because the specified behavior is for document.write to modify the page if it&#8217;s called before the onload event, or to <b>completely erase the entire page</b> if it&#8217;s called after the onload event.</p>
<p>As a result, dojo.back has another cryptic note in its documentation:</p>
<blockquote><p><b>WARNING:</b> dojo.back.init() must be called before the page&#8217;s DOM is finished loading. Otherwise it will not work. Be careful with xdomain loading or djConfig.debugAtAllCosts scenarios, in order for this method to work, dojo.back will need to be part of a build layer.</p></blockquote>
<p>Fortunately, this document.write hack is not required in any of Dojo&#8217;s currently supported browsers; notably, it&#8217;s not required in IE6+ or Safari 3+.  In fact, if you&#8217;re reading this blog post in IE, you&#8217;re probably using IE8 or IE9, which has &#8220;onhashchange&#8221; built-in and requires no iframe at all.</p>
<p>dojo.hash uses an iframe, but does not attempt to create it using document.write; as a result, it&#8217;s a lot safer to use than dojo.back.</p>
<h2>dojo.hash Is the Way of the Future</h2>
<p>dojo.hash is a very simple library compared to dojo.back.  In fact, if the browser supports the &#8220;onhashchange&#8221; event, then it does almost nothing more than attach the &#8220;onhashchange&#8221; event to the &#8220;/dojo/hashchange&#8221; topic.  (You can also use dojo.hash to get/set the hash fragment in a convenient way.)</p>
<p>dojo.hash makes no attempt to store state objects in memory; instead, anyone who uses dojo.hash must serialize their memento into a hash string before passing it to dojo.hash.  (dojo.objectToQuery is particularly useful for for this serialization.)  When subscribing to the &#8220;/dojo/hashchange&#8221; topic, Dojo will invoke a callback function, passing it the current hash fragment.</p>
<p>We think you will find dojo.hash more useful than dojo.back for almost all situations.  If dojo.back works for you today, you may not need to upgrade, but if you&#8217;re building something new and deciding which library to use, we strongly recommend dojo.hash.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.redfin.com/devblog/2010/11/use_dojohash_instead_of_dojoback.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Ryan Dahl Introduces Node.JS</title>
		<link>http://blog.redfin.com/devblog/2010/07/ryan_dahl_introduces_nodejs.html</link>
		<comments>http://blog.redfin.com/devblog/2010/07/ryan_dahl_introduces_nodejs.html#comments</comments>
		<pubDate>Thu, 29 Jul 2010 16:50:14 +0000</pubDate>
		<dc:creator>Glenn Kelman</dc:creator>
				<category><![CDATA[Engineer-to-Engineer Series]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.redfin.com/devblog/?p=442</guid>
		<description><![CDATA[Ryan Dahl, the creator of a high-performing web server written in JavaScript, came by Redfin’s San Francisco office to talk about his creation, Node.JS. It was a very funny, thoughtful talk, particularly because Ryan is somehow both opinionated and careful with the truth. He is the latest in a long line of speakers for Engineer-to-Engineer, [...]]]></description>
			<content:encoded><![CDATA[<p><em>Ryan Dahl, the creator of a high-performing web server written in JavaScript, came by Redfin’s San Francisco office to talk about his creation, Node.JS. It was a very funny, thoughtful talk, particularly because Ryan is somehow both opinionated and careful with the truth. He is the latest in a long line of speakers for Engineer-to-Engineer, <a href="http://www.facebook.com/#!/pages/Engineer-to-Engineer-San-Francisco-Tech-Talks/119013258113825?ref=sgm">a series of technical talks hosted by Redfin, Digg, Pandora and Greylock</a> on topics such as <a href="http://blog.redfin.com/devblog/2010/06/evolving_a_new_analytical_platform_with_hadoop.html">Hadoop</a>, <a href="http://blog.redfin.com/devblog/2010/05/how_and_why_twitter_uses_scala.html">Scala</a>, <a href="http://blog.redfin.com/blog/2010/07/html_5_vs_native_applications.html">HTML5</a>, <a href="http://www.facebook.com/#!/event.php?eid=103357636377435&amp;index=1">Cassandra</a> and <a href="http://www.facebook.com/#!/event.php?eid=117015878337811&amp;ref=mf">Clusto</a>.</em></p>
<p><em> </em></p>
<p><em>Ryan’s presentation is <a href="http://drop.io/dpqbat2/asset/talk-final-pdf">here</a></em><em>, and below is a summary of what he said. </em></p>
<p>This is going to be very introduction-level, with apologies to anyone who has dived deeper.</p>
<p>The goal of Node is to do easy network programming, to be able to create servers and clients that can be thrown together in a fairly simple way, using JavaScript.</p>
<p>Node.JS is a set of C++ bindings for network I/O and socket I/O. The strong focus is on putting together network servers.<img class="alignright" src="http://jsconf.eu/2009/assets_c/RDahl_sw.jpg" alt="RDahl sw Ryan Dahl Introduces Node.JS" width="510" height="383" title="Ryan Dahl Introduces Node.JS" /></p>
<p>Node is a command-line tool. You need to compile it. There are no binaries available. It’s something that runs from your terminal. It doesn’t have any dependencies other than Python to build it.</p>
<p>Let’s understand it by example… The first example is a program that prints <em>Hello</em> and then in 2 seconds, says<em> World.</em><br />
<strong> </strong></p>
<p><strong>1. setTimeout(function () {<br />
2. console.log(’world’);<br />
3.  }, 2000);<br />
4.  console.log(’hello’);</strong></p>
<p>Node has a lot of browser-like APIs. When you’re in JavaScript, you expect it to be Browser-ey, even if it’s not Browser…ish, that is, even if it doesn’t run in the browser.</p>
<p>Node exits automatically. The program drops out when there’s nothing else to do. If there’s a callback pending it keeps running. In the example, after <em>World</em>, the program exits.</p>
<p>Now let’s make this more complicated. What if we want<em> Hello</em> every half second, then on an interrupt signal we want the program to print <em>Bye</em>?<br />
<strong> </strong></p>
<p><strong>1. setInterval(function () {<br />
2. console.log(’hello’);<br />
3. }, 500);<br />
4.<br />
5. process.on(’SIGINT’, function () {<br />
6. console.log(’bye’);<br />
7. process.exit(0)<br />
8. });</strong></p>
<p>In the browser your central object is the window; in node it’s a process. This global variable exists always.</p>
<p>It’s like a browser listening for a click event. And it’s also like a UNIX program in that you have to end the program. The process object emits an <em>emit</em> when it receives a signal; you only have to listen for it. You can get the pid, the program arguments, you can grab memory usage, you can get the executable path.</p>
<p>A TCP server emits a connection event, whenever someone connects, it says connect, and then it connects.</p>
<p>Now let’s create an event…<br />
<strong> </strong></p>
<p><strong>1. net = require(&#8220;net&#8221;);<br />
2.<br />
3. s = net.createServer();<br />
4.<br />
5. net.on(’connection’, function (c) {<br />
6. c.end(’hello!’);<br />
7. });<br />
8.<br />
9. s.listen(8000);<br />
</strong></p>
<p>You can load a module; browser-based JavaScript doesn’t support this. You create a server in line 3, in line 5 – 7, we add an event listener, and then finally on line 9 you set up a port so the server is actually listening.</p>
<p>File I/O is non-blocking too. Node does File I/O. Here’s a program that outputs the last time /etc/passwd was modified.</p>
<p><strong>1.       var stat = require(’fs’).stat;<br />
2.<br />
3.       stat(’/etc/passwd’, function (err, s) {<br />
4.       if (err) throw err;<br />
5.       console.log(’modified: %s’, s.mtime);<br />
6.       });<br />
</strong></p>
<p>If you’re on a server being hit by thousands of people, you can’t just wait for the disk to spin, so Node takes the pragmatic view that you should never wait for something to happen. Set up the action to occur, but don’t wait for this action to occur. Give a callback and then drop back. There are two parameters. There’s an error object if the file is not there. Otherwise, you print out the time modified.</p>
<p>Node can do HTTP too. If it was just TCP and file stuff, that would be very limiting. Load the HTTP module; it is called every time you have a request, it writes to the response the header and <em>Hello</em> and <em>World</em>.</p>
<p><strong>1. var http = require(’http’);<br />
2.<br />
3. var server = http.createServer(function (req,res) {<br />
4. res.writeHead(200, {’Content-Type’: ’text/plain’});<br />
5. res.write(’Hello\r\n’);<br />
6. res.end(’World\r\n’);<br />
7. })<br />
8.<br />
9. server.listen(8000);</strong></p>
<p>The HTTP response is chunked because we don&#8217;t know how long it will end up being, so we can&#8217;t put a Content-Length header at the top.  Node is very good at streaming: we’re not limited to “Here’s this movie, buffer it all.” Node streams up to memory, down to disk.</p>
<p>Here’s a streaming HTTP server… it can stream responses without introducing a large amount of weight, you don’t use a thread for each of these. If you curl it, you get <em>Hello</em>, then two seconds later, you get <em>World</em>.</p>
<p><strong>1. var http = require(’http’);<br />
2. server = http.createServer(function (req,res) {<br />
3. res.writeHead(200, {’Content-Type’: ’text/plain’});<br />
4.<br />
5. res.write(&#8220;Hello\r\n&#8221;);<br />
6.<br />
7. setTimeout(function () {<br />
8. res.end(’World\r\n’);<br />
9.  }, 2000);<br />
10.});<br />
11.<br />
12.server.listen(8000);<br />
</strong></p>
<p>This is low-level. It allows streaming requests, and requests can be hung while waiting for other things. With AJAX, connections are continually asking &#8220;Do you have anything new?, which can be very taxing on the server. Long polling, on the other hand, only involves asking once and then getting a response when the server wants to send you one.</p>
<p>Node&#8217;s HTTP server is enabled by the HTTP parser. You can check out http://github.com/ry/http-parser</p>
<p>You might be thinking: &#8220;HTTP, Jeez, how hard could it be, it’s a simple protocol.&#8221; You’re wrong. HTTP in the real world is extremely complicated. It’s difficult to be able to parse the headers and be able to expose this streaming nature without buffering. This HTTP server buffers nothing. It’s totally callback-based.</p>
<p>The HTTP server only uses 28 bytes per HTTP connection, which is important when you have 1,000 people chatting on a server. 28 bytes is acceptable for overhead; 4 megabytes isn’t.</p>
<p>Now let’s do inter-process communication with other processes. In this example, you pull out the child process. This is something that can spin the disk. Your CPU is much, much faster than the disk. Don’t wait for the disk.</p>
<p><strong>1. exec = require(’child_process’).exec;<br />
2. exec(’ls /’, function (err, output) {<br />
3. if (err) throw err;<br />
4. console.log(output);<br />
5. });<br />
</strong></p>
<p>It’s worth nothing that Node never forces output buffer. You can also stream data through the standard in and out of a child process.</p>
<p>Now we spawn the program <em>cat</em>, and we get a reference to that program. Whatever you send to <em>cat</em>, it sends back. You type in <em>Hello</em>, wait 2 seconds, then type <em>Bye</em>. You get <em>Hello</em>, then wait 2 seconds, then get <em>Bye</em>.</p>
<p><strong>1.       spawn = require(’child_process’).spawn;<br />
2.<br />
3.       cat = spawn(’cat’);<br />
4.<br />
5.       cat.stdin.write(’hello\n’);<br />
6.<br />
7.       setTimeout(function () {<br />
8.       cat.stdin.end(’bye\n’);<br />
9.       }, 2000);<br />
10.<br />
11.   cat.stdout.on(’data’, function (d) {<br />
12.   console.log(d);<br />
13.   });</strong></p>
<p>Connecting streams is common. Where I want to go with Node is thinking of everything in terms of streams. There’s standard in and out, there’s file streams, HTTP connections. But mainly we deal a lot with streams. Generally we’re proxying streams and modifying them in the middle.</p>
<p>So this is JavaScript outside the browser. Yes! That’s almost what everybody wants. We’re interacting with the OS in a browser-like way.<br />
We have an HTTP library for streaming. But wait there’s more…  here’s a contrived but interesting web-server benchmark. We’ve set up four web servers. They’re all going to respond with a 1 megabyte file. 100 concurrent clients connect.</p>
<ul>
<li>Node can handle 822 reqests per second</li>
<li>Nginx (web server written in C, popular with the Ruby crowd, consider this as good as it gets): 708</li>
<li>Thin: 85</li>
<li>Mongrel: 4</li>
</ul>
<p>This should be shocking to you. You should be urinating right now. Or getting angry. It shocks me.</p>
<p>There are some caveats. NGINX peaked at 4mb of memory, and Node 60mb of memory. I also didn’t sit down for hours and try to make NGINX fast, as I did with Node.</p>
<p>There are a lot of places in Node where the opposite is true, where it sucks while everything else is good. SSL for example.</p>
<p>Node is written on Google’s V8, the JavaScript engine in Chrome. V8 is a masterpiece of engineering. Google took the 14 best VM engineers and locked them in a closet in Denmark. They were given the JavaScript spec and then told to make it fast.</p>
<p>It’s an amazing VM. Much better than Ruby or Python. Incomparable. Or comparable I guess&#8230; All these callbacks must seem weird to you but that is where our speed increase comes from.</p>
<p><strong>Result = query (‘select * from T’); //use result</strong></p>
<p>If you&#8217;ve done traditional web programming, you&#8217;ve probably used activerecord and you access some record. You use a function to do the I/O, but what does your software do while it’s accessing the database. In many cases, nothing. It’s the year 2010, we’re using Rails, and when you access a database, it stops, the world stops for who knows how long, the database might be in LA, and it takes 2 seconds to respond.</p>
<p>To mitigate that, we load balance with multiple processes, all waiting 2 seconds. That’s a form of concurrency to be sure, I guess that’s what processes are for.</p>
<p>When you access stuff in the CPU, it’s very fast. You can assume any operation to take zero amount of time, until you access the disk or the network. It’s not appropriate to treat operations in the CPU in the same way as operations on disk or I/O. Abstracting I/O as a function doesn’t make sense when the time-frames are so different.</p>
<ul>
<li><strong>3 cycles for L1</strong></li>
<li><strong>14 cycles for L2</strong></li>
<li><strong>250 cycles for RAM</strong></li>
<li><strong>41M cycles for disk</strong></li>
<li><strong>240M cycles for network</strong></li>
</ul>
<p>It’s unacceptable to wait for the database when you’re serving many clients. You can fork a thread – it’s hard in Ruby because its threading system is utterly crap, but Java can – so when one thread blocks while accessing the database, you can start new threads. That’s fine. But you can’t use an OS thread for each socket when you want good concurrency. Threads have weight to them, and context-switching is costly too. Each thread takes 4 meg of memory, which is a lot when you have 1,000 concurrent users.</p>
<p>The alternative to using threads is to structure your code like this:<br />
<strong> </strong></p>
<p><strong>Query (‘select..’ function (result) { //use result });<br />
</strong></p>
<p>Node is fast because it never blocks on I/O. And JavaScript is great for this. In Ruby there’s EventMachine, in Python there’s Twisted, somehow it doesn’t jive, you sit down to write the code and it doesn’t work the way that programming language is meant to work, it doesn’t work with all the modules out there – like a MySQL library &#8212; to do I/O. But the browser was already set up to be an event loop. <a href="http://en.wikipedia.org/wiki/Brendan_Eich">Brendan Eich</a> was a genius. Yes it does one thing at a time, but also many things very quickly, because you never block on I/O.</p>
<p>And there’s a culture of JavaScript, an entire generation of programmers who grew up programming browsers, and now they can code on a server, without forking a thread and blocking on <em>except</em>. Java people on the other hand find this callback concept difficult to grasp. “What do you mean? What is it doing while it’s doing nothing?”</p>
<p>Node jails you into this evented-style programming. You can’t do things in a blocking way, you can’t write slow programs.</p>
<p>Node consists of 3 C libraries: V8; event loop (libev) so you don’t have to write something for every OS; a thread pool (libeio), which is necessary for file I/O. There’s a layer for bindings, C++ glue, then the standard library is written in JavaScript. It’s not a thin binding to a C web server, it actually goes through a lot of JavaScript – that’s impressive – V8 is up to the task. I used to write web servers in Ruby, it was awful, every line of Ruby hurts performance; it’s a beautiful language, but a crappy virtual machine. V8 is not that way.</p>
<p>JavaScript can only access the main thread, the C layer has access to blocking functions – we don’t want to have a global interpreter lock – let’s let the experts have access to the threads. To use the threads, program in C.</p>
<p>I wouldn’t use Node.JS to make big websites, but it is one of the only solutions for making real-time, long-polling things. You’ll probably have a bunch of Rails servers and one Node server for a specialized function. As frameworks mature, you can use Node to build the whole website. You won’t have to load-balance it because it’s very fast but you’ll probably have to put it behind a web server, because you don’t trust it, or because SSL support still sucks. The bottleneck will be your gigabit connection into that machine, not memory or anything else.</p>
<p><em>And that was it! Many thanks to <a href="http://twitter.com/ryah">Ryan</a></em><em> for a dazzling talk, and to everyone who came. Thanks too to Greylock, Digg and Pandora for helping us put on the event&#8230;</em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.redfin.com/devblog/2010/07/ryan_dahl_introduces_nodejs.html/feed</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Introducing MultiMarker: The fastest way to add many hundreds or thousands of markers on Google Maps</title>
		<link>http://blog.redfin.com/devblog/2010/07/introducing_multimarker_the_fastest_way_to_add_many_hundreds_or_thousands_of_markers_on_google_maps.html</link>
		<comments>http://blog.redfin.com/devblog/2010/07/introducing_multimarker_the_fastest_way_to_add_many_hundreds_or_thousands_of_markers_on_google_maps.html#comments</comments>
		<pubDate>Thu, 08 Jul 2010 17:02:24 +0000</pubDate>
		<dc:creator>Dan Fabulich</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.redfin.com/devblog/?p=418</guid>
		<description><![CDATA[At Google IO 2009, our fearless leader Sasha Aickin (my boss) demonstrated our high-performance Google Maps utility library to the world; we provided directions explaining how to code it, but we didn&#8217;t actually ship the code. Today, I&#8217;m proud to announce that we&#8217;ve made our MultiMarker utility library code (formerly known as SuperMarker) available to [...]]]></description>
			<content:encoded><![CDATA[<p>At Google IO 2009, our fearless leader Sasha Aickin (my boss) demonstrated our <a href="http://www.youtube.com/watch?v=zI8at1EmJjA&amp;feature=player_embedded#t=11m12s">high-performance Google Maps utility library</a> to the world; we provided directions explaining how to code it, but we didn&#8217;t actually ship the code.</p>
<p>Today, I&#8217;m proud to announce that we&#8217;ve made our MultiMarker utility library code (formerly known as SuperMarker) available to everyone under the Apache License 2.0.  As far as we know, it&#8217;s <a href="http://code.google.com/p/multimarker/">the fastest way to add many hundreds or thousands of markers on Google Maps</a>.</p>
<p>HOW FAST IS IT?</p>
<p>Below, we compare the time required to add 1000 markers in four ways:</p>
<ol>
<li>V2 map using the default GMarker
<li>V2 map using Pamela Fox&#8217;s <a href="http://code.google.com/p/gmaps-samples/source/browse/trunk/youtube/markerlight.js">MarkerLight</a>
<li>V3 map using the default google.maps.Marker
<li>Using our MultiMarker library
</ol>
<p>Note that the GMarker timings you see here were recorded by stopwatch, NOT by automated timer.  Also note that these browsers were running on different machines, so you shouldn&#8217;t use this table to compare browsers with each other; just compare the columns within each row.</p>
<p><b>UPDATE: This table was updated on April 22nd to reflect the faster markers available in Google Maps v3.4 nightly.</b></p>
<table border="1">
<tr>
<td></td>
<td>V2 Gmarker</td>
<td>V2 MarkerLight</td>
<td>V3 Gmarker</td>
<td>V3 MultiMarker</td>
</tr>
<tr>
<td>IE6</td>
<td>44 seconds</td>
<td>9.0 seconds</td>
<td>56 seconds</td>
<td>1.5s</td>
</tr>
<tr>
<td>IE7</td>
<td>42 seconds</td>
<td>6 seconds</td>
<td>3.5 seconds</td>
<td>1s</td>
</tr>
<tr>
<td>IE8</td>
<td>32 seconds</td>
<td>3.1 seconds</td>
<td>1.5 seconds</td>
<td>&lt;1s</td>
</tr>
<tr>
<td>IE9</td>
<td>3 seconds</td>
<td>0.9 seconds</td>
<td>&lt;1s</td>
<td>&lt;1s</td>
</tr>
<tr>
<td>FF4</td>
<td>5.1 seconds</td>
<td>1.2 seconds</td>
<td>&lt;1s</td>
<td>&lt;1s</td>
</tr>
<tr>
<td>GC10</td>
<td>2.4 seconds</td>
<td>&lt;1s</td>
<td>&lt;1s</td>
<td>&lt;1s</td>
</tr>
<tr>
<td>iPhone 4</td>
<td>40 seconds</td>
<td>6 seconds</td>
<td>3 seconds</td>
<td>2 seconds</td>
</tr>
</table>
<p>&nbsp;</p>
<p>As you can see, a speedup of 10-100x is possible using the MultiMarker technique, depending on which version of GMaps you&#8217;re using.</p>
<p>Try the examples for yourself:</p>
<p><a href="http://multimarker.googlecode.com/svn/trunk/fast-marker-overlay/maps-v2/example/comparison-test.html">Google Maps V2 Comparison Test</a><br />
<a href="http://multimarker.googlecode.com/svn/trunk/fast-marker-overlay/maps-v3/example/comparison-test.html">Google Maps V3 Comparison Test</a></p>
<p>Enjoy!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.redfin.com/devblog/2010/07/introducing_multimarker_the_fastest_way_to_add_many_hundreds_or_thousands_of_markers_on_google_maps.html/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Service Oriented Architecture with Varnish and Edge Side Includes</title>
		<link>http://blog.redfin.com/devblog/2010/06/service_oriented_architecture_with_varnish_and_edge_side_includes.html</link>
		<comments>http://blog.redfin.com/devblog/2010/06/service_oriented_architecture_with_varnish_and_edge_side_includes.html#comments</comments>
		<pubDate>Tue, 15 Jun 2010 00:02:26 +0000</pubDate>
		<dc:creator>Michael Smedberg</dc:creator>
				<category><![CDATA[Performance]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[ESI]]></category>
		<category><![CDATA[Service Oriented Architecture]]></category>
		<category><![CDATA[SOA]]></category>
		<category><![CDATA[Varnish]]></category>
		<category><![CDATA[VCL]]></category>

		<guid isPermaLink="false">http://blog.redfin.com/devblog/?p=385</guid>
		<description><![CDATA[The Varnish HTTP Accelerator can be used to implement a Service Oriented Architecture by using Edge Side Includes]]></description>
			<content:encoded><![CDATA[<p>As <a href='http://blog.redfin.com/devblog/2010/05/esi_and_caching_trickery_in_varnish.html'>we talked about before</a>, Redfin uses <a href='http://www.varnish-cache.org/' target='_new'>Varnish</a> to implement <a href='http://en.wikipedia.org/wiki/Edge_Side_Includes' target='_new'>Edge Side Includes</a> (ESI.)  This involved breaking a single big (and expensive) page into individual chunks; each chunk would be generated by separate code, and would be cached on a different schedule.</p>
<p>Once we broke our expensive page into chunks that could be individually cached, it seemed pretty easy to have those chunks served up by different backend servers.  Voilà, a monolithic app became &#8220;<a href='http://en.wikipedia.org/wiki/Service-oriented_architecture' target='_new'>service oriented</a>&#8220;!  This would let us run the different software components on different machines (with different performance characteristics, different <a href='http://en.wikipedia.org/wiki/Service_level_agreement' target='_new'>SLAs</a>, even implementations in different languages/environments!)</p>
<p>Of course, nothing is actually that easy, and we made a number of mis-steps before we figured out how to do it.</p>
<p><a href='http://www.flickr.com/photos/dnorman/3732851541/' target='_new'><img src='http://blog.redfin.com/devblog/files/2010/06/soa_with_esi_difficult.jpg' title='more difficult than you would think' alt="soa with esi difficult Service Oriented Architecture with Varnish and Edge Side Includes" /></a></p>
<h1>How To</h1>
<p>Varnish allows you to define multiple backends in your <a href='http://www.varnish-cache.org/wiki/Introduction#TheVarnishConfigurationLanguage' target='_new'>VCL</a>.  And in your vcl_recv function, you can decide which backend should handle a particular request.  At Redfin, we added a new Varnish backend for each of our ESI endpoints, and we added logic to choose the relevant backend by URI.  In practice, we actually only have one pool of machines handling our ESI requests, so all of our Varnish backends actually point to the same place.</p>
<p>So the first piece of the puzzle is on our main web servers.  On the main web servers, requests go through Varnish.  Requests for &#8220;normal&#8221; pages are sent through to Tomcat, but requests for ESIs are sent to one of the SOA backends.  Here&#8217;s an example of what the VCL file might look like:</p>
<p><code><br />
backend default {<br />
&nbsp;&nbsp;.host = "localhost";<br />
&nbsp;&nbsp;.port = "8080";<br />
}<br />
backend similars {<br />
&nbsp;&nbsp;.host = "similars.redfin.com";<br />
&nbsp;&nbsp;.port = "6081";<br />
}<br />
backend relevantlinks {<br />
&nbsp;&nbsp;.host = "relevantlinks.redfin.com";<br />
&nbsp;&nbsp;.port = "6081";<br />
}</p>
<p>...</p>
<p>sub vcl_recv {<br />
&nbsp;&nbsp;if (req.url ~ "^/esi-listing-similars" || req.url ~ "^/esi-property-similars") {<br />
&nbsp;&nbsp;&nbsp;&nbsp;set req.backend = similars;<br />
&nbsp;&nbsp;}<br />
&nbsp;&nbsp;else if (req.url ~ "^/esi-listing-trackbacks") {<br />
&nbsp;&nbsp;&nbsp;&nbsp;set req.backend = relevantlinks;<br />
&nbsp;&nbsp;}<br />
</code></p>
<p>You might have noticed that the &#8220;localhost&#8221; backend is associated with port 8080 (where Tomcat is running), but the ESI backends are associated with port 6081 (where Varnish is running on those remote machines.)</p>
<p>We want the instance of Varnish on the main web server to cache content from the main web server, and the instances of Varnish on the ESI backends to cache the content from those backends.  This has a few benefits:</p>
<ul>
<li>Our effective cache is bigger, since we have caches on multiple machines, each of which has fixed memory</li>
<li>Having independent caches prevents one set of items from pushing another set out of the cache.  If all the data were in a single cache, then cache entries holding similars information (which is small, but expensive to recreate) could be pushed out of the cache by cache entries of &#8220;main page&#8221; content (which is big and relatively cheap to recreate, but we&#8217;d still like to cache.)</li>
<li>It&#8217;s easy to flush individual caches without having to worry about performance problems with other parts of the site</li>
</ul>
<p>We have another design goal: we&#8217;d like to have a single distribution of our software.  We&#8217;d like to have a single WAR that we can put on any machine; we do NOT want to have to deal with multiple builds, with figuring out which build has been installed on which machine, etc.  We&#8217;d like to be able to switch a single machine from being a standard web server to being an ESI endpoint without having to redeploy or reconfigure.</p>
<p>This creates a conundrum.  We want our main web servers and our ESI servers to be identical, but we also want them to act different.  In particular, when an instance of Varnish on a web server gets a request for an ESI fragment, it should redirect that request to an ESI server (more precisely: to the Varnish instance running on an ESI server.)  But when an instance of Varnish on an ESI server gets a request for an ESI fragment, it should forward the request to the local Tomcat instance.  It should NOT forward the request to ITSELF.  Forwarding port 6081 to port 6081 creates an infinite loop- not good.</p>
<p>We want to break the symmetry between the standard web servers and the ESI servers, and we do that by messing with the URIs.</p>
<p>We prepend our ESI URIs with a known prefix, which means &#8220;forward this to the ESI server.&#8221;  But when we process the URI (while forwarding it), we strip off that prefix, so that the ESI server does not also forward it to itself.  That&#8217;s harder to say than it is to code.  The VCL code looks like this:</p>
<p><code><br />
sub vcl_recv {<br />
&nbsp;&nbsp;if (req.url ~ "^/backend/") {<br />
&nbsp;&nbsp;&nbsp;&nbsp;set req.url = regsub(req.url, "^/backend/", "/"); </p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;if (req.url ~ "^/esi-listing-similars" || req.url ~ "^/esi-property-similars") {<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;set req.backend = similars;<br />
&nbsp;&nbsp;&nbsp;&nbsp;}<br />
&nbsp;&nbsp;&nbsp;&nbsp;else if (req.url ~ "^/esi-listing-trackbacks") {<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;set req.backend = relevantlinks;<br />
&nbsp;&nbsp;&nbsp;&nbsp;}<br />
</code></p>
<p>This breaks the circularity.  The path of requests looks like:</p>
<ol>
<li>A requests comes into Varnish on the standard web server for /path/to/a/page</li>
<li>Varnish forwards the request to the local Tomcat instance</li>
<li>Tomcat responds with HTML that includes &lt;esi:include src=&#8221;/backend/esi-listing-similars&#8221; /&gt;</li>
<li>Varnish processes the ESI, and must make a request for /backend/esi-listing-similars</li>
<li>The Varnish instance on the standard web server strips off &#8220;/backend&#8221;, and sends a request for &#8220;/esi-listing-similars&#8221; to the ESI server</li>
<li>The Varnish instance on the ESI server gets the request for &#8220;/esi-listing-similars&#8221;</li>
<li>Since there&#8217;s no &#8220;/backend&#8221; prefix, the Varnish instance on the ESI server forwards the request to its local Tomcat instance</li>
<li>The Tomcat instance on the ESI server processes the request, and responds with the relevant HTML fragment</li>
<li>The Varnish instance on the ESI server caches the HTML fragment and returns it</li>
<li>The Varnish instance on the standard web server parses the HTML fragment into the main page content and returns it to the browser</li>
</ol>
<p>This example points out another tricky bit- how do we assure that the HTML fragment is cached by the Varnish service on the ESI server, but not by the Varnish service on the standard web server?  To handle this correctly, we add a header to the response which indicates if it&#8217;s already been cached:</p>
<p><code><br />
sub vcl_fetch {<br />
&nbsp;&nbsp;if (req.url ~ "^/esi-") {<br />
&nbsp;&nbsp;&nbsp;&nbsp;if (obj.http.X-RF-Cached ~ "true") {<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;pass;<br />
&nbsp;&nbsp;&nbsp;&nbsp;}<br />
&nbsp;&nbsp;&nbsp;&nbsp;set obj.http.X-RF-Cached = "true";<br />
</code></p>
<p>This code says &#8220;If there&#8217;s an X-RF-Cached header present, then don&#8217;t attempt to cache.  If there is NOT an X-RF-Cached header present, then add one, and attempt to cache.&#8221;  With this addition, the HTML fragments will only be cached on the first Varnish instance they pass through, which is on the ESI server in our case.</p>
<h1>How NOT To</h1>
<p>The solution described above works, and meets our requirements.  But we also tried some solutions that did NOT work.  Perhaps you can learn from our failures&#8230;</p>
<h2>Putting Absolute URIs into ESI Includes</h2>
<p>Our first thought was that we&#8217;d put absolute URIs into our ESI includes in the HTML.  For instance, we tried to put &lt;esi:include src=&#8221;http://similars.redfin.com:6081/esi-listing-similars&#8221; /&gt; into the main HTML of our page.  Varnish simply (and correctly, I think) ignores the host name and port.  Including http://similars.redfin.com:6081/esi-listing-similars will cause Varnish to act as if you included /esi-listing-similars, and Varnish will use whichever backend it thinks is relevant, regardless of the host name or port in the URI.</p>
<h2>Using a Single Server as both a Standard Web Server and an ESI Server</h2>
<p>When doing testing, or when some of our servers were unavailable, we were tempted to use a single server as both the standard web server and the ESI server.  It seemed like this should work- the trick with the &#8220;/backend&#8221; prefix should prevent infinite circularity.  However, it didn&#8217;t work.  It seems that Varnish is doing its own checks for circularity, and noticing that a single request passed through the same Varnish instance multiple times (which NORMALLY would be a problematic example of circularity, but we&#8217;ve got our clever symmetry breaker in there!)  Anyway, Varnish doesn&#8217;t allow it, and causes those semi-circular requests to fail.</p>
<p><b>P.S.</b></p>
<p>Thanks to <a href='http://www.flickr.com/photos/dnorman/' target='_new'>D&#8217;Arcy Norman</a> for the photo!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.redfin.com/devblog/2010/06/service_oriented_architecture_with_varnish_and_edge_side_includes.html/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Synchronous/Asynchronous Switching with Varnish</title>
		<link>http://blog.redfin.com/devblog/2010/05/synchronousasynchronous_switching_with_varnish.html</link>
		<comments>http://blog.redfin.com/devblog/2010/05/synchronousasynchronous_switching_with_varnish.html#comments</comments>
		<pubDate>Fri, 07 May 2010 21:55:36 +0000</pubDate>
		<dc:creator>Michael Smedberg</dc:creator>
				<category><![CDATA[Performance]]></category>
		<category><![CDATA[UI Design]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.redfin.com/devblog/?p=302</guid>
		<description><![CDATA[When your webapp is serving up content that&#8217;s expensive to generate, you may want to serve it up asynchronously- via AJAX calls. This is particularly appealing when content is &#8220;below the fold.&#8221; However when that content is cached, you want to serve it up as quickly as possible. If you&#8217;ve already calculated the content, you&#8217;d [...]]]></description>
			<content:encoded><![CDATA[<p>When your webapp is serving up content that&#8217;s expensive to generate, you may want to serve it up asynchronously- via <a href="http://en.wikipedia.org/wiki/AJAX" target="_new">AJAX</a> calls.  This is particularly appealing when content is &#8220;below <a href="http://en.wikipedia.org/wiki/Above_the_fold" target="_new">the fold</a>.&#8221;</p>
<p>However when that content is cached, you want to serve it up as quickly as possible.  If you&#8217;ve already calculated the content, you&#8217;d like to include it inline in the page, without requiring an AJAX roundtrip.  That way, you avoid the <a href="http://en.wikipedia.org/wiki/Latency_(engineering)" target="_new">latency</a> of an unnecessary round-trip.  You also allow the page to be fully rendered (so content doesn&#8217;t jump around), etc.</p>
<p>You can optimize for the empty cache, or you can optimize for the full cache, but it seems hard to optimize both experiences.</p>
<p>Redfin faces exactly this conundrum with our listing pages (e.g. <a href="http://www.redfin.com/CA/San-Francisco/830-El-Camino-Del-Mar-94121/home/604622" target="_new">http://www.redfin.com/CA/San-Francisco/830-El-Camino-Del-Mar-94121/home/604622</a>.)  Calculating the Similar Listings and Similar Sales is expensive and performed in real time.  We cut this <a href="http://en.wikipedia.org/wiki/Gordian_Knot" target="_new">Gordian Knot</a> through the use of the <a href="http://www.varnish-cache.org/" target="_new">Varnish</a> caching reverse proxy, along with clever use of ESI (<a href="http://en.wikipedia.org/wiki/Edge_Side_Includes" target="_new">Edge Side Includes</a>.)  For an overview of how we use Varnish at Redfin, see our <a href="http://blog.redfin.com/devblog/2010/05/esi_and_caching_trickery_in_varnish.html" target="_new">previous post</a>.</p>
<p><a href="http://en.wikipedia.org/wiki/Gordian_knot" target="_new"><img src="http://upload.wikimedia.org/wikipedia/commons/b/bb/Alexander_cuts_the_Gordian_Knot.jpg" title="Synchronous/Asynchronous Switching with Varnish" alt="Alexander cuts the Gordian Knot Synchronous/Asynchronous Switching with Varnish" /></a></p>
<p>We want to say &#8220;if there&#8217;s a cache miss, then do AJAX, but if there&#8217;s a cache hit, then just include the content.&#8221;  We have to make sure that the AJAX calls will fill the cache, such that subsequent requests will see cache hits, of course!</p>
<p>I&#8217;ll outline what the requests/responses look like for us, then I&#8217;ll include some pseudocode that supports this.</p>
<p>At the beginning of time, the cache is empty, and the browser requests information on a Listing.</p>
<table border="1" cellspacing="0" cellpadding="4">
<tr bgcolor="#e7e7e3">
<th>
			Step
		</th>
<th>
			Browser
		</th>
<th>
			Varnish
		</th>
<th>
			Backend Server
		</th>
</tr>
<tr bgcolor="#f7f7f3">
<td>
			1
		</td>
<td>
			Requests <a href="http://www.redfin.com/CA/San-Francisco/830-El-Camino-Del-Mar-94121/home/604622" target="_new">http://www.redfin.com/&#8230;/home/604622</a>
		</td>
<td>
		</td>
<td>
		</td>
</tr>
<tr>
<td>
			2
		</td>
<td>
		</td>
<td>
			Passes request to server
		</td>
<td>
		</td>
</tr>
<tr bgcolor="#f7f7f3">
<td>
			3
		</td>
<td>
		</td>
<td>
		</td>
<td>
			Returns HTML including an ESI like <em>&lt;esi:include src=&#8221;/similars?property_id=604622&#8243; /&gt;</em>
		</td>
</tr>
<tr>
<td>
			4
		</td>
<td>
		</td>
<td>
			Lookup </em>/similars?property_id=604622</em> in cache
		</td>
<td>
		</td>
</tr>
<tr bgcolor="#f7f7f3">
<td>
			5
		</td>
<td>
		</td>
<td>
			Cache lookup fails
		</td>
<td>
		</td>
</tr>
<tr>
<td>
			6
		</td>
<td>
		</td>
<td>
			Makes request to <em>/similars?property_id=604622</em>
		</td>
<td>
		</td>
</tr>
<tr bgcolor="#f7f7f3">
<td>
			7
		</td>
<td>
		</td>
<td>
		</td>
<td>
			Returns HTML for AJAX for Similars (e.g. a &lt;script&gt; block with a reference to <em>http://www.redfin.com/extranet-similars?property_id=604622</em>)<br />
			Response includes &#8220;no cache&#8221; headers
		</td>
</tr>
<tr>
<td>
			8
		</td>
<td>
		</td>
<td>
			Injects the &lt;script&gt; block into the HTML to be returned<br />
			Does NOT cache the server response
		</td>
<td>
		</td>
</tr>
<tr bgcolor="#f7f7f3">
<td>
			9
		</td>
<td>
		</td>
<td>
			Returns HTML to Browser
		</td>
<td>
		</td>
</tr>
<tr>
<td>
			10
		</td>
<td>
			Displays HTML
		</td>
<td>
		</td>
<td>
		</td>
</tr>
<tr bgcolor="#f7f7f3">
<td>
			11
		</td>
<td>
			Executes &lt;script&gt; block
		</td>
<td>
		</td>
<td>
		</td>
</tr>
<tr>
<td>
			12
		</td>
<td>
			Requests <em>http://www.redfin.com/extranet-similars?property_id=604622</em>, including a special header saying &#8220;gimme the real content&#8221;
		</td>
<td>
		</td>
<td>
		</td>
</tr>
<tr bgcolor="#f7f7f3">
<td>
			13
		</td>
<td>
		</td>
<td>
			Passes <em>/extranet-similars?property_id=604622</em> request to server
		</td>
<td>
		</td>
</tr>
<tr>
<td>
			14
		</td>
<td>
		</td>
<td>
		</td>
<td>
			Returns HTML including an ESI like <em>&lt;esi:include src=&#8221;/similars?property_id=604622&#8243; /&gt;</em>
		</td>
</tr>
<tr bgcolor="#f7f7f3">
<td>
			15
		</td>
<td>
		</td>
<td>
			Lookup <em>/similars?property_id=604622</em> in cache
		</td>
<td>
		</td>
</tr>
<tr>
<td>
			16
		</td>
<td>
		</td>
<td>
			Cache lookup fails
		</td>
<td>
		</td>
</tr>
<tr bgcolor="#f7f7f3">
<td>
			17
		</td>
<td>
		</td>
<td>
			Makes request to <em>/similars?property_id=604622</em>, passing along special &#8220;gimme the real content&#8221; header
		</td>
<td>
		</td>
</tr>
<tr>
<td>
			18
		</td>
<td>
		</td>
<td>
		</td>
<td>
			Examines request, sees special &#8220;gimme the real content&#8221; header
		</td>
</tr>
<tr bgcolor="#f7f7f3">
<td>
			19
		</td>
<td>
		</td>
<td>
		</td>
<td>
			Calculates correct HTML to display Similar Listings and Similar Sales
		</td>
</tr>
<tr>
<td>
			20
		</td>
<td>
		</td>
<td>
		</td>
<td>
			Returns HTML including &#8220;please cache this&#8221; headers
		</td>
</tr>
<tr bgcolor="#f7f7f3">
<td>
			21
		</td>
<td>
		</td>
<td>
			Injects the Similars block into the HTML to be returned<br />
			DOES cache the server response
		</td>
<td>
		</td>
</tr>
<tr>
<td>
			22
		</td>
<td>
		</td>
<td>
			Returns HTML to Browser
		</td>
<td>
		</td>
</tr>
<tr bgcolor="#f7f7f3">
<td>
			23
		</td>
<td>
			Client side Javascript injects Similars HTML into page
		</td>
<td>
		</td>
<td>
		</td>
</tr>
</table>
<p>That&#8217;s all great, but we still haven&#8217;t used the cache!  The cache entry will get used for subsequent requests for the same page, like this:</p>
<table border="1" cellspacing="0" cellpadding="4">
<tr bgcolor="#e7e7e3">
<th>
			Step
		</th>
<th>
			Browser
		</th>
<th>
			Varnish
		</th>
<th>
			Backend Server
		</th>
</tr>
<tr bgcolor="#f7f7f3">
<td>
			1
		</td>
<td>
			Requests <a href="http://www.redfin.com/CA/San-Francisco/830-El-Camino-Del-Mar-94121/home/604622" target="_new">http://www.redfin.com/&#8230;/home/604622</a>
		</td>
<td>
		</td>
<td>
		</td>
</tr>
<tr>
<td>
			2
		</td>
<td>
		</td>
<td>
			Passes request to server
		</td>
<td>
		</td>
</tr>
<tr bgcolor="#f7f7f3">
<td>
			3
		</td>
<td>
		</td>
<td>
		</td>
<td>
			Returns HTML including an ESI like <em>&lt;esi:include src=&#8221;/similars?property_id=604622&#8243; /&gt;</em>
		</td>
</tr>
<tr>
<td>
			4
		</td>
<td>
		</td>
<td>
			Lookup <em>/similars?property_id=604622</em> in cache
		</td>
<td>
		</td>
</tr>
<tr bgcolor="#f7f7f3">
<td>
			5
		</td>
<td>
		</td>
<td>
			Cache lookup SUCCEEDS
		</td>
<td>
		</td>
</tr>
<tr>
<td>
			6
		</td>
<td>
		</td>
<td>
			Injects the Similars block into the HTML to be returned
		</td>
<td>
		</td>
</tr>
<tr bgcolor="#f7f7f3">
<td>
			7
		</td>
<td>
		</td>
<td>
			Returns HTML to Browser
		</td>
<td>
		</td>
</tr>
<tr>
<td>
			8
		</td>
<td>
			Displays HTML including Similars (no AJAX calls)
		</td>
<td>
		</td>
<td>
		</td>
</tr>
</table>
<p>There are two things worth noting about this exchange.</p>
<p><strong>First</strong>, when the backend server gets a request for <em>/similars?property_id=604622</em>, it has to decide if it should be returning the real HTML, or should be returning Javascript that will retrieve the HTML via AJAX.  It makes this decision based on the value of a header passed in by the client.  When the client is making an AJAX request, it knows it better NOT get back a response that generates AJAX requests (that&#8217;d be a death spiral.)  Therefore, when it makes the AJAX request, it includes the special header.  In all other cases, the special header is NOT included.  When the header is included in a request, the server will generate the real HTML.  When the header is not included, Varnish may answer the request from cache, or it may pass through to the backend server.  If the request is fulfilled by the Varnish cache, then it&#8217;s the real HTML, but if it&#8217;s fulfilled by the backend server, it&#8217;ll be the AJAXy HTML.</p>
<p><strong>Second</strong>, there are two URLs that have to do with similars.</p>
<p><em>/similars?property_id=604622</em> is an internal-use-only URL that returns the content (either the proper HTML or the AJAX code.)</p>
<p><em>/extranet-similars?property_id=604622</em> is an externally facing URL that only returns an ESI fragment (which will subsequently be filled in by Varnish.  This way, the ESI endpoints are never available to the extranet; Varnish can get to them, but extranet clients have no need for them.  This lets us be lazy with the ESI URLs.  For example, URLs that are exposed to the extranet do extra validation to check if the user is logged in, etc.  URLs for internal use only, such as the ESI URLs, can skip that work.  This also lets us change the URLs when the property changes, to facilitate cache busting (see the &#8220;Cache busting&#8221; section in <a href="http://blog.redfin.com/devblog/2010/05/esi_and_caching_trickery_in_varnish.html" target="_new">ESI and Caching Trickery in Varnish</a> for more information.</p>
<p><strong>Pseudocode</strong></p>
<p>OK, so we know what we want the interaction to look like.  What code will make this happen?  Here&#8217;s some Javaish pseudocode that illustrates how it might work:</p>
<p><code><br />
/*<br />
Invoked for requests like http://www.redfin.com/[address]/home/[property id]<br />
*/<br />
public void handlePropertyRequest(Request request, Response response, long propId) {<br />
&nbsp;&nbsp;&nbsp;Property property = getProperty(propId);<br />
&nbsp;&nbsp;&nbsp;response.write("<em>&lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;</em>" +<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;...<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"<em>&lt;esi:include src='/extranet-similars?property_id=</em>&quot; +<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;propId +<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&quot;<em>&amp;last_mod=</em>&quot; +<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;property.getLastModified() +<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"<em>'/&gt;</em>" +<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;...<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"<em>&lt;/body&gt;&lt;/html&gt;</em>");<br />
}<br />
</code></p>
<p><code><br />
/*<br />
Invoked for (extranet) requests like /extranet-similars?property_id=[property id]&amp;last_mod=[date]<br />
*/<br />
public void handleExtranetSimilarsRequest(Request request, Response response, long propId) {<br />
&nbsp;&nbsp;&nbsp;Property property = getProperty(propertyId);<br />
&nbsp;&nbsp;&nbsp;response.write("<em>&lt;esi:include src='/extranet-similars?property_id=</em>&quot; +<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;propId +<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&quot;<em>&amp;last_mod=</em>&quot; +<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;property.getLastModified() +<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"<em>'/&gt;</em>");<br />
}<br />
</code></p>
<p><code><br />
/*<br />
Invoked for (intranet) requests like /similars?property_id=[property id]&amp;last_mod=[date]<br />
*/<br />
public void handleSimilarsRequest(Request request, Response response, long propId) {<br />
&nbsp;&nbsp;&nbsp;if (null == request.getHeader("full_html")) {<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//This request does NOT demand that we return the actual HTML.<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//&nbsp;We will return a script block that will fetch the HTML via AJAX.<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;response.write("<em>&lt;script&gt;</em>" +<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"<em>dojo.addOnLoad(</em>" +<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"<em>function() {</em>" +<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"<em>dojo.xhrGet({</em>" +<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"<em>url: 'http://www.redfin.com/extranet-similars?property_id=</em>" + propId + "<em>',</em>" +<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"<em>load: function(response, ioArgs){</em>" +<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"<em>dojo.byId('similar_homes').innerHTML = response;</em>" +<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"<em>return response;</em>" +<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"<em>},</em>" +<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"<em>headers: {'full_html': 'true'},</em>" +<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"<em>handleAs: 'text'</em>" +<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"<em>});</em>" +<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"<em>}</em>" +<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"<em>);</em>" +<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"<em>&lt;/script&gt;</em>");<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//Do NOT cache the script<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;response.setCacheable(false);<br />
&nbsp;&nbsp;&nbsp;}<br />
&nbsp;&nbsp;&nbsp;else {<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//This request wants the actual HTML for similars<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;response.write(getSimilarsHTML(propId));<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//The similars HTML is cacheable- that's the whole point!<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;response.setCacheable(true);<br />
&nbsp;&nbsp;&nbsp;}<br />
}<br />
</code></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.redfin.com/devblog/2010/05/synchronousasynchronous_switching_with_varnish.html/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>ESI and Caching Trickery in Varnish</title>
		<link>http://blog.redfin.com/devblog/2010/05/esi_and_caching_trickery_in_varnish.html</link>
		<comments>http://blog.redfin.com/devblog/2010/05/esi_and_caching_trickery_in_varnish.html#comments</comments>
		<pubDate>Tue, 04 May 2010 15:40:38 +0000</pubDate>
		<dc:creator>Michael Smedberg</dc:creator>
				<category><![CDATA[Performance]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.redfin.com/devblog/?p=269</guid>
		<description><![CDATA[Varnish is a high performance, flexible, open source HTTP accelerator. We started using Varnish at Redfin in our last major release, a few weeks ago. It&#8217;s pretty much invisible to our end users, but we&#8217;re so happy with it that we wanted to give the folks who made Varnish their props in public. It has [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.varnish-cache.org/">Varnish</a> is a high performance, flexible, open source HTTP accelerator.</p>
<p>We started using Varnish at Redfin in our last major release, a few weeks ago.  It&#8217;s pretty much invisible to our end users, but we&#8217;re so happy with it that we wanted to give the folks who made Varnish their props in public.  It has really been great!</p>
<p>Varnish combines three technologies that are really useful at Redfin:</p>
<ol>
<li>A <a href="http://en.wikipedia.org/wiki/Reverse_proxy">caching reverse proxy</a> to reduce load on our backend servers</li>
<li>ESI (<a href="http://en.wikipedia.org/wiki/Edge_Side_Includes">Edge Side Includes</a>) to break a page into snippets of HTML which can each have their own caching strategy</li>
<li>VCL (<a href="http://www.varnish-cache.org/wiki/Introduction#TheVarnishConfigurationLanguage">Varnish Configuration Language</a>) which enables fine grained control of Varnish</li>
</ol>
<p>We use Varnish to accelerate the delivery of home details pages.  When you visit the page for a home (e.g. <a href="http://www.redfin.com/CA/San-Francisco/830-El-Camino-Del-Mar-94121/home/604622">http://www.redfin.com/CA/San-Francisco/830-El-Camino-Del-Mar-94121/home/604622</a>), parts of that page are cacheable but other parts can&#8217;t be easily cached.  For example, the description of the home may be available to all users, but <a href="http://en.wikipedia.org/wiki/Multiple_Listing_Service">MLSs</a> require us to hide some historical information from users who aren&#8217;t logged in.  Further, while most of the page might be highly cacheable, the &#8220;Sites Linking to 830 El Camino Del Mar&#8221; section isn&#8217;t as easy to cache- a blog post that refers to our page (via a <a href="http://en.wikipedia.org/wiki/Trackback">trackback</a>) may come in at any time.</p>
<p>ESI nesting makes it easy to accomodate these vagaries.<br />
<a href="http://www.flickr.com/photos/odalaigh/2320059944/" target="_new"><img src="http://farm4.static.flickr.com/3207/2320059944_47b4a99f23.jpg" title="ESI and Caching Trickery in Varnish" alt="2320059944 47b4a99f23 ESI and Caching Trickery in Varnish" /></a></p>
<p>Conceptually, here&#8217;s what the HTML for our main page looks like:<br />
<code><br />
&lt;html&gt;<br />
  &lt;body&gt;<br />
    Some notes about this home</p>
<p>    Sites Linking to 830 El Camino Del Mar:<br />
    &lt;esi:include src="/esi-listing-trackbacks?listing-id=123" /&gt;</p>
<p>    Median House Values:<br />
    &lt;esi:include src="/esi-listing-regions?listing-id=123" /&gt;<br />
  &lt;/body&gt;<br />
&lt;/html&gt;<br />
</code></p>
<p>Varnish will fill in the details of each of the esi:include sections with results from the &#8220;src&#8221; URL.  In this example, a single HTTP request from the browser to Varnish will cause Varnish to make three HTTP requests to the backend server (one for the main page, one for the trackbacks, and one for the similars.)</p>
<p>Turning a single request into three requests doesn&#8217;t really help per-se, but it does enable caching.  Previous to ESI, we were unable to cache the page as a whole since the &#8220;Sites Linking to&#8221; section was uncacheable.  By breaking the page into three sections, we can support caching for some of the sections, while disallowing caching of the other sections.</p>
<p>The workflow of a request that&#8217;s partially answered from cache might look something like this:</p>
<p>1. The browser requests http://www.redfin.com/CA/San-Francisco/830-El-Camino-Del-Mar-94121/home/604622<br />
2. Varnish receives that request, and looks up the URL in its cache<br />
3. Varnish finds a match in the cache, so it doesn&#8217;t send the request for /CA/San-Francisco/830-El-Camino-Del-Mar-94121/home/604622 through to the backend.  Instead it retrieves the content from the cache, and searches it for ESI tags.<br />
4. Varnish finds the ESI include for /esi-listing-trackbacks?listing-id=123<br />
5. Varnish looks up /esi-listing-trackbacks?listing-id=123 in the cache.  There&#8217;s no entry, so Varnish requests /esi-listing-trackbacks?listing-id=123 from the backend.<br />
6. The backend calculates the content for /esi-listing-trackbacks?listing-id=123 and returns it (along with <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9">cache control headers</a> specifying that the results should not be cached)<br />
7. Varnish likewise retrieves the results for /esi-listing-regions?listing-id=123<br />
8. Varnish knits the three HTML snippets together and returns the results to the browser</p>
<p>The big win here is that ESI allows us to cache the main body of the page, even though the trackbacks cannot be cached.  This is a tricky bit, so I&#8217;ll repeat it.  The &#8220;outer&#8221; HTML, which is the main body of the page, is cached.  But the &#8220;inner&#8221; HTML, the HTML for trackbacks, is NOT cached.  The cache of the outer content doesn&#8217;t include the inner content- it just includes a token saying &#8220;fill in this inner content before you use this cache entry.&#8221;</p>
<p>Of course, that&#8217;s just the simplest case.  In practice, we faced a number of minor challenges while implementing this.</p>
<p>1. Recording every hit</p>
<p>We have two conflicting goals.  On the one hand, we&#8217;d like to serve content up from cache as often as reasonable- users get the content faster, and our backend systems scale better.  On the other hand, we&#8217;d like to record every page hit.  Whenever a user views a page describing a listing, we record various information.  We would like every request to get through Varnish and into our backend, so that we can record this information.<br />
As with nearly every problem in Computer Science, this is solved by adding a layer of code.  In this case, the &#8220;outer&#8221; request is NEVER cached, but all it does is record the hit and generate an ESI include.  The &#8220;inner&#8221; request does the heavy lifting, but responses are cached.  For example, the user might request <a href="http://www.redfin.com/CA/San-Francisco/830-El-Camino-Del-Mar-94121/home/604622">http://www.redfin.com/CA/San-Francisco/830-El-Camino-Del-Mar-94121/home/604622</a> which would result in this &#8220;outer&#8221; response:<br />
<code><br />
Cache-Control: max-age=0</p>
<p>&lt;esi:include src="/esi-display-listing?cache-for-logged-out&amp;listing-id=604622" /&gt;<br />
</code><br />
which would in turn generate a cache lookup for /esi-display-listing?cache-for-logged-out&amp;listing-id=123.  If that&#8217;s cached, it&#8217;s fast.  If it&#8217;s not cached, we gotta do all the work.</p>
<p>2. Caching public content without caching user-specific content</p>
<p>The main page content for a home (e.g. <a href="http://www.redfin.com/CA/San-Francisco/830-El-Camino-Del-Mar-94121/home/604622">http://www.redfin.com/CA/San-Francisco/830-El-Camino-Del-Mar-94121/home/604622</a>) is the same for all anonymous users.  However, users that are logged in will see additional details, such as whether or not that home is a &#8220;favorite.&#8221;  Thus, it&#8217;s easy to cache for anonymous users, but harder to cache for logged in users (we don&#8217;t cache the main page content for logged in users.)  It&#8217;s easy enough to set the cache-control response headers such that Varnish won&#8217;t cache content for logged in users.  But we wanted to optimize a bit more- we wanted to avoid even attempting cache lookups when the user is logged in.  We did this by adding VCL which examines the incoming request.  If the request includes cookies that indicate the user is logged in, we skip the cache lookup.  We also put a special token into the URL to make it easy for the VCL logic to know that it should do this magic for the request (since the URLs are ESI URLs, they&#8217;re not visible to the extranet.)  Here&#8217;s what the VCL looks like:<br />
<code><br />
sub vcl_recv {<br />
&nbsp;&nbsp;&nbsp;&nbsp;...<br />
&nbsp;&nbsp;&nbsp;&nbsp;if (req.http.Cookie ~ "RF_AUTH") {<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;set req.http._rf_login = regsub( req.http.Cookie, "^.*?RF_PARTY_ID=([^;]*?);*.*$", "\1" );<br />
&nbsp;&nbsp;&nbsp;&nbsp;}</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;# cookies by default make requests in Varnish uncacheable<br />
&nbsp;&nbsp;&nbsp;&nbsp;unset req.http.Cookie;<br />
&nbsp;&nbsp;&nbsp;&nbsp;...<br />
&nbsp;&nbsp;&nbsp;&nbsp;if (req.url ~ "cache-for-logged-out") {<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;#Directive says to use cache for logged out users, but not for logged in users<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (req.http._rf_login) {<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;#Since there's an RF_AUTH, the user is logged in- do not use cache<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;pass;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;else {<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;#The user is NOT logged in- use cache (but do not look up based on cookies)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;lookup;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br />
&nbsp;&nbsp;&nbsp;&nbsp;}<br />
&nbsp;&nbsp;&nbsp;&nbsp;...<br />
}<br />
</code></p>
<p>3. Cache busting</p>
<p>We&#8217;d like to cache HTML describing a listing for a long time (24 hours), but when we get new listing data, we want to show that to users immediately.</p>
<p>One approach is to explicitly invalidate any cache entries that refer to the listing.  We could identify all Varnish instances that might cache the data and individually invalidate the content in each one.  However, that&#8217;s a little difficult to do from Java, it may be unreliable (it requires that we keep good records about all Varnish instances), and it&#8217;s generally a PITA.</p>
<p>Instead, we include the last modified time of the listing in the URL.  Again, the ESI URLs are internal, so this doesn&#8217;t dirty our extranet URLs.  My earlier example was incomplete.  A request for http://www.redfin.com/CA/San-Francisco/830-El-Camino-Del-Mar-94121/home/604622 might generate a response that looks like this:<br />
<code><br />
&lt;esi:include src="/esi-display-listing?cache-for-logged-out&amp;listing-id=604622&amp;<strong>last-mod=1272651333452</strong>" /&gt;<br />
</code><br />
(note the &#8220;last-mod&#8221; argument, which represents that last modification date of the Listing.)  That way, whenever the listing changes, the URL to the main ESI fragment will change- stale cache entries will be orphaned.</p>
<p>4. Tuning Varnish</p>
<p>When we initially deployed Varnish, we were seeing 503 errors- Varnish was returning 503 Service Unavailable errors.  Michael Young (our intrepid CTO) changed many of the Varnish settings, including connect_timeout, sess_workspace, thread_pool_min, and thread_pool_max.  The most important thing he did was match the Varnish threads to our expected traffic, and the 503 errors went away (pretty much.)</p>
<p>P.S.  Thanks to <a href="http://www.flickr.com/photos/odalaigh/">Odalaigh</a> for the gorgeous image</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.redfin.com/devblog/2010/05/esi_and_caching_trickery_in_varnish.html/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Laziness (in proxies) is a virtue</title>
		<link>http://blog.redfin.com/devblog/2009/12/laziness_in_proxies_is_a_virtue.html</link>
		<comments>http://blog.redfin.com/devblog/2009/12/laziness_in_proxies_is_a_virtue.html#comments</comments>
		<pubDate>Tue, 29 Dec 2009 16:52:06 +0000</pubDate>
		<dc:creator>Michael Smedberg</dc:creator>
				<category><![CDATA[Performance]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.redfin.com/devblog/?p=218</guid>
		<description><![CDATA[In Hibernate, when you indicate that a domain object should not support lazy proxies, you make it hard for DAO writers to get their code to perform well. Worse, you disable a capability that they may be counting on, and they may not notice until there are major performance problems. Unless you have a good reason to, use “@Proxy(lazy = true)” on your domain objects.]]></description>
			<content:encoded><![CDATA[<p>We use <a href="https://www.hibernate.org/" target="_new">Hibernate</a> for object-relational mapping (ORM) and organize our code into <a href="http://en.wikipedia.org/wiki/Business_object_%28computer_science%29" target="_new">domain objects</a> and <a href="http://java.sun.com/blueprints/corej2eepatterns/Patterns/DataAccessObject.html" target="_new">data access objects</a>.</p>
<p>Historically many of our domain objects have been marked with the &#8220;<a href="http://docs.jboss.org/hibernate/stable/annotations/api/org/hibernate/annotations/Proxy.html" target="_new">@Proxy(lazy = false)</a>&#8221; Hibernate annotation.  This annotation tells Hibernate that it should NOT create lazy proxies for the annotated class.</p>
<p>At Redfin, these were almost all bugs.  We should never use &#8220;@Proxy(lazy = false)&#8221; without a big comment explaining why it&#8217;s necessary.  Our default should be &#8220;@Proxy(lazy = true)&#8221;.  Laziness is good!</p>
<p><a href="http://www.carloneworld.it/" target="_new"><img src="http://blog.redfin.com/devblog/files/2009/12/lazy-cat5.jpg" alt="Lazy" width="700" height="525" class="size-full wp-image-249" title="Laziness (in proxies) is a virtue" /></a></p>
<p>Here&#8217;s my quick understanding of the effects of the @Proxy annotation.  As with everything in Hibernate, each individual piece seems simple, but when you consider all the features that Hibernate exposes, and how they interact, it can become pretty complicated.</p>
<p><strong>Hibernate Load Options</strong></p>
<p>When Hibernate loads objects that refer to other objects (i.e. have member objects), it needs to do something about the associated objects.  For example, suppose that Cat objects contain (optional) references to Owner objects.  When Hibernate is loading a Cat object into memory, it has to decide what to do about the Owner member variable.  There are a number of things it COULD do:</p>
<ol>
<li>When it constructs SQL to load the Cat, it could include the Owner table and columns in the SELECT clause, so that all the data is loaded at once</li>
<li>It could load the Cat object, and subsequently load the Owner object (via a second SQL statement)</li>
<li>It could load the Cat object, and set the Owner member to a placeholder (a proxy), which can be filled in later when the Owner information is needed</li>
</ol>
<p>Note that it CANNOT simply do nothing about the Owner- if it instantiates a Cat and leaves the Owner member null when the DB says that the Cat DOES have an Owner, then consumers of the Cat will be misinformed- they&#8217;ll think that the Cat has no Owner, which is false.</p>
<p>Option 1 (get all the info in 1 SQL statement) is efficient when loading multiple Cats for which the Owner information is needed.  For example, if some code needed to iterate over 1000 Cats, and get Owner information for each one, this approach would be efficient.</p>
<p>However, option 1 is inefficient in cases where the secondary information is not needed.  E.g. if some code needed to iterate over 1000 Cats but did NOT need to get Owner information, then loading the Owner information is an obvious inefficiency.</p>
<p>Worse, taking option 1 to the extreme can cause an explosion in the data load.  For example, a Cat might have an Owner, the Owner might have a Home, the Home might have a Address, which might have a City, which might have a State, etc.  Loading the whole object graph into memory via SQL could be very inefficient.  Further, every change to domain objects could cause many SQL statements to get hairier (e.g. adding a Country member to the State object would effectively add to the SQL needed to load Cat objects.)</p>
<p>Option 2 (load the Cat, then load the Owner) is simple, and often not bad, but never optimal (if you know you&#8217;ll need the Owner info, it&#8217;s more efficient to load it in a single SQL statement; if you know you won&#8217;t use it you should never load it; if you won&#8217;t know until later, delaying the load is better.)</p>
<p>However, option 2 is particularly bad when bulk operations are being performed.  For instance, if some code were to load up every Cat object in the database to do some processing, this could be accomplished via a single SQL statement (though it&#8217;d probably be better to break it into chunks of, say, 10,000 Cat objects.)  However, Hibernate would run a &#8220;SELECT * FROM owners&#8221; type statement for every Cat object that has an Owner- potentially millions of SQL statements.</p>
<p>Option 3 (load the Cat and set it&#8217;s Owner member variable to a proxy- load the Owner info on demand) is a compromise.  It allows code to do bulk operations without loading the ancillary information (e.g. load all Cat objects without ever loading any Owner objects.)  However, it requires additional SQL statements to load the secondary information IF that info is needed (e.g. if code loaded all Cat objects, then accessed the Owner for each Cat, option 3 would result in potentially millions of SQL statements.)  Note that if the Owner information is never needed, then option 3 is most efficient- the information is never loaded.</p>
<p>Hibernate allows programmers to influence which strategy it will take.  It offers (at least) two types of control: direct control over the SQL it generates, and control over the proxies.</p>
<p>See <a href="https://www.hibernate.org/315.html" target="_new">https://www.hibernate.org/315.html</a> and <a href="https://www.hibernate.org/162.html" target="_new">https://www.hibernate.org/162.html</a> for information on Hibernate fetching strategies and lazy loading.</p>
<p><strong>Controlling SQL</strong></p>
<p>When you&#8217;re implementing a DAO method, you can tell Hibernate whether it should proactively fetch information about member objects.</p>
<p>Under the <a href="http://docs.jboss.org/hibernate/core/3.3/reference/en/html/querycriteria.html" target="_new">Criteria API</a>, Hibernate lets you call <a href="http://docs.jboss.org/hibernate/core/3.3/reference/en/html/querycriteria.html#querycriteria-dynamicfetching" target="_new">criteria.setFetchMode</a> to tell Hibernate that it should load the additional info immediately, or should defer it.  Hibernate uses the term &#8220;eager&#8221; to mean &#8220;load immediately&#8221;, and &#8220;lazy&#8221; to mean &#8220;defer loading.&#8221;</p>
<p>When using <a href="http://docs.jboss.org/hibernate/core/3.3/reference/en/html/queryhql.html" target="_new">HQL</a>, you can use the <a href="http://docs.jboss.org/hibernate/core/3.3/reference/en/html/queryhql.html#queryhql-joins" target="_new">FETCH</a> keyword to specify the fetch mode, which is equivalent.</p>
<p>When using SQL, you can use the <a href="http://docs.jboss.org/hibernate/core/3.3/reference/en/html/querysql.html#d0e13732" target="_new">query.addJoin</a> method to tell Hibernate that you&#8217;ve written SQL which retrieves information for member objects.  In this case, you&#8217;ll be responsible for writing the joins, etc., yourself.</p>
<p><strong>Controlling Proxies</strong></p>
<p>Hibernate also lets you control the existence and behavior of proxies via the tags mentioned above.  Annotating a class with &#8220;@Proxy(lazy = false)&#8221; tells Hibernate to NOT support lazy proxies for that type of object (of course &#8220;@Proxy(lazy = true)&#8221; tells Hibernate to support lazy proxies.)  This allows the writer of the domain object to essentially override the wishes of the writer of the DAO.  If the DAO writer would like to load members in a lazy manner, but the domain object in question doesn&#8217;t support lazy loading, then Hibernate will NOT lazy load the object (since it cannot.)</p>
<p>If you&#8217;re writing a class for which lazy loading would be dangerous, then you SHOULD disallow lazy proxies, since DAO writers probably won&#8217;t understand the detailed load requirements of your class.  However, this is unusual.  In most cases, lazy proxies are safe.</p>
<p>Since the writer of the domain object can control what choices are available to the writer of the DAO object, they need to use that power judiciously.  You CAN code all of your domain objects to disallow lazy loading, which will force all writers of DAOs to use load options 1 or 2 (load all members via fancy SQL, or load all members via secondary SQL statements.)  But you generally should not.  DAO writers often rely on option 3 (lazy loading), particularly when they know that the member objects will never be accessed (or when they&#8217;re not sure.)  If you specify &#8220;@Proxy(lazy = false)&#8221;, you&#8217;ve made it impossible for DAO writers to use option 3, which means it may be difficult for them to get their code to perform well.  Worse, the writer of the DAO may not realize that you did that, or may not understand the implications.  Hibernate queries are actually kinda hard to view, so the writer of the DAO may have created a huge performance problem and not even known it (until you go into production.)</p>
<p><strong>Only the client really knows</strong></p>
<p>Even the writer of the DAO doesn&#8217;t know how the client will use the objects it returns.  If you&#8217;re implementing the CatDAO, you might add a method like <a href="http://en.wikipedia.org/wiki/Basement_Cat#Ceiling_Cat_and_Basement_Cat" target="_new">getBasementCatsAndOwners</a>, which would return all black cats and pre-fetch the corresponding owners.  You think you&#8217;re clever because you&#8217;ve avoided a major performance problem, but a caller might try to get the Home for each Owner, defeating your pre-fetching strategy.  The DAO writer should do their best to anticipate the needs of their callers, and to name and document their methods such that callers can understand what they do, but ultimately the caller is in control, and can (unintentionally) defeat the optimizations of the DAO writer.  If your database were large and you knew that you had clients that sometimes needed Owners, sometimes needed Owners and Homes, etc., you might make three methods: CatDAO.getBasementCats, CatDAO.getBasementCatsAndOwners, and CatDAO.getBasementCatsAndOwnersAndHomes.</p>
<p><strong>Conclusion: <a href="http://docs.jboss.org/hibernate/stable/annotations/api/org/hibernate/annotations/Proxy.html" target="_new">@Proxy(lazy = false)</a> is generally evil</strong></p>
<p>As mentioned above, when you indicate that a domain object should not support lazy proxies, you make it hard for DAO writers to get their code to perform well.  Worse, you disable a capability that they may be counting on, and they may not notice until there are major performance problems.  Unless you have a good reason to, use &#8220;<a href="http://docs.jboss.org/hibernate/stable/annotations/api/org/hibernate/annotations/Proxy.html" target="_new">@Proxy(lazy = true)</a>&#8221; on your domain objects.</p>
<p><strong>P.S.</strong></p>
<p>Lazy proxies do have some known problems.</p>
<p>First, the lazy proxy is NOT the same as the actual object.  If you depend on the datatype of the object, <a href="http://blog.xebia.com/2008/03/08/advanced-hibernate-proxy-pitfalls/" target="_new">you may have problems</a>, since the type of the proxy isn&#8217;t the same as the type of the actual object (e.g. a proxy for an Owner is not actually an Owner- it&#8217;s a subclass.)</p>
<p>Second, you may have to think carefully about methods like equals() or hashCode(), since the proxies <a href="http://windhood.spaces.live.com/blog/cns!452FEE7EB6C195AD!146.entry" target="_new">may not do what you expect</a>.</p>
<p><strong>P.P.S</strong><br />
Thanks to <a href="http://www.carloneworld.it/" target="_new">carloneworld</a> for the great lazy kitty photo!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.redfin.com/devblog/2009/12/laziness_in_proxies_is_a_virtue.html/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>One Week After The Outage</title>
		<link>http://blog.redfin.com/devblog/2009/11/one_week_after_the_outage.html</link>
		<comments>http://blog.redfin.com/devblog/2009/11/one_week_after_the_outage.html#comments</comments>
		<pubDate>Mon, 16 Nov 2009 20:22:30 +0000</pubDate>
		<dc:creator>Michael Young</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.redfin.com/devblog/?p=190</guid>
		<description><![CDATA[We launched a major new version of Redfin.com a week and a half ago. The headliner was the addition of near-real-time &#8220;solds&#8221; data through our MLS-based virtual office website (VOW) data feeds. On launch day, we had a 3 hour outage and intermittent &#8220;brownouts&#8221; for another 2 days after. We wanted to give people an [...]]]></description>
			<content:encoded><![CDATA[<p>We launched a <a href="http://blog.redfin.com/blog/2009/11/theres_going_to_be_a_whole_lot_of_rubber-necking_going_on.html">major new version</a> of <a href="http://www.redfin.com">Redfin.com</a> a week and a half ago. The headliner was the addition of near-real-time &#8220;solds&#8221; data through our MLS-based virtual office website (VOW) data feeds. On launch day, we had a 3 hour outage and intermittent &#8220;brownouts&#8221; for another 2 days after. We wanted to give people an idea of what happened and what we&#8217;re doing to make sure this kind of outage doesn&#8217;t happen again.</p>
<p><strong>Better and&#8230; Bigger</strong><br />
For 14 days prior to launch, we ran data imports day and night. We added 1.4 million records, 9 million photos, and revamped our internal database schema. As a result, the disk space used by our Postgres database grew by 30%. Way more disk was needed to store photos.</p>
<p>By Thursday morning, we were not able to go live with all our slave databases as planned. We use Slony replication for our slave databases. Errors in scripts can cause a Slony slave to require a complete re-sync, and that is, unfortunately, what happened to us. We launched believing that our single master database would handle the load. We were wrong.</p>
<p>By 9am PST on Thursday, our site was maxing out. First it was slow, then it was non-responsive. The problem wasn&#8217;t a rush of traffic from the press coverage. The problem was our single master database. The increase in database size and new schema overloaded it. We ended up throttling our database to allow most people to access Redfin.com, but this just caused intermittent issues and &#8220;brownouts,&#8221; where the site would be overwhelmed with requests and become non-responsive for a minute or two at a time.</p>
<p>Many engineers spent all Thursday and Friday looking at code, looking at the database, and looking at the traffic. Everyone was looking for some magical bug that was causing the problem. In the end, the solution was very simple. Once the slave databases were synced up and put into production on Friday at 8pm PST, the problem mostly went away. We&#8217;re still investigating the root cause, but all indicators are strongly pointing to the idea that we just didn&#8217;t have enough RAM to avoid disk I/O slowness and thrashing. </p>
<p><strong>Lessons Learned</strong><br />
Redfin learned that the scalability &amp; performance testing that we do before every release isn&#8217;t good enough. This outage hurt our professional pride, and we are newly dedicated to fixing this. We need to know every new release is going to run well against expected load and existing hardware.</p>
<p>For our next major release in December, we had been planning to upgrade our master database from Postgres 8.3 on 32GB of RAM to Postgres 8.4 on 72GB of RAM. The database servers are over two years old now. Too bad we didn&#8217;t do it sooner, but we&#8217;ve accelerated the hardware upgrade to have it ready this week. We&#8217;re also intrigued by the idea of using <a href="http://www.fusionio.com/Default.aspx">Fusion-IO SSDs</a> at some point.</p>
<p>We also plan to spend more time looking at ways we can streamline the code to run the site more efficiently on the hardware. Hardware is relatively cheap these days, but smart engineers can often find places in the code that can be made 10x faster!</p>
<p>And as the site grows, we&#8217;ll also look at more scalable database solutions like partitioning or switching at least some parts to <a href="http://hadoop.apache.org/hbase/">Hadoop HBase</a>. We use Hadoop for log analysis, but it&#8217;s very promising as a high-scale query engine.</p>
<p>I know there are a lot of folks in technology who use Redfin. What do you think? Did we learn the right lessons?</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.redfin.com/devblog/2009/11/one_week_after_the_outage.html/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>How to Set Up Hot Code Replacement with Tomcat and Eclipse</title>
		<link>http://blog.redfin.com/devblog/2009/09/how_to_set_up_hot_code_replacement_with_tomcat_and_eclipse.html</link>
		<comments>http://blog.redfin.com/devblog/2009/09/how_to_set_up_hot_code_replacement_with_tomcat_and_eclipse.html#comments</comments>
		<pubDate>Wed, 30 Sep 2009 19:28:32 +0000</pubDate>
		<dc:creator>Dan Fabulich</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.redfin.com/devblog/?p=170</guid>
		<description><![CDATA[This blog post will guide you through setting up Tomcat hot code replacement (also called hotswap debugging) in Eclipse. What Is &#8220;Hot Code Replace&#8221;? What&#8217;s the Catch? What About JavaRebel? Configuring Your Web Application in Eclipse Download Eclipse &#8220;JEE&#8221; Edition Switch to the &#8220;Java EE&#8221; Perspective Configure Your WAR Project Create a New Server Magic [...]]]></description>
			<content:encoded><![CDATA[<p>This blog post will guide you through setting up Tomcat hot code replacement (also called hotswap debugging) in Eclipse.</p>
<ul>
<li>What Is &#8220;Hot Code Replace&#8221;?</li>
<li>What&#8217;s the Catch?</li>
<li>What About JavaRebel?</li>
<li>Configuring Your Web Application in Eclipse
<ol>
<li>Download Eclipse &#8220;JEE&#8221; Edition</li>
<li>Switch to the &#8220;Java EE&#8221; Perspective</li>
<li>Configure Your WAR Project</li>
<li>Create a New Server</li>
<li>Magic Setting: Disable &#8220;Auto Reloading&#8221; on Each Project in the Server</li>
<li>Performance Tip: Adjust Memory Settings</li>
<li>Start Your Tomcat Server in Debug Mode</li>
</ol>
</li>
<li>Why Disable Auto Reloading?</li>
<li>Disable Auto Reloading but <em>Enable</em> Auto Publishing</li>
<li>Finding the <kbd>tmp0</kbd> Fake Tomcat Directory</li>
<li>Exorcising the <kbd>tmp0</kbd> Directory</li>
<li>Troubleshooting: What Do I Do If I Still Can&#8217;t Get It to Work?</li>
</ul>
<h2>What Is &#8220;Hot Code Replace&#8221;?</h2>
<p>&#8220;<a href="http://wiki.eclipse.org/FAQ_What_is_hot_code_replace%3F">Hot Code Replace</a>&#8221; (HCR) lets you make modifications to a Java class and see the effect immediately in a running JVM, without restarting your application.  HCR is part of the Java Platform Debugger Architecture (JPDA) and is available on all modern JVMs.</p>
<p>Consider this ordinary application:</p>
<p><code>
<pre>public class Sample {
  public static void main(String[] args) {
    String foo = "unchangeable";
    foo += blah();
    System.out.println(foo);
  }

  public static String blah() {
    String bar = "bar";
    bar += "blah";
    return bar;
  }

}</pre>
<p></code></p>
<p>If you debug this class in Eclipse, you can make changes to it, on the fly, without restarting the JVM.  For example, try setting a breakpoint on the second line of <code>blah()</code>. Next, change the literal <code>blah</code> to <code>quz</code>.  Save the file and the program will continue running with the new code.</p>
<p>You can do this with Tomcat web applications in Eclipse, but it&#8217;s a lot trickier.</p>
<h2><a name="catch"></a>What&#8217;s the Catch?</h2>
<p>There are a few limitations when using hot code replace.  You can&#8217;t use JPDA HCR to change the signature of a class (add/remove methods or fields) or to add new classes on the fly.  Additionally, some method calls (&#8220;stack frames&#8221;) can&#8217;t be modified, including the <code>main</code> method or any method invoked via reflection, that is, by using <kbd>java.lang.reflect.Method.invoke()</kbd>.</p>
<p>Here&#8217;s what happens if you try to replace the <code>unchangeable</code> variable in the main method of <kbd>Sample.java</kbd> above.</p>
<p><img src="http://blog.redfin.com/devblog/files/2009/09/unchangeable-300x115.png" alt="unchangeable 300x115 How to Set Up Hot Code Replacement with Tomcat and Eclipse" width="300" height="115" class="alignnone size-medium wp-image-180" title="How to Set Up Hot Code Replacement with Tomcat and Eclipse" /></p>
<h2>What About JavaRebel?</h2>
<p><a href="http://www.zeroturnaround.com/javarebel/">JavaRebel</a> is a hot code replacement system that&#8217;s a little better than JPDA HCR.  (<a href="http://www.zeroturnaround.com/jrebel/comparison/">Maybe a lot better</a>.)</p>
<p>With JavaRebel you can add/remove methods and classes without restarting Tomcat.  However, JavaRebel costs $149 per developer per year, so it may or may not be worthwhile for you.</p>
<h2>Configuring Your Web Application in Eclipse</h2>
<ol>
<li>Download Eclipse &#8220;JEE&#8221; edition</p>
<p>Most developers already use this, since it&#8217;s the first option available on the <a href="http://www.eclipse.org/downloads/">Eclipse download page</a>, but if you&#8217;re using &#8220;Eclipse IDE for Java Developers&#8221; (92MB), you&#8217;ll need to go back and download &#8220;Eclipse IDE for Jave EE Developers&#8221; (189MB).  It&#8217;s worth it, I promise!</p>
<p><img src="http://blog.redfin.com/devblog/files/2009/09/download-screenshot-300x188.png" alt="download screenshot 300x188 How to Set Up Hot Code Replacement with Tomcat and Eclipse" width="300" height="188" class="alignnone size-medium wp-image-175" title="How to Set Up Hot Code Replacement with Tomcat and Eclipse" /></p>
<p>Note: The difference between the regular Java IDE and the Java EE IDE is that the JEE edition comes with the Eclipse &#8220;Web Tools Project&#8221; (WTP), which includes &#8220;Web Server Tools&#8221; (WST).  The terms are sometimes used interchangeably; keep an eye out for this if you need to search for them.</p>
</li>
<li>Switch to the &#8220;Java EE&#8221; Perspective
<p>Make sure you&#8217;re in the &#8220;Java EE&#8221; perspective, not the &#8220;Java&#8221; perspective.  If it&#8217;s not in the upper-right corner of your Eclipse window, you might need to enable it manually (Window menu -&gt; Open Perspective -&gt; Other&#8230;).  If &#8220;Java EE&#8221; doesn&#8217;t appear on this list, you&#8217;ve probably downloaded the wrong package of Eclipse; go back and download the Java EE version.</p>
<p><img src="http://blog.redfin.com/devblog/files/2009/09/jee-screenshot.png" alt="jee screenshot How to Set Up Hot Code Replacement with Tomcat and Eclipse" width="253" height="141" class="alignnone size-full wp-image-176" title="How to Set Up Hot Code Replacement with Tomcat and Eclipse" /></p>
</li>
<li>Configure Your WAR Project
<p>From scratch: From the New menu, select &#8220;Dynamic Web Project&#8221;.  Configure your source and output folders, as well as your &#8220;Content directory&#8221;, which will contain your JSPs, your WEB-INF directory, etc.</p>
<p>Via Maven: Use <a href="http://maven.apache.org/">Maven</a> to create a WAR project.  For example:</p>
<p><kbd>mvn archetype:create -DarchetypeArtifactId=maven-archetype-webapp -DartifactId=YOURNAMEHERE -DgroupId=YOURNAMEHERE</kbd></p>
<p>Modify your <kbd>pom.xml</kbd> to include an explicit reference to <code>maven-eclipse-plugin</code>, like this:</p>
<p><code>
<pre>
&lt;!-- ... --&gt;
&lt;build&gt;
    &lt;!-- ... --&gt;
    &lt;plugins&gt;
        &lt;!-- ... --&gt;
        &lt;plugin&gt;
            &lt;artifactId&gt;maven-eclipse-plugin&lt;/artifactId&gt;
            &lt;configuration&gt;
                &lt;wtpversion&gt;2.0&lt;/wtpversion&gt;
            &lt;/configuration&gt;
        &lt;/plugin&gt;
    &lt;/plugins&gt;
&lt;/build&gt;
&lt;!-- ... --&gt;
</pre>
<p></code></p>
<p>Now generate an Eclipse project from the command line, like this:</p>
<p><kbd>mvn eclipse:eclipse</kbd></p>
<p>Here&#8217;s an <a href="http://blog.redfin.com/devblog/files/2009/09/hotswaptest.zip">example Maven project</a> you can use.  Just download it, extract it, and run <code>mvn eclipse:eclipse</code> to generate your Eclipse project.  (If this is your first time using Maven with Eclipse, you&#8217;ll also need to add an <kbd>M2_REPO</kbd> classpath variable in your Eclipse workspace preferences that points to your Maven repository, typically <kbd>$HOME/.m2/repository</kbd> or <kbd>%USERPROFILE%\.m2\repository</kbd>.)</p>
</li>
<li>Create a New Server
<p>From the New menu, select Other&#8230; -&gt; Server -&gt; Server.  For your server type, expand the &#8220;Apache&#8221; folder and select the version of Tomcat you intend to use.  Choose &#8220;Next&#8221; and then specify the path to your Tomcat installation directory, e.g. <kbd>c:\tools\tomcat-6.0</kbd>.  Add your web project as a &#8220;resource&#8221; to this server.</p>
</li>
<li><a name="magic-setting"></a>Magic Setting: Disable &#8220;Auto Reloading&#8221; on Each Project in the Server
<p>You now have a weird pseudo-project called &#8220;Servers&#8221; in your Project Explorer.  In the explorer, your server looks like a folder called something like &#8220;Tomcat v6.0 Server at localhost-config&#8221; &#8230;but that&#8217;s not what you want.  You need to interact with your server using the &#8220;Servers&#8221; tab.  (Eclipse calls these tabs &#8220;Views,&#8221; but everybody else just calls them &#8220;tabs.&#8221;)</p>
<p>The &#8220;Servers&#8221; tab should be exposed by default, but in case it isn&#8217;t, you can expose it via Window -&gt; Show View -&gt; Servers.  From there you can double-click on your server to configure it.</p>
<p>Note that the configuration panel for your server has two tabs, &#8220;Overview&#8221; and &#8220;Modules&#8221;, down at the bottom of the window.  Click on the &#8220;Modules&#8221; tab to bring up the list of projects associated with the server.</p>
<p>Select your project in the list and click on &#8220;Edit&#8221;. You&#8217;ll see the magic secret checkbox: &#8220;Auto reloading enabled&#8221;.  It&#8217;s checked by default.  For the love of Pete, uncheck it!</p>
<p>(It&#8217;s interesting to note that <a href="http://www.zeroturnaround.com/news/javarebel-and-wtp-configuration/">JavaRebel also requires disabling auto reloading</a> to work properly in Eclipse.)</p>
<p><img src="http://blog.redfin.com/devblog/files/2009/09/magic-checkbox-screenshot-300x216.png" alt="magic checkbox screenshot 300x216 How to Set Up Hot Code Replacement with Tomcat and Eclipse" width="300" height="216" class="alignnone size-medium wp-image-177" title="How to Set Up Hot Code Replacement with Tomcat and Eclipse" /></p>
</li>
<li><a name="memory"></a>Performance Tip: Adjust Memory Settings
<p>Before you start up your server, I strongly recommend adjusting your server&#8217;s <kbd>-Xmx</kbd> memory settings; otherwise, it will constrain itself to the default value, 64 MB, which is just not enough!</p>
<p>Double-click on your server in the &#8220;Servers&#8221; tab and switch to the &#8220;Overview&#8221; tab.  Click on the &#8220;Open launch configuration&#8221; link.  Switch to the Arguments tab; there you can add relevant memory settings to the &#8220;VM Arguments&#8221; section.  For example, you might add <kbd>-Xmx512m</kbd> to the end of the existing arguments.</p>
<p><img src="http://blog.redfin.com/devblog/files/2009/09/memory-screenshot-300x273.png" alt="memory screenshot 300x273 How to Set Up Hot Code Replacement with Tomcat and Eclipse" width="300" height="273" class="alignnone size-medium wp-image-178" title="How to Set Up Hot Code Replacement with Tomcat and Eclipse" /></p>
<li>Start Your Tomcat Server in Debug Mode
<p>Now you can right-click on the Server in your Servers tab and choose &#8220;Debug&#8221;.  Changes you make to your JSPs or Java classes will be instantly hotswapped into your running WAR.</p>
</li>
</ol>
<h2>Why Disable Auto Reloading?</h2>
<p>Auto reloading is a feature of Tomcat that allows you to replace Java classes at runtime without using JPDA.  In this mode, Tomcat uses Java classloaders to try to unload classes and reload them; whenever it reloads, it also tries to reinitialize your application, re-launching any servlets that are marked <code>load-on-startup</code> in your <kbd>web.xml</kbd> file.</p>
<p>As a result, Tomcat auto reloading may not save you any time at all if your webapp has a lot of startup code.  For example, if your load-on-startup code needs to warm up Hibernate database caches, Spring/Guice dependency injection configuration, etc., it may take almost as long to restart your webapp as it does to restart Tomcat.</p>
<p>Worse, an app that has been auto reloaded can behave strangely, and can quickly run out of PermGen memory due to frequent unloading/reloading of classes.  When this happens, restarting Tomcat typically fixes the problem.  If you spend even five minutes investigating a weird auto reloading problem, you&#8217;ve just wasted all the time you were hoping to save by avoiding a restart.  (Not to mention your stress and frustration!)</p>
<p>By disabling auto reloading and using JPDA hot code replace instead, you get a more reliable code replacement system.</p>
<h2><a name="auto-publish">Disable Auto Reloading but <em>Enable</em> Auto Publishing</a></h2>
<p>In the screenshot above you can see how to disable auto reloading on the &#8220;Modules&#8221; tab of the Tomcat server; auto reloading is bad for JPDA debugging.  But there&#8217;s another setting called &#8220;Automatically publish when resources change&#8221; on the &#8220;Overview&#8221; tab of the Tomcat server.  It&#8217;s hidden by default, collapsed under the &#8220;Publishing&#8221; section.  You can see it if you expand that section; you want to make sure auto publishing is enabled while auto reloading is disabled.</p>
<p><img src="http://blog.redfin.com/devblog/files/2009/09/autopublish-screenshot-300x205.png" alt="autopublish screenshot 300x205 How to Set Up Hot Code Replacement with Tomcat and Eclipse" width="300" height="205" class="alignnone size-medium wp-image-173" title="How to Set Up Hot Code Replacement with Tomcat and Eclipse" /></p>
<p>To understand the difference between auto publishing and auto reloading, we&#8217;ll have to go into how exactly Eclipse WTP works.  When you create a &#8220;Server&#8221; in Eclipse, the IDE creates a fake Tomcat directory, complete with the <kbd>conf</kbd>, <kbd>logs</kbd>, <kbd>temp</kbd>, <kbd>webapps</kbd> and <kbd>work</kbd> directories.  When you configured the server, you told Eclipse where to find Tomcat, but it doesn&#8217;t use any of your settings files or any data from your <kbd>webapps</kbd> directory.  Instead, Eclipse launches Tomcat with special command-line arguments, indicating where to find the fake Tomcat directory.</p>
<p>&#8220;Publishing&#8221; means to populate the fake Tomcat directory with all of your code.  It copies your JSPs, JARs up your Java, regenerates settings files, etc.</p>
<p>If you turn off auto publishing, then you have to right-click on your Server and &#8220;Publish&#8221; your changes manually every time you save your Java code.  Additionally, auto publishing allows Tomcat to reload JSPs automatically, regardless of whether you use JPDA or not.</p>
<p><img src="http://blog.redfin.com/devblog/files/2009/09/server-menu-screenshot-300x279.png" alt="server menu screenshot 300x279 How to Set Up Hot Code Replacement with Tomcat and Eclipse" width="300" height="279" class="alignnone size-medium wp-image-179" title="How to Set Up Hot Code Replacement with Tomcat and Eclipse" /></p>
<h2>Finding the <kbd>tmp0</kbd> Fake Tomcat Directory</h2>
<p>Sometimes it can be helpful to look inside the fake Tomcat directory and see what&#8217;s going on in there.  Eclipse tells you where it put the Tomcat directory in the &#8220;Server Locations&#8221; section of your &#8220;Tomcat&#8221; server configuration panel.  (Double-click on your Server in the &#8220;Servers&#8221; tab to open the configuration panel.)  Typically, Eclipse says that your server is in <kbd>.metadata/.plugins/org.eclipse.wst.server.core/tmp0</kbd>; for this reason I typically call it the <kbd>tmp0</kbd> directory (pronounced &#8220;tempo&#8221;).</p>
<p>The <kbd>.metadata</kbd> folder is inside your Eclipse workspace directory.  (You can find your Eclipse workspace directory by going to File -&gt; Switch Workspace; the default value is your current workspace directory.)  In the worst case, you can always just search your hard drive for <kbd>tmp0</kbd>.  It&#8217;s there somewhere!</p>
<p>Inside, you can see all the folders Eclipse has created.  Check out the generated <kbd>server.xml</kbd> file in <kbd>tmp0/conf</kbd>.  Examine generated <kbd>.java</kbd> files in <kbd>tmp0/work</kbd>.  Your <kbd>tmp0/webapps</kbd> directory is probably empty; Eclipse has probably generated your webapp in <kbd>wtpwebapps</kbd>.</p>
<h2><a name="exorcism">Exorcising the <kbd>tmp0</kbd> Directory</a></h2>
<p>Unfortunately, sometimes Eclipse gets a little confused about what to put in your WAR file, and you need to perform various stages of exorcism depending on how badly your <kbd>tmp0</kbd> directory is messed up.</p>
<ul>
<li>Try republishing your <kbd>tmp0</kbd> directory.  Open the &#8220;Servers&#8221; tab, right-click on your server and select &#8220;Clean&#8230;&#8221; (<em>not</em> &#8220;Clean Tomcat Work Directory&#8230;&#8221;).  Then select &#8220;Publish.&#8221;  That should completely rebuild your <kbd>tmp0</kbd> directory.</li>
<li>Try restarting Eclipse.  This works more often than I&#8217;d like to admit.</li>
<li>Try completely deleting and recreating your server.  Follow this ritual:
<ol>
<li>Open the &#8220;Servers&#8221; tab, right-click on the server and select &#8220;Delete&#8221;.</li>
<li>Make sure &#8220;Delete unused server configuration(s)&#8221; is checked, then click OK.</li>
<li>Look at your &#8220;Servers&#8221; pseudo-project; make sure the folder for your server is gone.  If it isn&#8217;t, right-click on it and Delete it.</li>
<li>Quit Eclipse.</li>
<li>Go find your <kbd>tmp0</kbd> directory (if it&#8217;s still present) and delete it from your file system.</li>
<li>Launch Eclipse and recreate your server from scratch.</li>
</ol>
</li>
<li>Try creating a new workspace.  File -&gt; Switch Workspaces: specify an empty directory.  Create your server from scratch.</li>
</ul>
<h2>Troubleshooting: What Do I Do If I Still Can&#8217;t Get It to Work?</h2>
<ul>
<li><a name="regular"></a>My project works in regular Tomcat, but doesn&#8217;t work in Tomcat under Eclipse
<p>Try using Eclipse to generate a WAR file for Tomcat.  Right-click on your web project and select Export -&gt; WAR file, and install it in Tomcat by dropping the exported WAR into your Tomcat <kbd>webapps</kbd> directory.</p>
<p>If the exported WAR file doesn&#8217;t work, then you now have two WARs: one working WAR generated by your build script, and one non-working WAR generated by Eclipse.  WAR files are just zips; extract them both, find the difference, and fix it!  Right-click on your web project and select &#8220;Properties&#8221;.  The problem is somewhere in here.</p>
<p>On the other hand, if the exported WAR file <em>does</em> work, then you know that the problem has to do with the way Eclipse is launching Tomcat.  Find your <kbd>tmp0</kbd> directory (described above) and poke around.  Does everything look OK in there?  Be sure to check your <kbd>server.xml</kbd> file, as well as your webapp itself in <kbd>wtpwebapps</kbd>.  Make sure to note your <kbd>WEB-INF/lib</kbd> directory, typically in <kbd>tmp0/wtpwebapps/YOURAPP/WEB-INF/lib</kbd>.</p>
</li>
<li>Tomcat is throwing <kbd>NoClassDefFoundError</kbd> or <kbd>ClassNotFoundException</kbd>
<p>First, double-check whether this problem happens in regular Tomcat.  See <a href="#regular">My project works in regular Tomcat, but doesn&#8217;t work in Tomcat under Eclipse</a> above.</p>
<p>If this problem occurs in the exported WAR file under regular Tomcat, then your webapp is probably missing JARs.  See <a href="#missing-jars">My exported WAR is missing JARs</a> below.</p>
<p>If the exported WAR works but your webapp is still broken under Tomcat, you may need to perform an exorcism.  (See <a href="#exorcism">Exorcising the <kbd>tmp0</kbd> Directory</a> above.) If this happens to you often, double-check that you haven&#8217;t accidentally disabled auto publishing.  (See <a href="#auto-publish">Disable Auto Reloading but <em>Enable</em> Auto Publishing</a> above.)</p>
</li>
<li><a name="missing-jars"></a>My exported WAR is missing JARs
<p>Right-click on your web project and select &#8220;Properties.&#8221;  The problem is somewhere in here.  Make sure you see your JAR listed under &#8220;Java Build Path&#8221; in the Properties dialog.</p>
<p>Beware: not every JAR in &#8220;Java Build Path&#8221; gets exported to the WAR.  The list of JARs for the WAR is under &#8220;Java EE Module Dependencies.&#8221;  If a JAR/project is unchecked on that list, it won&#8217;t appear in your WAR file.</p>
</li>
<li>My <kbd>tmp0</kbd> directory is missing an important configuration file
<p>Eclipse will publish files that it finds in the &#8220;Tomcat&#8221; folder of the &#8220;Servers&#8221; pseudo-project to your <kbd>tmp0/conf</kbd> directory; if you&#8217;re missing files, you can add them here.</p>
</li>
<li>My <kbd>server.xml</kbd> file is messed up
<p>That file is copied from the &#8220;Tomcat&#8221; folder in your &#8220;Servers&#8221; pseudo-project to your <kbd>tmp0/conf</kbd> directory.  But note that the <kbd>server.xml</kbd> file is at least partially autogenerated by Eclipse.  If you make direct changes to the file, Eclipse will do its best to try to incorporate your changes, but it often gets confused and does the wrong thing.  When possible, it&#8217;s better to find the appropriate Eclipse settings and change them there instead of modifying the <kbd>server.xml</kbd> file directly.</p>
<p>Note that one of the most common problems with <kbd>server.xml</kbd> is an incorrect <code>path</code> attribute on your webapp&#8217;s <code>&lt;Context&gt;</code> element, causing your webapp to appear on a non-standard URL.  See the following question about 404 errors for more details about this problem.</p>
</li>
<li>Tomcat is giving me strange 404 errors
<p>First, double-check whether this problem happens in regular Tomcat.  (See <a href="#regular">My project works in regular Tomcat, but doesn&#8217;t work in Tomcat under Eclipse</a>)  If it happens in regular Tomcat too, then it&#8217;s probably a bug in your code.</p>
<p>If the problem only happens in Eclipse, then it&#8217;s probably a <kbd>server.xml</kbd> configuration problem.  Check your <kbd>tmp0/conf/server.xml</kbd> file&#8217;s <code>&lt;Context&gt;</code> element; check the <code>path</code> attribute.  The <code>path</code> attribute indicates the virtual directory of your webapp.  For example, if your <kbd>Context/path</kbd> is &#8220;examplePath&#8221;, then your webapp will appear at <kbd>http://localhost:8080/examplePath</kbd>.  If it&#8217;s misconfigured, your webapp may not be available at the URL you expect.</p>
<p>The <kbd>path</kbd> attribute is auto-generated based on settings in the properties of your WAR project.  Right-click on your web project, select &#8220;Properties&#8221; and go to the &#8220;Web Project Settings&#8221; section.  There&#8217;s only one setting here; it&#8217;s called &#8220;Context root&#8221;.  Specify the context you intend to use here.  If you want your project to appear in the root directory, you&#8217;ll need to put <kbd>/</kbd> as your context root (since you aren&#8217;t allowed to leave it blank).</p>
<p><img src="http://blog.redfin.com/devblog/files/2009/09/context-root-screenshot-300x272.png" alt="context root screenshot 300x272 How to Set Up Hot Code Replacement with Tomcat and Eclipse" width="300" height="272" class="alignnone size-medium wp-image-174" title="How to Set Up Hot Code Replacement with Tomcat and Eclipse" /></p>
</li>
<li>Tomcat times out when starting under Eclipse (&#8220;Server [...] was unable to start within 45 seconds&#8221;)
<p>The Eclipse developers, in their infinite wisdom, have added a timeout to Tomcat startup.  If Tomcat doesn&#8217;t declare a successful startup in 45 seconds, it kills Tomcat automatically.  (Gee, thanks, guys!)</p>
<p>You can increase that timeout.  Open the &#8220;Servers&#8221; tab and double-clicking on your server to open the server configuration panel.  Make sure the panel&#8217;s &#8220;Overview&#8221; tab is selected.  Expand the &#8220;Timeouts&#8221; section and increase the Start timeout to something reasonable for your server.</p>
</li>
<li>I had Tomcat working under Eclipse, but now it&#8217;s broken and I can&#8217;t figure out why
<p>You may need to perform an exorcism.  (See <a href="#exorcism">Exorcising the <kbd>tmp0</kbd> Directory</a> above.)</p>
</li>
<li>My web project starts up fine, but when I save <kbd>.java</kbd> files in Eclipse, it doesn&#8217;t take effect until I restart
<ul>
<li>Did you make sure to launch the server in Debug mode, as opposed to Run mode?  JPDA only works when you Debug the server.</li>
<li>Is your server configured to auto publish?  (See <a href="#auto-publish">Disable auto reloading but <em>Enable</em> auto publishing</a> above.)</li>
<li>Did you change something that broke JPDA?  (See <a href="#catch">What&#8217;s the catch?</a> above.)  If you make large changes to your classes, JPDA may be unable to replace the code; if you choose to &#8220;Continue&#8221; past that point, further changes will have no effect.</li>
</li>
</ul>
<li>My web project starts up fine, but when I save <kbd>.jsp</kbd> files in Eclipse, it doesn&#8217;t take effect until I restart
<p>This is typically due to disabled auto publishing.  Double-check that your server is configured to auto publish.  (See <a href="#auto-publish">Disable auto reloading but <em>Enable</em> auto publishing</a> above.)</p>
<p>If that doesn&#8217;t work, examine your <kbd>tmp0</kbd> directory to make sure Tomcat is using the correct JSP.  It should automatically begin consuming new JSPs as they are installed in the <kbd>tmp0/wtpwebapps</kbd> directory.</p>
</li>
<li>Whenever I save a <kbd>.java</kbd> file in Eclipse, Tomcat restarts
<p>This is typically due to Tomcat auto reloading.  Double-check that you correctly disabled auto reloading.  (See <a href="#magic-setting">Magic Setting</a> above.)</p>
</li>
<li>Tomcat in Eclipse is <i>much</i> slower than regular Tomcat
<p>Try increasing your <a href="#memory">memory settings</a> as described above, if you haven&#8217;t already.</p>
<p>Try running Tomcat in &#8220;Run&#8221; mode (as opposed to &#8220;Debug&#8221;) mode.  If that fixes the problem, then there may be nothing you can do about it.  JPDA does have some overhead; you can turn it off, but while you&#8217;ve turned it off you won&#8217;t have access to hot code replacement.</p>
</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://blog.redfin.com/devblog/2009/09/how_to_set_up_hot_code_replacement_with_tomcat_and_eclipse.html/feed</wfw:commentRss>
		<slash:comments>19</slash:comments>
		</item>
	</channel>
</rss>

