As part of our recent release, we added every survey submitted about our agents to our agent profiles. To prep for this, we needed to do a mail merge with each survey response, which we exported from a Postgres database. The problem was, when our users submitted their answers, they used the skills their grade 3 English teacher taught them and wrote in paragraphs. But all the common export formats indicate new records by using line-breaks, so we needed a way to clean the whitespace from these surveys.
As it turns out, Postgres has some very useful regex functions that make string operations a breeze. But since no one wants to have to reconstruct the appropriate syntax every time they need to clean whitespace, you can make a Postgres function that wraps the functionality:
CREATE OR REPLACE FUNCTION clean_whitespace(to_clean text) RETURNS text AS $$ BEGIN RETURN regexp_replace(to_clean, E'[ tnr]+', ' ', 'g'); END; $$ LANGUAGE plpgsql IMMUTABLE;
This replaces each group of whitespace in the argument with a single space. The immutable flag indicates that the function will have no side-effects, and thus allows it to be used in indices. Also notice that we only want to match occurrences of length at least 1 (by using “+” rather than “*”), because otherwise you end up with a space between every character!
Thanks to Thomas Kellerer on the postgres message board for pointing us in the right direction with regards to the arguments we needed.
(Photo credits: jamesdale10 on Flickr)