The Limitations of Redshift

I use Redshift every day.  We replicate records from our Cassandra and Postgresql databases there. That being said, I’d like to take a minute to talk about what Redshift is and how we as product managers can make the best use of it.  My usage is primarily through a reporting interface (Yellowfin or Tableau), but I also spend a few hours a week digging through SQL, looking for patterns in customer utilization and behavior.

So what IS Redshift? Don’t think of it as being a Postgresql database, it’s not.  Sure it has some parity with Postgresql commands, but it’s a very different creature underneath.  You also have to keep in mind that it “acts” like Postgresql 8.0.2 while version 9.6 of Postgresql is being released as I write this.

Staying with the theme of differences, you should keep in mind that there are also a large number of limitations on redshift's functionality.  Let's go through those...

No procedural programming

One of my chief complaints about Redshift from a data investigation standpoint is that it only supports SQL.  It does not support any procedural programming.  SQL Server has TSQL, Postgresql has pl/pgsql, heck, even MySQL supports some procedural programming.  This may not seem like a big deal initially; but then you try to do something as simple as declaring a variable in a query and you find out how limiting pure SQL is.  Similarly, while Redshift does support UDFs, they have to written in Python and run externally.   

Pivots

Everyone in the business world loves pivots (thank you Excel) and they're amazingly powerful.  In Postgresql 8.4 and up you can use tablefunc to write a crosstab function that dynamically generates a pivoted view.  SQL Server and MySQL can do the same thing with dynamic sql.  Redshift can't do it, so if you use pivoted data for your analysis, I hope your reporting solution can perform that operation.

JSON

Redshift also doesn’t support the JSON or ARRAY data types.  I work with JSON data quite often and none of the newer JSON functions in Postgresql 9.4 have been imported into Redshift.  Yes, you can import JSON directly into Redshift if you have a table structure that matches it, but what if your application dumps metadata into Redshift in one field?  You have no way of accessing it.  

No materialized views or triggers

When dealing with large sets of data, I’ve often used materialized views to speed up data retrieval.  It’s an old trick and one that Oracle DBAs have been using forever.  I often use triggers to keep the views updated.  That way, when you’re pulling back a report it can be quick and snappy, even if the underlying query has 15 joins and a handful of unions. 

Unfortunately, Redshift doesn’t support materialized views OR triggers.  So every time you pull up that report the database has to re-do all 15 joins and all the unions.  As a result it’s difficult to have snappy and responsive reporting.

In Closing

Please don’t think I’m trying to be completely negative on redshift.  I’m not.  It can be amazingly useful (as a number of companies have found out - AirBnB, Periscope, DonorsChoose, and more). My point is that Redshift isn’t your end-all-be-all.  It’s great for situations where you have an application that can handle all of the business logic and can translate the data to a form that redshift can accommodate.  As a result of this feature deficiency, Redshift can be hard to use if you don’t have a strong application layer.  If you're like me and you're a Product Manager who wants to figure out how often customers are using feature X, or what the churn rate is on a portion of your user base, you might run into some problems.

Coming up soon, an entry on how Redshift works with Yellowfin for lightweight reporting.