Stop Using Excel For Everything, Just Stop It.

Stop using Excel for everything, just stop it.

Excel needs to be lead round to the back of the stable block and put out of its misery.

Don’t get me wrong, Excel is an amazing piece of software. Many a fantastic innovation has started life as a spreadsheet, but so many people are comfortable and proficient with Excel that it gets used for things it simply isn’t good for.

Businesses get stuck with systems and processes that depend on spreadsheets that someone built years ago. That person has either left or forgotten how it all works, and now it’s all getting slower and slower as more and more rows get added. More and more versions of the spreadsheet are having to be maintained for different people and different software. There’s an excel plugin in there somewhere that’s no longer supported and the next big version update is going to bring the whole thing down in a heap.

“If your only tool is a hammer then every problem looks like a nail.”
This quote (probably wrongly) attributed to Mark Twain sums up the problem nicely; if you’re not a software person but you can do a pivot table then you will naturally reach for Excel whenever you have a problem to solve. And that’s fine. I use Excel to deal with one off tasks and prototyping but if you need to deploy a solution beyond your own desktop then stop.

What happens all too often is that the company’s Excel ninja puts together something that does the job; but then as the business tries to scale the solution to more users or more data things go pear shaped.

Excel Is Not a Database

Spreadsheets really aren’t a good place to keep any significant amount of data, you can’t denormalise it properly and the relational capabilities of Excel are basic. Officially, Excel can handle just over a million rows of data, but unless you’ve got a seriously powerful computer you’ll start seeing performance issues at less than a 10th of that, especially if you’ve got formulae.
Use Excel to get your data structure right and then get someone to build you a database.

Excel is Not a Data Visualisation Tool

A freelance friend of mine got asked recently to work on a project to produce “advanced BI dashboards” in Excel. He said no. If you want “advanced BI dashboards”, or any sort of dashboard for that matter look beyond Excel. Yes, Excel will produce graphs and pie charts etc, but sharing the dashboard becomes a nuisance. Emailing people a new version of a dashboard every few days isn’t ideal; what if you’re not there to send it? What if it they open an old version by mistake? What if they accidently make an edit and end up looking at a different set of numbers to everyone else?

A single source of truth – preferably a database – feeding data to a single online data visualisation that all your users can view is a much nicer solution. What’s more, this stuff isn’t anywhere near as complex or expensive as it once was. Google Data Studio is free and will let you produce professional dashboards from a wide variety of data sources and share them with the rest of your organisation. Or if you really want to smash it out of the park then spend the money on Tableau. No, it’s not cheap but it’s awesome.

If it’s a place to keep data that you need then there’s a million and one options that you might consider depending on what type of data you’ve got. If it’s customer data then use a CRM system, if it’s product data then use a proper PIM like Akeneo.

If you really must use a spreadsheet then consider Google Sheets, at least you can share a single version with all your colleagues and it will connect seamlessly to Data Studio.

Excel has its place, but know where that place is and more importantly where it isn’t.

Connecting Google Data Studio to Amazon Redshift

It’s now possible to connect Data Studion To Amazon Redshift.

Google and Amazon both have complete suites of analytics tools that seamlessly connect within their own ecosystems, but getting them to work with each other can be a pain.

Amazon’s data warehouse solution – Redshift – is widely used and works brilliantly with Amazon Quicksight, but in my opinion at least Google Data Studio is a better data vis tool, and it’s free.

I already used Redshift to power various reports in Tableau but we needed a way to share a few basic KPIs without having to pay for a load of extra Tableau licences. Data Studio isn’t even close to be as full featured and powerful as Tableau yet but it’s improving constantly.

Amazon changed the Redshift Postgress connector recently, so now it’s possible to connect Data Studio to a Redshift database using the standard Data Studio PostgreSQL connector (Redshift is based on Postgres).

It’s really simple to set up, provided you remember 2 things:

  1. Your Redshift cluster must be publicly accessible
  2. The username and password you enter in to Data Studio need to be for an IAM user with full access to the Redshift cluster

Here’s a screenshot of my Data Studio configuration:

Data Studio setup Redshift

And here’s one of my Redshift cluster setup:

Redshift set up Data Studio

Once you’ve got them connected Redshift functions like any other data source in Data Studio.

Context in Data Presentation

Sales of Widget Z are up 20 percent!

That’s what the headline figure on a chart in a meeting I was in the other week said. The presenter was thrilled and speculating about what could have caused such a jump in sales.

But that 20% tells you very little. 20% of what? Compared to when? I was unfamiliar with the client’s products and sales figures so I asked.

As it turns out, Widget Z had sold 10 units the previous week and 12 this week. Combined sales were in the tens of thousands of units and best selling products were selling in the thousands of units.

So an extra 2 sales of Widget Z was totally irrelevant. In fact it’s average weekly sales rate for the previous 6 months was just over 15 units, so it was actually performing below average.

This got me thinking about the importance of providing context when presenting figures. You might know the numbers inside out and have an intuitive understanding of how they all relate and change over time but don’t assume the person you’re giving the numbers to does. Even if they do, work on the assumption that they will give your figures to someone who doesn’t.

Dashboard design is something I’m doing a lot of at the moment, using Tableau and Google Data Studio so I got thinking about relating this idea to designing data dashboards.

Let’s say you’re doing a simple dashboard showing some performance metrics for a website, you’ll probably want to show it’s conversion rate:

Ok, so far so good, but is that good or bad?  Lets add a comparison:

Better, but what’s the timeframe?

That’s much better, but it’s not quite enough, this figure could be skewed by a couple of extreme values, so lets add a time series:

The scorecard and time series work well together. We can tell it’s increased fairly significantly and if we look over at the time series we can tell that there was a dip but that we’ve now recovered and there seems to be a slight upwards trend.

Now we could drill deeper in to this, looking at a longer timeline, breaking out other dimensions like traffic sources etc but we can see the headline figure at a glance. Whether or not you need to drill deeper depends on who the dashboard is for, I plan to cover this in another post