“Content” Eugh.

Before I get to why the idea of “content” needs to be thrown out of a helicopter, let me take you on a whistle stop tour of SEO. Back in the good/bad old days, you could get your site to rank by simply copying and pasting the site that was number one for the term you wanted and buying a few more links than they had. Allegedly.

As google got smarter and the stakes got higher, the tricks got more and more complex. Then, in April 2012 Google rolled out their “Penguin” update. Penguin, in a nutshell, meant that it was possible to damage your site by building crappy links. This was significant because up to then the worst a poor quality link could do was not help you.
Building links still worked, it was still a vital part of SEO but it now carried risk.
This left a lot of companies in a quandary; if building crappy links is out, and there’s not enough budget to build decent ones what do we do?

The answer came out of the now unbearably smug whitehat SEO community – “just write great content”.

Okay. Great is subjective. And hard. So that bit got dropped and the mantra became “just write any old rubbish as long as it hits the word count”.

Which brings us to where we are now. Companies are writing content for the sake of it, and not because they have anything to say. Armies of content creation executives are pumping out word-paste to a brief, word count and a deadline, often about subjects they know precisely nothing about. But as long as the company website gets two five hundred word articles each and every week then everyone is happy.
Well, everyone except anyone who has the misfortune to read any of this bland, soulless mulch.
You know the type of article I mean, you’ve read them;

18 surprising new ways to say the same thing we said last week!
Expert opinions on something bloody obvious!
You won’t believe this slight rephrasing of this same, tired point!

Anyone who lands on content like this will bounce. Trust me, I’ve seen enough Google analytics accounts to know.
If your business’ footprint on social media is a daily stream of tedious articles people will just tune you out. They probably won’t unfollow you, they just mentally screen out your posts and the one time you do write something worth reading they’ll skim over it.

So stop it. Stop rubber stamping out content for the sake of it. Write one decent article not fifty half arsed ones. Write about what you know. Write something useful.
Quality > quantity.

Stop Using Excel For Everything, Just Stop It.

Stop using Excel for everything, just stop it.

Excel needs to be lead round to the back of the stable block and put out of its misery.

Don’t get me wrong, Excel is an amazing piece of software. Many a fantastic innovation has started life as a spreadsheet, but so many people are comfortable and proficient with Excel that it gets used for things it simply isn’t good for.

Businesses get stuck with systems and processes that depend on spreadsheets that someone built years ago. That person has either left or forgotten how it all works, and now it’s all getting slower and slower as more and more rows get added. More and more versions of the spreadsheet are having to be maintained for different people and different software. There’s an excel plugin in there somewhere that’s no longer supported and the next big version update is going to bring the whole thing down in a heap.

“If your only tool is a hammer then every problem looks like a nail.”
This quote (probably wrongly) attributed to Mark Twain sums up the problem nicely; if you’re not a software person but you can do a pivot table then you will naturally reach for Excel whenever you have a problem to solve. And that’s fine. I use Excel to deal with one off tasks and prototyping but if you need to deploy a solution beyond your own desktop then stop.

What happens all too often is that the company’s Excel ninja puts together something that does the job; but then as the business tries to scale the solution to more users or more data things go pear shaped.

Excel Is Not a Database

Spreadsheets really aren’t a good place to keep any significant amount of data, you can’t denormalise it properly and the relational capabilities of Excel are basic. Officially, Excel can handle just over a million rows of data, but unless you’ve got a seriously powerful computer you’ll start seeing performance issues at less than a 10th of that, especially if you’ve got formulae.
Use Excel to get your data structure right and then get someone to build you a database.

Excel is Not a Data Visualisation Tool

A freelance friend of mine got asked recently to work on a project to produce “advanced BI dashboards” in Excel. He said no. If you want “advanced BI dashboards”, or any sort of dashboard for that matter look beyond Excel. Yes, Excel will produce graphs and pie charts etc, but sharing the dashboard becomes a nuisance. Emailing people a new version of a dashboard every few days isn’t ideal; what if you’re not there to send it? What if it they open an old version by mistake? What if they accidently make an edit and end up looking at a different set of numbers to everyone else?

A single source of truth – preferably a database – feeding data to a single online data visualisation that all your users can view is a much nicer solution. What’s more, this stuff isn’t anywhere near as complex or expensive as it once was. Google Data Studio is free and will let you produce professional dashboards from a wide variety of data sources and share them with the rest of your organisation. Or if you really want to smash it out of the park then spend the money on Tableau. No, it’s not cheap but it’s awesome.

If it’s a place to keep data that you need then there’s a million and one options that you might consider depending on what type of data you’ve got. If it’s customer data then use a CRM system, if it’s product data then use a proper PIM like Akeneo.

If you really must use a spreadsheet then consider Google Sheets, at least you can share a single version with all your colleagues and it will connect seamlessly to Data Studio.

Excel has its place, but know where that place is and more importantly where it isn’t.

Fast Uploads to AWS Redshift with Pentaho Kettle

Although it’s blisteringly fast at large, aggregate queries Amazon Redshift is horribly slow at insert/update transactions. Even the new faster instances only get 3 or 4 writes a second, if you’re only uploading a few hundred rows at a time then Kettle’s table output step is probably ok, but if you’re trying to add a 500,000 rows to a Redshift table from a Pentaho Kettle job then you’re looking at 36 hours to write all the rows.

The Redshift documentation suggests that for bulk uploads you should upload your data as CSV files to S3 and then use Redshift’s COPY command to load to your table.

It’s actually quite easy to do all of this in one go in a job in Kettle.

First, create a simple transform to set a system variable to today’s date. We’ll use this later to set file names.

The next thing to do is to build whatever transformation you need to get your data in to the correct schema for your Redshift table using the S3 output step. Use the system variable from the previous transform to append to the file name. Don’t set a header row.
You need to set the fields in the same order as the columns in the table you’ll be copying to.

If you’re going to be uploading more than a million or so rows; select the split every…rows option and set it to 500,000. This will speed things up later; the COPY command loads files in parallel so 10 files of 500,000 rows will load faster than one file of 5,000,000 rows.

At this point you should test that your output files import to your redshift table correctly; run the transform on a small data set (use a LIMIT statement if you’re getting the date from a query) and inspect the file that gets written to S3. Then you can try manually running the COPY command to import the data to Redshift.

Once you’ve successfully copied a file to your table, you can go ahead and build the whole thing in to a job, using the Execute SQL Script step to copy the files to Redshift. You can use the system variable from the first step in the SQL query to copy only the files created during this run. Old files can just be left in S3 as an archive, deleting periodically.

New data gets added to the end of the table, so if your new data is ordered by the Sortkey of your table you can add a VACUUM command to the end of the SQL script.

In the job above you can see I’ve added success/failure emails,I’ve also added a check database connection step at the begining to make sure the Redshift connection and any other database connections are available before doing anything else.

Connecting Google Data Studio to Amazon Redshift

It’s now possible to connect Data Studion To Amazon Redshift.

Google and Amazon both have complete suites of analytics tools that seamlessly connect within their own ecosystems, but getting them to work with each other can be a pain.

Amazon’s data warehouse solution – Redshift – is widely used and works brilliantly with Amazon Quicksight, but in my opinion at least Google Data Studio is a better data vis tool, and it’s free.

I already used Redshift to power various reports in Tableau but we needed a way to share a few basic KPIs without having to pay for a load of extra Tableau licences. Data Studio isn’t even close to be as full featured and powerful as Tableau yet but it’s improving constantly.

Amazon changed the Redshift Postgress connector recently, so now it’s possible to connect Data Studio to a Redshift database using the standard Data Studio PostgreSQL connector (Redshift is based on Postgres).

It’s really simple to set up, provided you remember 2 things:

  1. Your Redshift cluster must be publicly accessible
  2. The username and password you enter in to Data Studio need to be for an IAM user with full access to the Redshift cluster

Here’s a screenshot of my Data Studio configuration:

Data Studio setup Redshift

And here’s one of my Redshift cluster setup:

Redshift set up Data Studio

Once you’ve got them connected Redshift functions like any other data source in Data Studio.

Analytics and Tag Manager for Single Page Applications

Single page applications are becoming more and more common, they’re a great way to provide a fast, responsive user experience but they can be a pain for analytics.

A standard implementation of Google Analytics will record a page view when the SPA first loads and nothing else which is useless.

There are a few approaches to this, exactly how you deal with it will depend in some ways on the underlying framework your single page application is built on. I’m going to go through a specific example with Vue.js later but before that let’s look at a few basic concepts.

Google Analytics records page views, and we only have one page, so we need to record something else. I’ve seen SPAs tracked using Google Anaytics events, that works for small, simple SPAs but my preferred approach is to treat every significant change in the page’s content as a page load and send dummy page views to GA.

This works well for a few reasons, the main one being that the data and reports you generate will be immediately familiar to anyone who’s used Analytics before.

Another reason I like this approach is that the dummy page views become available in GA for functions like goal funnels, content flow, content grouping and you get stats like exits, time on page etc giving you a lot more insight than if you went the event tracking route. I wrote a bit about using dummy page views for goal funnels here:

Another advantage to this approach is that if you have an SPA within a more traditional multi page site the data will fit in with the rest and work with any custom dashboards or reporting you have.

I suggest using Google Tag Manager for setting up your SPA Analytics, you can do it without but the example below assumes you’re using GTM.

The example below is for Vue.JS, but the prinicples are the same whatever framework you’re using.

You need to get your framework to emit a data layer event at each significant change in the content, along with the event you need a variable containing a short (one or two word) description of the content.

Say you’ve got a 3 part sign up process, each time a part is displayed you want to emit an event called something like “content change”  and a data layer variable called something like “content name”  with the value “signup-part1”, “signup-part2” etc.

Then, you can use the event as a trigger in GTM and use the content name variable to create a dummy page view,

Vue.js Example

If you’re using Vue.JS for your SPA, this plugin will make setting up Analytics and Tag Manager a whole lot easier, especially if you’re using Vue to manipulate the router to give the impression of a multi page site.

With the plugin installed, Vue will emit a data layer event called content-view each time the content changes and a data layer variable called content-name. We can stitch this all together in GTM really quickly and start sending pageviews to Analytics.

First thing to do is pull the data layer variable in to a GTM variable so that it’s available for us to use as a page name:

Next, we need a trigger that listens for the content-view event so that we can send the page view each time the content changes:

Finally, we need a Google Analytics tag to send the page view:

That’s pretty much it with Vue.js and the plugin, you will need to find a way that works with your SPA but the principles above should help; send an event an a name for the content to Data Layer each time the content changes, use GTM to send a dummy page view to Google Analytics.


Don’t Build Toasters

I saw a TED talk the other day about a guy who built a toaster from scratch, I mean really from scratch, he mined his own iron ore, made plastic from oil and pretty much did everything using pre-industrial tools and technology. He had a great time doing it and it’s a fascinating talk, link below.

Mr Thwaites’ project was a labour of love, his goal was to see if it could be done, he succeeded and all credit to him. However, had his goal been to produce toast then this was a truly terrible idea.

  • It made bad toast
  • It cost £1187.54
  • It took nine months

Amazon will deliver a Russell Hobbs toaster to your door, today, for £16.99. Failing that you could hold bread over your cooker or light a fire or use a blowtorch. In fact pretty much any method of toasting bread you care to name would be faster, cheaper and better in every way than making your own toaster.

And yet this is precisely what many businesses do when it comes to software. Rather than use products that are available now, reasonably priced and someone else’s responsibility to maintain they build or commission something bespoke that costs a fortune, takes forever, isn’t very good and diverts attention and resources from actually running the business.

Usually this happens when a company has internal processes that don’t align with the way that off the shelf software does the job. For example a lead management process that doesn’t fit easily in to off the shelf lead management tools like PipeDrive or a project management workflow that doesn’t fit with tools like Jira.

Think about this for a moment, these tools are built to do a single job well, the processes they are built around are researched and optimised by experts and have been honed by thousands of existing users providing feedback. If your process doesn’t fit, it’s entirely possible that it’s better than the combined wisdom of industry experts and thousands of other businesses like yours but before you go and build “The Homer” be honest about it. Could you conceivably change your processes to work with existing tools? Might it be better?

Another reason bespoke software happens is when someone wants features from various tools in one place.

This spawns monstrosities that do lots of things less well than the single purpose solutions the features came from. It ends up kind of like a Swiss army knife, yes, there are scissors and tweezers but compared to a proper pair of scissors and a proper pair of tweezers they’re crap.

Could you use two or more separate tools? Most modern SaaS solutions integrate with each other anyway, either directly or via services like Zapier. Even if you can’t find tools that will integrate, the chances are they’ll have APIs and that you can have an integration built for a tiny fraction of the cost of bespoke software.

Bespoke isn’t always bad, sometimes there’s a genuine need for a completely new piece of software and a bespoke solution is the only or best way, I’m not saying never go bespoke but do exhaust all the other possibilities first. Don’t build a toaster.

Link to the Ted Talk:



Context in Data Presentation

Sales of Widget Z are up 20 percent!

That’s what the headline figure on a chart in a meeting I was in the other week said. The presenter was thrilled and speculating about what could have caused such a jump in sales.

But that 20% tells you very little. 20% of what? Compared to when? I was unfamiliar with the client’s products and sales figures so I asked.

As it turns out, Widget Z had sold 10 units the previous week and 12 this week. Combined sales were in the tens of thousands of units and best selling products were selling in the thousands of units.

So an extra 2 sales of Widget Z was totally irrelevant. In fact it’s average weekly sales rate for the previous 6 months was just over 15 units, so it was actually performing below average.

This got me thinking about the importance of providing context when presenting figures. You might know the numbers inside out and have an intuitive understanding of how they all relate and change over time but don’t assume the person you’re giving the numbers to does. Even if they do, work on the assumption that they will give your figures to someone who doesn’t.

Dashboard design is something I’m doing a lot of at the moment, using Tableau and Google Data Studio so I got thinking about relating this idea to designing data dashboards.

Let’s say you’re doing a simple dashboard showing some performance metrics for a website, you’ll probably want to show it’s conversion rate:

Ok, so far so good, but is that good or bad?  Lets add a comparison:

Better, but what’s the timeframe?

That’s much better, but it’s not quite enough, this figure could be skewed by a couple of extreme values, so lets add a time series:

The scorecard and time series work well together. We can tell it’s increased fairly significantly and if we look over at the time series we can tell that there was a dip but that we’ve now recovered and there seems to be a slight upwards trend.

Now we could drill deeper in to this, looking at a longer timeline, breaking out other dimensions like traffic sources etc but we can see the headline figure at a glance. Whether or not you need to drill deeper depends on who the dashboard is for, I plan to cover this in another post