Analytics solutions for everyone

Google Data Studio is a basic, yet functional report building and dashboard service for when you don’t need or want the power of something like RStudio Shiny, Power BI or Tableau. It has many data connectors and is tightly integrated into Google’s authentication ecosystem. I recently used it to build a simple yet effective business intelligence dashboard report for a client that saved them significant annual spend on a content analytics service.

Underlying report data comes from a Google Sheet updated nightly with data from a Go program I wrote using Google’s Admin SDK API for Google Drive activity reporting. I massage that data in R after extracting it with the Go program and before pushing it to Google Sheets using the extremely useful googlesheets package. The Google Data Studio report I designed and built does the rest and satisfies the client’s basic needs for analytics on their content stored on Google Drive and served on their Google site.

There are several parts to this overall solution and it took a while to fully complete mostly because it was my first time writing a program in the Go language. This is just the R script that cleans the data and appends it to the Google Sheet.

library(dplyr)
library(stringr)
library(googlesheets)

setwd("c:/Projects/Go/")

# Read in the nightly file produced by the Go program that extracts 
# one day of Drive activity
  gDriveLogData <- read.csv("C:/Projects/Go/gDriveLogData.csv", header=FALSE, sep=","
  ,stringsAsFactors=FALSE)
  names(gDriveLogData) <- c("DateTime","Actor","Action","Document")

# Trim leading and trailing spaces from the Drive filenames
  gDriveLogData <- apply(gDriveLogData,2,str_trim)
  gDriveLogData <- as.data.frame(gDriveLogData, stringsAsFactors = FALSE)

# Convert the time stamp to a date / time class and change time zone from UTC to PST
  gDriveLogData$DateTime <- as.POSIXct(gDriveLogData$DateTime,
  "%Y-%m-%dT%H:%M:%S", tz="UTC")
  attr(gDriveLogData$DateTime, "tzone") <- "America/Los_Angeles"

# Get unique entries with  hours minutes seconds - deduplicate
  gDriveLogData <- unique(gDriveLogData)

# Trim hours minutes seconds
  gDriveLogData$DateTime <- strptime(gDriveLogData$DateTime, "%Y-%m-%d")
  gDriveLogData$DateTime <- as.POSIXct(gDriveLogData$DateTime)

# Filter out all other Google Drive events except views and downloads for the report
  target <- c("view", "preview", "download")
  gDriveLogData <- filter(gDriveLogData, Action %in% target)

# Write output for recordkeeping
  write.csv(gDriveLogData, "gDriveLogDataSiteUpdate.csv", row.names = FALSE)

# Rename the file from the nightly Go program output that we read in at the start
# in case the GO job fails - we won't create duplicate entries in the Google sheet
  file.rename("gDriveLogData.csv","last_gDriveLogData.csv")

# Google Sheet update (assumes the gs_auth steps already completed)
  gDriveLogDataAppend <- gs_key("<target Google Sheet GUID")
  gs_add_row(gDriveLogDataAppend, ws = 1, input = gDriveLogData, verbose = FALSE)