When to Stop Using Excel

Without a doubt, Microsoft Excel is the most widely used “business intelligence tool.” Yes, those quotes are there for a reason. Excel is a great tool, and one that most every analyst of any kind has used in excess and been heavily dependent upon. However, there does come a point where a business should begin evaluating if it is time to grow up and move beyond spreadsheets for analysis, reporting and data visualization. Over the years, I’ve worked on many projects where the final deliverable is some kind of automated spreadsheet. And, through this experience, I’ve seen many things that make me think, “okay, Excel is NOT the best way to meet this business requirement in the long term, because of X.” I just thought I’d share a few of the things I’ve seen being done in Excel that indicate it’s time to start looking for another way to analyze, report and visualize your data:

  • Ownership – Could a new person jump right into the spreadsheet and continue the work?
    • Many times, I’ve seen (or created) spreadsheets that would be next to impossible to follow all the way from how the data is inserted, how it’s later transformed, how some data may be tied together and then eventually reported upon. This makes maintaining a spreadsheet solution very difficult in the long-term. When a spreadsheet’s functionality becomes so complex that no one aside from the original creator can edit it, it’s time to consider other solutions. This becomes even more true when you start to introduce things such as VBA scripting, large pivot tables (these in general are an indicator that you need a real BI solution) and complex macros into the mix.
  • Size – Plain and simple, Excel is not meant for large datasets
    • This one is pretty much a no-brainer. Any single Excel tab can only hold around 1.5 million rows of data. But, good luck not pulling your hair out trying to manipulate (or even just open) an Excel file with that much data in it. Even though you can have 1.5 million rows in Excel, it takes far less data than that to essentially make a spreadsheet useless (or a nightmare to use). Even inserting a new, blank column into such a spreadsheet means sitting there for a couple of minutes waiting on your machine’s CPU to chug away trying to complete this task.
  • Geographic Data – It’s a spreadsheet not a mapping tool
    • When dealing with geographic data, your visualizations in Excel are pretty limited. There is no integrated way to visualize geographic data on any kind of map via Excel (at least out of the box). Visualizing geographic data on a map is great for the consumers of the data as it allows them to immediately put the data into proper context without having to first look down all of the rows (or maybe even only the top 10 rows,etc) of a ranked list of locations or at the entire bar chart of locations (neither of which visualize where the actual location is).
  • Data Integration – It’s a spreadsheet, not a database
    • A “vlookup” is not not really joining multiple data sets. Excel can be used to tie data from multiple datasets together if there are common keys, but you have to ask yourself if Excel is the best way to maintain a complex set of data where integration and data relationships are critical. To put this simply, if multiple data sets are critical to analysis, then you will eventually run into the ownership and size issues mentioned above. At some point, your business requirements  will be better served by using a true relational database (even Access would scale better, and MySQL is a free relational database).

Even though spreadsheets are very useful in the right situation, my primary argument against them in certain cases conform to the points above. So in summary, a spreadsheet isn’t the best long-term solution for business intelligence, analysis and data visualization if; a) it’s too complex for anyone else to quickly update or expand upon, b) you have a large amount of data, c) you need to effectively analyze geographic data and d) if you have many data sets that are related and need to be joined (or even visualized aside one another).


Data Visualization Best Practices

Below is a presentation deck from Tableau which was presented during one of their recent “roadshow” events. Regardless of the tools you have at hand to visualize your data, the best practices presented here are worth your time. Of especial interest within the deck is the information and research on how humans can best leverage data visually as well as how certain relationships in data are best visualized for efficient consumption.

Visualization of Tennessee Elementary School Performance

Below shows what can be accomplished by applying a simple visualization to any data set (using Tableau public in this case). Through this visualization I’ve been able to see that the elementary school of which my own children attend is one of the top in our school district. On the other side, I can also see that the elementary school of which I attended as a child is currently the worst performing elementary school within that different district within the state of Tennessee.

The visualization below is not yet complete, but will provide additional context as to the different profiles of the schools presented in the above analysis by providing:

  • size of student population
  • number of minorities
  • number of students that are economically disadvantaged
  • male/female makeup of the student population

State of Tennessee K-5 School Profiles: