Hope you are well.
In this tutorial I want to show you how to clean up a data set with a view to analysing it. During the year I participated in my first 8k race and I found myself curious as to how people did etc. So I asked the organisers for the raw data which they duly provided. I ended up doing a dashboard for the whole data set and you can download the Finished Dashboard here.
I had a number of specific questions…
- How many people participated in all? How many clubs?
- How many people of the various categories participated i.e. Senior Men/Senior Women (as per the Category column)
- What was the breakdown of entries across clubs/people from no clubs
- What was the average time across each sector and club?
- What was the fastest, average (and slowest!) time across each sector and club?
- I also wanted to check the median (most frequent time analysis) but I chose not to add it to the final sheet – but it is included in the file (Median Time Analysis)
I decided in the end just to present the summary data on a sheet called Dashboard in the file. The top levels give the over all numbers. For me what was really interesting was the following:
- The Senior Women has the highest number of participants (690)
- The largest “club” category was those with “no club” was 1811.
- 134 people came from a club with fewer than 5 entries.
But you can check the max, min and average times across categories and across clubs and identify how many participated in each category and club.
However if you want to just go straight and see the finished product – Finished Dashboard
As always if you have any questions or comments…please feel free to ask 🙂