DataFest-2024

Past DataFests

Data Fest 2021 Ediburgh

Thirty-six teams participated in the ASA DataFest @ EDI 2021 between 26-28 March 2021.

ASA DataFest is a data analysis competition where teams of up to five students attack a large and complex surprise dataset over a weekend to find and communicate insights into the data. The teams that impress the judges win prizes, and the event is a great opportunity for them to gain some data analyses work experience.

Undergraduate students from 10 different schools across the University of Edinburgh and Heriot-Watt University participated in this event with Academic staff, postgraduate students, and data scientists from industry attending as consultants to guide participants during the event.

For this competition the Rocky Mountain Poison and Drug Safety Center was interested in discovering and identifying patterns of drug use, with particular attention paid to identifying misuse. These could include patterns that might describe demographic profiles within a given category of drug or combinations of drugs that frequently appear together. One goal they were keen to achieve with the data was to predict future drug misuse cases. The data sets came from participants who responded to an on-line request and were paid to participate.

For this challenge the participants were asked to produce two products:

a six minute video detailing their primary findings; and
a one page summary which included the primary questions that were investigated, or the general goal of the project, the methods used, and a quick description of the findings.

Over the weekend participants worked together, and with consultants, via Gather Town. Multiple virtual get-togethers were held on Gather Town’s virtual rooftop where everybody involved could meet and get to know each other! The award ceremony was held over Zoom on the second of April.

Judging

Our judges for this year’s DataFest @ EDI were

Ksenia Aleksankina - Data Scientist, Mirador Analytics
Nicole Augustin – Faculty, University of Edinburgh
Philip Darke – Actuary and PhD Researcher, Mercer and Newcastle University
Joanna Faulds – Senior Data Scientist, BBC
Ruth King – Faculty, University of Edinburgh
John MacInnes - Emeritus Professor of Sociology and Statistics at The University of Edinburgh

The judges first reviewed the submissions from participants independently and scored them, and then deliberated on a Zoom call to make final decisions on winners.

The judges remarked that “it’s so hard to pick a winner, everybody did so well, but it was honestly really true that every team did something great. That could have been a slick presentation, novel visualisation or just some general comments that showed a really deep understanding of the data or the problem. So, you should all be really proud about what you’ve achieved over just a weekend, it’s quite incredible."

Awards

🏆 Best insight: The Bayes-sic Team

Cannabis Usage and Prediction

Zeno Kujawa, Greig Rowe and Lee Suddaby

📹 Video presentation ✨Shiny app

The aim of this project was to investigate the factors which can be correlated to an individual’s cannabis use. By doing so we wish to find which variables were the most useful to help with the development of a questionnaire to predict cannabis misuse in the USA. This analysis was based on the 2019 survey results.

The judges were very impressed with the level of the statistical analysis the team presented. They liked how the team picked a single question to answer and went into great depth with it. They also appreciated the Shiny app the team built to summarize their results. They also noted how the team considered ethics of data collection. Very impressive work, well done!

🏆 Best visualisation: Hippopotamus Testing

Demographics and Geography of drug use in the UK

Michael Renfrew, Michał Kobiela, Kaiya Raby and Stanislaw Szcześniak

📹 Video presentation

The team used The Survey of Non-Medical Use of Prescription Drugs (NMURx) Program in order to analyse trends in drug use in the UK. Our analysis was performed using the R language. We mostly focused on geographical and demographic tendencies.

The judges felt that the geographic focus was successful and the team’s presentation included some brilliant visualisations, for example the postcode heat maps and comparisons across age and by sex. Weightings were allowed for in confidence intervals, and the team presented possible explanations for some of the patterns identified but also highlighted important limitations e.g. sample sizes across postcode groups. The presentation was professional and engaging. The judges emphasized that that the team should be proud of what they achieved over the weekend. They also mentioned that old people having more drugs lying around was a new insight for them, and they’re curious what’s going on in the Highlands?!

🏆 Best use of outside data: TheThreeMusketeers

Recreational drug predilections

Benjamin Gardner and Matthew Reidy

📹 Video presentation

The project goal was to determine the recreational drug predilections by demographic in Scotland and the rest of the UK (RUK), to compare these, and explore the potential reasons underlying Scotland’s high drug deaths. A higher consumption of MDMA appears to be correlated with the higher Scottish death rate, and this should be researched further. Furthermore, there are a number of significant difference in the preferences.

The judges were very impressed with the focus the team picked and how the team brought in external data on Scottish drug deaths. This is an topic of huge importance and social relevance and they really liked that the team were able to tie this in to their project. They also liked the team’s heatmaps comparing Scotland to the rest of the UK.

🏆 Judge’s pick - FlyingPenguins

Drug usage and mental health disorders

Arnav Bhargava, Purvi Harwani, Arjun Nanning Ramamurthy, Laura O’Sullivan, Pablo Ortuno Floria

📹 Video presentation ✨ Shiny app

The team explored the relationship between drug usage and mental health disorders. To do this, they created a Shiny app. They focused primarily on 3 facets: Non-Illicit drug use, Illicit drug use, and demographics relating these two together. They added interactivity to our app by allowing users to apply filters such as gender, mental health disorders, substance abuse, and illicit drug use. They looked at the number of users for each non-illicit drug and what percentage of those cases were used for non-medical use and also relevant statistics relating to these findings.

The judges were very impressed with how much functionality the team packed into the team’s dashboard. The team’s presentation was professional and highlighted important limitations of the sampling approach, sample sizes, and the complexities of mental health, which they also thought made their project strong!

🏆 Judge’s pick - Team Schoffee

Understanding drug misuse among healthcare workers in the United Kingdom

Syaqilah Farihah Binti Akmal Hisham, Serena Inez Binti Rafizal, Siti Rohmah Binti Satitan, Nurul Binti Yazid, Nicholas Goguen-Camponi

📹 Video

How do healthcare workers take care of themselves in terms of drug misuse? Team Schoffee defines the misuse of drugs as taking drugs without a doctor’s prescription or for a reason not recommended by a doctor. This does not imply any dependency on the drug. As is shown by the data, a large proportion of healthcare workers, 66%, have misused drugs at least once. The team found this result surprising because we would expect that as healthcare professionals, they would understand the consequences of misusing drugs.

The judges loved how professional the team’s slides and presentation were. They were impressed with the particular focus the team picked on drug use among medical professionals as well as how they communicated their modeling results via a confusion matrix. Very well done!

🏅 Honourable mention - Best insight: JGGL

Misuse and severity

Gabrielle Gaudeau, Jai Karayi, Gareth Lamb, Luca Terry

📹 Video

Team JGGL focused on finding a way of measuring the breadth of drugs a person has misused with a weighting for the severity of the drug in question, examining how representative the data set is to the UK as a whole, and with this measurement, see how this score changes for different demographics.

The judges commended the team on the specific focus they picked about the maximum possible prison sentence as well as how they compared the data provided with other data sources to get a sense of representativeness. They were also impressed with the team’s clear presentation.

Best team name: Abraca-data

Dave Diaper, Adam Henderson, Jonah Ramponi, Lyndon Scott Humphris, Robin Weersma

Other Past DataFests

2021 - Rocky Mountain Poison and Drug Safety Center

Goal: For this competition the Rocky Mountain Poison and Drug Safety Center was interested in discovering and identifying patterns of drug use, with particular attention paid to identifying misuse. These could include patterns that might describe demographic profiles within a given category of drug or combinations of drugs that frequently appear together. One goal they were keen to achieve with the data was to predict future drug misuse cases. The data sets came from participants who responded to an on-line request and were paid to participate. Click here to read more about the submissions from the winning teams at ASA DataFest 2021 @ EDI.

2020 - COVID-19

Goal: For this competition, we challenged participants to explore the societal impacts of the COVID-19 pandemic other than its direct health outcomes. Participants were allowed to explore everything from the effects on pollution levels, transportation levels, or working from home. They could investigate changes in the number of people posting on TikTok with their families or do an analysis on online education. We left the focus up to them and urged them to be thoughtful and creative as they analyzed data and communicated their insights about some of pandemic’s impacts on society. Click here to read more about the submissions from the winning teams at ASA DataFest 2020 @ EDI.

2019 - Canadian National Women’s Rugby Team

Goal: How do we quantify the role of fatigue and workload in a team’s performance in Rugby 7s? How reliable are the subjective wellness Fata? Should the quality of the opponent or the outcome of the game be considered when examining fatigue during a game? Can widely used measurements of training load and fatigue be improved? How reliable are GPS data in quantifying fatigue?

2018 - Indeed

Goal: What advice would you give a new high school about what major to choose in college? How does Indeed’s data compare to official government data on the labor market? Can it be used to provide good economic indicators?

2017 - Expedia

Goal: How do visitors' searches relate to the choices of hotels booked or not booked? What role do external factors play in hotel choice?

Expedia provided DataFesters with data from search results from millions of visitors around the world who were interested in traveling to destinations all over the world. The data were in two files, one of which included data collected on search results from visitors' sessions, and another which contained detailed information about the destinations that visitors searched for.

2016 - Ticketmaster

Goal: How can site visits be converted to ticket sales, and how can TicketMaster identify “true fans” of an artist or band?

Data consisted of three sets. One included events from the last 12 months that tracked customer travel through the website. Another provided information about advertising campaigns on Google, and the third included data on the events themselves.

2015 - Edmunds.com

Goal: Detect insights into the process of car shopping that can help make the process easier for customers.

Data consist of visitor ‘pathways’ through a website that helps customers configure car features and shop for cars. Five data files were linked by a customer key, and including data about the customer, about his or her visits to the webpage, and, when applicable, about the car purchased and the dealership where the car was purchased.

2014 - GridPoint

Goal: Help understand how customers can best save money and energy.

Data consisted of a random sample of customers, with five-minute aggregates over a year of energy consumption that was then aggregated across important features of the commercial properties, as well as supporting climate and location data.

2013 - eHarmony

Goal: Help understand what qualities people look for in prospective dates.

The DataFest students worked with a large sample of prospective matches. For each customer, data were provided on his or her preferences, as well as four matches, their preferences, and information about whether parties contacted one another.

2012 - Kiva.com

Goal: Help understand what motivates people to lend money to developing-nation entrepreneurs and what factors are associated with paying these loans.

Several data sets were provided, including characteristics of lenders and borrowers and loan pay-back data.

2011 - Los Angeles Police Department

Goal: Make a data-based policy proposal to reduce crime

Data consisted of arrest records for every arrest in Los Angeles from 2005-2010, including time, location, and weapons involved.