The first team’s data analysis marathon took place on last Saturday. Almost 60 participants turned up to take part in it (representing various levels of proficiency in the art of data analysis and different regions of Poland –most were from Warsaw but there were also people from Krakow, Poznan and Biala Podlaska). They came to grips with three problems under the supervision of 8 coordinators. The marathon took 11 hours so only the tough guys persevered until the end, although there were quite many of them (over a half).
We managed to find interesting solutions to each problem (presentation of the results took two hours even though they were being shortened). The results which made the biggest impression on me were the results of the teams working on cancer data from The Cancer Genome Atlas project. It is a very difficult problem and it had to be profoundly analyzed by both molecular biologists and data scientists. The available data was quite big (RNAseq data) and it requires a lot of preliminary processing (although a lot of work was done by the package RTCGA anyway). Data cleaning itself was amazingly time-consuming and loading data almost killed the WiFi. In the middle of the marathon it seemed that we barely managed to prepare the data for the analysis and the purpose of that analysis was still a little vague.
Yet it sometimes happens so that the difficulties only make us stronger (if we struggles to overcome them). Finally cancer teams discovered some incredible relationships showing something really… surprising. We did not manage to fully explain these results (yet) and my personal guess is that they are connected with various molecular subtypes of breast cancer. The participants were so absorbed with the topic that they are still analyzing the data even the marathon has ended.
As befits people powered by data, after our meeting we carried out a survey. Those who responded to the survey were mostly very satisfied (they stressed great atmosphere, opportunity to meet interesting people and learning new tricks/methods of data analysis). They also declared their willingness to take part in another (some suggested monthly) marathons. Some organizational issues would have to be amended (more coffee, bigger rooms better adapted to the needs of teamwork) but we are already convinced that we want to organize another marathon after the summer holidays.
The survey showed that the most crucial reason for coming to the marathon was: (1) willingness to face an interesting and important challenge, (2) willingness to improve one’s skills in working with R, (3) willingness to work in a team, meet new people and make new contacts.
We would like to thank our coordinators (in the alphabetical order): Artur Kalinowski, Katarzyna Potega (CNK), Tymoteusz Wolodzko, Tomek Zozlak (IBE), Marcin Herok, Maciej Olszewski, Bartosz Wawrzynow (IIMCB), people involved in organisational preparations: Paulina Auguscik, Marcin Kosinski, Katarzyna Fak, Barbara Sozanska (MiNI), as well as our sponsor (deepsense.ai) and finally all the participants – this event would not take place if it wasn’t for you.