Now is an incredibly exciting time for you to participate in bioinformatic research, with new methods for generating and analyzing large genomic data sets emerging almost daily. As bioinformaticians, we need a large, comprehensive, current, established, well supported, open source, community developed collection of software to address our leading-edge needs. This presentation describes the _Bioconductor_ project, providing exactly these resources for the _R_ bioinformatics community. We’ll learn a little about the history, philosophy, and approach of the _Bioconductor_ project. We then walk through essential steps in getting started with _Bioconductor_, tackling real analytic challenges, and contributing to the _Bioconductor_ community. With luck, the presentation will inspire and empower you to tackle new and innovative challenges in your own bioinformatic research.
This workshop will introduce you to the Bioconductor collection of R packages for statistical analysis and comprehension of high-throughput genomic data. The emphasis is on data exploration, using RNA-sequence gene expression experiments as a motivating example. How can I access common sequence data formats from R? How can I use information about gene models or gene annotations in my analysis? How do the properties of my data influence the statistical analyses I should perform? What common workflows can I perform with R and Bioconductor? How do I deal with very large data sets in R? These are the sorts of questions that will be tackled in this workshop.
Key words: Bioinformatics; R; gene expression; annotation; data management
Requirements: You will need to bring your own laptop. The workshop will use cloud-based resources, so your laptop will need a web browser and WiFi capabilities. Participants should have used R and RStudio for tasks such as those covered in introductory workshops earlier in the week. Some knowledge of the biology of gene expression and of concepts learned in a first course in statistics will be helpful.
Relevance: This workshop is relevant to anyone eager to explore genomic data in R. The workshop will help connect ‘core’ R concepts for working with data (e.g., data management via data.frame(), statistical modelling with lm() or t.test(), visualization usiing plot() or ggplot()) to the special challenges of working with large genomic data sets. It will be especially helpful to those who have or will have their own genomic data, and are interested in more fully understanding how to work with it in R.
Director, R/Bioconductor Project, Professor of Oncology, Roswell Park Comprehensive Cancer Center
Martin earned his undergraduate and Master’s degrees in Botany at the University of Toronto. Martin’s PhD studies at the University of Chicago involved the evolutionary consequences of frequency-dependent selection, and of multilocus deleterious mutation. Martin is currently at the Roswell Park Comprehensive Cancer Center in Buffalo.
Martin leads the core team that maintains the Bioconductor project. He is the author of many Bioconductor packages and a renowned biostatistican.