ECOL 453R/553R
Computing Concepts for Bioinformatics
Fall 2005

What this class is about Class Notes (HTML/Power point)

This class will provide hands on overview of tools and resources for executing a typical bioinformatics project. The intention is to introduce and give a flavor of the practical programming skills and design considerations. To achieve this we will be covering PERL, BioPERL, R, MySQL, XML along with the EMBOSS application suite, BLAST and machine learning tools (WEKA).

  • PERL has several features that make it the language of choice for bioinformatics applications. Because PERL is an interpreted language, writing and modifying programs is quick and easy; thus lending itself very well for developing small "quick fix" programs for analysis. This also leads to the use of PERL as the "glue" tying and automating various applications written in different languages (such as C, C++, FORTRAN).
  • To avoid reinventing the wheels of "bioinformatics" we will use BioPERL modules to read, write sequence data and parse output from analysis programs (BLAST, EMBOSS).
  • Sequence data and associated information is usually stored in relational databases. We will cover the creation and use of databases using MySQL, a powerful Open Source database system, at the same time learn ways to retrieve data from public repositories like NCBI/GenBANK and store it in MySQL using PERL and BioPERL. Microsoft Access is a popular desktop database; we will use Access as a front end to generate reports and data entry forms with MySQL as the back end database
  • Statistical Analysis, aggregation and visualization of data are an integral part of any sequence analysis pipeline. We will be using R to carry out some rudimentary analysis and create plots
  • XML is fast becoming the default format for data exchange. We will cover the use of SOAP, XML-RPC to obtain and parse data from public repositories like NCBI
  • With the vast amount of data being generated ability to mine data using machine learning techniques is crucial we will explore the use of WEKA a popular machine learning environment

Final project will utilize skills and knowledge acquired using above mentioned tools/resources.

Students are encouraged to bring real case scenarios to class for possible final projects

 

For non Univ. of Az users

For students/users not enrolled for the class:

  • You are free to browse through the slides and use them in your class.
  • If you are looking to learn basic PERL, there are many "other" amazing resources on the net that you can use (instead of this).
  • To follow my sessions all you need is access to PERL, BioPERL, MySQL, R all of which can be downloaded and installed on a local machine (MS,OSX,LINUX). Your mileage may vary but I recommend any LINUX or BSD, OSX based system.
  • I cannot give you a account on our machines.
  • If you need sample data set/examples I use in the class, please e-mail me at:
    nirav at arl arizona edu
    I will send you the files

 

Books

Programming Perl
Larry Wall, Tom Christiansen, Jon Orwant
(UA library has both books available through netlibrary.com for online use).

Perl in a Nutshell : A Desktop Quick Reference
Siever, Ellen.; Spainhour, Stephen.; Patwardhan, Nate

If you do want to buy a book check
Perl CD Bookshelf 2.0 CD-ROM includes 5 books for ~ US $60

Purchase is not necessary for this class. We will be using documentation from various websites.

Instructors:
    • Susan Miller
    • Nirav Merchant

For contact info look at the first slide for relevant class