Module 1: Sequence Analysis
Completion requirements
Due: Wednesday, 13 September 2017, 11:55 PM
Module 1: Sequence Statistics
Objective: Find and download real genomes from internet repositories and calculate basic statistics of these genomes
Script:
- Download the complete genome sequence of Bacteriophage lambda from GenBank, accesion number NC_001416.
- Write a program to analyze the GC content using windows of different sizes.
- Download from the GenBank the full genome sequence of the mitochondrial DNA of man and chimpanzee (respectively NC_001807 and NC_001643).
- Using the program of the item 2, compare the differences of the two DNAs. Plot both of GCs distribution on the same graph using any program that graphs plot (Excel, gnuplot, etc). Do this for different sizes of windows and see the difference.
- Construct a multinomial model for each of the DNAs of the item 3 by calculating the probabilities of G, C, A , T, compare them.
Note: The programs have to be written in Python.