CHAPTER 1Retrieval of Sequence(s) from the NCBI Nucleotide Database

CS Mukhopadhyay and RK Choudhary

School of Animal Biotechnology, GADVASU, Ludhiana

1.1 INTRODUCTION

The NCBI nucleotide database (http://www.ncbi.nlm.nih.gov/nucleotide/) is an archive of gene, transcript, and fragments of genomic DNA sequences. It combines several online public repositories, including GenBank (the genetic sequence database of NIH), RefSeq (annotated, non‐redundant reference sequence from genomic, transcript and protein), TPA (third‐party annotated data on nucleotide sequences), and PDB (protein databank: a repository of 3D structures of proteins and nucleic acids). The International Nucleotide Sequence Database Collaboration (INSDC) maintains the liaison between the three major molecular data repositories – namely, NCBI, DDBJ, and EMBL – to share the nucleotide data present in any of those databanks.

A brief description of the NCBI databases has been given in Appendix A “NCBI Database: A Brief Account” at the end of this book.

1.2 COMPONENTS OF THE NCBI NUCLEOTIDE DATABASE

  • GenBank: An annotated collection of all publicly available nucleotide and in silico translated protein sequences.
  • EST database: Maintains expressed sequence tags (ESTs) and short, single‐pass reads (the sequence‐fragments/reads obtained by loading the reaction in a lane only once and, hence, obtained after analyzing the input sequence by the sequencer only once) from mRNA (cDNA).
  • GSS database: A database of genome survey ...

Get Basic Applied Bioinformatics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.