Dear lady/sir I would be grateful if you could answer the following questions regarding the HRAS Homo sapiens Exon/Intron gene sequence found on web page: http://www.ncbi.nlm.nih.gov/entrez/batchseq.cgi?dopt=graph&extrafeatP4&out=on&list_uidsQ493057&_fromG2130&_sfromG2130&_toG4729&_slenP00&_phrap=off 1. At the above address the gene is located on chromosome 11 approximately between nts 472,150-475,570. According to Ensembl This gene can be found on Chromosome 11 at location 522,243-525,550. (From web page: http://www.ensembl.org/Homo_sapiens/geneseqview?db=core&gene=ENSG00000174775&flank5_display`0&flank3_display`0&exon_display=core&exon_ori=all&snp_display=off&line_numbering=sequence&submit=Update) Can you explain this discrepancy? 2. Is there a text version (that can be copy/pasted) of this same sequence (with Exons/Introns indicated)? 3. Is there a reference to this sequence (with Exons/Introns indicated) in any other web site (such as ncbi)? Thank you Uri Moran MSc. Student Tel Aviv University First reply: Dear Uri, 1) The difference between two databases is because each group has its own assembly. 2) You may search in Gene database and see this gene's record. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve&doptfull_report&list_uids265 In a Full report page you may chose "Gene Table" from the "Display" option pull down menu to see the text. 3) Links in the right side of the full report will show you where you may obtain more information about this record. Regards, Simin Second reply: My guess is that that sequence you are looking for has derived from EST (Expressed Sequence Tags), and translated to complementary DNA (cDNA). This is the case with many genes in vast amount of species that retrieved from mRNA. The RNA has gone splicing before the reverse-transcription, and only the coding regions are in the gene-bank. , Yuval Third reply: Dear Uri! Unfortunately I am not able to present the sequence to you, but I am sure you can find what you need. If you blast your mRNA with the human EST-library and the Human WGS-library and align the results, you should be able to find the sequence you need. I did that with several genes in sticklebacks. Cheers, Sascha Fourth reply (complete correspondence): Hi Uri, I poked around the NCBI site some more, and I never was able to really see a gene sequence annotated with the exons. I can see a line drawing of the sequence, but not the actual sequence. If I am missing something, please let me know. I was able to download the gene sequence, unannotated, plus the sequences of the mRNAs (alternative splicing). So I aligned them (see attached file). You can view this with JalView, BioEdit or any other multiple sequence alignment editor. All of the introns begin with GT and end with AG, which is the rule for most eukaryotic genes. Brian On 6/5/06 9:25 AM, "moranuri@post.tau.ac.il" wrote: > Dear Brian > > Thank you very much for your reply. > > In case you are interested, see correspondence with NCBI help desk below. > > Regards, > Uri +++++++++++++++++++++++++++++++++++++++++++ This Mail Was Scanned By Mail-seCure System at the Tel-Aviv University CC. ----- End forwarded message ----- Correspondence with Brian T. Foley Hi, Note that in both cases, the Ras oncogene is on the other strand (you will need the reverse compliment of the GenBank entry), And no genes are annotated on either of these chromosome 11 sequences. I found Exon 1, so now you need to blast exon 2 against chromosome 11. Let me know if you need help. Brian BLAST of RAS exon 1: >gi|29650323|gb|AC137894.5| Download subject sequence spanning the HSP Homo sapiens chromosome 11, clone RP13-46H24, complete sequence Length5000 Score = 119 bits (60), Expect = 2e-25 Identities = 60/60 (100%), Gaps = 0/60 (0%) Strand=Plus/Minus Query 1 GGCAGGAGACCCTGTAGGAGGACCCCGGGCCGCAGGCCCCTGAGGAGCGATGACGGAATA 60 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 156059 GGCAGGAGACCCTGTAGGAGGACCCCGGGCCGCAGGCCCCTGAGGAGCGATGACGGAATA 156000 http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=Nucleotide&lis t_uids)650323&dopt=GenBank LOCUS AC137894 165000 bp DNA linear PRI 09-APR-2003 DEFINITION Homo sapiens chromosome 11, clone RP13-46H24, complete sequence. ACCESSION AC137894 VERSION AC137894.5 GI:29650323 KEYWORDS HTG. SOURCE Homo sapiens (human) ORGANISM Homo sapiens Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo. REFERENCE 1 (bases 1 to 165000) AUTHORS Birren,B., Nusbaum,C. and Lander,E. TITLE Homo sapiens chromosome 11, clone RP13-46H24 JOURNAL Unpublished >gi|28975150|gb|AC138374.2| Download subject sequence spanning the HSP Homo sapiens chromosome 11, clone CTD-2647G13, complete sequence Length6424 Score = 119 bits (60), Expect = 2e-25 Identities = 60/60 (100%), Gaps = 0/60 (0%) Strand=Plus/Minus Query 1 GGCAGGAGACCCTGTAGGAGGACCCCGGGCCGCAGGCCCCTGAGGAGCGATGACGGAATA 60 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 50302 GGCAGGAGACCCTGTAGGAGGACCCCGGGCCGCAGGCCCCTGAGGAGCGATGACGGAATA 50243 http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=Nucleotide&lis t_uids(975150&dopt=GenBank LOCUS AC138374 196424 bp DNA linear PRI 16-MAR-2003 DEFINITION Homo sapiens chromosome 11, clone CTD-2647G13, complete sequence. ACCESSION AC138374 VERSION AC138374.2 GI:28975150 KEYWORDS HTG. SOURCE Homo sapiens (human) ORGANISM Homo sapiens Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo. REFERENCE 1 (bases 1 to 196424) AUTHORS Birren,B., Nusbaum,C. and Lander,E. TITLE Homo sapiens chromosome 11, clone CTD-2647G13 JOURNAL Unpublished On 6/5/06 9:25 AM, "moranuri@post.tau.ac.il" wrote: > Dear Brian > > Thank you very much for your reply. > > In case you are interested, see correspondence with NCBI help desk below. Yes! That is very interesting. I have often wondered about the lack of annotation on these complete chromosomes. Now I see I am missing a lot. http://www.ncbi.nlm.nih.gov/mapview/maps.cgi?taxid–06&chr&query=uid(142 21652)&QSTR265%5Bgene%5Fid%5D&maps=gene_set&cmd=focus Has a Download/view sequence link where you can get the complete gene sequence: http://www.ncbi.nlm.nih.gov/sutils/evv.cgi?taxid–06&contig=NT_035113.6&fro mG2510&toG4578 So, there is a lot of annotation done, and good links from the annotation back to the complete chromosome entry, but I don't see any links in the other direction, from the Chromosome 11 to it's annotation. Brian > > Regards, > Uri Hi Uri, I poked around the NCBI site some more, and I never was able to really see a gene sequence annotated with the exons. I can see a line drawing of the sequence, but not the actual sequence. If I am missing something, please let me know. I was able to download the gene sequence, unannotated, plus the sequences of the mRNAs (alternative splicing). So I aligned them (see attached file). You can view this with JalView, BioEdit or any other multiple sequence alignment editor. All of the introns begin with GT and end with AG, which is the rule for most eukaryotic genes. Brian Dear Brian > > Type the address below: > > http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve&dopt> full_report&list_uids265 > > into the web site address rectangle. > > Make sure you get the whole address, ending with '265'. > > Then go to the Display rectangle (option) and scroll to 'Gene Table'. > > There you will find Exons (non coding and coding) and Introns of 2 HRAS > isomorfs. click on any Exon/Intron and you get the complete sequence. > > Triple checked by me to match with the chromosome 11 sequence. > > Good luck, > > Uri > OK. I can see this: But is is still not simple to get the entire gene, all six exons, with annotation or ³marked up² in some way to show introns and exons on the sequence. It looks to me, as if I would have to learn a whole new method of interacting with GenBank/NCBI, if I for example wanted to compile an alignment of all intron/exon boundaries in the human genome, to see if they all begin with GT and end with AG. I had always assumed that if there was annotation for a sequence, it would end up in the GenBank entry¹s FEATURE table, and we would then write scripts to parse that. Right now, it is not clear to me, where the annotation of chromosome 11 really is. How do I ask a question such as ³how many genes are on the short arm?²? Or ³What is the average gene length?² Brian moranuri@post.tau.ac.il