Database Statistics
R14, November 2009

The current version of the database is R14, released in November 2009.

The R14 release contains 26597 somatic mutations, 535 germline mutations, functional data on 2314 mutant proteins and TP53 gene status of 1993 cell-lines.

Here are some statistics on the database contents:

Somatic mutations (1)

Publication trends

Tumor site distribution of mutations

TP53 mutation prevalence

Type of mutations

Codon distribution of mutations

Prognostic value of somatic mutations


Cell-lines (included in dataset of somatic mutations)

Tumor site distribution

Type of mutations

Codon distribution of mutations


Germline mutations (2)

Publication trends

Type of mutations

Codon distribution

Tumors associated with TP53 germline mutations

Prevalence of TP53 germline mutations in selected cohorts



Functional properties

Dataset contents




(1) Statistics on previous releases of the somatic dataset:

 Database version  Release date  Mutations count  References count  Last Ref_ID  Added references  Added mutations  Deleted mutations*  PubMed search**
R3 - 10411 1048 1075 - - - -
R4 July
2000
14050 1320 1369 294 3798 159 Jan 1998 to Apr 2000
R5 June
2001
15121 1412 1480 111 1459 388 May to Dec 2000
R6 Jan
2002
16285 1485 1571 91 1549 385 Jan to June 2001
R7 Sept
2002
17689 1599 1715 144 1477 73 July 2001 to June 2002
R8 June
2003
18585 1680 1810 95 924 28 July 2002 to Feb 2003
R9 July
2004
19809 1769 1921 111 1196 40 March to Dec 2003
R10 July
2005
21587 1876 2055 107 1788 10 Jan to Dec 2004
R11 Nov
2006
23544 1995 2221 120 2014 57 Jan to Dec 2005
R12 Nov
2007
24810 2081 2349 86 1331 65 Jan to Dec 2006
R13*** Nov
2008
24806 2081 2349 - - 4 -
R14 Nov
2009
26597 2179 2483 98 1814 - Jan to Dec 2007

* Data may be deleted if (1) they correspond to duplicate entries or (2) errors. Publication of the same set of samples in different papers by the same authors is a serious problem that has led to duplicates entries in the database in the past. We now perform systematic searches of the database under the author’s name to identify earlier entries that may correspond to the same dataset. We have also extensively reviewed the entire dataset in order to find and eliminate these duplicates. However, despite these efforts, some duplicates may remain in the database and their identification is an ongoing task.

** Papers edited in PubMed at the indicated dates were searched with selected keywords and reviewed to extract relevant data.

*** The dataset of somatic mutations has not been updated.



(2) Statistics on previous releases of the germline dataset:

 Database version  Release date  Mutations count  References count  Last Ref_ID  PubMed search**
R4 July
2000
144 71 74 Jan 1998 to Apr 2000
R5 June
2001
195 84 91 May to Dec 2000
R6 Jan
2002
213 90 99 Jan to June 2001
R7 Sept
2002
225 97 106 July 2001 to June 2002
R8 June
2003
225 98 108 -
R9 July
2004
264 112 125 July 2002 to Dec 2003
R10 July
2005
283 123 137 Jan to Dec 2004
R11 Nov
2006
376 142 156 Jan to Dec 2005
R12 Nov
2007
399 159 173 Jan 2006 to June 2007
R13 Nov
2008
423 164 181 Jul 2007 to July 2008
R14 Nov
2009
535 196 211 Aug 2008 to Aug 2009

** Papers edited in PubMed at the indicated dates were searched with selected keywords and reviewed to extract relevant data.