Plant resistance Genes
These genes have been mostly isolated from the Solanaceae family (33 genes) (7,13), although others have been studied in other plants, such as Arabidopsis thaliana (21 R-genes) (14), Oryza sativa (rice, four R-genes) (15,16), Phaseulus vulgaris (bean, one R-genes) (17), Glicine max (soybean, two R-genes) (18), Zea mais (mais, two R-genes) (19) and Hordeum vulgare (barley, three R-genes) (8,20,21), Cucumis melo (melon, two R-genes) (22), Lactuca sativa (lettuce, one R-genes) (23), Beta vulgaris (beet, one R-genes) (24) Linum usatissimum (linum, three R-genes) (25–27). Data related to these genes, such as nucleotide and protein sequences, genomic location, known genetic markers and relevant information about resistance to specific diseases and pathogens, were gathered from the literature and several publicly available resources such as NCBI nucleotide, NCBI taxonomy (28) and SOL network databases (29), and manually inserted into the PRG database through a web-based system. This dataset was used both to retrieve all putative R-gene sequences from NCBI database and to build up an R-gene prediction system.
In this way, a set of 6308 annotated R-genes from 161 plants was obtained automatically using an NCBI query (see Methods section) (Figure 1B). Information such as nucleotide and protein sequences, genomic locations and structural information were automatically retrieved and imported into the PRG database. Since these genes could have been annotated in NCBI as R-genes from other predictive tools, we will refer to them from here on as ‘putative R-Genes collected from NCBI’.
Furthermore, we were able to computationally predict novel ‘putative’ R-genes from the UniGene dataset, using a home-made developed bioinformatic pipeline, Disease Resistance Analysis and Gene Orthology, (DRAGO, see ‘Methods’ section) (Figure 1C). A total of 604 981 non-redundant Unigene transcript sequences expressed in 33 different plants were translated into 488 250 potential protein sequences. Finally, a total of 10 463 sequences were identified as ‘putative R-Genes predicted from NCBI UniGene’ based on their sequence similarity and protein domain composition and imported into the PRG database.
These three distinct approaches yielded a total of 16 844 protein sequences annotated in our database as potential plant resistance genes. Of 194 plant species analyzed, 172 contained sequences related to resistance genes. A complete list of retrieved plants is available on the PRG web site under the ‘plant search’ section. In this section all putative resistance genes are divided by plant species to allow specific searches to be conducted.