Abstract:
The database contains fasta sequences from UniProt and associated metadata for molluscan shell matrix proteins (SMPs). The database only contains SMPs that have been experimentally validated to be present in molluscan shell matrices (based on the publication(s) attached to the UniProtID). Metadata includes information on functional domains present in the sequence, as detected by InterproScan.
With the advent of Next Generation Sequencing technologies, it is computationally resource intensive to run sequence similarity algorithms on all published data. Moreover, it is impractical to sort through hundreds of sequence similarity search results when working with non-model organisms, since pre-established functional annotations of sequences are generally not available. Therefore, this database was created in order to provide a targeted molluscan biomineralization dataset for sequence similarity algorithms (such as BLAST).
Database created as part of doctoral research, funded under Marie Curie Innovative Training Networks (ITN) - Calcium in the Changing Environment (CACHE - Grant agreement 605051).
Keywords:
Biomineralization, Molluscs, SMPs, Shell Matrix Proteins, Shell formation
Yarra, T. (2019). Molluscan Shell Matrix Proteins (Version 1.0) [Data set]. UK Polar Data Centre, British Antarctic Survey, Natural Environment Research Council, UK Research & Innovation. https://doi.org/10.5285/c42314b9-089e-48e7-b08e-8f664f5dc71c
Use Constraints: | Data released under Open Government Licence V3.0: http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/. |
---|
Creation Date: | 2019-01-24 |
---|---|
Dataset Progress: | Complete |
Dataset Language: | English |
ISO Topic Categories: |
|
Parameters: |
|
Personnel: | |
Name | UK PDC |
Role(s) | Metadata Author |
Organisation | British Antarctic Survey |
Name | Dr Tejaswi Yarra |
Role(s) | Investigator |
Organisation | British Antarctic Survey |
Parent Dataset: | N/A |
Reference: | Yarra T (2018) Transcriptional profiling of shell calcification in bivalves, Doctor of Philosophy, University of Edinburgh, Edinburgh, UK. | |
---|---|---|
Quality: | Database is based on uploaded Unioprot IDs entries until JULY 2018. No major publications have since been released that match the criteria to be included in the database. Missing values are only present in the domain columns - A missing value indicates that there were no functional domains detected in the sequence, based on InterproScan results from Interpro databases. | |
Lineage: | Data was gathered by mining metadata from Trembl database for various biomineralization specific keywords. Example keywords include: Molluscs, shell, bivalve, aragonite, calcite, prismatic, foliated, mantle, mantle edge, central mantle, pallial mantle, etc. Only SMPs uploaded to Uniprot were included in this database. The database only contains SMPs that have been experimentally validated to be present in molluscan shell matrices (based on the publication(s) attached to the UniProtID). Uniprot entries for only mantle specific sequences were not included, since all such sequences were determined to be biomineralization related based on sequence similarity results to the proteins identified in shell matrices, and were not experimentally validated. |
Data Collection: | Data was collected from Uniprot (https://www.uniprot.org/). |
---|
Data Storage: | Two files are included in the Molluscan SMP database: 1. shell_matrix_proteins.fasta - fasta file format, with Uniprot IDs that may be used directly for sequence similarity searches. Please be advised, e-values from sequences similarity search algorithms, such as BLAST, are based on the target database size. As this database is small, e-values should not be considered as they will be highly inflated. Instead, length of alignment and percent identity to the target shell matrix protein sequence should be considered. 2. shell_matrix_proteins_metadata.xlsx - information on the sequences in the database: UniprotID, Protein Name, Taxa, SMP database nomenclature (as described in Yarra 2018), and Interproscan domain information. |
---|