Developing a data management and sharing plan
We expect the researchers we fund to make their research data available with as few restrictions as possible.
These guidelines are an overview of things to consider as you develop your data management and sharing plan.
When a plan is required
A data management and sharing plan is required when the data outputs from your proposed research are likely to be of value to other researchers and users.
- all proposals where the main goal is to create a database resource
- research that generates significant datasets that could be shared, eg where the data could be used to address research questions other than those it was intended for.
You will need to provide a data sharing plan if you apply to our Biomedical Science, Innovations and Humanities and Social Science funding schemes.
Types of data a plan covers
Examples of applications that require a data management and sharing plan
- large-scale genetic association and sequencing studies of common diseases
- genome-wide or large-scale functional genomic studies in a specific organism
- longitudinal studies of patient and population cohorts
- large-scale neuro-imaging studies.
As noted in the Toronto statement, community resources will typically have the following attributes:
- large-scale (requiring significant resources over time)
- broad utility
- creating reference datasets
- associated with community buy-in.
Read our policy on data management and sharing for more information.
When a plan is not required
A data management and sharing plan is not usually required for studies that generate small-scale and limited data outputs.
These studies are expected to make data available to other researchers on publication and, where possible, to deposit it in a recognised community repository.
You don’t need to supply a data sharing plan if you apply to our Public Engagement schemes, but if we fund your research we expect you to make outputs of wider value available to potential users in a timely and appropriate manner.
What to include in your plan
There is no set template for your plan. It should be clear and concise. Don’t repeat the methodological detail included elsewhere in your grant application.
Your plan should be proportionate to the scale of the datasets generated and their likely level of value to the research community.
Your plan should focus specifically on how data outputs will be managed and shared. Timely publication of results in peer-reviewed journals and presentations at conferences are key forms of dissemination but they’re not equivalent to data sharing and don’t constitute a data management and sharing plan.
Your data sharing plan should address the following:
The data outputs your research will generate
Any data that is shared should be of a sufficiently high quality and in a format that enables it to be used effectively.
We recognise that in some cases it may not be appropriate for researchers to share data outputs. If you don’t intend to share your data, you must justify your reasons .
Data should be shared in accordance with recognised data standards, where these exist, and in a way that maximises opportunities for data linkage and interoperability. BioSharing is one directory of available data standards.
- provide sufficient metadata to enable the dataset to be discovered, interpreted and used by others
- adopt agreed best practice standards for metadata provision where these are in place.
When developing a data management and sharing plan, you should consider and briefly describe:
- the types of data the proposed research will generate
- which data will have value to other research users and could be shared
- the data formats and quality standards that will be applied to enable the data to be shared effectively.
When you intend to share your data
You must state the timescale for sharing datasets of value. This should take account of any recognised standards of good practice in your research field.
We recognise that researchers have the right to a reasonable (but not unlimited) period of exclusive use for the research data they produce.
As a minimum, you should make the data underpinning research papers available to other researchers at the time of publication providing this is consistent with:
- any ethics approvals and consents that cover the data
- any valid restrictions relating to intellectual property.
We encourage researchers to make this data openly available wherever feasible via recognised subject data repositories, or general community repositories (eg Dryad, Zenodo and FigShare). Please read our requirements for publishing Wellcome-funded research papers [PDF 49KB] for more information.
We encourage researchers to increase opportunities for timely and responsible pre-publication sharing of datasets. Where appropriate, you may use publication moratoria to facilitate pre-publication data sharing with other researchers, while protecting your right to first publication.
Any such restrictions on data use should be reasonable, transparent and in line with established best practice in the respective field.
Where your data will be made available
You should deposit data in recognised data repositories for particular data types where they exist, unless there’s a compelling reason not to do so. Find out which repositories may be appropriate.
If you intend to create a tailored database resource or to store data locally, you should ensure that you have the resources and systems in place to curate, secure and share the data in a way that maximises its value and guards against any associated risks.
You need to consider how data held in this way can be effectively linked to and integrated with other datasets to enhance its value to users.
How your data will be accessible to others
Where a managed access process is required - eg where a study involves potentially identifiable data about research participants - the access mechanisms you set up should be proportionate to the risks associated with the data. They must not unduly restrict or delay access.
You must describe any managed access procedures you’re proposing in your data management and sharing plan.
Depending on the study, you may want to establish a graded access procedure. For example, less sensitive data (eg anonymised and aggregate data) is made readily available, whereas applications to access to more sensitive datasets are subject to a more stringent assessment process.
Any managed access procedures should be consistent and transparent.
In cases where a Data Access Committee is required to assess applications to access data, the committee should include individuals with appropriate expertise who are independent of the project.
The Expert Advisory Group on Data Access has set out key principles for developing data access and governance mechanisms, to which applicants should refer.
Citing data outputs
We encourage all researchers to attain digital object identifiers (DOIs) or other form of persistent identifiers for their data outputs to enable their re-use to be cited and tracked.
The DataCite initiative provides a key route through which DOIs are assigned to datasets. Many repositories assign DOIs on deposition.
Where appropriate, you may also publish a ‘data paper’ or other form of publication, so data users can formally cite their use of the resource.
Where a database resource is being developed as part of a funded activity, you should take reasonable steps to ensure that potential users are made aware of its availability. Your plan should outline your approach for enabling discoverability of your data.
Whether limits to data sharing are required
For some research, delays or limits on data sharing may be necessary to safeguard research participants or to ensure you can gain intellectual property protection.
But restrictions should be minimised as far as possible and set out clearly in data management and sharing plans where these are required.
Safeguarding research participants
For research involving human subjects, data must be managed and shared in a way that’s fully consistent with the terms of the consent under which samples and data were provided by the research participants.
For prospective studies, consent procedures should include provision for data sharing in a way that maximises the value of the data for wider research use while providing adequate safeguards for participants. Proposed procedures for data sharing should be set out clearly, and current and potential future risks explained to participants.
When designing studies you must ensure that you protect the confidentiality and security of human subjects through appropriate anonymisation procedures and managed access processes.
Systems should safeguard participants but also be proportionate to the data’s level of sensitivity and associated risk. They should not unduly inhibit responsible data sharing for legitimate research uses.
As a funded researcher, you need to ensure that any intellectual property that comes from your research is suitably protected and managed, in line with our intellectual property and patenting policy.
Delays or restrictions on data sharing which may be appropriate to gain intellectual property protection or to further develop a technology for public benefit should be minimised as far as possible.
How key datasets will be preserved
You need to consider how datasets that have long-term value will be preserved and curated beyond the lifetime of the grant.
If your proposal is to create a bespoke data resource or to store data locally, rather than to use a recognised data repository, your data management plan should state how you expect to preserve and share the dataset when your funding ends.
We’re happy to discuss issues relating to longer-term preservation and sustainability to maximise the long-term value of key research datasets.
You should carefully consider what resources you may need to deliver your plan and outline where dedicated resources are required.
Examples of resources you can request include:
People and skills
- support for one or more dedicated data manager or data scientist (on a full- or part-time basis)
- specific data management or analysis training for research or support staff that is needed to deliver the proposed research.
We don’t usually consider costs for occasional or routine support from institutional data managers or other support staff.
Data storage and computation
- any dedicated hardware or software that is required to deliver your proposed research
- the cost of accessing a supercomputer or other shared facilities.
We would usually expect costs associated with routine data storage to be met by the institution. We will only consider storage costs associated with large or complex datasets which exceed standard institutional allowances.
- the reasonable costs of operating a data access committee or other form of managed access mechanism over the lifetime of the award
- the costs of ingesting secondary data from users
- costs associated with accessing data from others researchers that you need to take forward your proposed research
Data deposition and preservation
- the data ingestion costs for recognised subject repositories
- costs for deposition in unstructured repositories (eg FigShare, Dryad and Zenodo) where no recognised subject repository exists.
If no repository is suitable, we may consider ingestion costs for institutional repositories.
We don’t usually consider estimated costs for data curation that extend beyond the lifetime of the grant. But we’re willing to discuss how we can help support the long-term preservation of high-value data resources on a case-by-case basis.
- Biosharing - a curated and searchable portal of data standards, databases, and policies in the life, environmental, and biomedical sciences.
- Digital Curation Centre - the UK's leading centre of expertise in data curation. The DCC provides a range of resources and training opportunities for the UK higher education sector, and has developed a checklist of issues that should be considered in developing data management plans.
- Medical Research Council guidance and resources - in support of its data preservation and sharing policy, the MRC has developed detailed practical guidance for researchers on data sharing. The MRC data and tissues toolkit provides a visual guide through regulatory requirements for use of personal information and human tissue samples in healthcare research.
- re3data.org - a global registry of research data repositories across different academic disciplines.
- Wellcome Trust Sanger Institute data sharing guidance - the Wellcome Trust Sanger Institute has a policy setting out the principles that underlie data sharing at the Institute, with associated guidance for researchers.