Moving your research from Milou to Bianca
This page is directed at life science researchers with sensitive personal data who have been using Milou. Sensitive personal data (e.g. any data that is directly or indirectly associated with a living person) must be processed on Bianca, a secure cluster. Read this page for more information and contact email@example.com if you have questions about what to do with your sensitive personal data.
If your research does not involve human data, please see this page instead.
First: a quick overview of the moving process
We're working to let you continue your work with as little interruptions as possible. The basic process for Milou users moving their research to Bianca is as follows:
- Users read this webpage.
- PI's submit proposals for new projects on Bianca. Deadline: December 31st.
- UPPMAX approves projects and places old projects' data in a queue. Members can start using Bianca, but old data is not immediately available.
- UPPMAX starts moving project's data to Bianca. PI and proxy are contacted but members can continue working.
- UPPMAX freezes the projects and finalises the move. PI and proxy are contacted at start and end of this phase.
- Old Milou projects are closed and data is deleted from Pica.
IMPORTANT #1: After the end of 2017, we can no longer guarantee that we can keep Pica running. We will do our best to keep it available until everything has been moved, but be advised that if the hardware breaks there is a risk of data loss.
IMPORTANT #2: If you have data in a project on Milou/Pica and need continued access to it at UPPMAX, you must communicate with us which project's data should be moved to your new project. Unless you request otherwise, we will move the specified projects' data for you. This information is best provided in your new project proposals. All active projects marked as having data to be moved will remain available until the data is moved. All other projects on Milou will be closed and data removed in early 2018.
UPPMAX is a supercomputing facility hosted by Uppsala University and is a part of the Swedish National Infrastructure for Computing (SNIC). As a SNIC center, we provide computational resources for a wide variety of researchers all over Sweden. Access to our resources is granted through the SNIC project application portal, SUPR.
In order to do any kind of sequence analysis, you need two resources:
- Computations. It takes time for a computer to run programs. Computational resources are measured in core-hours. Allocations are granted in core-hours per month.
- For example, if you have a hundred samples and it takes a single core one week run a pipeline on one sample, then the total core-hours needed is 100 samples * 7 days/week * 24 hours/day * 1 core/hour = 16800 core-hours. If you're planning to do this analysis over the course of 6 months then you'll need a project that provides about 16800/6 = 2800 core-hours/month.
- The sensitive personal data compute cluster is called Bianca.
- If a project exceeds its allocation of CPU time, you can keep working but at a lower priority in the queue. We call this the bonus queue.
- Data storage. It takes disk space to store sequences and related data. Space is usually measured in GB.
- The sequencing facility should be able to tell you roughly how much space the raw sample will take. When you're working with the data, it usually expands by a factor ranging between 50%-300%, and you will need to account for this when you apply for a storage project.
- The storage system attached to Bianca is named Castor.
- If a project exceeds its storage quota, no one can write any more data to its directory.
On Bianca, projects have both compute and storage resources bundled together, like it was on Milou.
Project types: SNIC SENS Small, SNIC SENS Medium.
- SNIC SENS Small: Up to 5000 core-hours/month and 20 TB of data storage.
- SNIC SENS Medium: PI must be permanently employed researcher. Up to 100 kch/m and 150 TB of data storage. Subject to a technical evaluation.
Below are step-by-step instructions for getting a project for sensitive-data life science research.
Submitting a proposal for SNIC SENS project
- Go through your project on Milou and make an inventory of your data.
- Moving data from Pica to Castor takes time. We will all be done quicker if you remove data you don't actively need. What are you going to analyse the coming year and what can be archived?
- Write down how many GB of active data you have right now.
- Estimate how much new raw data you're expecting in the upcoming year, in GB.
- If you're going to work from existing databases, this is relatively straightforward.
- If an NGS platform is producing data for you, they can provide an estimate.
- Estimate the "expansion factor", i.e. how much additional data you'll produce when analyzing the raw data. This number is usually 1.5x-3x, sometimes more.
- Calculate a final estimate of your total storage needs. This is "GB of raw data" times "expansion factor" plus your current usage.
- Figure out how much computation time you will need, in core-hours per month.
- Sum together your average monthly usage in your current Milou project.
- If your usage history doesn't reflect the future you expect, try to make an estimate — a rule of thumb is that you consume on average 1000 core-hours per month for every TB of data (1 TB = 1000 GB) in your project.
- Go to SUPR. Log in.
- Go to the SNIC SENS Small round (< 20 TB total, < 10 kch/month) or the SNIC SENS Medium round (> 20 TB total, > 10 kch/month). Create a new proposal.
- Complete the proposal and submit.
- Project Title should be the topic of your activity.
- Edit Basic Information.
- Abstract should summarise your research plan.
- Resource Usage should describe the data you are going to store. Show how you estimated your projected needs (in step 4).
- In either field (or both), include the names of the current Milou projects (bYYYYXXX) from which you want to move data.
- Add co-investigators (if any).
- If someone other than the PI needs control over the project, assign a co-investigator the role of proxy.
- Add resources to your proposal:
- Add the /proj resource and set the Requested Capacity to your amount of raw data. You may ignore the other fields.
- Add the /proj/nobackup resource and set the Requested Capacity to your estimated total minus your raw data. You may ignore the other fields.
- Add the Bianca resource and set the Requested Capacity to the thousands of core-hours/month you estimated
- Submit the Proposal.
A decision will be made typically within a few days or a week. We will move the data from the projects specified in step 8.2 as soon as possible. Feel free to contact firstname.lastname@example.org with questions.