Moving your research from Milou to Rackham

This page is directed at life science researchers who have been using Milou. As you know, Milou and its storage system, Pica, are old and are being decommissioned. As of the end of 2017, there will be no more service on Pica's hardware. New hardware is to be installed on Rackham, which is going to be a SNIC resource for data-intensive research. You are invited to move to Rackham and continue your work on a new system with better CPUs and bigger, faster storage. 

This page contains detailed instructions for applying for compute and storage resources on Rackham. Note that sensitive data (e.g. any data that is directly or indirectly associated with a living person) must be processed on Bianca, a secure cluster. Read this page for more information and contact support@uppmax.uu.se if you have questions about what to do with your sensitive data. Also see the page that explains how to move your research from Milou to Bianca.

First: a quick overview of the moving process

We're working to let you continue your work with as little interruptions as possible. The basic process for Milou users moving their research to Rackham is going to be as follows:

  1. Users read this webpage.
  2. PI's submit proposals for new projects on Rackham/Crex. Deadline: December 31st.
  3. UPPMAX approves projects and places old projects' data in a queue. Members can start using Rackham, but old data is not immediately available.
  4. UPPMAX starts moving project's data to Crex. PI and proxy are contacted but members can continue working.
  5. UPPMAX freezes the projects and finalises the move. PI and proxy are contacted at start and end of this phase.
  6. Old Milou projects are closed and data is deleted from Pica.

IMPORTANT #1: After the end of 2017, we can no longer guarantee that we can keep Pica running. We will do our best to keep it available until everything has been moved, but be advised that if the hardware breaks there is a risk of data loss. 

IMPORTANT #2: If you have data in a project on Milou/Pica and need continued access to it at UPPMAX, you must communicate with us which project's data should be moved to your new project. Unless you request otherwise, we will move the specified projects' data for you. This information is best provided in your new project proposals. All active projects with data to be moved to Rackham will remain available until the data is moved. All other projects on Milou will be closed and data removed in early 2018. 

Click here to go to the actual instructions and skip over the background information. 

Background:

UPPMAX is a supercomputing facility hosted by Uppsala University and is a part of the Swedish National Infrastructure for Computing (SNIC). As a SNIC center, we provide computational resources for a wide variety of researchers all over Sweden. Access to our resources is granted through the SNIC project application portal, SUPR.

In order to do any kind of sequence analysis, you need two resources:

  1. Computations. It takes time for a computer to run programs. Computational resources are measured in core-hours. Allocations are granted in core-hours per month
    • For example, if you have a hundred samples and it takes a single core one week run a pipeline on one sample, then the total core-hours needed is 100 samples * 7 days/week * 24 hours/day * 1 core/hour = 16800 core-hours. If you're planning to do this analysis over the course of 6 months then you'll need a project that provides about 16800/6 = 2800 core-hours/month. 
    • Our current SNIC-funded compute cluster is called Rackham.​
    • If a project exceeds its allocation of CPU time, you can keep working but at a lower priority in the queue. We call this the bonus queue.
  2. Data storage. It takes disk space to store sequences and related data. Space is usually measured in GB.
    • The sequencing facility should be able to tell you roughly how much space the raw sample will take. When you're working with the data, it usually expands by a factor ranging between 50%-300%, and you will need to account for this when you apply for a storage project.
    • The storage system attached to Rackham is named Crex.
    • If a project exceeds its storage quota, no one can write any more data to its directory.

On Rackham, we have divided projects into two types: Compute Projects and Storage Projects. 

Compute Projects: SNIC SMALL, SNIC MEDIUM, SNIC LARGE. These are called "SNAC" projects while in the proposal phase. They also come with 128 GB of storage, which is enough for many.

  • SMALL: Anyone can apply. Default limit of 2000 core-hours/month, can go up to 5000 upon request.
  • MEDIUM: PI must be permanently employed researcher. Up to 100 kch/m. Subject to a technical evaluation.
  • LARGE: For very large projects involving large groups or multiple groups. Proposals are accepted twice per year and undergo a scientific evaluation. 

Storage Projects: There are two storage project types. The SciLifeLab Storage area is backed up and can store about 1 PB of data (this area is currently fully booked). The UPPMAX Storage area is not backed up and is currently 0.5 PB, but will be expanded to 4.5 PB in Spring 2018. 

Getting your projects

Below are step-by-step instructions for getting a project for data-intensive life science research. As described above, you will need a compute project and probably a storage project. It is best to submit both proposals at once. 

FAQ: How many projects should I have?
Answer: It is generally better to have one large project instead of several small. However, if you head several research projects with little overlap in terms of personnel or data, or need to restrict data access to a particular group, then it may be appropriate to split your storage and/or compute projects. 

Submitting a proposal for storage project

  1. We encourage you to collect multiple Milou projects into one storage project on Crex.
  2. Go through all your projects on Milou and make an inventory of your data. 
    • ​Moving data from Pica to Crex takes time. We will all be done quicker if you remove data you don't actively need. What are you going to analyse the coming year and what can be archived? 
    • Write down how many GB of active data you have right now.
  3. Estimate how much new raw data you're going to get in the upcoming year, in GB.
    • If you're going to work from existing databases, this is relatively straightforward.
    • If an NGS platform is producing data for you, they can provide an estimate.​
  4. Estimate the "expansion factor", i.e. how much additional data you'll produce when analyzing the raw data. This number is usually 1.5x-3x, sometimes more.
  5. Calculate a final estimate of your total storage needs. This is "GB of raw data" times "expansion factor" plus your current usage.
  6. Go to SUPR. Create an account if you do not already have one. Log in.
  7. Go to the UPPMAX Storage round. Create a new proposal.
  8. Complete the proposal and submit.
    1. Project Title should be the topic of your activity.
    2. Edit Basic Information.
      • Abstract should summarise your research plan.
      • Resource Usage should describe the data you are going to store. Show how you estimated your projected needs (in step 4).
      • In either field (or both), include the names of the current Milou projects (bYYYYXXX) from which you want to move data.
    3. Add co-investigators (if any).
    4. If someone other than the PI needs control over the project, assign a co-investigator the role of proxy.
    5. Add the Crex resource to the proposal and set the Requested Capacity to your total storage needs. You may ignore the other fields.
    6. Submit the Proposal. 

Submitting a proposal for a compute project

  1. Figure out how much computation time you will need, in core-hours per month. 
    • Sum together your average monthly usage in your current Milou projects.
    • If your usage history doesn't reflect the future you expect, try to make an estimate — a rule of thumb is that you consume on average 1000 core-hours per month for every TB of data (1 TB = 1000 GB) in your project.
  2. Go to SUPR. Log in.
  3. Select a round.
    • If you need less than 10,000 core-hours per month, choose SNAC SMALL UPPMAX. This is appropriate for most projects. 
    • If you need between 10,000 - 100,000 core-hours per month, choose SNAC MEDIUM
  4. Create a new proposal. Complete and submit.
    1. Project Title should be the topic of your activity.
    2. Edit Basic Information.
      • Abstract should summarise your research plan.
      • Resource Usage should describe the computations (analysis) you are going to do, which softwares you will use, etc. Show how you estimated your projected needs.
    3. Add co-investigators (if any).
    4. If someone other than the PI needs control over the project, assign a co-investigator the role of proxy.
    5. (MEDIUM:) Add the Rackham resource to the proposal. Set the Requested Capacity to your compute needs. You may ignore the other fields. 128 GB of storage on Crex will be allocated for you automatically.
    6. Submit the Proposal. 

After having submitted BOTH proposals, a decision will be made typically within a few days or a week. Feel free to contact support@uppmax.uu.se with questions.