Quick Start

Registration of datasets to MSD

In this tutorial we go through the steps you need to take to upload your dataset (i.e: 16S Sequences) to MSD. You need to follow the steps in order.

Creating Account

As MSD is a platform dedicated to needs of CRC1371 consortium. You need to be a member of CRC1371 consortium in order to use it.

In order to create you account at MSD follow the steps below:

Go to MSD
Click on Register.

Fill the Sign up form and click Create an account

Activate your account by clicking on the link sent to your email address.

5. You need to wait until MSD administrator approve your account. As your account is approved, you can start using MSD. You will be notified by an email after account approval.

Now you have your account activated, and you can log in.

Note

At MSD all kinds of Registration happen under Submit tab. Under this tab you can register Protocols, Projects, Organisms, Samples, Datasets (16S, 18S, ITS, Metabolomics, Transcriptomics, Proteomics, Metagenomics).

Defining Project

MSD database schema tries to comply with every usual research project which starts with defining a project. Similarly, the first step at MSD is also creation of a project to which all your samples would be assigned. In order to that follow the steps below.

Under Submit tab:

Click on Project
Give your project a Name
If you have your project already registered at SRA [1] and you have an accession assigned, you can give it as Accession field to your project.
You can also give your project an Acronym for ease of use. Leaving it empty means no acronym for you project.
Creator is the owner of project. You have to select your username.
You should give your project Description. The more you are descriptive, the more your project would get appeared in your searches for datasets within various projects you have.
Availability Checkbox will make other users of MSD able to see the description of your project in their Dashboard View and ask for permission to have access to your project datasets.
Click on Create Project to finalize the project creation.

Project Register Form — An example of project creation form. After new project is created you will be redirected to 16S Datasets View.

Note

If you have your metadata stored at DIS (Data Integration System) and you have patient ID given by DIS. Then you can skip these steps and follow your step through Submitting Datasets as a DIS User.

Defining Protocols

Different projects might have different protocols for Sampling, Sequencing, Analysis, Preparation, etc. Each sample and dataset which is to be submitted to MSD should have a protocol assigned. Before registration of organisms, samples, and datasets to MSD you need to have the protocols you used for sample preparation, sampling as well as protocols used for Sequencing and Analysis.

There are already some common protocols available at MSD which you can view and download at Protocols View tab.

The definition of different proton helps you with protocol definition.

Preparation Protocol: This protocol refers to steps you have taken until your organism is prepared for sampling.
Sampling Protocol: This protocol refers to steps taken to get samples for measurement (i.e: sequencing).
Sequencing Protocol: This protocol refers to steps taken after sampling utility a library is prepared for sequencing. If you have had your samples sequenced by CFM [2] then you don’t need to define any Sequencing protocols, and you can use the one provided by MSD: Sequencing_protocol_Default
Analysis Protocol: This protocol refers to steps taken for processing the sample you have uploaded to MSD. As all your 16S amplicon sequences get analyzed at MSD, you don’t need to define any Analysis protocols and can use the one provided by MSD: Analysis_protocol_Default

Protocol Register Form — An example of protocol creation form.

If your protocol is an extension to other protocol, you can make it related to other protocol by clicking on Extension and choosing one of already submitted protocols at MSD.

Protocol Extension Register Form — An example of extending a protocol.

Defining Organisms

So far we have submitted prerequisites to registration of Organisms, Samples and Datasets. Registration of Organism and then Samples always comes before datasets. It is recommended that by start of your project define your organisms at MSD and by each sampling attempt to define your samples at MSD. Having done that will help not only by tracking and documenting your project but also by registration of datasets.

In order to register your organisms you need to do three major steps. Firstly you need to Create Template and then Register Template. For registration of Samples and Datasets these two major steps are followed as well.

Note

In order to submit your samples and make the relation to their corresponding organisms, you need to go to Submit tab -> Organisms subtab.

I. Create Template

You will see the various metadata tabs including General Required Metadata, Human Required Metadata, and Mouse Required Metadata.

General Required Metadata: Under this tab there are general metadata required for each organism getting registered at MSD
Human Required Metadata: Metadata specific to human organisms.
Mouse Required Metadata: Metadata specific to mouse organisms.

Note

Currently, all metadata attributes are preselected. We continue with preselected attributes, and we later provide only related metadata.

By clicking on **Create Organism Template** you download an *Excel template*.

II. Fill in the Template

In order to introduce your organisms to MSD, you need to fill the rows with downloaded excel with selected metadata as columns.

Note

Please be careful to open the Excel file with English Excel and NOT the German excel.

You can find description of each column as below:

General Required Metadata

External_ID: If the organism you’re submitting has been registered on any other platform and has an ID, then you can fill this cell with that ID. This field is not required.
MSD_ID: If a value for this cell is provided MSD tries to find that organism with given MSD_ID and update its Metadata with current given metadata in the Excel. You can find information about your registered organisms at Organisms View.
Name: The name you want to give to your organism.
Description: Add some extra information to your organism. It will help you later to filter your organisms.
Project_ID: The MSD ID of the project this organism belongs to. You can find information about your projects at Projects View.
Species: This cell should contain the scientific name of type of organism you are defining. You have three options: Mus Musculus, Sus Scrofa, Homo sapiens. Note: Currently pig organism are not supported.
Sex: The gender of your organism : Male or Female

Human Required Metadata

According to type of organism you are submitting you need to related metadata. If you are defining human organisms: then fill following metadata:

Place of Birth: Choose related regions from the drop-down menu.
Medical History: If there is specific information about the medical history of your organism then add it here. No more than 100 characters.
IBD: If your organism has been diagnosed with IBD. Yes or No
Cancer: If your organism has been diagnosed with cancer. Yes or No

Mouse Required Metadata

If you are submitting mouse organisms then fill the following only.

General Genotype: Choose genotype of your organism from the drop-down list.
Genetic Modification: Choose type of genetic modification from the drop-down list.

Organism Submit - Mouse Metadata — An example of filled row for these metadata.

The figure below shows an example of defining 3 mice and 2 human organisms to my project defined in Defining Project. After finding the Project_ID of the project of mine I want to define the organisms inside from Protocols View, I will fill the 5 rows for 5 organisms but as they belong to different species I fill the rows differently as below.

Note

Pay attention that for the sake of better representation relative columns are not shown.

Organism Submit - Mouse Metadata - Example — Columns A to G contain metadata and have values for any type of organism you are uploading. The first three rows belong to *mice* organisms, and they have values for *mice-specific metadata* so that they are only filled for *mice* organisms and **left blank** for *human* organisms. Columns H to K are not shown in this figure.

Organism Submit - Human Metadata - Example — Columns A to G contain metadata and have values for any type of organism you are uploading. The last two rows belong to *human* organisms, and they have values for *human-specific metadata* so that they are only filled for *human* organisms and **left blank** for *mice* organisms. Columns L and M are not shown in this figure.

III. Uploading Template

As we have our organism template filled with related values, it’s time to upload the template to MSD. In order to do so we go to Submit tab -> Organisms -> Register Template. By clicking on Browse we

choose filled organism_template.xlsx and then click on Upload Organisms.

After clicking on Upload Organisms you’ll be shown a message and redirected to Dataset Register. By clicking on Organisms tab you can see your newly uploaded oranisms.

Organism Table — For explanation of the table see Organisms View.

Defining Samples

So far we have some organisms like below registered at MSD. It’s time now to define samples which we have taken from these organisms. The process of sample registration follows the general registration approach in MSD.

In order to register your organisms you need to do three major steps. Firstly you need to Create Template, Fill the Template, and then Register Template. The same as registration of organisms which we did in previous section.

Note

In order to submit your samples and make the relation to their corresponding organisms, you need to go to Submit tab in top bar -> Samples subtab.

I. Create Template

Under Samples subtab you will see various other tabs named Sample Required Metadata, Optional Human Metadata, and Optional Mouse/Pig Metadata. Below you can read description of each:

Sample Required Metadata: Under this tab you can see all metadata required for each sample to be registered at MSD. They are already preselected as they are required.
Optional Human Metadata: Under this tab you can see all metadata relevant to each sample derived from human. You can select of which metadata you want to store information in the databse.
Optional Mouse/Pig Metadata: Under this tab you can see all metadata relevant to each sample derived from mouse or pig. You can select of which metadata you want to store information in the databse.

Note

For metadata whose value you don’t provide a default value would be assigned in the database.

Note

Also, please be notified that for all of these optional metadata there would be choices to choose in the Excel template.

After you have selected your desired metadata, it’s time to create an Excel template with desired columns representing your chosen metadata. To do so click on Create Sample Template button.

As an example in the figure below, I am creating a template to submit both human and mouse samples to organisms I defined in Defining Organisms part.

Create Organism Template - Required Metadata — All *required* sample metadata are already selected.

Create Organism Template - Human Metadata — You can select any number of human-related metadata and create an Excel containing these metadata as columns to fill.

Create Organism Template - Mouse Metadata — You can select any number of mouse-related metadata and create an Excel containing these metadata as columns to fill.

Now that I have selected metadta I want to provide for pool of my samples being to upload, I click on Create Sample Template button to download the excel tamplate with desired metadata to fill.

II. Fill in the Template

Now you have an excel template with columns being the metadata you chose in previous step. For each row being each of your samples you need to provide the corresponding values.

Note

Please be careful to open the excel file with English Excel and NOT the German excel.

You can find the difinition of each metadata below:

Required Sample Metadata

Note

All sample metadata refers to time of sampling. For example, if the organism (human) used to smoke regularly when the sampling took place, then the value of “smoking” column for the samples taken place then should be YES.

External_ID: The external ID to your sample if it is registered in other platforms such as SRA [1]. If it’s not registered to any platform then leave it blak.
MSD_ID: If you want to modify metadata of your samples already registered, you can put their MSD ID here and fill the column values. Doing so will tell MSD to update the metdata of your sample with provided MSD ID with new ones you are providing in this excel template.
Name: The name you want to give your sample. It should be unique throughout all samples registered in the project to which origin organism belongs.
Description: A description of for each sample. It will make it easier to search through your samples using Advanced Search feature.
ORID: ORID stands for “Ori**gin **ID”. This ID tells MSD from which part your sample is originated. In order to get this ID you need to use the search box in Origin View. When you found the ORID of you sample you copy that ID to this cell. For example, 1.3.7 is the ID of saliva (material) taken from Salivary Gland (localization) in mouth (organ). You can choose this ORID from the drop-down menu.
Organism_ID: The MSD ID of organism to which the sample belongs. You can view your organisms of your project at Organism View. You can choose your organism MSD ID from the drop-down menu.
Weight: Weight of you sample.
Weight_Unit: The unit of Weight of your sample.
Age: Age of the organism at time of sampling.
Age_Unit: The unit of Age.
Preservation: Type of sample preservation you have used for preserving your taken samples. Choose from drop-down menu.
Sampling_Protocol_ID: The sampling protocl that you have used for sampling and registered in Defining Protocols step.
Collection_Date: Date of sampling. The format YYYY-MM-DD is preferred.
Collection_Time: Time of sampling. The format HH:MM is preferred.
Collection_Country: The country where the sampling has taken place. It should be a two-letter standard code of the country according to ISO_3166.
Collection_Location_(GPS): The Sample Collection Location’s coordinates. Please watch this tutorial video about how to find the latitude and longitude on google maps: video. The format is like: Latitude, Longitude. For example: 48.39814451278265, 11.737600673415221

Human Sample Metadata

cancer_related_symptoms: “Yes”, “No”, or not assigned (“NA”). Choose from the drop-down menu.
arterial_hypertension: “Yes”, “No”, or not assigned (“NA”). Choose from the drop-down menu.
hypercholesterolemia: “Yes”, “No”, or not assigned (“NA”). Choose from the drop-down menu.
smoking: “Yes”, “No”, or not assigned (“NA”). Choose from the drop-down menu.
alcohol_dependance: “Yes”, “No”, or not assigned (“NA”). Choose from the drop-down menu.
physical_activity: “Yes”, “No”, or not assigned (“NA”). Choose from the drop-down menu.
regular_medication: “Yes” or “No”. Choose from the drop-down menu.
regular_medication_categories: If the value of regular_medication columns is “Yes” then you choose one option here. Otherwise, leave it blank.
antibiotics: “Yes” or “No”. Choose from the drop-down menu.
probiotics: “Yes” or “No”. Choose from the drop-down menu.
supplements: “Yes” or “No”. Choose from the drop-down menu.
bristol_score: The bristol score for stool samples. If the sample is not stool, leave it blank.
tissue_available: “Yes” or “No”. Choose from the drop-down menu. If there is still some tissue, from which samples are taken, stored.
tissue_type: Which method was used for taking tissue. “Biopsy” or “Resection”
human_diet_category: To which diet category you can assign the organism’s (human) diet at time of sampling.
coffee: “Yes”, “No”, or not assigned (“NA”). If the organism (human) was taking coffee at time of sampling.

Mouse/Pig Sample Metadata

feed_provider: Type of feed provider. “Sniff”, “Altromin” and “Other”
mouse_diet_category: Type of diet the which your organism (mouse) was taking at time of sampling.
animal_facility: To which animal facility within CRC, your organism is coming. Choose from the drop-down menu.
housing_hygiene_level: Choose from the drop-down menu.
caging: Type of caging. Choose from the drop-down menu.
basal_microbiota: Choose from the drop-down menu.
biotic_challenge: Choose from the drop-down menu.
abiotic_challenge: Choose from the drop-down menu.

Custom Sample Metadata

After all your selected metadata you can place any number of columns with your desired name as custom metadata and provide related value to them for each of your samples. These custom metadata are stored and your can see and export them for downstream analysis.

Custom_1: You can rename these default columns to hold metadata name you desire.
Custom_1: You can rename these default columns to hold metadata name you desire.

You can also add any number of columns after all MSD standard metadata and provide values for them.

You see and example of filled sample template excel below:

Filled Template - Until ORID — Eight new samples with no External_ID are going to get uploaded. Values until ORID columns. the first four rows are mouse samples and the rest human samples.

Filled Template - From Organism ID to Preservation Type — The same samples as prevoius figure. Filled from *Organims_ID* to *Preservation*. the first four rows are mouse samples and the rest human samples.

Filled Template - from Sampling_Protocol_ID to *Collection_Location_(GPS)* — The same samples as prevoius figure. Filled from *Sampling_Protocol_ID* to *Collection_Location_(GPS)*. the first four rows are mouse samples and the rest human samples.

Filled Template - from cancer_related_symptoms to alcohol_dependance — The same samples as prevoius figure. Filled from *cancer_related_symptoms* to *alcohol_dependance*. the first four rows are mouse samples and the rest human samples. Since the first four samples are mouse samples we leave their cells empty for human metadata.

Filled Template - from physical_activity to *probiotics* — The same samples as prevoius figure. Filled from *physical_activity* to *probiotics*. the first four rows are mouse samples and the rest human samples. Since the first four samples are mouse samples we leave their cells empty for human metadata.

Filled Template - from supplements to coffee — The same samples as prevoius figure. Filled from *supplements* to *coffee*. the first four rows are mouse samples and the rest human samples. Since the first four samples are mouse samples we leave their cells empty for human metadata.

Filled Template - from feed_provider to caging — The same samples as prevoius figure. Filled from *feed_provider* to *caging*. the first four rows are mouse samples and the rest human samples. Since the last four samples are human samples we leave their cells empty for mouse metadata.

Filled Template - from basal_microbiota to Custom_2 — The same samples as prevoius figure. Filled from *basal_microbiota* to *Custom_2*. the first four rows are mouse samples and the rest human samples. Since the last four samples are human samples we leave their cells empty for mouse metadata.

III. Uploading Template

Now that we have filled the excel template it’s time to upload it and register our samples to MSD. In order to upload your filled excel you need to go to Submit tab -> Samples sub-tab -> Register Template. There you can Browse your computer for your filled excel template and by clicking Upload Samples button you introduce your samples to MSD.

When you are done with sample uploading you will be shown a message like “Your samples have been successfully uploaded!” and you will be redirected to 16S Datasets View

16S Dataset Registration

So far organisms and samples taken from them are registered at MSD, and it’s time to register datasets produced from samples. As explained in MSD Database Structure, from each sample taken several datasets could be produced. For example, you can take a sample by biopsy and produce 16S rRNA gene amplicon dataset by sending some of it to for sequencing and from the same sample producing metabolomics data.

In this part we explain the last step of dataset registration for 16S rRNA amplicon sequences.

The steps we take, as it was for sample and organism registration, are Creating a template, Filling the template, Preparation of fastq files,and Uploadin the template. For dataset uploading, we also upload the raw files needed to get processed with the template.

I. Create Template

16S rRNA datasets Excel template can be created by going to Submit tab -> Datasets subtab -> 16S -> Create Template. By clicking on Create Dataset Template and you will have an Excel template downloaded.

II. Fill in the Template

Now that we have the Excel template download we need to fill each of rows in the Excel template for each of datasets produced from our sample.

Note

The first two columns of dataset Excel template are important (DIS_Sampling_ID and Sample_ID). You can for each dataset you are uploading you can provide both or one of them. - Providing only DIS_Sampling_ID: implies retrieval of metatada automatically from DIS and registration of related organisms and samples. Therefore, there is no need to follow Defining Organisms and Defining Samples steps. - Providing only Sample_ID: implies that the dataset you are uploading belongs to the sample with provided MSD Sample ID (P1O34S3). - Providing both: implies updating a sample already registered at MSD (having MSD Sample ID) with metadata derived automatically from DIS.

Below you find the description of each column and their valid values:

DIS_Sampling_ID: If you dataset belong to human patient whose metadata is already stored at Data Integration System (DIS), you can provide DIS Sampling ID in this column. If you don’t have it you can leave it empty. DIS ID looks like MTXXX1234. It starts with MT followed by three other letters and 4 digits at the end.
Sample_ID: If this dataset belongs to a sample already registered at MSD. Either this metadata or DIS_Sampling_ID or both of them is necessary.
Name: Name you want to give to your dataset. It helps you finding your dataset in 16S Datasets View.
Target_Gene: From drop-down menu choose 16S.
Accession: If your dataset is already registered at some public repositories such as SRA [1] you can provide it here. Otherwise, leave it blank.
Sequencer: Choose the type of sequencing machine from the drop-down menu.
Preparation_Protocol: In this drop-down menu you can see a list of preparation protocols you submitted at Defining Protocols.
Sequencing_Protocol: In this drop-down menu you can see a list of sequencing protocols you submitted at Defining Protocols.
Paired_Sequencing: If your sequence has been done paired-end, then choose Yes. It means you have two files of forward and reverse reads which you need to provide later.
Forward_Filename: If your sequencing layout is paired-end then your forward sequence read file’s name goes here. Provide the exact and full name of your file. If you have not had your samples sequenced paired-end, then you will have one file whose full name you need to provide here.
Backward_Filename: If your sequencing layout has been paried-end then no need to provide a file name here. Otherwise, provide the full name of your reverse reads file.
Target_Region: Which region of Target_Gene you have targeted for creating amplicon. For example, for 16S rRNA gene any choice of nine variable regions (V1 to V9) could go here. You can ask for this information from the sequencing facility.
DNA_Isolation: Choose the DNA Isolation methods used for your samples before sequencing from the drop-down menu.
Forward_Primer: Choose the forward primer used for sequencing library preparation. You can ask for this information from the sequencing facility.
Forward_Primer_Seq: This will be the sequence of your chosen forward primer. It gets selected according to Forward_Primer value automatically.
Reverse_Primer: Choose the reverse primer used for sequencing library preparation. You can ask for this information from the sequencing facility.
Reverse_Primer_Seq: This will be the sequence of your chosen reverse primer. It gets selected according to Forward_Primer value automatically.
Run_Length: Run length of your sequencing run. Choose from drop-down menu. You can ask for this information from the sequencing facility.
Amplification_Steps: Valid values here are 1 or 2.
First_Step: The number of PCR cycles for the first step (or only step if you have had only 1 step of amplification) of PCR amplification.
Second_Step: The number of PCR cycles for the second step of PCR amplification, if you have had two steps of amplification.
Reads_Number: Total number of reads for your dataset. If you don’t know it you can leave it blank.
Spike_Amount(ng): If your dataset has been spiked, put the amount of spike in your dataset as Nanogram here. Otherwise put 0 value.
Sample_Weight(g): Weight of sample taken for library preparation in grams. You can ask for this information from the sequencing facility. If you don’t know it just put a positive value digit there. For example: 1
Sample_Type: Type of sample to sent for sequencing.
Custom_1: After Sample_Type column you can add your desired columns and corresponding values to each of your dataset and have them stored at MSD.
Custom_2: After Sample_Type column you can add your desired columns and corresponding values to each of your dataset and have them stored at MSD.

III. Preparation of fastq files.

Now that you have your template ready. It’s time to prepare zip file of your fastq files for uploading. Your zip file should contain your fastq files (all you have put their file names in the Excel template, Forward_Filename and Backward_Filename). The zip file should NOT contain any folders inside. By opening the zip file you should only see the fastq (or fastq.gz) files.

An example of filled dataset excel you can find as below:

Filled Template - from DIS ID to Accession — The first three datasets have MSD Sample ID (i.e: P1O273S155) and the last three does not have MSD Sample ID which means that they are coming from human organisms whose metadata is already stored at DIS. The last three datasets would be created after their data is retrieved from DIS and related MSD organisms and samples will be created. The first three datasets are going to be assigned to already registered samples.

Filled Template - from Sequencer to Paired Sequencing — All datasets have been sequenced with Illumina MiSeq machine, same preparation protocol, same sequencing protocol and all of them are paired-end.

Filled Template - from Forward File name to DNA Isolation — Forward file name and Reverse file name provided. Note that the **full** name of files is given. The sequencing has targeted V3-V4 region.

Filled Template - from forward primer to run length — As all datasets have been sequenced with the same protocol and same facility, the forward and reverse primer used are the same. Note that there is no need to choose primers sequences as they would be automatically chosen according to your chosen primers names.

Filled Template - from amplification step to spike amount — Two amplification steps for library preparation (PCR) with 15 and 10 cycles for the two steps, respectively. Reads number are not known and the first three datasets were spiked and the rest not, so that the amount of 6 Nanograms has been put for the first three and amount of 0 Nanogram for non-spiked ones.

Filled Template - from Sample Weight to Custom 2 — Sample type and weight taken for sequencing for all datasets is provided (ask for this information from the sequencing facility). After **Sample_Type** column you can add your own columns with desired names and values for each dataset to have them stored at MSD. In this example I did not provide **additional metadata**, but you can provide yours after **Sample_Type** column.

IV. Uploading Template

It’s time to upload the Excel template and your zip file containing all your fastq (or fastq.gz) files.

Metabolomics Dataset Registration

Defining a Metabolomic Run

Note

This section applies to metabolomics facility operators or members. Here the analysis run preformed at metabolomics facility gets registered and given an ID. If you are a scientist using MSD and need to upload metabolomics dataset to your samples then please read Registration of Metabolites (Targeted Metabolomics).

In order to submit your metabolomics runs to MSD you need to follow the steps as below:

1. Make sure that your account has been added to metabolomics group operators. If you are not added to the group please contact MSD administrator to add you to the group.

Log in with your user account and navigate to Submit/Datasets/Metabolomics as shown below:

View of Metabolomics Run Submission Tab — This figure shows the navigation path to the metabolomics run submission tab.

3. Click on Metabolomics Run tab. You will see a table with all the metabolomics runs submitted to MSD. You can see the status of each run in the table including RUN_ID, Name, Wiff, Scan, Submitter, Upload Date, and Action.

By clicking on delete icon beside the Wiff file you can delete the uploaded file. Also, you can use download, delete, and edit to manage the run entry.

For adding new run fill the form as shown below and click on Save Metabolomics Run button.

Metabolomics Run Submission Form — This figure shows the form for submitting a new metabolomics run. Adding **Wiff** and **Scan** output will help you manage your run and store your files at MSD.

Registration of Metabolites (Targeted Metabolomics)

As explained here, you can assign different types of dataset to one specific sample (See Sample Submission). In order to add measured metabolites to your samples you need to follow the steps as below:

1. Submission of your samples and assign them to related organisms in one of your projects. After having registered your samples you will be given a sample ID for each registered sample (See view samples).

Sending your samples material for metabolites measurement.

Note

It’s recommended to submit your samples before sending their material to metabolomics lab and use MSD Sample IDs (e.g: P10O252S134) as identifier of your samples in metabolomics runs.

3. Receive metabolites Excel files of your metabolites’ measurement requests from metabolomics runs. Below you can find two example files you should expect from metabolomics facility.

Example 1

Example 2

Metabolites Excel files should have columns described as below and an extra row below column headers containing units of measurements for each metabolite.

Excel Template Columns:

Sample_ID: This column holds the MSD ID of your samples to be used in order to assign upcoming

metabolites in the file to proper samples of yours at MSD. MSD knows your samples by this IDs so that if you provide wrong MSD ID your metabolites in this Excel won’t be assigned to your registered sample at MSD. NOTE the second row of this column is empty.Normalization: The normalization method which the metabolomics facility used for normalization. NOTE the second row of this column is empty.

Metabolites Columns: From column C you should have metabolites names as first row (i.e: header)

and the unit of values in the next row. There should be values of the corresponding metabolites in the rows related to each of your samples. In case of not having values for a specific metabolite in a sample value N/A should be placed. (See figure of second metabolites example Excel)

An example of metabolites Excel you will receive from metabolomic facility — This figure shows an example of a typical metabolites Excel you will receive from metabolomics center.

An example of metabolites Excel with added custom metabolites — Another example of metabolites Excel with custom added metabolites in the last column.

4. Compress all metabolites Excels you want to upload into a zip file. You can download an example here: Metabolites Zip

Note

Make sure that you have used your samples MSD ID in the first column of your metabolites Excel. MSD will use those IDs to relate your metabolites to proper samples of your project.

5. Download metabolomics data submission template. You can follow the steps as shown in the picture to download it.

How to download metabolomics data to MSD

6. Fill out the metabolomics data submission template. The template has three main columns explained as below:

Excel Template Columns: - Dataset_Name: This name will be prepended to the name of samples you have given in the metabolites Excel given as File_Name. Imagine you have given the dataset the name “Measurement-1-Project-1” (as shown in the figure above) and content of “MetabolitesExample1.xlsx” are as shown in Metabolites Excel 1. When you submit your dataset MSD will take name of the first sample (the sample with ID of P10O2S3) and prepend it with the value given as Dataset_Name. If the name of sample (P10O2S3) is TM7258_B3 then the name of corresponding metabolomics dataset for this sample will be Measurement-1-Project-1_TM7258_B3. It means that you will see a row in metabolomics dataset table with a name as Measurement-1-Project-1_TM7258_B3 which includes all the metabolites assigned to sample with ID of P10O2S3 in Metabolites Excel 1 .

RUN_ID: This cell should be a drop-down choice list containing Run IDs submitted by metabolomics facility

to MSD (refer to :ref:` Submission of Run IDs <Submission of Metabolomics Runs>`). You should ask metabolomics facility which did your measurements for this ID then choose the correct ID for your dataset. By this ID we relate your dataset to proper raw run files submitted by metabolomics facility.

Note

If you are using Excel program with default language other than English version, the drop-down might not work due to translation of formulas. In this case, you can refer to Sheet 2 of the Excel and find valid Run IDs under a column named Raw Sources ID.

File_Name: These columns establish a relation between metabolites Excel files containing metabolites and sample IDs

to your Dataset_Name and RUN_ID. MSD will look in the zip file containing your metabolites Excel files and tries to find the given file name under this column there. Then it parses the metabolites in the metabolites Excel files and assign them to proper metabolomics run (i.e: RUN_ID) and metabolomic dataset name (i.e: Dataset_Name)

Metabolomics Dataset Submission Template

Upload your compressed metabolites Excel files and your metabolomics dataset submission template.

As it’s shown below in the screenshot you need to upload the zip file containing your metabolite Excels and a mapping Excel for submission as described above.

Metabolomics Datasets Upload — There are two fields you need to give files. **Dataset template**: here you give the filled template mapping metabolites Excel files to *RUN_ID* and *Dataset_Name* Metabolomics Dataset Template. **Dataset raw**: Here you upload the zip file containing all metabolites Excel (e.g: Example of metabolites Excel)

When the upload is finished you can view your metabolites datasets Metaqbolomics View

Footnotes