Raw Beans

Overview

Raw Beans is a quality control tool for assessment of data coming off mass spectrometry DDA (Data-Dependent Acquisition) experiments. It is an identification-free tool that can process either Thermo RAW or mzML formats. The results is a HTML report the provides some key data for a quick impression of the quality of the data itself and detect some obvious issues. 

This tool is available for linux and windows:

Downloads section

Git repository is located here: https://bitbucket.org/incpm/prot-qc/src/master/README.md


A tutorial for running RawBeans:

Input Types

There are 3 input types that this tool can handle:

  • Vendor files - at this point, only Thermo RAW files.
  • mzML files - files that are converted from raw to mzML format with ProteoWizard MSConvert tool, it is important to choose the 64-bit option for "binary encoding precision".
  • json files - this tool generates a json file for each of the processed samples, which contains the required data. this file the can be reused for quicker additional analyses (such as combining multiple files).

The order of the input types is has follows - raw file → mzML file → json file

You can put any of the file types in your input folder, and for each sample, the tool will the take the most advanced file type so it will not redo anything that already been done.

Vendor Support

Thermo -  Raw files can be read directly instead of converting them to mzML first, in order to accomplish that all you need to is:

  1. Install MSFileReader_x86_Standalone.zip (only the 32-bit is supported, it is best not to install both 32 and 64 bit).
  2. Run it on windows.
  3. Note that the Proteowizard msconvert.exe field is empty, otherwise it will run msconvert.
  4. If you are running from source code, you will need to install python3 32-bit as well in order to use the 32-bit dll.

Windows Software

For windows, there is an executable that opens this view:

The tool options are as follows:

ProteoWizard msconvert.exe - The location of the msconvert.exe file in the ProteoWizard installation folder (this value needs to be inserted only ones, after that, the software will remember the last location you inserted), for example: C:\Program Files\ProteoWizard\ProteoWizard 3.0.10738\msconvert.exe

Input Directory - The input folder which your input files (as explained above) are located in.

Output Directory - The output folder which the html report will be written to.

Mass List - A list of masses (one per line) to track during file processing for the mass deviation section.

Batch - If True - a web report will be created for each sample (file), if False - one web report will be created for all of the samples (files).

Num Cores - The number of cores you want to use for parallel processing, this number shouldn't be bigger then the number of cores (logical cores) you have on your computer.


Useful notes:

  • It is best if the input files will be located on your local PC, it will be faster and also will avoid network problems.