Lambda

Lambda: The Local Aligner for Massive Biological Data

Overview

Lambda is a BLAST compatible local aligner optimized for NGS and protein searches. It is faster than most other tools around while maintaining a high sensitivity. It is actively developed and free to use, study, share and improve.

 

Frequently Asked Questions

What kind of data can I use Lambda with?

Lambda is optimized for searches in protein space, so whenever your reads / query sequences represent proteins (transcriptome, exome, rna data …) and/or your database is a protein database (uniprot, ncbi nr …) Lambda will perform very well.

Does that mean I cannot use it to replace BlastN oder Megablast?

Lambda has a BlastN mode that shares many features of the protein searches (BlastP, BlastX, TBlastN, TBlastX), but that does not benefit from some optimizations that are specific for these modes. Our tests show that Lambda is still a big improvement over BlastN while being in a comparable sensitivity range below e-values of 0.1. Since less time has been spent on tuning the BlastN default parameters we recommend you try it on one of your data sets and compare it to BlastN before you use it with the same confidence. And we are interested in hearing your feedback on this mode and whether we should invest more resources into it!

What kind of speed-ups can I expect over NCBI Blast?

For protein modes we have measured speed-ups of 150-300x over NCBI Blast. Different options are available to increase speed further (at the cost of system memory or sensitivity of the results). The speed-up depends on the data and is higher, if the dataset is bigger.

What kind of sensitivity can I expect compared NCBI Blast?

For short reads senstivity was measuered to be over 96%. For reads of length over 900 the sensitivity dropped to a little over 80% in the default mode — but this is still better than with other Blast-competitors. There are different parameters to increase sensitivity at the cost of speed.

How much memory does Lambda require?

In its default mode Lambda requires approximately the following amount of RAM

size(queryFile) + 2 * size(dbFile)

The indexer additionally requires around 20 * size(dbFile) of free disk space in your temporary directory ($TMPDIR). This will change in the future to require less and be much faster!

 

Downloads

Latest Version [2014/11/10]:

Lambda Binaries 64bit (v0.4.1) - regular 64bit Linux
Lambda Binaries Sandybridge (v0.4.1) - faster, require modern Intel CPU and recent OpenMP.
Lambda Source Code .zip (v0.4.1) – see build instructions below
Lambda Source Code .tar.gz (v0.4.1) – see build instructions below

Latest changes:

  • Speed-ups of up to 100% compared to the published version
  • much lower memory consumption
  • BlastN mode fixed and tuned
  • several bugs fixed (including support for FastQ)
  • many parameters have changed or been renamed, please look at bin/lambda --help
  • 0.4.1 corrected lambda_indexer to also default to FM as index type

Older versions and detailed changes:

see the CHANGELOG and the releases history on git.

Pre-indexed Databases

NR Database (v2014-10) - Prebuilt FMIndex of reduced NR [26GiB]

 

Usage

These examples assume that you have the files query.fasta and db.fasta, with the query and the subject sequences respectively.

If you have sufficient memory, please store your files in /dev/shm/ .

Default profile

Optionally mask the database (dustmasker supported for BLASTN):

% /path/to/segmasker -infmt fasta -in db.fasta -outfmt interval -out db.seg

Run the indexer:

% bin/lambda_indexer -d db.fasta [-s db.seg]

Run lambda:

% bin/lambda -q query.fasta -d db.fasta

Other parameters

Please have a look at:

% bin/lambda --help

 

Build instructions

Please make sure that the build requirements are met:

  • CMake
  • GCC-4.8.* or GCC>=4.9.1 (4.9.0 has a bug)
  • or Clang >= 3.3 (however no multi-threading on Clang -> much slower!)

Then execute the following commands to extract the source and build the two binaries in seqan-lambda-build/release/bin

% tar xzf seqan-lambda-v0.4.1.tar.gz
% mkdir -p seqan-lambda-build/release
% cd seqan-lambda-build/release
% cmake ../../seqan-lambda-v0.4.1 \
  -DCMAKE_BUILD_TYPE=Release \
  -DCMAKE_CXX_FLAGS:STRING="-march=native"
% make -j2 lambda lambda_indexer

Warnings concerning a lack of C++14 can be ignored. Please be aware that due to excessive use of templating and compile-time optimizations the build might take well over 10min. If your most recent version of GCC is not the default, you have to give its path with -DCMAKE_C_COMPILER and -DCMAKE_CXX_COMPILER.
 

Contact

For questions, comments, or suggestions feel free to contact Hannes Hauswedell or Knut Reinert.

References

  • Lambda: the local aligner for massive biological data
    Hannes Hauswedell; Jochen Singer; Knut Reinert
    Bioinformatics 2014 30 (17): i349-i355
    doi: 10.1093/bioinformatics/btu439
Last Update 23. November 2014

No Comments

Write a comment · RSS Comments

Write a comment