Lambda

Lambda: The Local Aligner for Massive Biological Data

Overview

Lambda is a BLAST compatible local aligner optimized for NGS and protein searches. It is faster than most other tools around while maintaining a high sensitivity. It is actively developed and free to use, study, share and improve.

 

Frequently Asked Questions

What kind of data can I use Lambda with?

Lambda is optimized for searches in protein space, so whenever your reads / query sequences represent proteins (transcriptome, exome, rna data ...) and/or your database is a protein database (uniprot, ncbi nr ...) Lambda will perform very well.

Does that mean I cannot use it to replace BlastN oder Megablast?

Lambda has a BlastN mode that shares many features of the protein searches (BlastP, BlastX, TBlastN, TBlastX), but that does not benefit from some optimizations that are specific for these modes. Our tests show that Lambda is still a big improvement over BlastN while being in a comparable sensitivity range below e-values of 0.1. Since less time has been spent on tuning the BlastN default parameters we recommend you try it on one of your data sets and compare it to BlastN before you use it with the same confidence. And we are interested in hearing your feedback on this mode and whether we should invest more resources into it!

What kind of speed-ups can I expect over NCBI Blast?

For protein modes we have measured speed-ups of 150-300x over NCBI Blast. Different options are available to increase speed further (at the cost of system memory or sensitivity of the results); the recommended fast mode has speed-ups over 2000x. The speed-up also depends on the data and is higher if the dataset is bigger.

What kind of sensitivity can I expect compared NCBI Blast?

For short reads sensitivity was measured to be over 96%. For reads of length over 900 the sensitivity dropped to a little over 80% in the default mode -- but this is still better than with other Blast-competitors. There are different parameters to increase sensitivity at the cost of speed.

How much memory does Lambda require?

In its default mode Lambda requires approximately the following amount of RAM

size(queryFile) + 2 * size(dbFile)

The indexer has additional memory requirements, please see `lambda_indexer --help`. Future releases will contain algorithms with much lower requirements and higher speed.

 

Downloads

Latest Version [2014/12/05]:

Linux Binaries 64bit (v0.4.7) - generic 64 bit
Linux Binaries Sandybridge (v0.4.7) - faster, require modern Intel CPU and recent OpenMP.
MacOS X Binaries 64bit (v0.4.7) - built on Darwin 13.4 / MacOS X Mavericks
FreeBSD Binaries 64bit (v0.4.7) - built on FreeBSD10.0 with lang/gcc49
Source Code (v0.4.7) - see build instructions below

Latest changes:

  • Speed-ups of up to 100% compared to the published version
  • much lower memory consumption
  • BlastN mode fixed and tuned
  • several bugs fixed (including support for FastQ)
  • many parameters have changed or been renamed, please look at bin/lambda --help
  • 0.4.1 corrected lambda_indexer to also default to FM as index type
  • 0.4.5 added new construction algorithms to lambda_indexer which are faster and more reliable
  • 0.4.7 fixes build on Mac OS X

Older versions and detailed changes:

see the CHANGELOG and the releases history on git.

Pre-indexed Databases

NR Database (v2014-10) - Prebuilt FMIndex of reduced NR [26GiB] [ftp link]

 

Usage

These examples assume that you have the files query.fasta and db.fasta, with the query and the subject sequences respectively.

If you have sufficient memory, please store your files in /dev/shm/ .

Default profile

Optionally mask the database (dustmasker supported for BLASTN):

% /path/to/segmasker -infmt fasta -in db.fasta -outfmt interval -out db.seg

Run the indexer:

% bin/lambda_indexer -d db.fasta [-s db.seg]

Run lambda:

% bin/lambda -q query.fasta -d db.fasta

Other parameters

Please have a look at:

% bin/lambda --help

 

Build instructions

Please make sure that the build requirements are met:

  • CMake
  • GCC-4.8.* or GCC>=4.9.1 (4.9.0 has a bug)
  • or Clang >= 3.3 (however no multi-threading on Clang -> much slower!)
  • only 64-Bit Linux, 64-Bit FreeBSD and 64-Bit Mac are supported as platforms

Then execute the following commands to extract the source and build the two binaries in seqan-lambda-build/release/bin

% tar xzf seqan-lambda-v0.4.7.tar.gz
% mkdir -p seqan-lambda-build/release
% cd seqan-lambda-build/release
% cmake ../../seqan-lambda-v0.4.7 \
  -DCMAKE_BUILD_TYPE=Release \
  -DCMAKE_CXX_FLAGS:STRING="-march=native"
% make -j2 lambda lambda_indexer

Warnings concerning a lack of C++14 can be ignored. Please be aware that due to excessive use of templating and compile-time optimizations the build might take well over 10min. If your most recent version of GCC is not the default, you have to give its path with -DCMAKE_C_COMPILER and -DCMAKE_CXX_COMPILER.
 

Contact

For questions, comments, or suggestions feel free to contact Hannes Hauswedell or Knut Reinert.

References

  • Lambda: the local aligner for massive biological data
    Hannes Hauswedell; Jochen Singer; Knut Reinert
    Bioinformatics 2014 30 (17): i349-i355
    doi: 10.1093/bioinformatics/btu439
Last Update 12. December 2014

No Comments

Write a comment · RSS Comments

Write a comment