Introduction to Kelvin

Kelvin is a program suite for analysis of genetic data. It is based on the PPL framework1, 2, 3, and produces output on the posterior probability (0,..,1) scale.

Kelvin comes in two primary forms:

Kelvin-original is stable software, and has been tested on a number of hardware platforms. Kelvin-LKS is experimental alpha-quality software and currently only supports linkage analysis (linkage disequilibrium analysis is presently being implemented). It's under active development, and we endeavor to provide prompt support, but it can and will break under various circumstances.

A discussion of the guiding philosophy of Kelvin and details on the underlying statistical methods can be found in the following reference:

Vieland, V.J., et al. Kelvin: a Software Package for Rigorous Measurement of Statistical Evidence in Human Genetics. Hum Hered 2011;72(4):276-88. Epub 2011 Dec 23. PMID:22189470

Prerequisites

Kelvin-original

Kelvin-original has been tested and run on several platforms, but the reference and development platform is CentOS 6 (or any other Linux distribution of similar vintage).

To install Kelvin, you will also need a working C compiler (GCC will do and is tested; ICC (the Intel C Compiler) has also been tested). You will also almost certainly want libgsl (the GNU Scientific Library); compiling without GSL is an option but not supported by default.

Running Kelvin requires libgsl (if compiled with same) and Perl 5.8 (or any later version)

Kelvin-LKS

Kelvin-LKS, in addition to the requirements for Kelvin-original, requires an Open Grid Scheduler (or Sun Grid Engine, or another descendant thereof) cluster. Furthermore, the following is necessary of the cluster:

Build requirements are:

Additional software requirements include:

Database nodes will need a prebuilt binary distribution of MySQL Server version 5 or better, NOT SET UP. This can be downloaded from mysql.com (look for "MySQL Community Server"'s "Linux - Generic", "Compressed TAR Archive").

Installation

For Kelvin-LKS and Kelvin-original together, installation requires several steps:

  1. Select which nodes in your cluster are to be "database" nodes. Unpack your MySQL prebuilt distribution into the same directory on each node.

  2. Add a "database" INT resource to the scheduler and add it to each DB node's complex values. A Perl script (LKS_setupSGEDB.pm) is provided that can do this for you; run it with the --help option for guidance.

  3. Edit the Makefile as follows: BINDIR: This should point to where Kelvin and related modules and utility scripts should be located. The default is /usr/local/share/kelvin PATHDIR: This should point to a directory on your $PATH where the Kelvin program will be linked. The default is /usr/local/bin OWNER and GROUP: These should be the owner and group IDs for the Kelvin programs and utility files. The defaults are root for both. LKSPE_MYSQL_BASE: This is the directory where the MySQL Server binary distribution is located on each "database" node. LKSPE_JOB_SUBMITS_JOBS: This should be set if your cluster requires some sort of additional command-line option(s) for qsub to indicate jobs that submit other jobs.

  4. Run make install-lks. Kelvin-LKS, Kelvin-original, and associated programs will be built, assembled, and installed in the location you specified in the Makefile.

  5. (optional) Verify that setup worked by running one or both of the two Acid Tests - these are preassembled analyses stored in tarballs in PATHDIR named sa_dt-acid-test.tar.gz and merlin-only-sadt-acid-test.tar.gz. The former tests the whole thing; the latter verifies that Merlin integration is working. (It is normal for sa_dt-acid-test to report a small difference in Bayes ratios at the end of the test; this is due to the nature of MC-MC analysis.)

Uninstallation may be done by running make uninstall-lks; this simply deletes all files that were installed.

If you are only interested in Kelvin-original, installation is simpler:

  1. Edit the Makefile as per Step 3 above, ignoring those variables that start with LKSPE_.

  2. Run make install. Kelvin will be built, assembled, and installed in the location you specified in the Makefile.

  3. (optional) Verify the build worked by running make check (for a quick check) or make test (for a more involved one).

Uninstallation may be done by running make uninstall; this simply deletes all files that were installed.

Input Files for Kelvin

Kelvin - in both forms - requires four input data files, inspired by the de-facto standard formats employed by the LINKAGE program8. Examples are given here showing an affected sib-pair family with three markers:

Kelvin-original also requires a configuration file. Additional details on creation of same and the use of Kelvin-original can be found in the Kelvin-original usage documentation.

Kelvin-LKS may also sometimes require a loop-breaker file. This consists of a list of individuals (specified via family id followed by individual id) that indicate where loops in pedigrees should be broken. This is necessary for any pedigrees with loops. Loops should NOT be broken in advance!

Using Kelvin

Kelvin-original, in addition to its input files, requires a configuration file for usage. Once the configuration file is created, it is invoked as Kelvin <configfile>. Additional details on creation of same and the use of Kelvin-original can be found in the Kelvin-original usage documentation.

Kelvin-LKS has a multi-step startup and invocation process:

Final results of a Kelvin-LKS analysis will be in the file pooled/pooled.ppl.out.

Visualizing Results

Kelvin-formatted PPL output can be easily visualized using our graphing application, Kelviz. Kelviz is distributed separately; information on same can be found in the Kelviz documentation and downloads can be found on the Kelvin website.

References

  1. Smith, C.A.B. Some comments on the statistical methods used in linkage investigations. Am J Hum Genet 11, 289-304 (1959).
  2. Vieland, V.J. Bayesian linkage analysis, or: how I learned to stop worrying and love the posterior probability of linkage. Am J Hum Genet 63, 947-54 PMID: 9758634 (1998).
  3. Vieland, V.J. Thermometers: something for statistical geneticists to think about. Hum Hered 61, 144-56 PMID: 16770079 (2006).
  4. Elston, R.C. & Stewart, J. A general model for the genetic analysis of pedigree data. Hum Hered 21, 523-42 (1971).
  5. Thomas A., Gutin A., Abkevich V., and Bansal A. (2000). Multilocus linkage analysis by blocked Gibbs sampling. Stat. Comput. 10, 259-269.
  6. Lander, E.S. & Green, P. Construction of multilocus genetic linkage maps in humans. Proc Natl Acad Sci U S A 84, 2363-7 (1987).
  7. Abecasis GR, Cherny SS, Cookson WO and Cardon LR (2002). Merlin-rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet 30:97-101.
  8. Ott, J. (1976). A computer program for linkage analysis of general human pedigrees. Am. J. Hum. Genet. 28, 528-529.