Kelvin is a program suite for analysis of genetic data. It is based on the PPL framework1, 2, 3, and produces output on the posterior probability (0,..,1) scale.
Kelvin comes in two primary forms:
Kelvin-original is stable software, and has been tested on a number of hardware platforms. Kelvin-LKS is experimental alpha-quality software and currently only supports linkage analysis (linkage disequilibrium analysis is presently being implemented). It's under active development, and we endeavor to provide prompt support, but it can and will break under various circumstances.
A discussion of the guiding philosophy of Kelvin and details on the underlying statistical methods can be found in the following reference:
Vieland, V.J., et al. Kelvin: a Software Package for Rigorous Measurement of Statistical Evidence in Human Genetics. Hum Hered 2011;72(4):276-88. Epub 2011 Dec 23. PMID:22189470
Kelvin-original has been tested and run on several platforms, but the reference and development platform is CentOS 6 (or any other Linux distribution of similar vintage).
To install Kelvin, you will also need a working C compiler (GCC will do and is tested; ICC (the Intel C Compiler) has also been tested). You will also almost certainly want libgsl (the GNU Scientific Library); compiling without GSL is an option but not supported by default.
Running Kelvin requires libgsl (if compiled with same) and Perl 5.8 (or any later version)
Kelvin-LKS, in addition to the requirements for Kelvin-original, requires an Open Grid Scheduler (or Sun Grid Engine, or another descendant thereof) cluster. Furthermore, the following is necessary of the cluster:
Build requirements are:
Additional software requirements include:
Database nodes will need a prebuilt binary distribution of MySQL Server version 5 or better, NOT SET UP. This can be downloaded from mysql.com (look for "MySQL Community Server"'s "Linux - Generic", "Compressed TAR Archive").
For Kelvin-LKS and Kelvin-original together, installation requires several steps:
Select which nodes in your cluster are to be "database" nodes. Unpack your MySQL prebuilt distribution into the same directory on each node.
Add a "database" INT resource to the scheduler and add it to each DB node's complex values. A Perl script (
LKS_setupSGEDB.pm) is provided that can do this for you; run it with the
--help option for guidance.
Edit the Makefile as follows:
BINDIR: This should point to where Kelvin and related modules and utility scripts should be located. The default is
PATHDIR: This should point to a directory on your $PATH where the Kelvin program will be linked. The default is
GROUP: These should be the owner and group IDs for the Kelvin programs and utility files. The defaults are
root for both.
LKSPE_MYSQL_BASE: This is the directory where the MySQL Server binary distribution is located on each "database" node.
LKSPE_JOB_SUBMITS_JOBS: This should be set if your cluster requires some sort of additional command-line option(s) for qsub to indicate jobs that submit other jobs.
make install-lks. Kelvin-LKS, Kelvin-original, and associated programs will be built, assembled, and installed in the location you specified in the Makefile.
(optional) Verify that setup worked by running one or both of the two Acid Tests - these are preassembled analyses stored in tarballs in
merlin-only-sadt-acid-test.tar.gz. The former tests the whole thing; the latter verifies that Merlin integration is working. (It is normal for sa_dt-acid-test to report a small difference in Bayes ratios at the end of the test; this is due to the nature of MC-MC analysis.)
Uninstallation may be done by running
make uninstall-lks; this simply deletes all files that were installed.
If you are only interested in Kelvin-original, installation is simpler:
Edit the Makefile as per Step 3 above, ignoring those variables that start with
make install. Kelvin will be built, assembled, and installed in the location you specified in the Makefile.
(optional) Verify the build worked by running
make check (for a quick check) or
make test (for a more involved one).
Uninstallation may be done by running
make uninstall; this simply deletes all files that were installed.
Kelvin - in both forms - requires four input data files, inspired by the de-facto standard formats employed by the LINKAGE program8. Examples are given here showing an affected sib-pair family with three markers:
Pedigree File - This contains phenotypic and genotypic information. This will nearly always be in pre-MAKEPED format. (There are some cases where Kelvin-original will require post-MAKEPED format).
1 2 3 4
fam1 papa 0 0 1 1 2 2 1 2 1 1 fam1 mama 0 0 2 1 1 1 1 2 1 1 fam1 kid1 papa mama 2 2 1 2 2 2 1 1 fam1 kid2 papa mama 1 2 2 1 1 1 1 1
Locus File (also called Data File) - Describes marker column order in the pedigree file, starting with the position of the trait locus.
1 2 3 4
T Trait M MRK_1 M MRK_2 M MRK_3
Frequency File - Gives the allele frequencies for the markers.
1 2 3 4 5 6
M MRK_1 F 0.3 0.7 M MRK_2 F 0.35 0.65 M MRK_3 F 0.7 0.3
Map File - Gives the chromosomal position of the markers.
1 2 3 4
CHROMOSOME MARKER POSITION 3 MRK_1 0.33 3 MRK_2 0.66 3 MRK_2 0.99
Kelvin-original also requires a configuration file. Additional details on creation of same and the use of Kelvin-original can be found in the Kelvin-original usage documentation.
Kelvin-LKS may also sometimes require a loop-breaker file. This consists of a list of individuals (specified via family id followed by individual id) that indicate where loops in pedigrees should be broken. This is necessary for any pedigrees with loops. Loops should NOT be broken in advance!
Kelvin-original, in addition to its input files, requires a configuration file for usage. Once the configuration file is created, it is invoked as
Kelvin <configfile>. Additional details on creation of same and the use of Kelvin-original can be found in the Kelvin-original usage documentation.
Kelvin-LKS has a multi-step startup and invocation process:
ready_kelvin_lks_analysis.sh. This will create two new scripts in the folder -
settings.sh- and create a
settings.sh. Some are necessary (
INPUT_*parameters are there to indicate where and how to find your data), some are helpful (
ANALYSISwill allow you to receive helpful progress emails and warn you when things go wrong), and some are strictly optional (such as the
MCMC_*parameters. They're all documented right there in the file, so that should help.
Final results of a Kelvin-LKS analysis will be in the file
Kelvin-formatted PPL output can be easily visualized using our graphing application, Kelviz. Kelviz is distributed separately; information on same can be found in the Kelviz documentation and downloads can be found on the Kelvin website.