SOCS
SOCS is a reference-based, ungapped alignment tool designed for mapping both standard SOLiD data and bisulfite transformed SOLiD data.
For usage instructions, see Usage.html, which is included in the package and posted in Docs and Data. Standard sample data are available at GEO, accession GSE13543. Bisulfite sample data are available from Applied Biosystems.
OS: MacOSX/Unix/Linux
SOCS was developed at the Georgia Institute of Technology and the National Biodefense Analysis & Countermeasures Center with assistance from Life Technologies.
SOCS is licensed by the GNU General Public License.
Features
| Bisulfite capability |
For methylation studies utilizing bisulfite sequencing, alignments can be tolerant of any bisulfite-induced nucleotide substitutions, while remaining tolerant to an additional user-specified number of color-space errors. Although runtimes will be significantly longer in this mode, it will be free of the bias caused by aligning to reference sequences with simulated conversion. See Variant mode in the Options section of Usage.html |
| Short variant detection |
Short sequence-space variants of a user-specified length (typically 1, for SNPs) can be inferred from valid color-space variants. See Variant mode in the Options section of Usage.html |
| Census function |
SOCS can optionally compute read coverage depth per nucleotide of the reference. Census of short variants and bisulfite transformations can also be computed. |
| Multiple reference files |
Multiple reference files can be provided, each with multiple sequences. The files can be in FASTA or 2bit format and can be mixed in a single alignment. SOCS will find the optimal alignment for each read across all reference files provided. |
Publications
Ondov BD, Varadarajan A, Passalacqua KD, Bergman NH. Efficient mapping of ABI SOLiD sequence data to a reference genome for functional genomic applications. Bioinformatics 2008 Dec 1;24(23):2776-7. Full text
Ondov BD, Cochran C, Landers M, Meredith GD, Dudas M, Bergman NH. An alignment algorithm for bisulfite sequencing using the Applied Biosystems SOLiD System. Bioinformatics 2010 Aug 1;26(15):1901-2. Full text
Contact
Please contact Nicholas Bergman, PhD at the National Biodefense Analysis & Countermeasures Center with any questions, problems, or suggestions.
bergmann@nbacc.net
| Recent News |
Version 2.1 Nicholas Bergman 2011-04-04Version 2.1 adds built-in read filtering and fixes 2 important bugs:
- Incorrect reference files were reported when previous versions of SOCS were run on multiple computers, each with local copies of the reference files.
- Previous versions never chose the first ambiguous mapping position in random mode.
Also note that 2.1 reports read names instead of read numbers in the alignment file.... |
Version 2.0 Nicholas Bergman 2010-03-05Version 2.0 is now available. Major updates include:
- bisulfite mode
- multi-sequence fasta and 2bit support
- 64bit-aware
A complete description of changes can be found in Version_2_guide.html under Docs and Data. |
Crashes with SOLiD 3 data Nicholas Bergman 2009-10-08Several users have encountered crashes that were traced to uncalled colors, which appear as "." in the csfasta file. A read filtering script has been posted with 1.2.1, and it is recommended that this script is run on any SOLiD 3 data before running SOCS.
Scripts and usage information are available in scripts.tar. |
User feedback for version 2 Nicholas Bergman 2009-04-09We are currently working on version 2 of SOCS and would like to hear how it can be improved. Let us know what kinds of features you would like to see or what kinds of bottlenecks SOCS causes in your workflow.
Here are some features we are already working on:
- bisulfite mode
- multiple contig support
- 64-bit version
To leave feedback, email socs@biology.gatech.edu or post on the thread ... |
Version 1.2.1 - Important fix Nicholas Bergman 2009-01-21Previous versions contained a bug that affected best match files (.best.txt) for mappings that took more than one round. The read number column restarted at 0 for each round (instead of representing the absolute order in the csfasta file).
Score files (.map, .amb, .imm) were not affected.
.imm files have also been renamed to .vmm (valid mismatch). |
|