Scripts and Example Code
Here you can find a collection of bioinformatics and fun scripts that I've worked on, as well as several examples of how to do clever things in Mathematica, especially things pertaining to high-performance and parallel computing.
jQuery Performance
April 21st, 2012 | By: Zach
jQuery is a nice tool for [sometimes] simplifying code and enhancing cross-browser compatibility, but it's never as fast as native js. Here are some examples in differences in timing:
| jQuery |
Native |
Differential |
| $("#elem") |
9.9 |
document.getElementById("elem") |
0.3 |
33X |
| $(this).attr("id") |
16.9 |
this.id |
0.2 |
85X |
| element.click(...) |
29.0 |
element.addEventListener(...) |
8.3 |
3X |
Times are in microseconds average over 10,000 calls. Evaluated in Firefox 14a1; times were similar in Chrome.
SNiPnfo
May 11th, 2010 | By: Zach
At first glance, this code doesn't do much. All it does is take an SNP file from cns2snp (from the Maq suite) and make a new table that says what gene the SNP is in, if any, and if it's synonymous or what the amino acid change is.
At second glance, this code does a lot. In addition to adding the two columns described above, it corrects Maq's read depth column which is limited at one byte (255). It adds a column for allele frequence. It directly parses GenBank XML files (or at least it tries, when NCBI is nice enough to publish a file with a consistent format), handles multi-exon and minus strand genes and makes full use of IUPAC ambiguity codes when determining the amino acid change. It can process SNPs from any creature with a GenBank XML file, not just those in Ensembl or human. It can also easily be customized to output more info about the proteins than their name. Finally, it lists a BLOSUM62 score.
Due to the nature of GenBank XML files, you will occasionally have to get out and push the code along. That's the nice thing about Mathematica though: it's oh-so-easy to do that. Please report bugs, issues, comments to bjornson*stanford.edu. The only testing I've done is on viral genomes.
Future versions will parse something other than the XML files to avoid NCBI's file format issues.
Current version: 0.1 5/11/2010 (right click and Save As... to avoid seeing a plain text notebook)
Iterative Elimination Sudoku Solver
January 4th, 2010 | By: Zach
I doubt this is the first time someone has come up with this, but it's not included in Wikipedia's article on Sudoku Algorithmics. It's smarter than brute force, but dumber than finding vertex solutions. It's fast, and it will always find a solution
How it works: Calculate what numbers are allowed to fill each blank (what numbers don't already exist in that blank's row, column and square). For the easiest of puzzles, you'll find at least one blank that has only one possible value. After you fill in that blank(s) with its one possible value, you go back to the beginning and figure what numbers are allowed to fill each remaining blank. Usually this takes only 6-10 iterations. For more advanced puzzles, there will be no blank with only one possible value. In that case, the algorithm picks the cell with the fewest possible options (usually two for human-solvable puzzles) and randomly picks one of the options, then proceeds as before, seeking unique numbers. This takes slightly longer, between 6 and 50 iterations, depending on how many cells have more than one possibility and whether or not the first random choice is correct or not.
Example
Input (an easy puzzle):

One iteration, 4 new numbers gained:

Two iterations, 7 new numbers gained:

...Six iterations, completed:
Notebook to be uploaded.
Parallelization of Common Operations
January 4th, 2010 | By: Zach
Includes: Loading files in parallel, operating file streams in parallel, batch application of any script to files in a directory.
To be uploaded.
Setting up a Mathematica Compute Cluster
August 4th, 2009 | By: Zach
Ideally you'd use the Wolfram Grid software, but if you're tooling... The Wolfram documentation for setting up your own Mathematica cluster is slim, and it took me a few hours to figure out how to do this. These directions will enable you to use one or more slave computers as an expandable computer cluster. This might not be the easiest method, but it certainly works and performs well. (That is to say, there's probably a way to do it without plink.)
This particular documentation is for a Windows master and linux slaves, but it is easy to adopt to any platform.
On your slaves, in any order...
1. Establish a static IP address. For me, and for others with a good networking services department, this means requesting a static hostname through that department, and changing xubuntu to use that static IP address. You could also set this up with a router, but routers are often illegal at schools and businesses where they will interfer with the larger network.
2. Enable sshd. At a terminal, type
# sudo apt-get install openssh-server openssh-client
3. Install Mathematica.
On your master, in this order...
1. Install Plink. Download it from
http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html
and save them to e.g. the Program Files directory. You might want PuTTY as well, but probably won't need it.
2. Open Mathematica and go to Evaluation>Parallel Kernel Configuration. Click on the Remote Kernels tab. Click "Add Host."
3. Type the hostname or IP address (both will work) in the Hostname box.
4. Select "Use custom launch command" and change it to this:
C:\Progra~2\plink.exe username@192.168.0.1 -pw 123456 -x "math -mathlink -linkmode Connect `4` -linkname `2` -subkernel -noinit >& /dev/null &"
where C:\Progra~2\plink.exe is the location of plink.exe (it might be Progra~1 if you're not on a 64-bit machine), username is your username on the slave, 192.168.0.1 is the hostname or IP address of the slave and 123456 is the password for your login on the slave.
5. Restart Mathematica. (This is not really necessary, you can just evaluate LaunchKernels[] but sometimes Mathematica bawks at your already-open kernels.) All of your kernels should launch. Test it with e.g. ParallelEvaluate[2^2048;//Timing] to see how your kernel speeds roughly compare. If you get an error about your remote kernel(s) not launching, try running the command in step 4 from a command shell. (Open cmd.exe and type C:\Progra~2\plink.exe username@192.168.0.1 -pw 123456 "math" with the replacements described above. You should get In[1]:= in a shell.) If that doens't work, either PuTTY or your slave isn't configured properly.