I’m happy to announce the release of zeptodb version 1.1. This release
does not add much in the way of functionality but, instead, it pulls
in some gnulib modules to help make the
code more portable. I used some handy functions provided by the
GNU C Library that may not be present on
other systems. The gnulib modules provide the functionality for those
systems. For anyone who has GNU libc, you won’t notice any change
(aside from a slightly longer configure
script).
Head over to the zeptodb website to download it!
Also, just to demonstrate the usefulness of zeptodb, I ran a quick test demonstrating the O(1) look-up of the GDBM library on which zeptodb is based. I had two databases filled with pairs of human genes and their genetic locations. One database had 5 genes, the other had 62192 genes:
$ zdbf --all gene_locations.db | wc -l
5
$ time echo "ENSG00000141194" | zdbf gene_locations.db
17:63133549-63223821:1
real 0m0.002s
user 0m0.000s
sys 0m0.000s
$ zdbf --all all_genes.db | wc -l
62192
$ time echo "ENSG00000141194" | zdbf all_genes.db
17:56232494-56233517:1
real 0m0.002s
user 0m0.000s
sys 0m0.000s
As you can see, even with tens of thousands of records, the correct record was fetched in the same amount of time. In the middle of a pipeline that is passing around huge amounts of data, these time savings can really add up!