Archive for the ‘Databases’ Category

On metagenomics

Tuesday, March 13th, 2007

Konrad was the first this morning to hint at release of Venters effort of providing  environmental sequencing samples from the world oceans. The data is backed by several papers in PLoS Biology and the new camera database. Other bloggers have followed and the main stream media will pick it up soon.
What to add on a busy day like today? The results might not breathtaking but that was as true for the release of the release of the human genome project back in 2000. Sequencing the human genome was a necessity - but the environmental samples provide a complete new picture of our planet, even if our initial view is warped and noisy and our ways of understanding the data is limited.

Freebase, another centralized database of the web

Sunday, March 11th, 2007

The newly founded company Metaweb recently announced Freebase, an effort to organize the world’s information by creating meaningful connections between the different data sources. The data base will be released under a Creative Commons by-attribution license.
The major difference to Google Base from what little I can see at this point is that the company actively takes place in integrating the different data and creating a meaningful way to navigate rather than search it. Freebase already incorporates Wikipedia and other smaller projects and should allow future content to be connected to this resource in a controlled manner. However, as the database is early alpha and only open by invitation, it’s premature to discuss the use on hard scientific data.
There’s more on the matter in Konrad’s post and comprehensive coverage on bbgm.

Update: An informative sneak preview and the news that BioMedCentral content will be incorporated.

Dangling on String

Monday, January 29th, 2007

Singling out my favorite amongst the 174 biological information resources in the current database issue of Nucleic Acids Research is easily achieved: String, a protein-protein interaction database primarily developed in the group of Peer Bork at the EMBL was updated to version 7, introducing many small and a few major improvements and should finally be covered here.
(more…)

Minimal protein-protein interaction publication standards

Thursday, October 12th, 2006

Nature Biotech has opened a new section, Community Consultation, which aims to involve the scientific community in the development of standards for publications. The first manuscript for review discusses The Minimum Information required for reporting a Molecular Interaction Experiment (MIMIx).

The authors comprise many important people in the interacting proteins field, both experimentalists and bioinformaticians associated with the development of databases such as DIP, BOND and Intact. One important focus is to enforce unique identifiers for biomolecules; I was pleased to see that the experimental role (such as “bait” in a biochemical purification experiment) is enforced too, as it was missing from many databases and is often neglected in bioinformatics network analysis.

The bioinformatics community will benefit from these standards most. Let’s hope that all publishers will enforce them consistently.
[Via Pedro Beltrao. Nature could really give this broader coverage, they should be proud of it.]

DILS’06: Last day

Saturday, July 22nd, 2006

The commencing day of the DILS’06 workshop started with reviews of several well known projects: Taverna, BioMoby (workflows), (webservices) and BioMart(data management/retrieval). If you require any such service, check them out, they are all well established projects with active communities (as most of the people reading this probably know).  The main subject of the day - workflows - definitely convinced me to explore the matter such as Taverna or Kepler again.

Noteworthy: Simon Mercer from Microsoft Research presented how Microsoft supports bioinformatics research in academia including projects small and large, e.g. Openwetware. IP generated in these projects remains with the academic scientists, Microsoft basically provides financial supports and receives insight into current research.

All in all, the workshop exceeded my expectations. The talks delivered much more than buzzwords despite - or may be because - being targeted at a small, experienced audience. The venue of the workshop, the Wellcome Trust Conference Center on the Genome Campus provides the right environment and infrastructure and the set up allowed for easy mixing with participants.
The field of data integration advances - not solving all problems as quickly as one would hope but I am convinced that webservices, data marts and workflows will hopefully replace Perl hacking in many places. Let’s see.

DILS’06: Day 2

Friday, July 21st, 2006

The auditorium is well populated. The European heat wave might contribute but I hear many rather happy remarks over coffee.

My personal highlight for today was Attempto Controlled English (ACE), an Controlled Natural Language approach provides both human descriptions and computer readable, presented by Tobias Kuhn from the University of Zürich.

(more…)

DILS’06: Opening keynote

Thursday, July 20th, 2006

The workshop started on a strong keynote with Victor Markowitz providing overview, opinion and application to biological data integration. Many scientists both in Computer Science and Biology view the topic as a necessary evil at best and I might have been observed to support that one day or another but in this convincing keynote most acronyms were replaced by insight.

(more…)

DILS’06: Expectations

Thursday, July 20th, 2006

The 3rd Internation Workshop on Data Integration in the Life Sciences 2006 (DILS’06), starting at the European Bioinformatics Institute in Hinxton, UK today is one of those workshops for the die hard bioinformaticians with the vision of structured, interoperable data sources in biology. Peter Norvig’s recent criticism of the semantic web, neatly summarized by Duncan at Nodalpoint probably affects the majority of the research that is performed by the attending scientists. While I do share many of Norvigs gloomy prospects on the lack of compliance for standards (etc.), giving up and let our current data babel continue feels like folding before picking up the cards. Just proposing data schemes by one consortium or another is certainly insufficient and we need to change the way we communicate and possibly conduct scientific research too - and changes are underway.
Back to the workshop: the talks won’t be breathtaking. Even well done presentations on ontologies, database schema and scientific workflows probably make excellent yet tough contributions to a Powerpoint Karaoke line up. Attending ten of them in a day can be tiring for someone with biological focus and one might get more from studying the projects website. Frankly, my main motivation is to meet and discover people to discuss and reflect on some concrete project ideas. Good talks would probably help but the workshop provides ample opportunity for discussions.

N.B. After re-reading Lem’s Futurological Congress recently, I wanted to reflect on its slander of science tourism in the next conference coverage. However, getting up at 3.45 a.m. and strolling through Cambridgeshire expelled any witticism on my part for now.