Protein binding sites often are represented by "consensus sequences," such as TATA(A/T)A(A/T), "which report the most common nucleotide at any given position but eliminate much of the possible variability. In 1991, National Institutes of Health research biologist Tom Schneider developed an alternative, graphical approach, called sequence logos.
In a sequence logo, the height of each position measures how well conserved it is, while the height of each character within that position reflects its relative frequency. Thus, where a consensus sequence might mark a position as C/T, a sequence logo could indicate that C actually is observed five times more often.
Steven Brenner, an associate professor at the University of California, Berkeley, says Schneider's logo-generation software "was very, very hard for typical biologists to make use of." So in 1994, while a graduate student at the University of Cambridge, UK, Brenner developed a Web version called WebLogo. It would take another...