The content of transcription factors (TFs) in the human genome (~1500 TFs) was just recently determined and is still unknown in most other sequenced genomes. The most significant problem for determining the exact TF content in the other genomes is the insufficient quality of their draft genomes, which makes it difficult to identify TFs in the complicated genome areas. Furthermore, the lack of transcript information (mRNA, cDNA, or EST sequences) makes it difficult to determine the sequence of the transcribed genes, because the prediction of promoters, open reading frames, and splice sites has to be done purely based on genomic features and conservation to other species. We take advantage of improved genomic information and increasing amounts of transcript data provided by RNA-Seq to computationally identify all TFs in primate genomes and to manually curate gene models for TFs in humans, chimpanzees, orangutan, rhesus macaque, and marmoset. We are using our high-quality TF gene models to reveal lineage- and species-specific TFs, TFs that have lineage- or species-specific changes in functional domains, and TFs under positive selection.
Only a small proportion of TFs has been functionally characterized. Very little is known about many gene families, and this situation is especially dramatic for the biggest TF family in mammalian genomes: the KRAB-ZNFs. The importance of TFs for phenotypic differences and speciation has been established for various examples (e.g. PRDM9, FOXP2, EGR1, BMP4). We are focussing on human-specific TFs, TFs with human-specific domain changes, and TFs that are connected in gene regulatory networks in a human-specific way to determine experimentally their evolutionary impact. We perform for instance ChIP-Seq experiments in human and chimpanzee cell lines to identify the binding sites of the TFs in both species. Furthermore, we manipulate expression levels of the TFs in cell lines of both species (knock-down and overexpression) followed by RNA-Seq to determine downstream targets. These experiments will not only give us insight into the function of the selected TFs, but more importantly, insight into their functional changes during evolution.
TFs regulate their target genes in a concerted, combinatorial fashion, thus forming often large and complex gene regulatory networks. Little is known about the evolution of such networks, about the amount of noise or redundancy in such networks, and the importance of gain or loss of nodes (genes) or links (interaction). Based on transcriptome information, we have previously identified a network of TFs that is active in the prefrontal cortex and is characterized by significant link changes between humans and chimpanzees. It appears that this network was involved in shaping some phenotypic differences, such as the larger human brain and its higher energy consumption. We are now investigating this TF network in other primates to reveal its evolutionary history. Furthermore we are interested in network differences between human populations.
Joint project with Professor Peter Stadler (Bioinformatics group Leipzig).
Long non-coding RNAs (lncRNAs) are emerging as key players in the nervous system. Many of the about 15.000 human lncRNAs are expressed in the brain and multiple lines of evidence have linked them to important brain functions, such as neurogenesis and behavior, or have associated them with neurodegenerative and psychiatric diseases. Although several databases for lncRNAs exist, there is still a large gap in the structural and functional annotation of lncRNAs hindering a full understanding of their role in the nervous system. Many characteristics of the brain are human specific. Genes that evolve quickly, as lncRNAs do, are therefore the best candidates to be primarily responsible for the evolution of these innovations. Since biological function has to be studied in the light of evolution, we aim here at establishing a full catalog of human lncRNAs, including an annotation of their sequence, structure, expression, and evolutionary changes by collating and coherently re-analyzing the wealth of already available high throughout data. We will experimentally determine target genes for one selected lncRNA and for the other lncRNA genes provide insights into their function and involvement in gene regulatory networks using computational methods. These results, together with the custom brain-lncRNA chip we plan to develop, will set the stage for thorough functional characterization of lncRNAs in the brain during the second funding period.
Joint project with Dr. Rui Faria (Research Center in Biodiversity and Genetic Resources, Porto, Portugal).
Alterations in the spatial organization of the genome (chromosomal rearrangements (CRs)) can cause divergence in coding sequences, in sequences of regulatory regions, and in gene expression. Our working hypothesis is that these types of changes are correlated with hybrid sterility and/or inviability, and thus can lead to speciation. We are testing this hypothesis by evaluating the relative contribution of CRs and of changes in the composition of the neighborhood of genes on gene expression divergence between species. We further analyze if sequence and expression divergence are correlated. In addition, we aim to investigate how gene regulatory and co-expression networks that contain genes with changed genomic location differ between species. By exploring these networks and the functions of genes that are located within CRs and display high expression divergence we aim to gain insight into why hybrids are sterile or inviable.
Accelerated Evolution in Chromosomal Rearrangements and Speciation in Lacertid Lizards
Joint project with Prof. Martin Schlegel, Prof. Peter Stadler, Dr. Klaus Henle, and Dr. Rui Faria within the German Centre for Integrative Biodiversity Research (iDiv).
During the process of speciation, individuals of two populations acquire genetic differences leading to reproductive isolation. According to Suppressed Recombination Models (SRMs) of chromosomal speciation, genetic divergence can quickly accumulate within regions of low recombination, such as in chromosomal rearrangements (CRs). We are testing the hypothesis that CRs are associated with accelerated evolution driving speciation by studying two species of lizards (Lacerta viridis and L. bilineata) which recently separated during Pleistocene. To this end, we are sequencing, assembling, and comparing their genomes, to search for accumulated divergence near breakpoints and within rearranged regions. We are also sequencing the transcriptomes of four individuals of each species as well as four hybrids for gene annotation and identification of differential gene expression patterns. We are further developing a new gene assembly method to detect CRs. In the spirit of iDiv we want to contribute to a better understanding of mechanisms of speciation and of how biodiversity emerges.