Tunca Doğan, PhD
Adjunct faculty member
Department of Health Informatics,
Graduate School of Informatics,
METU, 06800 Ankara, Turkey
Research fellow
Protein Function Development Team (UniProt database),
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI),
CB10 1SD Cambridge, UK
Work (Turkey) +90(312)2107778
Mobile (Turkey): +90(505)5250011
Mobile (UK): +44 7733 849182

ResearcherID: B-5274-2017
ORCID iD: 0000-0002-1298-9763
Alternative website: www.ebi.ac.uk/about/people/tunca-dogan




Motivation: Conventional similarity based computational methods only able to annotate one third of all recorded non-characterized proteins in UniProt knowledgebase. Here we proposed a novel approach by comparing protein domain architectures, classifying proteins based on the similarities and propagating functional annotation using the Gene Ontology (GO). The results indicate the effectiveness of this approach in detecting functional similarity with an average F-score: 0.85. We applied the method on nearly 55 million uncharacterized proteins in UniProtKB and obtained ~45 million functional predictions. Dr Doğan has conceived the methodology, carried out the experiments, written the manuscript and is the corresponding author in this publication.

Motivation: Orphan genes/proteins usually remain completely unidentified until costly experimental analyses are carried out. In this paper, we propose a completely unsupervised and automated computational method to identify evolutionary conserved regions with functional importance, using a combination of sequence alignment and graph-theoretical approaches. We evaluated the biological relevance of the method and then applied the method to a genome wide dataset of human proteins. Investigations on the resulted conserved regions revealed that they corresponded strongly to experimentally annotated structural domains, suggesting that the method can be useful in predicting novel domains on protein sequences. Dr Doğan has conceived the methodology, carried out the experiments, written the manuscript and is the corresponding author in this publication.

Motivation: Lately, a plethora of biological databases and tools are deployed, in technically complex and diverse implementations, across a spectrum of disciplines. The corpus of documentation of these resources is fragmented across the Web, with much redundancy. Scientists must often struggle to find, understand, compare and use the best resources for the task at hand. Here we present a community-driven effort, supported by ELIXIR that aspires to a comprehensive and consistent registry of information about bioinformatics resources with 2 components: curation and automated efforts. In this large-scale project, Dr Doğan has undertaken the automated efforts using data/text mining techniques to classify these resources under various schemes/sub-groups.




My current research in the field of bioinformatics can be summarized as developing novel computational methods for biomolecular sequence analysis, protein function prediction and computational drug discovery; using statistical learning, data mining and machine learning techniques and graph theory approaches. As an interdisciplinary researcher, I always try to approach biological problems from different angles, using the fundamentals and techniques generally applied in other relevant disciplines, to be able to propose novel and effective solutions to prevalent issues. My overall research philosophy is to utilize the knowledge obtained by valuable and hard-to-conduct wet-lab experiments to accurately model the biological systems in-silico, with the aim of assisting the ongoing work in biomedical research.

For this, I divide my work in 3 major parts: (i) integrating complementary biological data from various open access data repositories in order to generate a bigger picture using the current bio-knowledge; (ii) developing novel in-silico methods and applying on large-scale biological data in order to estimate/predict what has been missing from the current knowledge; and (iii) further analysing the specific parts of the produced well-annotated biological data (enriched/completed with the in-silico predictions) to infer biological insight. I give emphasis on publishing the produced data via open-access data repositories where the whole research community can work on it to further our biological understanding on different subjects. In this sense, I'm currently developing myself to be able to construct and maintain better resources.