Our goal is to collect and organize publicly accessible bibliographic metadata and research information into the Common Map of Academia. We aim to lower the barriers of entry to scientometric studies by performing the onerous task of data preparation and releasing its results to public under an open license.
Common Map of Academia is based solely on publicly available sources. In terms of volume, the two largest data sources are: web pages processed via CommonCrawl and repositories harvested using the OAI-PMH protocol (both are already ingested). But we have appetite for more: PDF files linked from the web pages, Directory of Open Access Journals, arXiv.org, DBLP, NPG Linked Data, Open Access subset of PubMed Central, and many more.
Do you know other publicly available sources of information? Please contact us!
Under the Hood
We employ state-of-the-art machine learning techniques for document deduplication, author name disambiguation, keyword extraction, document analysis (page segmentation, zone classification), etc. Thanks to Apache Hadoop and a modest cluster we are able to handle tens of millions of records.
Download and Experience
Feel free to download the compressed RDF file containing COMAC data. Use them in your data processing projects.
Explore data visually with COMAC
Navigator for better understanding the connections present in COMAC
|Some part of this software were created with cofinancing from the European Union funds as part of the European Regional Development Funds|