New approach to sharing data published in Nature Scientific Data
Machine learning methods like Alchemite™ extract value from the available data to support the design and optimisation of materials. But where can you find the data to enable such analysis?
One source is materials databases. An abundance of such databases now cover the structures and properties of millions of materials. They represent an explosion in the quantity of data that has been further accelerated by the advent of high throughput experimental and computational techniques. But in this abundance lies a problem. How can we usefully collate and analyse materials data when every database has a different API (Application Programming Interface) – the set of communications protocols and formats for accessing its data?
Intellegens’ CTO Dr Gareth Conduit has been working with other leading experts in an international project, OPTIMADE, which is solving this problem. The Open Databases Integration for Materials Design project is developing a universal API specification for materials databases. Version 1.0 is now available and was recently published in Nature Scientific Data. The work was also featured as a research highlight in Nature Reviews Materials. The new specification supports holistic access to many leading crystal structure databases: AFLOW, COD, TCOD, Materials Cloud, Materials Project, NOMAD, odbx, Open Materials Database (omdb) and OQMD.
As a result, machine learning tools such as Alchemite™ will be able to gain ready access to all relevant data from across a rich set of materials data resources and exploit that data to propose novel materials and guide experimental testing programs towards the most productive pathways.
Publication: Nature Scientific Data
Title: OPTIMADE, an API for exchanging materials data
Authors: Casper W. Andersen, Rickard Armiento, Evgeny Blokhin, Gareth J. Conduit, Shyam Dwaraknath, Matthew L. Evans, Ádám Fekete, Abhijith Gopakumar, Saulius Gražulis, Andrius Merkys, Fawzi Mohamed, Corey Oses, Giovanni Pizzi, Gian-Marco Rignanese, Markus Scheidgen, Leopold Talirz, Cormac Toher, Donald Winston, Rossella Aversa, Kamal Choudhary, Pauline Colinet, Stefano Curtarolo, Davide Di Stefano, Claudia Draxl, Suleyman Er, Marco Esters, Marco Fornari, Matteo Giantomassi, Marco Govoni, Geoffroy Hautier, Vinay Hegde, Matthew K. Horton, Patrick Huck, Georg Huhs, Jens Hummelshøj, Ankit Kariryaa, Boris Kozinsky, Snehal Kumbhar, Mohan Liu, Nicola Marzari, Andrew J. Morris, Arash A. Mostofi, Kristin A. Persson, Guido Petretto, Thomas Purcell, Francesco Ricci, Frisco Rose, Matthias Scheffler, Daniel Speckhard, Martin Uhrin, Antanas Vaitkus, Pierre Villars, David Waroquiers, Chris Wolverton, Michael Wu & Xiaoyu Yang
Materials databases provide one route to enable effective materials design where you have limited data available from your own experimental programs. For other ways to approach this ‘small data’ problem, read our white paper.