Whether you are compiling, publishing, or applying databases of numerical data (for example, for scientific, engineering, financial, or business applications), you will have some key objectives. You will want that data to be as clean and complete as possible. And you will want to extract maximum value from it. Machine learning can achieve these goals. But machine learning approaches often fail where data is sparse (i.e., has gaps) or noisy – a common scenario in technical datasets that are continuously evolving.
The unique Alchemite™ technology has been designed for sparse, noisy datasets. It can auto-generate models that help you to understand the data, quantifying uncertainty, spotting outliers, and guiding data acquisition projects. And these models can be applied to use your data for prediction and optimisation, gaining vital insight and aiding decision-making.
View the webinar “Impossible journeys in 15 dimensional space” for an introduction to data analysis with Alchemite™.
Improving materials data quality at Matmatch
Matmatch is a provider of materials property data for use in science and engineering applications – an excellent example of an evolving technical data set that is compiled and curated from numerous data sources, meaning that data is ‘sparse’ (i.e., has gaps). Alchemite™ was used for data checking, and for estimation and gap filling of missing data.
Melissa Albeck, CEO at Matmatch, commented: “We were most impressed by the results we were able to achieve… the outcome exceeded our expectations and we succeeded in predicting and filling significantly more properties than expected across different material categories, despite starting with a very sparse dataset. The Intellegens team was a pleasure to work with and we look forward to continued cooperation.”
Alchemite™for data validation and analysis
- Gap-fill and validate sparse, noisy data
- Identify outliers so that you can understand and improve data quality
- Auto-generate and refine models that identify relationships in the data
- Quantify uncertainty to enable correct use of imputed data
- Identify priorities for additional data acquisition