Skip to content

Latest commit

 

History

History
23 lines (15 loc) · 745 Bytes

File metadata and controls

23 lines (15 loc) · 745 Bytes

Cheminformatics-based curation

Chemistry Development Kit-based

Because SPARQL makes it very easy to extract data from Wikidata, it makes it easy to find inconsistencies. For example, we can download all SMILES strings and parse the SMILES with a library like the Chemistry Development Kit [Q30149558], for example, using Bacting [Q107332190].

Unparsable SMILES

Example code is avialable from checkSMILES.groovy. This script runs a SPARQL query to get all SMILES, and then tries to parse the string. This will filter out many unparsable SMILES.

RDkit-based

...

References