Chemical Registration Made Easy
Possibly the most pressing question that synthetic and medicinal chemists face is “what shall I synthesize next?” and that will be the topic of another blog posting. But once the chemist has synthesized, purified and characterized his or her next compound, the likely next step in the R&D process will be to assign the compound its unique corporate identifier – typically called a registry number – so that the compound and its associated properties, assay results, samples, etc. can be properly and consistently tracked, stored, retrieved and analyzed.
Chemistry Registration Systems & Workflows
The process of registration to assign a registry number to a compound ought to be simple: draw the structure, check it against the existing registry file and see if there is match; if there isn’t a match, the compound is novel, and gets a new registry number; if there is a match, then the compound is a remake of a known compound (often referred to as a “lot”) and is given a version of the registry number for the known compound.
This assumed simplicity is fraught with a number of assumptions about the features of the various components of the registration system and workflow, and we will consider each of these in turn.
Draw the Structure
This assumes that the drawing program is equipped to handle all the structural types and variants that the chemists will produce. In addition to the basic building blocks of atoms, bonds, functional groups and ring systems, chemists will need to describe stereochemistry (absolute, racemic, relative, etc.); and assign other structural variants – for example salts, solvates and requisite equivalents.
Most contemporary chemical drawing programs have built-in chemical integrity checkers to identify structural errors (e.g. pentavalent carbon atoms) and experienced companies may well have developed chemical business rules that dictate how common functional groups are rendered that influence structural topology, resonance, and query results. In companies where the chemical registration process is initiated by the chemists, such business rules may have been automated as part of the drawing program to force the chemists to correct any errors and to abide by the agreed conventions before submitting a compound to a laboratory information management system for registration. Select companies still retain registrars who devise and administer these business rules, and run the registration process themselves.
In addition to depicting the structure, most modern sketchers can also calculate or estimate properties like molecular formula and weight, LogP, LogD, Total Polar Surface Area, Hydrogen Bond Donors and Acceptors etc. , as well as generating an IUPAC name and InChIkey for the structure, and these can all be included in the registration database.
Learn More About the Core ELN
Check the Structure Against the Existing Registry File
This presupposes that there is a chemically intelligent laboratory information management system that stores and indexes structures in such a way that it can check for the uniqueness of an incoming structure, and that the drawing program can output a structure file in an appropriate format for storage. Typical chemical database systems use connection tables or canonical SMILES for storage and indexing. These formats have evolved, and in some cases earlier versions can only store a subset of the detailed structural information that is now available from modern chemical sketchers. One example is the well-known molfile, where the earlier v2000 format cannot handle extended per-atom stereochemical descriptors such as absolute and relative, while the more modern v3000 format can. This can complicate both novelty checking and substructure searching.
Assign a Registry Number
For an unadorned single structure, this should be a no-brainer: it’s either in the database or it isn’t. If it isn’t, assign it the next unused registry number: if it’s already there, assign a variant (lot code) of the existing number.
For adorned or multiple structures things can get more complicated, and business rules may be needed to sort things out. Most biopharma companies share the concept of a “parent molecule”, i.e. the important structure that the registry number represents. If another batch of a parent molecule is made as a monohydrate, and then another as a hydrochloride salt, most companies would assign all three the same registry number, and distinguish them via an additional lot code.
But what about mixtures – intentional or not – or what level of impurity (or maybe a residual enantiomer in a resolved compound) in a compound switches the material from being assigned as a batch of a previous compound to being recognized as a new compound in its own right? There is no hard and fast rule here, and most companies have devised structural business rules to provide a degree of consistency that ensures no surprises in novelty checks, and that chemists can find all the correct hits when they do a substructure or similarity search in the registry database.
Bulk Registration of Compounds
The discussion so far has focused on the registration of single compounds, but there will be occasions when a set of compounds will need to be registered in bulk: examples could be a set of compounds purchased from a commercial vendor and preloaded in microtiter plates, a set of potential lead optimization candidates from a chemical CRO, or a compound library from a parallel synthesizer. The most common format for providing structural and related information on a set of compounds is the SDfile, and most registration systems can automatically register compounds in bulk, given an input SDfile with the structures.
This discussion applies to a growing array of therapeutics beyond “small molecules” that include oligo/polypeptides, and biologicals and will be a topic for a later blog in this series.
Core Informatics has partnered with recognized cheminformatics vendor ChemAxon to use its leading chemistry technology as part of the Core Informatics chemical registration application built on the Platform for Science. This system addresses the requirements outlined above, is integrated with the Core ELN, and is in daily use by multiple biopharma companies. Learn how.