National Institutes of Health Center for Translational Therapeutics
NCTT is now part of the NIH National Center for Advancing Translational Sciences (NCATS). Content on this site will move to NCATS.NIH.GOV

NCTT PubChem Information

About PubChem

PubChem PubChem is a freely accessible database of small organic molecules and their activities against biological assays. It was created by NIH in 2004 and maintained by the National Center for Biotechnology Information (NCBI), a component of the National Library of Medicine. PubChem is a critical part of the Molecular Libraries initiative of the NIH Roadmap for Medical Research. The database connects chemical information with biomedical research and clinical information, organizing facts in numerous databases into a unified whole.

PubChem consists of three dynamically growing primary databases:

PubChem Compound
Contains pure and characterized chemical compounds.
PubChem Substance
Contains mixtures, extracts, complexes and uncharacterized substances.
PubChem BioAssay
Contains database results from high-throughput screening programs with several million values.

The integration of these databases makes PubChem as a critical new tool that will speed the development of new treatments for America's most important health problems. It brings information about the biological activities of chemical substances to biomedical researchers on a broad scale.

For more information, visit http://pubchem.ncbi.nlm.nih.gov/

Top of page

Assay Data

Top of page

PubChem Data Guideline

The qHTS data in PubChem is preliminary, and for this reason and because of limited compound quantities, we do not supply probe compounds to investigators other than those who originally submitted the assay.

The data presented in PubChem from the NCGC listed as "qHTS" represents primary quantitative high throughput screening data. Each sample is tested as a titration series to provide a concentration-response output. While the results accurately describe the effect of the sample on the assay endpoint, the "actives" are not necessarily due to the perturbation of the intended target (i.e., they may be artifactual positives). Despite this, these primary data are provided to allow analysis by cheminformatic algorithms, guide the selection of compounds for subsequent chemistry optimization, and to populate the 'chemical genomics' database of compound-activity profiles. The value of this database should increase as additional assays and compounds are added.

In interpreting and using qHTS data the investigator should remain cognizant of the following:

  1. The sample tested is very limited in quantity, so neither the NCGC nor the MLSCN repository can supply screening samples upon request. Some samples are commercially available and inexpensive, and can be purchased directly from vendors. Compounds about which more is known, designated as "probes" by the MLSCN, will be designated as such in PubChem and arrangements for their broader availability to investigators will be made by the MLSCN.
  2. The effect of the sample on the assay described in PubChem may reflect artifacts that result from the sample's physical or spectroscopic properties, such as its interference in the assay due to aggregation in aqueous buffer, or absorbance of emitted fluorescence for signal detection. Flags indicating the propensity for interfering phenomenon from samples in the library will be added to the data set as it is determined.
  3. QC information is not necessarily current. The results are determined from "samples", indicated as such, because the term "compound" implies a single chemical entity. Subsequent analysis by LC-MS and verification of the activity will be performed for a subset of the actives. This data will be entered into PubChem as it is generated.
  4. The IC50/EC50s (referred to by the NCGC as AC50s) determined from the normalized titration-response data (n =1) are estimates. Curve fitting artifacts can occur due to the high throughput nature of the analysis. A flag indicating whether a curve fit has been verified will be updated over time. In addition, the primary data is available for interpretations by others.

Top of page

Last Updated: June 23, 2011