PubChem is a database of chemical molecules. The system is maintained by the National Center for Biotechnology Information (NCBI), a component of the National Library of Medicine, which is part of the United States National Institutes of Health (NIH). PubChem can be accessed for free through a web user interface. Millions of compound structures and descriptive datasets can be freely downloaded via FTP. PubChem contains substance descriptions and small molecules with fewer than 1000 atoms and 1000 bonds. The American Chemical Society tried to get the U.S. Congress to restrict the operation of PubChem, because they claim it competes with their Chemical Abstracts Service.[4]. More than 80 database vendors contribute to the growing PubChem database.[5]


PubChem consists of three dynamically growing primary databases:

  • Compounds, 10.9 million entries, contains pure and characterized chemical compounds. Check the accurate number of compounds online.[6]
  • Substances, 19.5 million entries, contains also mixtures, extracts, complexes and uncharacterized substances. Check accurate number of substances online.[7]
  • BioAssay, bioactivity results from 598 high-throughput screening programs with several million values. Check accurate number of bioassays online.[8]


Searching the databases is possible for a broad range of properties including chemical structure, name fragments, chemical formula, molecular weight, XLogP, and hydrogen bond donor and acceptor count.

PubChem contains its own online molecule editor with SMILES/SMARTS and InChI support that allows the import and export of all common chemical file formats to search for structures and fragments.

Each hit provides information about synonyms, chemical properties, chemical structure including SMILES and InChI strings, bioactivity, and links to structurally related compounds and other NCBI databases like PubMed.

In the text search form the database fields can be searched by adding the field name in square brackets to the search term. A numeric range is represented by two numbers separated by a colon. The search terms and field names are case-insensitive. Parentheses and the logical operators AND, OR, and NOT can be used. AND is assumed if no operator is used.

Example (Lipinski's Rule of Five):

0:500[mw] 0:5[hbdc] 0:10[hbac] -5:5[logp]

Industry Concerns with Pub Chem

The American Chemical Society has raised serious concerns about the effects of open access on science journals. In 2004, Editor Rudy Baum penned an editorial in Chemical & Engineering News against, what he called, "open access advocates." The editorial was titled "Socialized Science."[1]. "Their unspoken crusade is to socialize all aspects of science, putting the federal government in charge of funding science, communicating science, and maintaining the archive of scientific knowledge. If that sounds like a good idea to you, then NIH's open-access policy should suit you just fine."

ACS also hired a public relations firm to work against government control of publishing. In July 2006, Nature reported that public relations operative, Eric Dezenhall, "spoke to employees from Elsevier, Wiley and the American Chemical Society at a meeting arranged last July [2006] by the Association of American Publishers." The publishers were seeking to counter economic threats from open-access journals and public databases.

Nature reported on an email from Dezenhall which suggested that the publishers "focus on simple messages, such as 'Public access equals government censorship.' He hinted that the publishers should attempt to equate traditional publishing models with peer review, and 'paint a picture of what the world would look like without peer-reviewed articles.'" Nature added that "Brian Crawford, a senior vice-president at the American Chemical Society and a member of the AAP executive chair, says that Dezenhall's suggestions have been refined and that the publishers have not to his knowledge sought to work with the Competitive Enterprise Institute."[2]

Scientific American reported that ACS had spent hundreds of thousands of dollars lobbying against open-acess. "In fact, the ACS paid lobbying firm Hicks Partners LLC at least $100,000 in 2005 to try to persuade congressional members, the NIH, and the Office of Management and Budget (OMB) that a 'PubChem Project' would be a bad idea, according to public lobbying disclosures, and paid an additional $180,000 to the Wexler & Walker Public Policy Associates to promote the 'use of [a] commercial database.' It also reportedly spent a chunk of its 2005 $280,000 internal lobbying budget as well as part of its $270,000 lobbying budget last year to push the issue, according to disclosure documents. The ACS publishes more than 30 journals covering all aspects of chemistry, and the organization did not return phone calls for comment."[3]

Crawford later supported affirmed the hiring Dezenhall in an editorial: "In essence, the premise of a January 24, 2007 article in Nature was that [publishers] should be admonished for seeking advice and assistance from a media consulting firm known for its effectiveness in working with high-profile clients on controversial issues," he wrote. "Peer-reviewed science and medicine should be free of any government intervention or funding agency bias, and we will fulfill our responsibility to communicate that point of view, because doing so is in the best interest of science and society."[4]

ACS apparently took Dezenhall up on his offer, according to New Scientist, which reported that publishers had established a lobbying group called Partnership for Research Integrity in Science & Medicine (PRISM).[9]"Dezenhall's strategy includes linking open access with government censorship and junk science – ideas that to me seem quite bizarre and misleading," wrote the reporter. [5] New Scientist acquired a copy of Dezenhall's strategy document for creating PRISM and released it on their Website.[6]

Database fields

Identification numbers
Identification number in current database [UID]
Substance identification number [SID]
Compound identification number [CID]
BioAssay identification number [BAID], [AID]

Any database field [ALL]
Comment [CMT]
Deposition date [DDAT], [DEPDAT]
Depositor's external ID [SRID], [SRCID]
Source name [SRC], [SRCNAM], [SRCNAME]
Source release date [SRD], [SRDAT], [RLSDAT]
Medical Subject Heading (MeSH) term [MSHT], [MESHT]
MeSH tree node [MSHN], [MESHTN]
MeSH pharmacological actions [PHMA], [PHARMA]

Substance properties
Substance synonyms [SYNO]
International Chemical Identifier (InChI) [INCHI]
Molecular weight [MW], [MWT], [MOLWT]
Chemical elements [ELMT], [EL]
Non-Hydrogen atoms [HAC], [HACNT]
Isotope count [IAC], [IACNT]
Total formal charge [TFC], [CHG], [CHRG]
Chiral atom count [ACC], [ACCNT]
Defined chiral atom count [ACDC], [ACDCNT]
Undefined chiral atom count [ACUC], [ACUCNT]
Hydrogen bond acceptor count [HBAC], [HBACNT]
Hydrogen bond donor count [HBDC], [HBDCNT]
Tautomer count [TC], [TCNT], [TTMC]
Rotatable bond count [RBC], [RBCNT]

Compound properties
Compound synonyms [CSYN], [CSYNO]
Component count [CC], [CCNT]
Covalent unit (molecule) count [CUC], [CUCNT]
Total bioactivity count [TAC]

