Structure
Hepatitis B virus (HBV) is a member of the Hepadnavirus family. The virus particle, (virion) consists of an outer lipid envelope and an icosahedral nucleocapsid core composed of protein. The nucleocapsid encloses the viral DNA and a DNA polymerase that has reverse transcriptase activity. The outer envelope contains embedded proteins which are involved in viral binding of, and entry into, susceptible cells. The virus is one of the smallest enveloped animal viruses, with a virion diameter of 42 nm, but pleomorphic forms exist, including filamentous and spherical bodies lacking a core. These particles are not infectious and are composed of the lipid and protein that forms part of the surface of the virion, which is called the surface antigen (HBsAg), and is produced in excess during the life cycle of the virus.
Genome
The genome of HBV is made of circular DNA, but it is unusual because the DNA is not fully double-stranded. One end of the full length strand is linked to the viral DNA polymerase. The genome is 3020–3320 nucleotides long (for the full-length strand) and 1700–2800 nucleotides long (for the short length-strand). The negative-sense, (non-coding), is complementary to the viral mRNA. The viral DNA is found in the nucleus soon after infection of the cell. The partially double-stranded DNA is rendered fully double-stranded by completion of the (+) sense strand and removal of a protein molecule from the (-) sense strand and a short sequence of RNA from the (+) sense strand. Non-coding bases are removed from the ends of the (-) sense strand and the ends are rejoined. There are four known genes encoded by the genome, called C, X, P, and S. The core protein is coded for by gene C (HBcAg), and its start codon is preceded by an upstream in-frame AUG start codon from which the pre-core protein is produced. HBeAg is produced by proteolytic processing of the pre-core protein. The DNA polymerase is encoded by gene P. Gene S is the gene that codes for the surface antigen (HBsAg). The HBsAg gene is one long open reading frame but contains three in frame "start" (ATG) codons that divide the gene into three sections, pre-S1, pre-S2, and S. Because of the multiple start codons, polypeptides of three different sizes called large, middle, and small (pre-S1 + pre-S2 + S, pre-S2 + S, or S) are produced. The function of the protein coded for by gene X is not fully understood but it is associated with the development of liver cancer. It stimulates genes that promote cell growth and inactivates growth regulating molecules.
Replication
The life cycle of hepatitis B virus is complex. Hepatitis B is one of a few known non-retroviral viruses which use reverse transcription as a part of its replication process. The virus gains entry into the cell by binding to an unknown receptor on the surface of the cell and enters it by endocytosis. Because the virus multiplies via RNA made by a host enzyme, the viral genomic DNA has to be transferred to the cell nucleus by host proteins called chaperones. The partially double stranded viral DNA is then made fully double stranded and transformed into covalently closed circular DNA (cccDNA) that serves as a template for transcription of four viral mRNAs. The largest mRNA, (which is longer than the viral genome), is used to make the new copies of the genome and to make the capsid core protein and the viral DNA polymerase. These four viral transcripts undergo additional processing and go on to form progeny virions which are released from the cell or returned to the nucleus and re-cycled to produce even more copies. The long mRNA is then transported back to the cytoplasm where the virion P protein synthesizes DNA via its reverse transcriptase activity.
Serotypes and genotypes
The virus is divided into four major serotypes (adr, adw, ayr, ayw) based on antigenic epitopes presented on its envelope proteins, and into eight genotypes (A-H) according to overall nucleotide sequence variation of the genome. The genotypes have a distinct geographical distribution and are used in tracing the evolution and transmission of the virus. Differences between genotypes affect the disease severity, course and likelihood of complications, and response to treatment and possibly vaccination.
Genotypes differ by at least 8% of their sequence and were first reported in 1988 when six were initially described (A-F). Two further types have since been described (G and H). Most genotypes are now divided into subgenotypes with distinct properties.
Genotype A is most commonly found in the Americas, Africa, India and Western Europe. Genotype B is most commonly found in Asia and the United States. Genotype B1 dominates in Japan, B2 in China and Vietnam while B3 confined to Indonesia. B4 is confined to Vietnam. All these strains specify the serotype ayw1. B5 is most common in the Philippines. Genotype C is most common in Asia and the United States. Subgenotype C1 is common in Japan, Korea and China. C2 is common in China, South-East Asia and Bangladesh and C3 in Oceania. All these strains specify the serotype adrq. C4 specifying ayw3 is found in Aborigines from Australia. Genotype D is most commonly found in Southern Europe, India and the United States and has been divided into 8 subtypes (D1-D8). In Turkey genotype D is also the most common type. A pattern of defined geographical distribution is less evident with D1-D4 where these subgenotypes are widely spread within Europe, Africa and Asia. This may be due to their divergence having occurred before than of genotypes B and C. D4 appears to be the oldest split and is still the dominating subgenotype of D in Oceania. Type E is most commonly found in West and Southern Africa. Type F is most commonly found in Central and South America and has been divided into two subgroups (F1 and F2). Genotype G has an insertion of 36 nucleotides in the core gene and is found in France and the United States. Type H is most commonly found in Central and South America and California in United States. Africa has five genotypes (A-E). Of these the predominant genotypes are A in Kenya, B and D in Egypt, D in Tunisia, A-D in South Africa and E in Nigeria. Genotype H is probably split off from genotype F within the New World.