The quality of reports of randomized clinical trials on traditional Chinese medicine treatments: a systematic review of articles indexed in the China National Knowledge Infrastructure database from 2005 to 2012

Background The Consolidated Standards for Reporting Trials (CONSORT) are aimed to standardize clinical trial reporting. Our objective is to compare the quality of randomized clinical trials (RCTs) of traditional Chinese medicine (TCM) published in 2005–2009 and 2011–2012 according to the current CONSORT statements and Jadad scale. Methods Data Sources: Reports on RCTs of TCM in the China National Knowledge Infrastructure database (CNKI database) for manuscripts published from 2005 to 2009 and 2011–2012. Search terms included TCM and clinical trial. Study Selection: Manuscripts that reported RCTs of TCM were included. Data Extraction: Independent extraction of articles was done by 3 authors. Disagreement was discussed until agreement was reached. According to the CONSORT checklist, an item was scored as 1 when the item was described in the paper. Otherwise the item was scored as 0. Results A total of 4133 trials in 2005–2009 and 2861 trials in 2011–2012 were identified respectively. There was a significant increase in proportion of reports that included details of background (24.71% vs 35.20%, P < 0.001), participants (49.79% vs 65.26%, P < 0.001), the methods of random sequence generation (13.77% vs 19.85%, P < 0.001), statistical methods (63.00% vs 72.77%, P < 0.001) and recruitment date (70.14% vs 80.36%, P < 0.001) in 2011–2012 compared to 2005–2009. However, the percentage of reports with trial design decreased from 4.45% to 3.25% (P = 0.011). Few reports described the blinding methods, and there was a decreasing tendency (4.77% vs 2.48%, P < 0.001). There was a similar decreasing tendency on the reporting of funding (6.53% vs 5.00%, P = 0.007). There were no significant differences in the other CONSORT items. In terms of Jadad Score, the proportion of reports with a score of 2 was markedly increased (15.15% vs 19.71%, P < 0.001). Conclusions Although the quality of reporting RCTs of TCM was improved in 2011–2012 compared to 2005–2009, the percentages of high-quality reports are both very low in terms of Jadad score. There is a need for improving standards for reporting RCTs in China.


Background
In order to acquire effective and credible outcomes, randomization and control are essential for clinical trials. Randomized controlled trials (RCTs) provide the most reliable evidence of health care intervention and are the basis for the establishment of many medical guidelines. However, RCTs are not always reported with sufficient details or clarity, potentially hindering interpretation of results [1,2]. For a reader to accurately evaluate the conclusion of a published report, he (she) needs complete, clear, and transparent information on the methodology and findings of the report. Unfortunately, attempted assessments frequently fail because authors of many trial reports do not describe some critical data and only limited information is available [3][4][5].
The CONSORT (consolidated standards of reporting trials) statement was first published in 1996, revised in 2001 and updated in 2010 by the CONSORT Group. They provide authors and editors with a checklist for a minimum set of recommendations for reporting the trial design, analysis and results [6][7][8]. Many studies have showed that quality of trial reporting can be improved when authors follow the checklist of the CONSORT [9][10][11][12]. The Jadad score is considered a valid and reliable tool to assess the methodological quality of a clinical trial, and has been applied throughout the medical literature [13,14].
Traditional Chinese medicine (TCM), including herbal medicine, are widely used in China to treat a variety of diseases and used increasingly to complement conventional medical care globally. In a nationally representative U.S. survey conducted in 2002, almost 20% of adults and 75%-100% of Asian-Americans had used herbal therapies in the past year [15]. They believe the TCM and conventional medicine provides more optimal healing than conventional medicine alone [16][17][18]. However, in the era of evidence-based medicine, TCM has encountered a strong challenge from clinicians due to a shortage of evidencebased efficacy. Therefore, researchers have made a great deal of effort in TCM clinical studies. In the past decade, TCM RCT is avocated and a number of RCTs of TCM have been reported [19][20][21][22][23]. Recently many TCM researchers evaluated the quality of RCTs with TCM according to the checklist of the CONSORT [24][25][26][27][28][29]. Their studies show that the quality of TCM RCTs is generally low. However, these studies evaluated only one or several TCM journals, or evaluated publication on a specific disease. Thus they cannot give a comprehensive view on the overall quality of TCM RCTs.
The purpose of the present study was to compare the change in quality of reporting TCM RCTs prior to and after the publication of the 2010 CONSORT statement. We include all publications of TCM RCTs during this period in the CNKI database, aiming to comprehensively evaluate the overall quality of TCM RCTs.

Search strategy
The China National Knowledge Infrastructure (CNKI) database is the most comprehensive full-text database of journals published in China and was used in the present study [30]. The CNKI database has several subdatabases. Among them is the academic journals' full-text database, which was used in the present study. We chose manuscripts published in 2005-2009 and 2011-2012, which respectively represent publications before and after the 2010 CONSORT statement. We used an electronic search strategy that involved subject term 'traditional Chinese medicine' and 'clinical trial' and "Fuzzy Search" method so as to acquire more potential manuscripts. To evaluate the tendency of publication quality, we evaluated the published reports on an annual base. The titles, index terms, and abstracts of the identified manuscripts were read and rated as "potential manuscript" or "not relevant". We retrieved all potential manuscripts and reviewed their full texts according to the following criteria: Inclusion criteria were manuscripts reporting TCM RCTs.
Exclusion criteria were (1) review, literature analysis, experience, case report; (2) animal experiments; (3) Non-randomized clinical trials; (4) reduplicative reporting; (5) retrospective study; (6) others. Three reviewers (J L, Z L, R C) reviewed the texts of the manuscripts to identify TCM RCTs. Disagreements regarding inclusion were resolved by discussion. Figure 1 shows the process of collecting materials and analysis.

Scoring according to CONSORT
A checklist of 25 items from the updated 2010 CON-SORT guidelines was used [31][32][33]. Among the 25 items, 12 have 2 subitems. The score for each item or subitem was either 0 or 1: 0 indicates no description of the corresponding item/subitem and 1 indicates there was description of the item/subitem in the report. We did not include the following subitems in our report because we found after analysis of all manuscript that (1) there were no reports that changed the methods after trial commencement (Subitem 3b); (2) there were no reports that changed trial outcomes after the trial commenced (Subitem 6b); (3) there were no reports that had interim analyses and stopping guidelines (Subitem 7b); (4) there were no reports that were stopped prematurely (Subitem 14b); (5) there were no reports that had additional analyses (Subitem 18). After these 5 subitems were excluded, the maximum score a paper could obtain 31 points. Each article was assessed for every item according to the checklist [29] by three investigators independently (J L, Z L and R C). When there were different opinions between three investigators, they discussed them until reaching a consensus. Otherwise the final decision was made by L L. The total score of each trial was calculated.

Scoring according to Jadad
The Jadad scale is a 5-point scale for measuring the quality of randomized trials. A score of three points or more indicates high quality [13]. The Jadad scale includes how generation of random sequence is described (0 = no description; 1 = inadequate description; 2 = adequate description); how the blinding is carried out (2 = double-blinding with adequate description; 1 = double-blinding with inadequate description; 0 = wrong usage of double-blinding), and why and how often withdrawal of patients happens (When the numbers and reasons of withdrawal and exit of patients were reported, we recorded 1. Otherwise, 0 was recorded). Similarly, the work was done by three investigators (J L, Z L and R C) separately. Disagreement was discussed by three until agreement was reached. Otherwise final decision was made by L L.

Statistics
Pearson χ 2 test was used to test whether differences among two periods (2005-2009 and 2011-2012) were statistically significant in terms of mean total score of CONSORT. Wilcoxon rank sum test was used to test differences of Jadad scores of the different years. The levels of significance for all tests were set at 0.05. Data were analyzed using SPSS version 18.0. The total score of each report and the percentage of different score were calculated.

Characteristics of selected RCTs
After screening the titles, abstracts and texts, we identified a total of 4133 reports in 2005-2009 and 2861 in 2011-2012 in the CNKI database that met the inclusion and exclusion criteria and were included in this analysis. The annual numbers of reports identified in each screening step are shown in Table 1.

CONSORT: title, abstract, background and objectives
The proportion of reports with "randomized" in the title (1a) increased significantly (0.56% vs 1.15%, P = 0.006). However, the percentages were very low for both periods (Table 2 and Figure 2). 84.81% of reports had abstracts (1b) that included objective, methods, results and conclusions in 2005-2009, more than that in 2011-2012 (82.03%). The proportions of reports with detailed description of backgrounds (2a) of studies were low for  Although the proportions on the description of interventions, outcomes and the calculated sample size were improved, there was no significant difference ( Table 2). As shown in Figure 3, there is a fluctuation in the proportions on the description of these items during 2005-2012.

CONSORT: randomization
Description on sequence generation (8a) also increased significantly (    was observed for description on detailed implement process (10) (0.75% vs 0.17%, P = 0.001). Few reports described the allocation concealment mechanism (9) ( Table 2). As shown in Figure 4, there is a fluctuation in the proportions on the description of these items from 2005 to 2012.

CONSORT: results
The proportion with detailed statistical methods (

CONSORT: discussion
There was no difference in proportions of papers reporting harms (19), limitations (20), generalizability (21) and interpretation (22) before and after 2010 ( Table 2). As shown in Figure 6, there is a fluctuation in the proportions on the description of these items from 2005 to 2012.
CONSORT: total score of each report  Figure 9 shows that the annual distributions of the reports with a specific score in each year are similar.    Figure 11).

Discussion
In the present study, we demonstrate that proportions of reports with descriptions of CONSORT items 1a, 2a, 4a, 4b, 8a, 12, 14a, 15 and 17b increase after 2010, while proportions of reports with descriptions of CONSORT items 1b, 2b, 3a, 10, 11a, and 25 decrease after 2010. And for most of the items, there is a fluctuation of proportion on description of the item from 2005 to 2012. These data indicate that publication of CONSORT has little, if any, influence on the most of the researchers reporting clinical trials in China. TCM has been practiced in China for thousands of years. TCM doctors use herbal medicine to treat a variety of diseases. The medical herbs may be used singly or in combination. In the past decades, the effects of TCM have been evaluated in various animal models and the underlying mechanisms have also been explored in cellular, protein or DNA levels. Nevertheless, the efficiency of TCM should be demonstrated in RCTs, which is the top-level evidence for therapy. For example, Chansu, the skin and parotid venom glands of Bufo bufo gargarizans cantor, is a well-known TCM widely used for the treatment of a variety of tumors in China [34,35]. Experimental studies suggested that Chansu and its active compounds exhibit significant anti-tumor activity via inhibiting cell proliferation, inducing apoptosis and cell arrest and inhibiting angiogenesis [36]. Further studies demonstrated that bufalin, one compound in Chansu, induced apoptosis of gastric cancer cells by inhibition of AKT signaling pathway [37] and inhibiting proliferation of hepatocellular carcinoma cells through inhibiting AKT/ GSK3β/β-catenin/E-cadherin signaling pathway [38].
RCTs for TCM were first published in the 1980s [39]. Since then, a number of TCM RCTs have been published. However, the quality of the reports of the TCM RCTs were poor [39][40][41][42][43][44]. For example, Fang et al. reported that only 13 trials in 338 RCTs reports had the detailed description on method of randomization [40]. In the present study, we identified that only 8 of 31 CONSORT items have significant improvements from 2005-2009 to 2011-2012. A detailed and informative introduction of background can make readers understand the purpose of the study. Detailed inclusion and exclusion criteria of patients will avoid the selection bias. Clear and definite description of intervention is critical for the study to be repeated. In particular, Figure 7 The distribution of the mean scores before and after 2010. whether outcome assessments are blind has considerable implications for assessment of internal validity [45]. We found that 29.00% of the articles described the background from 2005 to 2012. Sequence generation was described in only 16.26% of the publications, blinding in 3.83% and calculation of simple size in 0.29%. Inadequate description of these items will make the results of the study incredible. Another problem was there were only 56 out of 6994 reports that had the term 'randomize' in their titles. Title is a very important part of an article. Researchers use title to screen potential studies in meta-analysis.
With regard to methodological items, calculation of sample size was done by only 20 reports out of 6994 reports. If the sample is too large, it would be a waste of time and money. The smaller number of patients will reduce statistical power and generate selection bias. There were 268 reports using blinding method. The proportion of description on blinding method decreased after 2010. Blinding, especially double-blinding, is challenging for studies in which the intervention is being randomized [46]. Inadequate measures to create and conceal the random allocation, selective attrition, and insufficient double-blinding have been theorized to bias the estimates of treatment effects in RCTs [47].
Reports on adverse events were obvious not detailed enough, which will overestimate the safety of TCM. In fact, the recorded information of TCM herbs in most classical books includes toxicities, incompatibilities between herbs, cautions, precautions and contraindications. Thus, contrary to a general misconception, toxicity data on Chinese herbs exist and are documented through clinical experience [48]. For example, cinnabar, which contains mercury sulfide, has been used in TCM for thousands of years and 40 cinnabar-containing traditional medicines are still used today. Absorbed mercury from cinnabar is mainly accumulated in the kidneys, resembling the disposition pattern   of inorganic mercury. Following long-term use of cinnabar, renal dysfunction may occur [49].
In addition, the reporting of outcomes and ancillary analyses remained poor. For example, intention-to-treat analysis is advocated because it preserves the randomization process and allows for noncompliance and deviations from policy by clinicians [50]. There are only 9 in 6994 papers using intention-to-treat analysis.
Discussion is an important part of a report. The author(s) can discuss the advantages and generalizability of the treatment, as well as the limitations of the study there. As we noticed, few reports have an informative discussion (Table 2). Finally, only one report contained information about registration, another one report contained information of protocol.
According to Jadad scale, there were 188 reports which scores were over 2 points. There was no difference between publications before and after 2010. Thus, reporting of TCM RCTs improved very slowly in their quality. The average Jadad score was 1.25 during 2011-2012, compared to 1.22 during 2005-2009. In the present study we chose the CNKI database as the database to avoid selection bias. The CNKI database is the most comprehensive database in China. It achieves the full-text publications of 1217 medical Chinese journals, including 26 journals for TCM and 18 for integrative TCM and modern Western medicine. In addition, two researchers assessed independently the quality of each report by reading its full text. This is in sharp contrast to the previous reports, which evaluated only one or several TCM journals, or evaluated publication on a specific disease [24][25][26][27][28][29]. Thus the present study is the most comprehensive one on TCM RCTs.
Interestingly, we found that none of the manuscripts described change of the methods after trial commencement (Subitem 3b), change of trial outcomes after the trial commenced (Subitem 6b), interim analyses and stopping guidelines (Subitem 7b), premature discontinuation of the trial (Subitem 14b) and additional analyses (Subitem 18). The underlying reason is unknown.

Conclusions
Although some improvements have been made in reporting TCM RCTs, the pace remains slow. And there remains considerable room for further improvement. The problems include optimal design of randomization, the usage of blinding, the calculation of sample size, comparability of baseline information, the clear and definite inclusion and exclusion criteria, the usage of statistical method, the withdrawal and follow-up of patients and the records of adverse events. Doctors practicing TCM should be trained to write high-quality reports and active implementation of the CONSORT guidelines by journals is necessary to make the reports on TCM RCTs more credible and TCM be used more widely in the world as an alternative medicine. We also suggest that a bibliographic database of TCM RCTs, similar to Acu-Trials(R), be developed to enhance the accessibility and quality of TCM RCTs [51].