The article under discussion here  approaches this problem by seeking to create and define domains that can be used for assessing the degree to which a homeopathy RCT reflects homeopathic practice. This follows the track laid down in evidence-based medicine by applying various “quality criteria” for rating published studies. Whereas this conventional approach attempts to use criteria that are flexible enough to apply to any RCT, the Mathie et al. article specifically takes on the task of providing criteria that are tailored to homeopathic research studies. Both approaches suffer from their dependence on how the RCTs were described in publication, as opposed to how they were carried out in the real world.
In conventional medicine, the rise of quality criteria (which now exist in dozens of versions; ) was in response to problems encountered in writing systematic reviews. It was perceived to be very difficult to summarize the results from multiple publications when there was so much variability in how the trials were described and how the results were presented. Thus projects like CONSORT were oriented toward formalizing how certain kinds of information should be communicated, to make it easier to compile them into systematic reviews .
It did not take very long, however, for reporting criteria to be interpreted as scientific criteria. “Reporting quality standards” very quickly became “quality standards” . As such, most lists of quality standards contain no substantive evaluation of the scientific quality of the articles to which they are applied. For example, when it comes to statistical analysis of the results, the only quality scales that even mention this aspect contain some vague comment about “appropriateness”, without further specification. (The one exception to this that I have seen was a very specific list, half of which I regard as mistaken.) The issue this raises is: if it is so difficult to get consensus on what criteria determine the difference between good and not-so-good science publications, then how are we to have any confidence that we are educating researchers to be able to tell the difference?
From this viewpoint Mathie et al. can be seen as an attempting to say in the simplest possible terms, what distinguishes genuine homeopathy research from something that only seems like homeopathy research. I would imagine there are several reasons why this is a good idea. First, a number of researches into homeopathy have been designed, funded, and carried out by individuals who had insufficient understanding of how homeopathy is actually practiced. Thus at least three of Mathiie et al.’s six domains depend on expert homeopathic judgment, in one way or another. Secondly, when funding for homeopathy research became available relatively recently, the pool of potential homeopathy-savvy investigators was generally weak in medical research knowledge or experience, leading to the publication of studies that could be impugned on scientific grounds. Thus, the other three of Mathie et al.’s criteria ask for scientific judgments about the appropriateness of the research design (again relying to some extent on an understanding of homeopathy).
Given that the objective of Mathie et al. is worthy, I think one can question whether their work is extensive enough, since it involved only six evaluated articles by eight reviewers. In addition, evidently the reviewers are the same ones who were involved in the consensus development of the criteria, whereas it is generally more realistic to test judgment-based criteria with a fresh set of reviewers. More evidence is needed on the degree of reviewer concordance, and in a more representative pool of articles.
It was of some interest to me that Mathie et al. did not endorse the combination of their six items into a scale. For some time now it has seemed to me that adding item scores requires some considerable justification, which may actually be absent. For example, if the investigators did not blind subjective outcome assessments, they should get a zero on that item, and it should then be multiplied by the sum of the other items. In other words, a critical error should not have the effect of merely diminishing an overall quality score; it should obliterate the score.