Tuesday, August 16, 2011

More on Bayesian statistics.

One of the problems about randomized controled trials is that they are about populations not individuals. One of the tyros about RCTs has said, "The individual case is entirely devoid of interest."

For the usual statistics that we talk about there is no doubt that this tyro is correct. Survival curves compare two median survivals. I remember when I first started treating colon cancer, I was impressed that the standard treatment, 5-FU, did not affect median survival, but there were 30% who survived longer than those who got no treatment beyond supportive care. Now I find that I am in that 30%.

In my own specialty of IGHV gene mutations, the observation was not at first about an RCT but about a retrospective look-back at factors that affected survival; had it not been for the fact that two groups made the same observation in December 1999, it is doubtful that it would have made much impact. The observation has since been confirmed in several RCTs, most notably in a trial of stem cell autografts, which showed that even in patients having an autograft the most important prognostic factor is the mutational status of the IGHV genes.

When it comes to the individual, he really does not want to know about the risk of the average population; he wants to know things like what are my chances of getting lung cancer; what are the chances of my road being flooded; what is the risk of a nuclear accident in my town; how many trucks should I buy for my business; how many workers should I take on; and so on with thousands of business or investment decisions.

RA Fisher invented statistical theory. When asked the risk of there being an accidental dropping of an armed nuclear bomb, his answer was zero, because it had never happened before. Other wiser heads prevailed. They discovered that by the time President Kennedy came to power there had been 60 occasions where an unarmed bomb had been dropped or involved in an aircraft crash. With all those prior risks there was bound, one day, to be a more severe explosion. This lead to many new precations to do with arming nuclear devices.

In 1950 Doll and Hill were investigating the cause of lung cancer. No-one could contemplate doing RCTs for chronic diseases; the observation period might be 50 years. Instead they did a look-back at risk factors. Lung cancer had appeared suddenly in the pre-war years. Explanations included finding cases that had been misdiagnosed prveiosly, the availability of chest X-rays, the rapid appearance of motor-cars and their exhausts, the London smog, industrial pollution, tarmacadamed roads and smoking. Looking at various factors it had to be something that was commoner in men and caused disease that got commoner with age. This probably ruled out a genetic factor, but not completely.

Among a large number of lung cancer sufferers, the stand-out statistic was that only two of them were non-smokers at a time when 80% of men smoked. It was so shocking that Doll and Hill and their wives immediately gave up smoking.

RA Fisher was a heavy smoker and received some of his grants from the Tobacco industry. To him, this search for prior indications of the cause smacked of the Bayesian heresy. He would have none of it. Correlation does not prove causation he hollered. It could just as well mean that lung cancer induces people to smoke or perhaps both have a completely different and unrelated cause. Both cigarette smoking and lung cancer are commoner in people who are married, for example. Perhaps we should ban marriage, he thundered.

But when the indirect evidence piled up Fisher was defeated. He would need an hypothesis that explained why more smoking produced greater rates of lung cancer, while stopping smoking reduced the rate, and that explained why pipe and cigar smoking was less dangerous than cigarette smoking and why cigarette tar placed on the skin of a mouse causes cancer.

I still haven't understood Bayes' theorem, but it seems to be something to do with individual risk - which is why it is beloved by actuaries - and also to do with indirect risk factors when you can't get precise ones.

No comments: