FastML

Machine learning made easy

How a Russian mathematician constructed a decision tree - by hand - to solve a medical problem

Here’s an excerpt from Love and Math, a book by Edward Frenkel. The author writes about mathematics and his career. One of the stories is about how during his studies in the 80s he built a decision tree to help with kidney transplants. There was no machine to learn from data so humans had to do the work.

The third, and last, medical project I worked on was the most interesting one for me. A young doctor, Sergei Arutyunyan – who also needed help to analyze his data for a thesis – and I had a great rapport. He was working with patients whose immune systems were rejecting transplanted kidneys. In such situation the doctor has to make a quick decision whether to fight for the kidney or remove it, with far-reaching consequences: if they kept the kidney, the patient could die, but if they removed it, the patient would need another one, which would be very difficult to find.

Sergei wanted to find a way to tell which recommendation was statistically most viable, based on quantitative ultrasound diagnostics. He had much experience in this area and collected a lot of data. He hoped that I could help him to analyze this data and come up with meaningful objective criteria for decision-making that could be useful to other doctors. He told me that no one had yet been able to do this; most doctors thought this was impossible and preferred to rely on their own ad hoc approaches.

I looked at the data. Like in our previous projects, there were about forty different parameters measured for each patient. During our regular meetings, I would ask Sergei pointed questions, trying to figure out which of these data were relevant and which weren’t. But this was hard. Like other doctors, he would give his answers based on specific cases, which was not very helpful.

I decided to use a different approach. I thought, “This man makes these kinds of decisions every day, and obviously he is very good at it. What if I manage to learn to ‘be him’? Even if I don’t know much about the medical aspects of the problem, I could try to learn his methodology following his decision-making process, and then I could use this knowledge to come up with a set of rules.”

I suggested that we play a kind of a game. Sergei had collected data on approximately 270 patients. I chose, randomly, the data for thirty of them and put aside the rest. I would take the history of each of these randomly chosen patients and have Sergei, who was sitting at the opposite corner of the office, ask me questions about the patient, which I would answer by consulting the file. My goal in all this was to try to understand the pattern of his questions (even if I could not possibly know the meaning of these questions as well as he did). For example, sometimes he would ask different questions, or the same questions, but in a different order. In such a case, I would interrupt him: “Last time you did not ask this. Why are you asking it now?”

And he would explain, “Because for the last patient the volume of the kidney was so and so, and this ruled out this scenario. But for the current patient it is so and so, and so this scenario is quite possible.”

I would make notes of all this and try to internalize this information as much as possible. Even so many years later, I can picture it well: Sergei sitting in a chair in the corner of his office, deep in thought, puffing on a cigarette (he was a chain-smoker). It was fascinating to me to try deconstructing the way he thought – it was kind of like trying to undo a jigsaw puzzle to find out what the essential pieces were.

Sergei’s answers gave me extremely valuable information. He would always arrive at the diagnosis after no more than three or four questions. I would then compare it with what actually happened to each patient. He was always spot on.

After a couple dozen cases, I could already make the diagnosis myself, following the simple set of rules that I learned while interrogating him. After half a dozen more, I was practically as good as he was in predicting the outcome. There was in fact a simple algorithm at play that Sergei was following in most cases.

Of course, there were always a handful of cases in which the algorithm would not be useful. But even if one could derive effectively and quickly the diagnosis for 90 to 95 percent of the patients, this would already be quite an achievement. Sergei told me that in the existing literature on the subject of ultrasound diagnostics, there was nothing of this sort.

After completing our “game,” I derived an explicit algorithm that I’ve drawn as a decision tree below. From each node of the tree there are two edges down to other nodes; the answer to a specific question at the first node dictates which of the next two possible nodes the user should go to. For example, the first question is about the index of peripheral resistance (PR) of the blood vessel inside the transplant. This was a parameter Sergei himself had come up with in his research. If its value was greater than 0.79, then it was highly likely that the kidney was being rejected, and the patient required immediate surgery. In this case, we move to the black node on the right. Otherwise, we move to the node on the left and ask the next question: what is the volume (V) of the kidney? And so on. Each patient’s data therefore gives rise to a particular path on this tree. The tree terminates after four or fewer steps (it is not important to us at the moment what the remaining two parameters, TP and MPI, were). The terminal node contains the verdict, as shown on this picture: the black node means “operate” and the white node means “do not operate.”

the decision tree

I ran the data of the remaining 240 or so patients, whose files I had put aside, through the algorithm. The agreement was remarkable. In about 95 percent of the cases, it led to an accurate diagnosis.

The algorithm described in simple terms essential points of the thought process of a doctor making the decision, and it showed which parameters describing the patient’s condition were most relevant to the diagnosis. There were only four of them, narrowing down the initial slate of forty or so. For example, the algorithm showed the importance of the index of peripheral resistance that Sergei had developed, measuring the flow of blood through the kidney. That this parameter played such an important role in the decision-making was, by itself, an important discovery. All of this could be used in further research in this area. Other doctors could apply the algorithm to their patients, test, and perhaps fine-tune it to help make it more efficient.

We wrote a paper about this, which became the basis for Sergei’s doctoral thesis, and applied for a patent that was approved a year later.

Edward Frenkel has a few videos on YouTube.

Comments