Hello everyone.
I have a quick question about visualizing data points in a scatter diagram for example. After you have presented the collected data points as a scatter diagram, you can quickly discover the outliers. These outliers could be faults, such as defective on the measuring device. It is therefore advisable to delete these outliers and calculate the new data points using interpolating (interpolating: add neighbouring data points together and take the average). So that we can just show the final values on a scatter diagram which make sense.
But what if these outliers are important? What if you can make important decisions based on these outliers? Such as decisions about diseases? Does it make in some cases sense not to delete the outliers?
I have one example in which case it makes sense not to delete the outliers. This example is: if you want to record the brain so that you can see in which region the patient has malfunctions. On the x-axis it would be the position in the brain and on the y-axis it would be the frequency of the neurons or the muscle contraction. After that, we can show where there are deviations compared to the other regions in the brain.
So, deleting outliers is not always the best solution. But on the other hand, it would make sense if it was really the fault of the devices. It is very difficult to judge how to proceed to correctly visualize the data.
What is your opinion on this? When should you ignore the outliers and how should you decide if they are really mistakes? Could there be a solution that solves this problem and enables correct interpretation?