Sunday, August 5, 2012

Visualize a random forest that classifies digits

My last post uses random forest proximity to visualize a set of diamond shapes (the random forest is trained to distinguish diamonds from non-diamonds).

This time I looked at the digits data set that Kaggle is using as the basis of a competition for "getting started". The random forest is trained to classify the digits, and this is an embedding of 1000 digits into 2 dimensions preserving proximities from the random forest as closely as possible:

The colors of the points show the correct label. The larger points are digits classified incorrectly, and you can see that in general those are ones that the random forest has put in the wrong "region". I've shown some of the digits themselves (instead of colored points) -- the red ones are incorrectly classified.

Here's the same but just for the 7's:


The random forest has done a reasonable job putting different types of 7's in different areas, with the most "canonical" 7's toward the middle.

You can see all of the other digits  http://www.learnfromdata.com/media/blog/digits/.

Note that this random forest is different from the one in my last post -- here it's built to classify the digits, not separate digits from non-digits. I wonder what kind of results a random forest to distinguish 7's from non-7's would look like?

Code is on Github.

11 comments:

  1. Great job!!Can you explain what " 2 dimensions preserving proximities from the random forest as closely as possible" means..like what metric you used to determine that it stays as close as possible?

    ReplyDelete
  2. This is a broad scope of dialects and toolboxs utilized by Data Scientists. ExcelR Data Science Courses

    ReplyDelete
  3. Should there be a workable carbon credit, farmers and land stewards that chose to invest in forests could begin to see the financial returns on these personal investments that benefit our entire society.Forestry Mulching in Virginia

    ReplyDelete
  4. Attend The data science course in Hyderabad From ExcelR. Practical data science course in Hyderabad Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The data science course in Hyderabad. data science course in Hyderabad

    ReplyDelete
  5. Very nice blogs!!! i have to learning for lot of information for this sites…Sharing for wonderful information.Thanks for sharing this valuable information to our vision. You have posted a trust worthy blog keep sharing, data science online training

    ReplyDelete

  6. Its as if you had a great grasp on the subject matter, but you forgot to include your readers. Perhaps you should think about this from more than one angle.
    data science course

    ReplyDelete
  7. This is a really explainable very well and i got more information from your site.Very much useful for me to understand many concepts and helped me a lot.Best data science courses in hyerabad

    ReplyDelete
  8. keep up the good work. this is an Ossam post. This is to helpful, i have read here all post. i am impressed. thank you. this is our site please visit to know more information
    data science training in courses

    ReplyDelete
  9. But AI is also poised to reinvent other areas of life. One is health care. Hospitals in India are testing software that checks images of a person's retina for signs of diabetic retinopathy, a condition frequently diagnosed too late to prevent vision loss. data science course in india

    ReplyDelete