Sunday, August 5, 2012

Random forests for visualizing data

Recently I read about using random forests as a way to visualize data. Here's how it works:
  1. Have a data set
  2. Create a set of fake data by permuting the columns randomly -- each column will still have the same distribution, but the relationships are destroyed.
  3. Train a random forest to distinguish the fake data from the original data.
  4. Get a "proximity" measure between points, based on how often the points are in the same leaf node.
  5. Embed the points in 2D in such a way as to distort these proximities as little as possible.
I decided to try this in a case where I would know what the outcome should be, as a way of thinking about how it works. So I generated 931 images of diamonds varying in two dimensions:
  1. Size
  2. Position (only how far left/right)

Then I followed the above procedure, getting this:

Neat! The random forest even picked up on a feature of this space that I wasn't expecting it to: for the same difference in position, small diamonds need to be closer to each other than large diamonds. None of my diamonds have a diameter smaller than 4 pixels, but imagine of the sizes got so small the diamond wasn't even there -- then position wouldn't matter at all of those diamonds.

I set one column of pixels to random values, and the method still worked just as well. (Which makes sense, as the random forest only cares about pixels that help it distinguish between diamonds and non-diamonds.)

A cool technique that I'd love to try some more! For one, I'd like to understand better how it differs from various manifold learning methods. One nice feature is that you could easily use this with a mix of continuous and categorical variables.

Note that starting with Euclidean distance between images (as vectors in R^2500) and mapping points to 2D doesn't seem to produce anything useful:

Code available on github.


  1. What method did you use for creating the embedding?

  2. Kruskal's Non-metric Multidimensional Scaling --

  3. Multidimensional scaling is typically a very bad way of visualizing high-dimensional data. I think one of the best algorithms for visualizations like these is t-SNE (

  4. That method is great! Thanks.

  5. I also use visual materials for my papersowl review, they always help more to render the idea of your information.

  6. Superb. I really enjoyed very much with this article here. Really it is an amazing article I had ever read. I hope it will help a lot for all. Thank you so much for this amazing posts and please keep update like this excellent article.thank you for sharing such a great blog with us. expecting for your.
    Java Training in Chennai
    Java Training in Coimbatore
    Java Training in Bangalore

  7. I am glad that I saw this post. It is informative blog for us and we need this type of blog thanks for share this blog, Keep posting such instructional blogs and I am looking forward for your future posts.
    Cyber Security Projects for Final Year

    JavaScript Training in Chennai

    Project Centers in Chennai

    JavaScript Training in Chennai

  8. פוסט מעניין, משתף עם העוקבים שלי. תודה.
    קבוצת גבאי פייסבוק

  9. This comment has been removed by the author.

  10. Good blog!!! It is more impressive... thanks for sharing with us...

  11. פוסט נחמד. חייב לשתף עם העוקבים שלי.
    מגש אירוח מתוקים

  12. מזל שנתקלתי בכתבה הזאת. בדיוק בזמן
    טבעות אירוסין מיוחדות

  13. This comment has been removed by the author.

  14. הדעות שלי קצת חלוקות בעניין הזה אבל ללא ספק כתבת מעניין מאוד.
    שולחן פינת אוכל

  15. הדעות שלי קצת חלוקות בעניין הזה אבל ללא ספק כתבת מעניין מאוד.
    עיצוב עצמות לחיים

  16. As we look back upon 2015, we would like to acknowledge those who have helped us shape our business. Thanks for a great year, and we wish you all the best as you embark on 2016.

  17. Your topic is very nice and helpful to us … Thank you for the information you wrote.

    Learn Hadoop Training from the Industry Experts we bridge the gap between the need of the industry. Bangalore Training Academy provide the Best Hadoop Training in Bangalore with 100% Placement Assistance. Book a Free Demo Today.
    Big Data Analytics Training in Bangalore
    Tableau Training in Bangalore
    Data Science Training in Bangalore
    Workday Training in Bangalore

  18. הדעות שלי קצת חלוקות בעניין הזה אבל ללא ספק כתבת מעניין מאוד.
    אטרקציות לחתונה

  19. הייתי חייבת לפרגן, תודה על השיתוף.
    משטח פעילות לתינוק

  20. מעולה. תודה על הכתיבה היצירתית.
    השקעה בנדל"ן בארה"ב

  21. מעולה. תודה על הכתיבה היצירתית.
    בלוק עץ עם תמונה

  22. תודה על השיתוף. מחכה לכתבות חדשות.
    מערכת מצלמות אבטחה

  23. I am definitely playing your internet site.
    You really have a few terrific perception and remarkable testimonies.

    click here for more info.

  24. הייתי חייבת לפרגן, תודה על השיתוף.
    חברת שיווק באינטרנט

  25. פוסט נחמד. חייב לשתף עם העוקבים שלי.
    מזרן פעילות

  26. Hello Admin!

    Thanks for the post. It was very interesting and meaningful. I really appreciate it! Keep updating stuffs like this. If you are looking for the Advertising Agency in Chennai | Printing in Chennai , Visit Inoventic Creative Agency Today..

  27. פוסט מרענן במיוחד. לגמרי משתף.
    השקעה בנדלן בישראל

  28. Thanks for explaining the ways to visualize data. It was informative and helped me to get a clear understanding on this topic. Keep posting. Awaiting for your next blog.

    Mobile App Development Company in Chennai

  29. Thank you so much for ding the impressive job here, everyone will surely like your post.
    keto diet pills amazon

  30. great inspiring article.pretty much pleased with your good work.very helpful information...walgreens prescription discount card

  31. Thank you for your post.this post give me lots of advise it is very useful for me.
    Eyelash extensions

  32. This is a nice work. I have loving reading your post first time. thanks for this post.
    best mattress uk

  33. I will be interested in more similar topics. i i will be always checking your blog thanks
    wireless backup cameras for cars