Sunday, August 5, 2012

Random forests for visualizing data

Recently I read about using random forests as a way to visualize data. Here's how it works:
  1. Have a data set
  2. Create a set of fake data by permuting the columns randomly -- each column will still have the same distribution, but the relationships are destroyed.
  3. Train a random forest to distinguish the fake data from the original data.
  4. Get a "proximity" measure between points, based on how often the points are in the same leaf node.
  5. Embed the points in 2D in such a way as to distort these proximities as little as possible.
I decided to try this in a case where I would know what the outcome should be, as a way of thinking about how it works. So I generated 931 images of diamonds varying in two dimensions:
  1. Size
  2. Position (only how far left/right)


Then I followed the above procedure, getting this:

Neat! The random forest even picked up on a feature of this space that I wasn't expecting it to: for the same difference in position, small diamonds need to be closer to each other than large diamonds. None of my diamonds have a diameter smaller than 4 pixels, but imagine of the sizes got so small the diamond wasn't even there -- then position wouldn't matter at all of those diamonds.

I set one column of pixels to random values, and the method still worked just as well. (Which makes sense, as the random forest only cares about pixels that help it distinguish between diamonds and non-diamonds.)

A cool technique that I'd love to try some more! For one, I'd like to understand better how it differs from various manifold learning methods. One nice feature is that you could easily use this with a mix of continuous and categorical variables.

Note that starting with Euclidean distance between images (as vectors in R^2500) and mapping points to 2D doesn't seem to produce anything useful:

Code available on github.

93 comments:

  1. What method did you use for creating the embedding?

    ReplyDelete
  2. Kruskal's Non-metric Multidimensional Scaling -- http://stat.ethz.ch/R-manual/R-patched/library/MASS/html/isoMDS.html

    ReplyDelete
  3. Multidimensional scaling is typically a very bad way of visualizing high-dimensional data. I think one of the best algorithms for visualizations like these is t-SNE (http://homepage.tudelft.nl/19j49/t-SNE.html).

    ReplyDelete
  4. I also use visual materials for my papersowl review, they always help more to render the idea of your information.

    ReplyDelete
  5. פוסט מעניין, משתף עם העוקבים שלי. תודה.
    קבוצת גבאי פייסבוק

    ReplyDelete
  6. This comment has been removed by the author.

    ReplyDelete
  7. Good blog!!! It is more impressive... thanks for sharing with us...
    123movie

    ReplyDelete
  8. פוסט נחמד. חייב לשתף עם העוקבים שלי.
    מגש אירוח מתוקים

    ReplyDelete
  9. This comment has been removed by the author.

    ReplyDelete
  10. הדעות שלי קצת חלוקות בעניין הזה אבל ללא ספק כתבת מעניין מאוד.
    שולחן פינת אוכל

    ReplyDelete
  11. הדעות שלי קצת חלוקות בעניין הזה אבל ללא ספק כתבת מעניין מאוד.
    עיצוב עצמות לחיים

    ReplyDelete
  12. Your topic is very nice and helpful to us … Thank you for the information you wrote.

    Learn Hadoop Training from the Industry Experts we bridge the gap between the need of the industry. Bangalore Training Academy provide the Best Hadoop Training in Bangalore with 100% Placement Assistance. Book a Free Demo Today.
    Big Data Analytics Training in Bangalore
    Tableau Training in Bangalore
    Data Science Training in Bangalore
    Workday Training in Bangalore

    ReplyDelete
  13. הדעות שלי קצת חלוקות בעניין הזה אבל ללא ספק כתבת מעניין מאוד.
    אטרקציות לחתונה

    ReplyDelete
  14. הייתי חייבת לפרגן, תודה על השיתוף.
    משטח פעילות לתינוק

    ReplyDelete
  15. מעולה. תודה על הכתיבה היצירתית.
    בלוק עץ עם תמונה

    ReplyDelete
  16. תודה על השיתוף. מחכה לכתבות חדשות.
    מערכת מצלמות אבטחה

    ReplyDelete
  17. I am definitely playing your internet site.
    You really have a few terrific perception and remarkable testimonies.

    click here for more info.

    ReplyDelete
  18. הייתי חייבת לפרגן, תודה על השיתוף.
    חברת שיווק באינטרנט

    ReplyDelete
  19. פוסט נחמד. חייב לשתף עם העוקבים שלי.
    מזרן פעילות

    ReplyDelete
  20. פוסט מרענן במיוחד. לגמרי משתף.
    השקעה בנדלן בישראל

    ReplyDelete
  21. Thanks for explaining the ways to visualize data. It was informative and helped me to get a clear understanding on this topic. Keep posting. Awaiting for your next blog.

    Mobile App Development Company in Chennai

    ReplyDelete
  22. Thank you so much for ding the impressive job here, everyone will surely like your post.
    keto diet pills amazon

    ReplyDelete
  23. great inspiring article.pretty much pleased with your good work.very helpful information...walgreens prescription discount card

    ReplyDelete
  24. Thank you for your post.this post give me lots of advise it is very useful for me.
    Eyelash extensions

    ReplyDelete
  25. This is a nice work. I have loving reading your post first time. thanks for this post.
    best mattress uk

    ReplyDelete
  26. I will be interested in more similar topics. i i will be always checking your blog thanks
    wireless backup cameras for cars

    ReplyDelete
  27. Thanks for sharing this article with us this article provide us valuable information.
    אינטרקום עם מצלמה

    ReplyDelete
  28. Really nice and interesting post. I was looking for this kind of information and enjoyed reading this one.
    genf20 plus walmart

    ReplyDelete
  29. Thank you so much for ding the impressive job here, everyone will surely like your post.
    https://phentermineguides.com/phenq-amazon/

    ReplyDelete
  30. This is a great inspiring article.I am pretty much pleased with your good work.You put really very helpful information...genf20 plus

    ReplyDelete
  31. Lovely pictures, awesome these are looking so funny interesting but professional and artistic pics.


    Global Asset Management Korea

    ReplyDelete
  32. Hi I m Harry Thomas working with Cash App Help. We work towards making the customer experience of making payments through Cash App simple and easier. Contact us for any type of query. https://hearthis.at/cashapphelps/

    ReplyDelete
  33. Quite entertaining indeed and greatly appreciated!Thank you! I truly like it.


    Kald Gart Garage Doors Calgary

    ReplyDelete
  34. When you are trading within the Crypto Era Pro market, you'll be able to trade anonymously. The currency isn't tied to any particular country and there are even no rules designed for it. Even small businesses are using bitcoins as a result of there's no transaction fee concerned in the exchange. If you've got some savings, you can invest that money to shop for bitcoins and to achieve profit as a result of the worth of this digital currency is predicted to go up
    https://www.cryptoerapro.com/

    ReplyDelete
  35. The market places where digital currencies are exchanged are known as Crypto Era Pro exchanges. They're the places where folks get and sell bitcoins by using the currencies of their respective countries. You simply want a wallet software, open an account, and then get bitcoins from the cash you have in your account so as to become ready for the exchanges. Folks are even transferring digital currencies through their Smartphones. There are mobile apps on the market for this purpose. You'll be able to either purchase bitcoins from on-line exchanges or get them from special ATMs.

    Mining is another possibility employed in the digital currency market. It's a method in which traders have to resolve mathematical puzzles to win bitcoins. It's a robust and time taking method, but if you get it right then you will win 25 bitcoins. This can simply happen in 10 minutes.
    https://www.cryptoerapro.com/

    ReplyDelete
  36. חשבתי שאתקל בסתם עוד מאמר שטחי. טעיתי.
    https://gooddog.co.il/

    ReplyDelete
  37. I would be grateful if you continue with the quality of what we are doing now with your blog ... I really enjoyed it


    appliance repair mississauga

    ReplyDelete
  38. I have found so many interesting thing in your blog and I really love that. commercial cleaning service west palm beach Keep up the good work!

    ReplyDelete
  39. This is very nice blog because information provided here through the article and the pictures are very effective. roof contractor west palm beach Because sometimes words cannot explain the things that pictures can and here the words and pictures both are expressing the things in balance.

    ReplyDelete
  40. Wow, what a blog! I mean, you just have so much guts to go ahead and tell it like it is. You're what blogging needs, an open minded superhero who isnt afraid to tell it like it is. This is definitely something people need to be up on. Good luck in the future, man


    Garage Door Cable Repair

    ReplyDelete