François Fleuret, my PhD advisor, recently gave a talk about object detection at Google (Zurich offices).
You can now see it online:
If you wonder where my research will try to extend the work done so far, just go to minute 45:30!
Thursday, January 8, 2009
Wednesday, September 24, 2008
Machine Learning Summer School
Held in Ile de Ré (France), 1-15th September, this school counted with some famous names within the Machine Learning and Artificial Intelligence communities: Rich Sutton (co-author of the widely adopted book on Reinforcement Learning), Isabelle Guyon (co-author of the first paper on Support Vector Machines) and Yann LeCun (known for the convolutional neural network, energy based models and the DjVu image compression technique).
You can check the (almost) complete list of lecturers here. I found the course given by Shai Ben-David, on the Theoretical Foundations of Clustering" quite interesting and intriguing. Clustering seems to be *really* lacking solid theoretical support, which is surprising, given the importance of the problem. Some atempts are being done to axiomatize it, but there are a lot of open questions: what exactly is the class of clustering algorithms? how can you compare different clustering algorithms? why is a partition better than other?
Hope to see more developments in this area in the coming years.
You can check the (almost) complete list of lecturers here. I found the course given by Shai Ben-David, on the Theoretical Foundations of Clustering" quite interesting and intriguing. Clustering seems to be *really* lacking solid theoretical support, which is surprising, given the importance of the problem. Some atempts are being done to axiomatize it, but there are a lot of open questions: what exactly is the class of clustering algorithms? how can you compare different clustering algorithms? why is a partition better than other?
Hope to see more developments in this area in the coming years.
Tuesday, July 22, 2008
ICVSS 2008
Last week I attended the International Computer Vision Summer School in Sicily, Italy. The main topics were Reconstruction and Recognition. I think the quality of the lectures, organization and location were all quite good, therefore I would recommend it to other PhD students.
Here is a short summary of some of the things we heard about:
Andrew Zisserman (Oxford, UK) - gave an overview of object recognition and image classification, with focus on methods that use "bag of visual words" models. Quite nice for newcomers like me!
Silvio Savarese (UIUC, USA) - talked about 3D representations for object recognition. There is actually a Special Issue of the "Computer Vision and Image Understanding" on the topic at
http://vangogh.ai.uiuc.edu/cviu/home.html
Luc Van Gool (ETH Zurich, Switzerland) - Lots of cool and fancy demos about 3D reconstruction. They are starting to use some recognition to help reconstruction (opposite direction of S. Savarese).
Stefano Soatto (UCLA, USA) - gave an "opinion talk" on the foundations of Computer Vision and how it can be distinguished from Machine Learning. I would have to read his papers to understand better, but he seems to claim that the existence of non-invertible operations such as
occlusions would support the need for image analysis instead of just "brute-force machine learning".
We also had Bill Triggs (CNRS) talking about human detection, Jan Koendrick (Utrecht, Netherlands) on "shape-from-shade" and a few tutorials touching stuff as diverse as: SIFT, object tracking, multi-view stereo and photometric methods for 3D reconstruction or
randomized decision forests.
To summarize, I think the message was:
- Traditionally, recognition uses lots of Machine Learning but models keep few 3D information about objects;
- Traditionally, reconstruction uses ideas from geometry, optics and optimization but not learning;
- The future trend is to merge them: use 3D reconstruction to help in recognition tasks and use recognition to help in 3D reconstruction.
Here is a short summary of some of the things we heard about:
Andrew Zisserman (Oxford, UK) - gave an overview of object recognition and image classification, with focus on methods that use "bag of visual words" models. Quite nice for newcomers like me!
Silvio Savarese (UIUC, USA) - talked about 3D representations for object recognition. There is actually a Special Issue of the "Computer Vision and Image Understanding" on the topic at
http://vangogh.ai.uiuc.edu/
Luc Van Gool (ETH Zurich, Switzerland) - Lots of cool and fancy demos about 3D reconstruction. They are starting to use some recognition to help reconstruction (opposite direction of S. Savarese).
Stefano Soatto (UCLA, USA) - gave an "opinion talk" on the foundations of Computer Vision and how it can be distinguished from Machine Learning. I would have to read his papers to understand better, but he seems to claim that the existence of non-invertible operations such as
occlusions would support the need for image analysis instead of just "brute-force machine learning".
We also had Bill Triggs (CNRS) talking about human detection, Jan Koendrick (Utrecht, Netherlands) on "shape-from-shade" and a few tutorials touching stuff as diverse as: SIFT, object tracking, multi-view stereo and photometric methods for 3D reconstruction or
randomized decision forests.
To summarize, I think the message was:
- Traditionally, recognition uses lots of Machine Learning but models keep few 3D information about objects;
- Traditionally, reconstruction uses ideas from geometry, optics and optimization but not learning;
- The future trend is to merge them: use 3D reconstruction to help in recognition tasks and use recognition to help in 3D reconstruction.
Monday, July 7, 2008
Moved to Switzerland
Since the 1st of July, I am a PhD student at Idiap Research Institute and the Ecole Polytechnique Fédérale de Lausanne.
I am working in Machine Learning and Computer Vision under the supervision of Dr. François Fleuret.
I am working in Machine Learning and Computer Vision under the supervision of Dr. François Fleuret.
Saturday, May 31, 2008
Generating all possible pictures
Think of an image of 800 x 600 pixel and 24 bit of color (8 bit per each RGB component). Its trivial binary representation is a sequence of 11520000 bits (800 x 600 x 24) and we can think of each picture as being a natural number.
Imagine now that we write an computer program that generates all these pictures one by one, incrementing the natural number by one in each round.
Running this algorithm for enough time you would eventually get:
- a picture of your face
- a picture of you in the Moon
- a picture of you with Marlin Monroe and James Dean
- pictures of ancient Earth, with dinosaurs
- pictures of all the paintings of Leonardo da Vinci, Van Gogh or Picasso
- pictures of all the pages of Shakespeare's writings
- pictures of proofs of all relevant mathematical theorems (already proved or not)
- pictures of all great music compositions (already written or not)
- pictures of Microsoft Office and Windows source code
- pictures/printscreens of all pages in the World Wide Web, including all the versions of Wikipedia
Warning: don't do this at home unless you can wait for some billion years between each pair of interesting pictures you would get!
Still, it's interesting to realize that you can compress all the world's information to a short and trivial program, all you have to do is add enough useless data to it!
Imagine now that we write an computer program that generates all these pictures one by one, incrementing the natural number by one in each round.
Running this algorithm for enough time you would eventually get:
- a picture of your face
- a picture of you in the Moon
- a picture of you with Marlin Monroe and James Dean
- pictures of ancient Earth, with dinosaurs
- pictures of all the paintings of Leonardo da Vinci, Van Gogh or Picasso
- pictures of all the pages of Shakespeare's writings
- pictures of proofs of all relevant mathematical theorems (already proved or not)
- pictures of all great music compositions (already written or not)
- pictures of Microsoft Office and Windows source code
- pictures/printscreens of all pages in the World Wide Web, including all the versions of Wikipedia
Warning: don't do this at home unless you can wait for some billion years between each pair of interesting pictures you would get!
Still, it's interesting to realize that you can compress all the world's information to a short and trivial program, all you have to do is add enough useless data to it!
Thursday, May 29, 2008
Monkey with robotic arm
I'm not sure it's recent news, because there is a public release from as back as 2005, but I just came across this video of a monkey eating using a robotic arm directly controlled by his brain. Researchers are from the Pittsburgh University.
Really impressive, although probably a bit tough for the monkey.
Really impressive, although probably a bit tough for the monkey.
Tuesday, May 13, 2008
The amazing intelligence of crows
In this 10min TED talk, Joshua Klein talks about crows and how they are incredibly good learners.
They seem to have a powerful memory, use vision effectively, have problem solving skills, use tools and even learn from examples of other crows. I guess AGI is more than achieved at "crow-level Artificial Intelligence"!!
They seem to have a powerful memory, use vision effectively, have problem solving skills, use tools and even learn from examples of other crows. I guess AGI is more than achieved at "crow-level Artificial Intelligence"!!
How Ant Colonies Get Things Done
Here you have a nice and very informative Google Tech Talk by Dr. Deborah Gordon on how ant colonies work without any central control:
It seems that ants make most of their decisions just based on the frequency they encounter other ants (which have a specific smell according to their role in the colony).
It seems that ants make most of their decisions just based on the frequency they encounter other ants (which have a specific smell according to their role in the colony).
Friday, May 9, 2008
Science in Summer time
If everything goes as planned this year I am attending two Summer Schools.
The first one, the International Computer Vision Summer School 2008 , will be hosted in Sicily, Italy in 14-19 July. The program seems to be quite good and it will cover topics like object detection, tracking or 3D reconstruction, among others. There's also a reading group on "how to conduct a literature review and discover the context of an idea". The challenge is to see how far back in the past one can track the origins of a scientific idea. For example, the AdaBoost is a well known machine learning meta-algorithm, in which a sequence of classifiers is progressively trained focusing on the instances misclassified by previous classifiers. The set of classifiers is then combined by a weighted average. It was introduced by Freund and Schapire in 1996. This is easy to track, the question however is: can you find the same or similar core idea, or intution, somewhere else back in the past? Possibly from a different domain?
It's gonna be fun!
The second one is the 10th Machine Learning Summer School, 1-15 September, Ile de Re, France. The program is also quite nice, but I still don't have the confirmation I can attend it.
I would be specially interested in Rich Sutton's lecture on "Reinforcement Learning and Knowledge Representation" although hearing about Active Learning, Bayesian Learning, Clustering, Kernel Methods, etc. also sounds quite appealing.
Looking forward to science in summer time!
The first one, the International Computer Vision Summer School 2008 , will be hosted in Sicily, Italy in 14-19 July. The program seems to be quite good and it will cover topics like object detection, tracking or 3D reconstruction, among others. There's also a reading group on "how to conduct a literature review and discover the context of an idea". The challenge is to see how far back in the past one can track the origins of a scientific idea. For example, the AdaBoost is a well known machine learning meta-algorithm, in which a sequence of classifiers is progressively trained focusing on the instances misclassified by previous classifiers. The set of classifiers is then combined by a weighted average. It was introduced by Freund and Schapire in 1996. This is easy to track, the question however is: can you find the same or similar core idea, or intution, somewhere else back in the past? Possibly from a different domain?
It's gonna be fun!
The second one is the 10th Machine Learning Summer School, 1-15 September, Ile de Re, France. The program is also quite nice, but I still don't have the confirmation I can attend it.
I would be specially interested in Rich Sutton's lecture on "Reinforcement Learning and Knowledge Representation" although hearing about Active Learning, Bayesian Learning, Clustering, Kernel Methods, etc. also sounds quite appealing.
Looking forward to science in summer time!
Monday, April 14, 2008
How difficult is Vision?
Lately I have been wondering about the problem of Vision and how difficult it should be compared to problem of Artificial General Intelligence.
It seems to me that, given the order that it happened in Nature, processing visual input should be much simpler than using language or reasoning. I say this because there are quite simple animals with eyes, say a fish, a frog or a mouse... As I am not a biologist or neurologist, I am not sure what kind of visual tasks these animals are able to perform. For example, can a mouse tell if there is a cat in a picture or not?
In any case, I guess that these neuronal systems, much simpler than the human brain, are able to solve tasks that we have not yet achieved with Computer Vision algorithms.
If that's the case, I have two questions to my readers, who hopefully can help me clarify these issues:
- What is the "perfect" biological system to understand vision? It should be powerful enough to solve problems that we are interested in, such as distinguishing between different objects, but it should also have a relatively simple brain. Any ideas?
- If animals without human-level intelligence use vision quite effectively, does this mean that Artificial Intelligence will follow the same order of achievements? Or given the properties of computers, it will turn out to be easier to do reasoning, planning or even language processing?
Looking forward to reading your comments.
It seems to me that, given the order that it happened in Nature, processing visual input should be much simpler than using language or reasoning. I say this because there are quite simple animals with eyes, say a fish, a frog or a mouse... As I am not a biologist or neurologist, I am not sure what kind of visual tasks these animals are able to perform. For example, can a mouse tell if there is a cat in a picture or not?
In any case, I guess that these neuronal systems, much simpler than the human brain, are able to solve tasks that we have not yet achieved with Computer Vision algorithms.
If that's the case, I have two questions to my readers, who hopefully can help me clarify these issues:
- What is the "perfect" biological system to understand vision? It should be powerful enough to solve problems that we are interested in, such as distinguishing between different objects, but it should also have a relatively simple brain. Any ideas?
- If animals without human-level intelligence use vision quite effectively, does this mean that Artificial Intelligence will follow the same order of achievements? Or given the properties of computers, it will turn out to be easier to do reasoning, planning or even language processing?
Looking forward to reading your comments.
Subscribe to:
Posts (Atom)