Friday, August 30, 2013

ICDL 2013 Conference Digest

As I predicted, this year's ICDL-EpiRob conference was really an interesting event. Although the number of submissions (77 submitted, 54 accepted) was not as high as usual - maybe because coming to faraway Japan can still be difficult (e.g. to fund) for many researchers in U.S. and Europe.
The conference this time nevertheless attracted a very international set of authors: there has been a total number of 124 attendees, 51 from Asia, 60 from Europe, and 13 from the U.S., who gave some really nice talks, keynotes, and came up with some quite interesting discussions. Let me give a (very subjective) selection of highlights...

The Award Winners

First and foremost it's certainly worth to mention the award winners. The Best Paper Award went to the paper "A generative probabilistic framework for learning spatial language" by Colin Dawson, Jeremy Wright, Antons Rebguns, Marco Valenzuela Escárcega, Daniel Fried and Paul Cohen (University of Arizona). The paper investigates the learning of spatial language expressions such as "the object left of X", where X is some landmark in the scene. In a task with actual human language expressions it yielded an accuracy of 50% in identifying the referenced object among several others - not perfect, but really not bad compared to the 62% accuracy of human raters.

The Best Student Paper Award went to "Epigenetic adaptation through hormone modulation in autonomous robots" by John Lones and Lola Cañamero (University of Hertfordshire). As a comment on that talk Nicholas pointed out that, at this conference on development, learning, and epigenetic robotics, this was the only paper actually on epigenetic robotics. Well, true story.

Constructive Developmental Science - CDS x 2

On the first day of the conference the special session "Constructive Developmental Science" took place. Two current Japanese research projects, both named like the workshop (not by coincidence as it turned out in the discussion). Prof. Kumagaya gave interesting insights on autistic spectrum disorders and stressed the need to consider the high variability among ASD people. Prof. Myowa presented work on developmental abnormalities in pre-term infants, followed by Prof. Iwata showing work on the circadian rhythm of neonates being determined by the daytime of their birth, and Mitsuru Kikuchi showing MEG studies on social mother-child interaction as well as autism. Then, Prof. Asada presented Osaka's CDS project. All of that was interesting stuff. Finally, Prof. Sandini was invited as discussant. He raised the entire CDS story onto the meta-level, which ended up in a quite deep discussion with the audience of what "kind" of interdisciplinary work is needed to achieve our long-term research goals, and how to facilitate such interdisciplinarity in research and education. While engineers (like me) sometimes read papers from science (e.g. biology/psychology) and make up a computational model plus a corresponding engineering paper, what is often lacking is the feedback from engineering back to science. He therefore stressed the need for engineers involved in CDS to move from descriptive computational models (explaining "how") to explanatory computational models (explaining "why"). Quoting Prof. Asada: "I agree 1000%". What was underrepresented in this debate - in my view - was the reverse question: how to commit scientists to look more into engineering. There still is some lack of incentives to do so... working the interdisciplinary way (and publishing so) is not quite always rewarded when it comes to people looking on your CV. This needs to change.

A matter of perspective

My first highlight in the main sessions was "Understanding Embodied Visual Attention in Child-Parent Interaction" by Sven Bambach, David Crandall and Chen Yu (Indiana University). This study extends the fabulous setup of Chen Yu's lab to investigate mother-child interaction from a child's first-person perspective (head-mounted camera) by using an additional eye-tracking of the infant's gaze. Sounds not so ground-breaking in the first place, but it's far from easy to actually do this with young infants, and apparently it has been done for the first time. The most significant finding of this study (in my view) is a not-finding: pure visual saliency (Itti's model), while it can account for a fair portion of the adult's gaze, does not explain the infants' gaze in these situations. And I mean: not at all! As Jochen Triesch immediately pointed out in the discussion, this is very surprising from a traditional point of view, since young infants' attention should be mostly bottom-up, just like visual saliency. In fact, I think it's not the top-down cognitive component missing here to explain what is happening, but the guidance of cross-modal cues. Really interesting stuff.

Beyond the convenience-sample

I certainly have to say that all of the four keynote speeches have been quite insightful. My favorite, though, was clearly the keynote by Prof. Anne Fernald on infant-directed speech, its impact on language development, and cultural factors thereof. Talking about cultural aspects of language learning and education in general, she referred, e.g., to some African cultures in which parents simply don't have time to "educate" their infants. They have to take care all-day-long that everyone has something to eat, and so on. The child is supposed to learn from observation. Well, those children eventually also learn the language.
But cultural differences also apply within our highly developed, and apparently rather homogeneous, countries. She mentioned that when they invite people to be participants in their lab (located in Stanford), who's normally coming is the U.S.'s top 5%-10% elite in terms of education and salary. That's the convenience sample. If they spend the unusual effort (they do!) to invite people not directly at their front door, but 2 kilometers down the road in Palo Alto, they get the lowest 20% in terms of education and salary. And the experimental results differ. A lot!
Experimental effects that are preserved across such different groups, however, might be said to have real substance. So observed for the effect of child-directed speech on infants' language development. Infants who have much exposure to child directed language learn more and faster. Early exposure (e.g. at the age of 2 years) even is a strong indicator for much later language skill (e.g. when 10 years old). An effect that can also be measured compared to exposure to speech in general (e.g. listing to adults talking to each other, or watching TV), which does not have to seem any (positive) effect on language development. What matters is people actually talking to the child.


Vergence and Action Representations

An interesting study by Luca Lonini (poster presented by Jochen Triesch) concerned binocular vergence as a result of rewarding actions that lead to a good "encodability" of sensory stimuli. If just the vergence angle of the two eyes is learned to improve the joint encoding of the camera images, then both eyes will tend to focus the same thing, basically because the camera images get more similar. This reduces the entropy and allows for an efficient coding. A neat idea... that seems to work well. Only I wondered how generally useful "reducing input entropy" is. It might on different problems lead to avoiding any interesting stimuli. Interesting perspective on vergence, though. 

Another paper of interest for me was "Structural bootstrapping at the sensorimotor level for the fast acquisition of action knowledge for cognitive robots" by Eren Aksoy and colleagues. The paper addresses  the learning of actions in a table-top scenario from (3D) camera observations, and further aims at their reproduction. The core ingredients, as I understood it, are (i) the representation of spatial relations by means of a compact graph structure of involved items, and (ii) the representation of temporal transitions by means of "events", in this case referring to topological changes of the items' graph structure. A complex setup like this necessarily comes with some restrictions, but it seems that the representations work quite well. Plus, they seem very compact, which deserves a deeper look into their work.

Goal Babbling

Besides all that stuff, it was a pleasure for me to see many papers with or about goal babbling. Felix presented our study on combining goal babbling and his associative neural memory approach for versatile sensorimotor coordination. I presented my direction-sampling approach to goal-babbling with totally unknown ranges of the achievable workspace, and was quite happy to receive lots of positive comments on it. But it wasn't only us ...
Also Pierre-Yves Oudeyer's group was very productive this year again. Clément presented a probabilistic approach to get a more unifying perspective on exploration schemes. Fabien Benureau presented work on transfer learning by using goal babbling's previously explored actions. Mai presented a study using goal babbling among other mechanisms within a complex action-for-perception task.

The Lab Tour

Finally, the lab-tour at Osaka University must have been a highlight. I say "must have" because I couldn't attend it: I was presenting at the same time here. But although I have seen quite most of the stuff here already, I would have still joined if I could. It's just pretty cool, and very stimulating, to see all the work done, and the robots built in Hosoda-, Asada-, and Ishiguro-Lab.

Next ICDL in Genoa

So that was this year's ICDL-EpiRob conference... The next one, as it was announced, will take place in Genoa, Italy. ICDL coming to iCub's birthplace! Let's meet there!

No comments:

Post a Comment