My AI Forecasts--Past, Present, and Future (Supplement)

1/3/2017

Warning: less well-written than main post

Methodology for Past Forecast Review

I downloaded a CSV file with all of my tweets and searched for all tweets with the strings forecast*, predict*, extrapolat*, state of the art*, SOTA*, and expect*. This may have missed a few predictions, and there are some forecasts that I’ve made in places other than Twitter, but this method has probably covered the vast majority of predictions, as I’m pretty tweet-prone.

It turns out that there were a lot more than I thought (I forgot about a lot of the less rigorous ones), and the forecasts have different implicit (and sometimes explicit) confidence levels and focuses (e.g. quantifiable technical achievements vs. social adoption of/responses to AI).

For each of the forecasts below, which are arranged in chronological order, I’ll reproduce the text of the tweet, and then say something about how it fared. I didn’t reproduce every single forecast-y tweet here because some are extremely vague or otherwise uninteresting, but here is a link to the spreadsheet on which this blog post was based if you’re interested/want to check (and my entire tweet history if you’re super skeptical about data missing from that curated spreadsheet).

Annotated List of Forecasts

I expected that CMU, a NASA-related team, and the Institute for Human and Machine Cognition (IHMC) would do well in the first (virtual) round of the DARPA Robotics Challenge:

Looking forward to the first DARPA Robotics Challenge results on Thursday. My bet is CMU, one of the NASA-related teams, and IHMC do well.
— Miles Brundage (@Miles_Brundage) June 26, 2013

This was a decent forecast (much better than chance under some interpretations of what I meant, though I was pretty vague), as IHMC got first place out of 28 teams and a JPL-related team got fifth place out of 28. This is definitely better than a random prediction. The DARPA Robotics Challenge website is no longer live, so I am having trouble verifying how CMU did in this round. I assume they weren’t in the top six based on what I later tweeted:

I was two out of three with my DARPA Robotics Challenge predictions...IHMC did the best - not surprised.
— Miles Brundage (@Miles_Brundage) June 27, 2013

I had previously done an internship at IHMC and had personally seen that they were putting a lot of effort into the DRC, so I probably don’t deserve much credit for this forecast. I also didn’t put much work into making it.

Later, I doubled down on this IHMC-boosterism:

If you're in Florida, consider checking out the DARPA Robotics Challenge live on Dec. 20-21 http://t.co/L41D4Wi5A4 My money is on IHMC!
— Miles Brundage (@Miles_Brundage) December 9, 2013

DARPA Robotics Challenge is Friday and Saturday! My bet is still on IHMC. Anyone else have a favorite?
— Miles Brundage (@Miles_Brundage) December 19, 2013

They got second place, but due to events I did not predict (SCHAFT, the winner, being bought by Google and dropping out), this retroactively improved:

So now that Google-SCHAFT is out of the DRC, my prediction of IHMC doing well has retroactively improved, they won rounds 1 and 2! ;-)
— Miles Brundage (@Miles_Brundage) June 26, 2014

Again, I don’t think I get much credit for this.

In early 2015, I said some things about DeepMind's likely work in 2015:

In 2015 I think DeepMind will prob demo some sort of mind blowing learning thing in a 3D world or at least much-richer-than-Atari 2D world.
— Miles Brundage (@Miles_Brundage) January 1, 2015

I don’t know what evidence I based this on, if any, or what counts as a “mind blowing learning thing in a 3D world,” but I think this basically happened a bit later than I expected: the A3C paper showing early impressive results in Labyrinth came out in early 2016. Fortunately, this was within my vague confidence interval:

On error bars: wouldn't be stunned if what I said re: DeepMind demo happened in 2016 not 2015, but if not in 2016 then my model is v. wrong.
— Miles Brundage (@Miles_Brundage) January 1, 2015

In early 2015, I had a vague and pretty incorrect model of what DeepMind and others were trying to do with games – roughly, move forward through time/game complexity space (with newer games generally being harder for AI) and show impressive learning across a wide variety of games for that point in time/complexity space. Based on this, I said:

Mode prediction for where in videogame chronology/complexity space DeepMind will have impressively dominated many hard games in 2016 is 2000
— Miles Brundage (@Miles_Brundage) January 1, 2015

This model of what DeepMind and others are up to turned out to be a bit misguided, since they’re still publishing a lot of results with (old) Atari games, and making brand new environments that don’t easily map onto the metric above (since the environments have highly variable difficulty/complexity). I was wrong for thinking that they’d try to move on before having more definitively solved Atari and games that are not well captured in that metric (e.g. Go – thousands of years old, but still pretty hard). Nevertheless, if you wanted to be generous and take “DeepMind” to refer to the broader AI community, you could say that OpenAI’s Universe covers a lot of Flash games from the early 2000s, some of which deep reinforcement learning (RL) works pretty well on. But overall, I’d say this was a misguided and vague forecast. I did caveat it a bit:

DeepMind *could* focus on playing higher fraction of old games w/o input, but they're also simultaneously moving forward in time game-wise.
— Miles Brundage (@Miles_Brundage) January 6, 2015

Regarding non-game stuff, I said:

DeepMind will prob someday (if they haven't already) do non-game stuff, but for now that's their metric, with some reason - it's very hard!
— Miles Brundage (@Miles_Brundage) January 1, 2015

DeepMind has since applied deep RL to data center energy management and deep learning to healthcare. They have also used non-game domains for benchmarks in research (e.g. MuJoCo). But this was a pretty uninteresting/banal prediction (it’s pretty obvious they would have done something non-game-related eventually).

Anti-prediction for DeepMind 2015-2016: them playing Destiny or other current video game. Way too hard/not worth their time except for fun.
— Miles Brundage (@Miles_Brundage) January 6, 2015

As far as I know, this was correct, unless you count StarCraft 2 as a “current” video game.

Another key pt on DeepMind's near-term game stuff: suspect some of the impressive results they show will *not* be fully autonomous learners.
— Miles Brundage (@Miles_Brundage) January 6, 2015

Arguably, this was ultimately true of AlphaGo – its learning was kickstarted with a dataset of human play, though they have said they’ll explore learning from scratch in the future.

Elaboration on previous 2015-2016 DeepMind predictions: simultaneous to video game stuff, they will prob make some big progress on Go. (1/2)
— Miles Brundage (@Miles_Brundage) January 6, 2015

This was based on the early results from Maddison et al. (including some DeepMind authors) in late 2014 that seemed to suggest to me that they might work more on it in the future and that deep learning could help a lot.

My money is on IHMC doing well in, if not winning, DARPA Robotics Challenge finals. Will be v. interesting to see how the Chinese team does.
— Miles Brundage (@Miles_Brundage) March 21, 2015

IHMC got second (same number of points as the winner, KAIST, but with a slower time) and the Chinese team did poorly.

Regarding speech recognition, in late 2015, i said:

2. Think 2016 will be year in which it's pretty clear that speech recognition is now of broad utility. Also, note role hardware played in...
— Miles Brundage (@Miles_Brundage) December 17, 2015

There wasn’t a clear metric for this. There was a lot of coverage of speech recognition in the tech press, and some impressive (nearly) human-level results, but I’m not sure whether 2016 represented any sort of shift in terms of wide adoption. Anecdotally, it seems more widely used in Beijing than in Western countries, but I don’t know for sure.

As part of a longer rant in 2016, I said:

6. And I no longer think massive progress in AI in, say, 10 years is implausible - now seems plausible enough to plan for possibility of it.
— Miles Brundage (@Miles_Brundage) January 8, 2016

And:

8. I expect enough prog that "human-level AI" will be more clearly revealed as a problematic threshold, and in many domains, long surpassed.
— Miles Brundage (@Miles_Brundage) January 8, 2016

10. access to the Internet is allowed, a la https://t.co/Vs9KX98v3p
— Miles Brundage (@Miles_Brundage) January 8, 2016

This was pretty vague, and the timeline in question is still ongoing, so I can’t evaluate it yet.

Regarding hardware and neural network training speeds, I said:

2. This would affect, as prior hardware improvements have affected, three things: attainable performance, speed thereof, and iteration pace.
— Miles Brundage (@Miles_Brundage) January 15, 2016

3. And that's all just from hardware - algorithmic advances have also been rapid in recent years, though I haven't yet quantified that rate.
— Miles Brundage (@Miles_Brundage) January 15, 2016

4. Seems like a not too crazy projection is that in, say, 3 years, neural nets will be 100x faster to train, w/ big impacts on applications.
— Miles Brundage (@Miles_Brundage) January 15, 2016

(sorry for the bad formatting here)

6. These are just rough ideas currently - may do more rigorous calculation with error bars at some point. Point is, expect much NN progress.
— Miles Brundage (@Miles_Brundage) January 15, 2016

I’m still pretty confident that hardware is speeding up and will speed up neural net training a lot, but we’ll have to wait until early 2019 to evaluate the 100x thing. I’ll try to specify it a bit better now: for a multiple expert-suggested set of 10 benchmarks in image recognition and NLP, you will be able to achieve the same the same performance, using new hardware and algorithms, in 100x less training time (wall time) vs. results reported in early 2016 on at least 8 of those benchmarks. This is a rough, intuitive guess, so I have less confidence in it than some of my more quantitative extrapolations of Atari results discussed below.

Regarding AlphaGo’s success against Lee Sedol, I said in the middle of the match:

Predicted AlphaGo victory w/ 65% confidence and 4-1/5-0 for whichever victor w/ 90% confidence, so not too late for me to be very wrong.. :)
— Miles Brundage (@Miles_Brundage) March 12, 2016

This reference to a prior prediction was based on a Facebook comment I made before the match began, which in turn elaborated on views expressed in a blog post I wrote. The comment (not publicly linkable, unfortunately), on March 1, said:

I think my reasoning at the time was essentially correct, deep RL is in fact very effective and scalable for well-defined zero-sum games, and that my conclusions in the aforementioned blog post (about the importance of hardware in this case and humans occupying a small band in the space of possible intelligence levels) are still correct. But lots of people thought AlphaGo would win, and I wasn’t extremely confident, so don’t get much credit for this.

Regarding dialogue systems:

In few years, I expect impressive (by today's standards, though maybe not future revised ones) limited dialogue AIs from Goog, IBM, FB, etc.
— Miles Brundage (@Miles_Brundage) March 12, 2016

I still believe this, but “a few years” haven’t yet passed so don’t have much more to say about this right now, other than that it is probably too vague.

Regarding Google’s business model for AI:

@samim my expectation is that they will gradually, over next 10 years, introduce more, better, and more integrated cognition-as-a-service.
— Miles Brundage (@Miles_Brundage) March 24, 2016

Again, it’s early for this, but this seems pretty plausible to me.

Atari Forecasts

See main blog post.

1 Comment

My AI Forecasts--Past, Present, and Future (Supplement)

Leave a Reply.

Author

Archives

Categories