Miles Brundage

GPT-2-117M-The-office outputs

4/13/2019

Below are raw outputs from a GPT-2-117M model fine-tuned on scripts from The Office for a few hours, which I only lightly edited for appropriateness in a few indicated areas. The training data can be found here (though note that it includes deleted scenes and has some duplicated episodes): https://www.milesbrundage.com/uploads/2/1/6/8/21681226/officefull.txt

The Colab notebook I used (made by Roadrunner01) can be found here: https://colab.research.google.com/github/ak9250/gpt-2-colab/blob/master/GPT_2.ipynb

Read this Twitter thread for some tips and tricks re: fine-tuning GPT-2-117M on your own data: https://twitter.com/Miles_Brundage/status/1107098097186820096

See in particular the link to this page on my website for links to a bunch of GPT-2-fine-tuning friendly datasets, including all of the ones I've used over the past few weeks. Apologies for the ugly layout: https://www.milesbrundage.com/public-gpt-2-fine-tuning-file-repo.html

____________________________________________________

Click here to see the raw outputs. Places where portions were redacted are indicated with "[REDACTED]," although the fact that the remaining text is unredacted should not be interpreted as suggesting that there isn't any offensive comment remaining - I wanted to provide a fair amount of text for people to get a sense of what the outputs are generally like, but also didn't want to spend all day reading through the text carefully, so decide on loosely editing 20 samples.

____________________________________________________

Added 4/16/19: outputs from a model fine-tuned on a dataset of 400 philosophical questions (dataset here). NOTE: outputs contain inappropriate/offensive content.

Added 5/4/2019: GPT-2-345M outputs, fine-tuned in this dataset of Star Wars scripts (original trilogy only). Unfiltered as above.

5 Comments