ＮＷＡ「第１３回スプレク参加ウオーク in たるはま」をアップしました。
The Mobile+AI Lab community is now 392 people strong!
This move to a more open and transparent process has been extremely rewarding: we are collecting great ideas and getting instant feedback on some of the bolder changes we are exploring. (See the detailed changelog below.)
We are also getting a lot of great bug reports. Thank you!
Our focus over the past 10 days was to address those bugs and polish the new experience (including scrolling, navigation, views, mark-as-read options, and night mode) so that we have a solid foundation to invite more people to the Lab and explore new ideas.
Three questions for the community:
Question 1: We did some night mode color tuning. Better? If not, can you please let us know 1/ if there is a specific part of the theme we need to refine, and 2/ what is your luminosity level?
Question 2: Are there other bugs you would like us to fix? Please post them in the “bugs” channel in Slack.
Question 3: Is there a missing feature which is not currently in the Trello roadmap that is preventing you from adopting the new app? Please post it in the “general” channel.
We love building the new Feedly in the open and look forward to the next 12 weeks!
Here is a more detailed changelog for this week’s bug hunt:
- Scroll and tap to open conflict (via Jesse Flanagan, John, and Kireet)
- Empty screen when loading an article after switching app (via Ryan)
- Improve scrolling performance (via Chad Hudson and Cyril)
- Double tap on selected bottom tab bar should scroll to the top (via Lior)
- Contextual mark as read copy improvement (mark all as read versus mark current top articles as read)
- Mark as read missing items (via Dan Newman)
- Mark as read and refresh of the All section (via Eric L.)
- Move to next feed and gray out articles after Mark all as read (via Serge Courrier)
- Night mode: Improve the brightness/contrast of the mark-as-read notification (via Dan Newman)
- Night mode: Better read vs unread distinction (via Lee Sprung)
- Night mode: Article separators (via Daniel)
- Read later tab is not selected (via Lior)
- Toggle read later bug (via Lior)
- Do not automatically mark-as-read articles that the user manually kept unread (via Gabe)
- Auto-mark-as-read-on-scroll: improvements regarding marking-as-read the last articles on the page
- Refresh automatically when the user launches the app after minutes (via sryo)
- Speed up the close animation (Aaron M.)
- Make compact view even more compact by inlining the source and date metadata
- Mark-all-as-read button at the end of a list of articles
- Second level left-to-right gesture to trigger the save to board action (via Eiselch)
- Progress circle needs to be reset after changing sort or layout preference (via John)
Thank you for your time and participation.
If you aren’t yet part of the Lab and you would like to participate, you can join here.
In our previous post we showed how we could use CNNs with transfer learning to build a classifier for our own pictures. Today, we present a recent trend of transfer learning in NLP and try it on a classification task, with a dataset of amazon reviews to be classified as either positive or negative. Have a look at the notebook to reproduce the experiment on your own data!
The ideas of transfer learning in NLP are very well presented in the fast.ai course and we encourage you to have a look at the forum. Our reference paper here is Howard, Ruder, “Universal Language Model Fine-tuning for Text Classification”.
So what is Transfer Learning?
Computer Vision is a field which has seen tremendous improvements because of transfer learning. Highly non-linear models with millions of parameters required massive datasets to train on, and often took days or weeks to train, just to be able to classify an image as containing a dog or a cat!
With the ImageNet challenge, teams competed each year to design the best image classifiers. It has been observed that the hidden layers of such models are able to catch general knowledge in images (edges, certain forms, style…). Hence, it was not necessary to re-train from a scratch a model each time we wanted to change task.
Let’s take the example of the VGG-16 model (Simonyan, Karen, and Zisserman. “Very deep convolutional networks for large-scale image recognition.” (2014))
This architecture is relatively complex, there are many layers and the number of parameters is large. The authors claim a training time of 3 weeks using 4 powerful GPUs.
The idea of transfer learning is that, since the intermediary layers are thought to learn general knowledge about images, we could use them as one big featurizer! We would download a pre-trained model (trained for weeks on the ImageNet task), remove the last layer of the network (the fully-connected layer, which project the features on the 1000 classes of the ImageNet challenge), add put instead the classifier of our choice, adapted to our task (a binary classifier if we are interested in classifying cats and dogs) and finally train our classification layer only. And because the data we use may be different than the data the model has been previously trained on, we can also do a fine-tuning step, where we train all layers for a reasonably short amount of time.
In addition to being quicker to train, transfer learning is particularly interesting since training only on the last layer enables us to use fewer labeled data, compared to the huge dataset required to train the full model end-to-end. Labeling data is expensive and building high quality models without requiring large data sets is very much appreciated.
And what about Transfer Learning in NLP?
Advances in deep learning for NLP are less mature than they are in Computer Vision. While it is quite conceivable that a machine is able to learn what edges, circles, squares, etc. are and then use this knowledge to do other things, the parallel is not straightforward with text data.
These word vector representations take advantage of the contexts in which words live, to represent them as vectors, where similar words should have similar word representations.
In this figure from the word2vec paper (Mikolov, Sutskever, Chen, Corrado, Dean. “Distributed Representations of Words and Phrases and their Compositionality” (2013)) we see that the model is able to learn the relation between countries and their capital cities.
Including pre-trained word vectors has shown to improve metrics in most NLP tasks, and thus has been widely adopted by the NLP community, which has continued to look for even better word/character/document representations. As in computer vision, pre-trained word vectors can be seen as a featurizer function, transforming each word in a set of features.
However, word embeddings represent only the first layer of most NLP models. After that, we still need to train from scratch all the RNNs/CNNs/custom layers.
Language Model Fine Tuning for Text classification
The ULMFit model was proposed by Howard and Ruder earlier this year as a way to go a step further in transfer learning for NLP.
The idea they are exploring is based on Language Models. A Language Model is a model which is able to predict the next word, based on the words already seen (think of your smartphone guessing the next words for you while you text). Just like an image classifier has gained intrinsic knowledge about images by classifying tons of them, if an NLP model is able to predict accurately the next word, it seems reasonable to say that it has learned a lot about how natural language is structured. This knowledge should provide a good initialization to then be trained on a custom task!
The ULMFit proposes to train a language model on a very large corpus of text (a wikipedia dump for example) and use it as a backbone for any classifier! Because your text data might be different than the way wikipedia is written, you would fine-tune the parameters of the language model to take these differences into account. Then, we would add a classifier layer on top of this language model, and train this layer only! The paper suggests to gradually unfreeze layers and hence to gradually train every layers. They also build upon previous work on learning rates (cyclical learning rates) and create their slanted triangular learning rates.
Take away from the ULMfit paper
The amazing practical result from this paper is that using such a pre-trained language model enables us to train a classifier on much less labeled data! While unlabeled data is almost infinite on the web, labeled data is very expensive and time-consuming to get.
Here are results they report from the IMDb sentiment analysis task:
With only 100 examples they are able to reach the same error rate that the model reaches when trained from scratch on 20k examples!!
Moreover, they provide code to pre-train a language model in the language of your choice. Because Wikipedia exists in so many languages, this enables us to quickly move from a language to another using Wikipedia data. Public labeled datasets are known to be more difficult to access in languages others than English. Here, you could fine-tune the language model on your unlabeled data, spend a few hours to manually annotate a few hundreds/thousand data points, and adapt a classifier head to your pre-trained language model to perform your task!
Playground with Amazon Reviews
To deepen our undestanding of this approach we tried it on a public dataset not reported on their paper. We found this dataset on Kaggle. It contains 4 millions reviews of products on Amazon and tags them with a sentiment, either positive or negative. We adapt the fast.ai course on ULMfit to the task of classifying Amazon reviews as positive or negative. We find that with only 1000 examples the model is able to match the accuracy score obtained by training a FastText model from scratch on the full dataset, as reported on the Kaggle project home page. With 100 labeled examples only, the model is still able to get a good performance.
To reproduce this experiment you can use this notebook. Having a GPU is encouraged to run the fine-tuning and classification parts.
Unsupervised vs supervised learning in NLP, discussion around meaning
Using ULMFit, we have used both unsupervised and supervised learning. Training an unsupervised language model is “cheap” since you have access to almost unlimited text data online. However, using a supervised model is expensive because you need to get it labeled.
While the language model is able to capture a lot of relevant information from how a natural language is structured, it is not clear whether it is able to capture the meaning of the text, which is “the information or concepts that a sender intends to convey, or does convey, in communication with a receiver”.
You might have followed the very interesting Twitter thread on meaning in NLP (If not, take a look at this summary from the Hugging Face team). In this thread, Emily Bender makes her argument against meaning capture with the “Thai room experiment”: “Imagine [you] were given the sum total of all Thai literature in a huge library. (All in Thai, no translations.) Assuming you don’t already know Thai, you won’t learn it from that.”
So we could think that what a language model learns is more about syntax than meaning. However, language models are better than just predicting syntactically relevant sentences. For example, the sentences “I ate this computer” and “I hate this computer” both are syntactically correct, but a good language model should be able to know that “I hate this computer” should be “more correct” than the hungry alternative. So, while I would not be able to write in Thai even if I had seen the whole Thai Wikipedia, it’s easy to see the language model does go beyond simple syntax/structure comprehension. So we could think of language models as learning quite a lot about the structure of natural language sentences, helping us in our quest of understanding meaning.
We won’t go further in the notion of meaning here (this is an endless and fascinating topic/debate), if you are interested we recommend Yejin Choi’s talk at ACL 2018 to dig further in the subject.
Future of transfer learning in NLP
The progress obtained by ULMFit has boosted research in transfer learning for NLP. This is an exciting time for NLP, as other fine-tuned language models also start to emerge, notably the FineTune Transformer LM. We note also that with the emergence of better language models, we will be able to even improve this transfer of knowledge. The flair NLP framework is quite promising in addressing transfer learning from a language model trained at the character level, which make it very relevant for languages with sub-word structure like German.
If you enjoyed reading this, subscribe in Feedly to never miss another post.
Books have the power to inspire, connect, and educate. Today in honor of Book Lovers Day, here are some of the books that have inspired the Feedly team as lifelong learners.
What’s on your must-read list right now? What recent read inspired you to see the world in a new way? Tweet at us, or comment below. We always respond.
Ultramarathon Man: Confessions of an All-Night Runner by Dean Karnazes
Petr says, “I liked the story and how passionate one can be about running and endurance and pursuing dreams. It inspired me to run longer distances.”
Grandma Gatewood’s Walk by Ben Montgomery
Emily says, “I felt a connection to this 67-year-old woman who lived and worked on farms all her life before deciding she needed to hike the 2,050-mile Appalachian Trail. The suffering she happily endured on the trail must have been a welcome relief from the darkness of her past.”
Evicted by Matthew Desmond
Victoria says, “This is one of my faves because of the empathy and understanding it creates within you as you experience the loss of eviction through the eyes of the evicted. It’s a powerful piece on how to better take care of your neighbors.”
The Story of a Shipwrecked Sailor by Gabriel García Márquez
Eduardo says “It’s easily one of my favorite books. The struggle of the guy who was adrift at sea … he never lost hope. You could almost feel what he was feeling. That’s the vividness of the writing.”
Barbarian Days by William Finnegan
Remi says, “Finnegan has a way of pulling his reader into what a life of pursuing their obsession and journeying all over the world really feels like. Bonus points for the years in South Africa which bring it back to a moment in history … beautifully written, permeating passion all the way through.
Les Fleurs Du Mal (The Flowers of Evil) by Charles Baudelaire
Guillaume says, It has the best reread value of any book I know. Every piece is incredibly beautiful and well written, and the whole volume oozes a sort of calm melancholy that always gets me.
Le Mythe de Sisyphe (The Myth of Sisyphus) by Albert Camus
David says, “This was one of the most pivotal books in my life.”
Thanks for reading!