This week’s NLP breakfast meetup in our Redwood City office will focus on Text Classification at scale.
We found a promising paper built on two current trends in NLP: Language Models and Transfer Learning:
An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models
by Alexandra Chronopoulou, Christos Baziotis, Alexandros Potamianos
The solution proposed caught our interest for a couple of reasons:
- It performs well on hard tasks like Irony or Sarcasm detection that require Natural Language Understanding
- It builds on the ideas presented in ULMFiT paper, but at the same time simplifies the architecture, achieving better results while being more computationally efficient.
Edit: The paper got accepted at the NAACL 2019 conference
Thursday, March 28, 2019, at 9:30 AM PST
We pushed Leo 0.5 to a limited beta in early March and collected lots of interesting feedback. The team is listening and crunching through all that feedback and adapting Leo to improve UI/UX as well as the relevance of the underlying machine learning models.
Here is a summary of the changes we are pushing out today as part of Leo 0.6 Beta
One of the feedback we collected was that the difference between mentions and topics was not clear. So in 0.6, we merged these two concepts into a single one we call Smart T
Level of Aboutness
Sometimes you are interested in a company, product, or topic and you want to see every article mentioning that topic. Sometimes, for more popular topics, you are only interested in reading an article if the article is truly about that topic or company.
Leo 0.6 exposes a “level of aboutness” knob that gives you more control over the model so that you can cut out low salience matches.
For example, if you are interested in NLP or BERT, you can train Leo to only prioritize research articles that are prominently about those topics (as opposed to articles which only briefly touch on those topics).
This is a particularly powerful feature when combined with Google News Keyword alerts.
Some Leo 0.5 beta customers mentioned that it was critical for them to be able to define priorities that span across multiple feeds. For example, you might be doing research about Stablecoin and want to prioritize that topic across both your Tech feed, your Business feed, or all your personal or team feeds.
In Leo 0.6, the priority designer allows you to pick “All Team Feeds” or “All Personal Feeds” as the scope of the priority.
This change reduces the total number of priorities you need to create and manage when researching topic and trends across multiple of your feeds.
Some users mentioned that they would like to be able to navigate their content by priority. If you are interested in a specific topic like Docker, it makes sense to be able to quickly see if there are new Docker related articles in your Feedly and easily access those articles.
In Leo 0.6, we added a new Priorities section to the left navigation bar that surfaces all your priorities and gives you quick access to all the article Leo has flagged as important.
We added two settings in the Leo settings to let you personalize this feature. You can decide if you want to see priorities in your left navigation. If you want to see all the priorities or all the global ones (default). If you want to see all the priorities or only the ones which
Your interests and priorities are continuously evolving. Often, you discover a new company, product, or topic while reading an article and you want to be able to teach Leo about it.
In Leo 0.6, the most prominent topics mentioned in an article are highlighted so that you can quickly prioritize them (or mute them)
As part of Leo’s Cyber Security skill, you will also see highlights of CVE entities. More to come soon.
Like for the Quick Access feature, there is a Leo setting that allow you to turn off Inlined Entities if that is your preference.
Like Board Improvements
The ML team is spending time understanding how you are engaging with your priority feeds (which articles are saved to a board, which articles are being Less Like This’ed) and tuning the underlying ML models to improve accuracy. You should expect to see the quality of your priority feeds improve over the next 8 weeks.
A lot of Feedly Pro and Feedly Teams customer rely on power search to find specific articles in their feeds and boards. In Leo 0.6, we are expanding power search and let you search with your priority feeds.
For teams using Leo to discover and track trends, opportunities, and trends across industries, the combination of Leo priorities and Power search is a powerful way to quick find the most crucial information
We want to thank all the beta customers who have been working very closely with us over the last few weeks (and sometimes months). We are very grateful for your time and precious feedback. This open collaboration is not only powerful and efficient but it is also very fun. We look forward to the next 3 months!
Edwin, Remi, and Victoria
Love reading? Love the Web? Join the Leo Beta Program
A few weeks ago, we have experimented making our internal papers discussions open via live-streaming. The first NLP breakfast featured a discussion on the paper Accelerating Neural Transformer via an Average Attention Network, available on our NLP Breakfast YouTube channel.
This week’s discussion is an overview of progress in language modeling, you can find the live-stream video here.
Language modeling is the task of predicting the next word in a sentence, based on the previous words. We already use it in lots of everyday life products, from you phone auto-complete to search engines:
Formally, the task consist in estimating a probability distribution over the whole possible vocabulary conditioned on the words (or tokens) that we have seen so far. We’re predicting the future based on the past, but we can make the Markov assumption that the future only depends on a few items from the past, that is we condition on a fixed-size window of past words to predict the probabilities for the next word. We then use Bayes formula to get this modified estimation problem:
A possible simplification is to use the quotient of counts: we pre-compute counts for all N-grams (chunks of [words, characters or other symbols] of size N) and we use those counts to approximate probabilities.
This is a really efficient technique for inference since, as soon as the counts have been computed, probabilities are output in O(1) time using lookup tables. However, this suffers a lot from sparsity issues: if the numerator is a sequence of words that has never been seen before, the output probability will be zero.. More seriously, if the denominator has never been seen the output probability could be undefined! If there are some small fixes for those issues (smoothing, backtracking), this method also suffers from scalability problem with the window size N: the number of unique N-grams grows exponentially with N…
Neural networks for Language Modeling
To adapt the exact same setup described before to the neural network domain, we could just take the (pre-trained) word embeddings from the window, concatenating them and then using a linear projection layer to output a probability distribution. This solved both the sparsity and scalability issues discussed before. However, we still have a fixed sized window and concatenating even more words would produce too big embeddings. Moreover, the matrix multiplication W is completely independant between different word vectors, whereas we would like to have some parameters sharing between words.
Using Recurrent Neural Networks, we remove the fixed-sized context issue, and we can (hope to) learn long-term dependencies. A disadvantage of RNNs however is that the inherent sequential aspect makes them difficult to parallelize. Also, even with improvements from LSTMs or GRUs, long-term dependencies are still hard to catch.
Language Models for contextualized word embeddings
A limitations to current word embeddings is that they learn embeddings of word types, and not word tokens in context. That is, using word2vec, “jaguar” will have the same embedding in both “I just bought a new Jaguar” and “A Jaguar has attacked them”. A lot of progress has been made in this direction:
In the TagLM system, Peters et al. train a bi-LSTM language model over a huge dataset, and use the concatenation of forward and backward LMs to supplement an NER system with contextualized word representations, extracted from the last layer of the LMs.
The ELMo paper goes a step forward, by using a combination of all different layers of the LM to feed contextualized word embeddings:
The motivation under this successful experiment is that lower level layers might contain more information useful for low-level NLP tasks (POS tagging, syntactic parsing…) and higher layers useful for high level NLP tasks (sentiment analysis, question answering…).
Focusing on the Text Classification task, the ULMFit paper from Ruder and Howard leverage the same transfer learning ideas:
In this setup, a language model is first trained on a big corpus of text. Then, given a classification dataset, the LM is carefully fine-tuned to the data: different learning rates are set for each layer, and layers are gradually unfrozen to let the LM fine-tune in a top-bottom fashion.
An important issue for RNN-based models is that they rely on sequential modeling and then are hard to parallelize. Moreover, they still lack long-term dependencies learning even with LSTMs/GRU improvements. Attention layers are often the solution to keep long-term dependencies. But, if attention layers keep these dependencies, and sequentiality of RNN don’t allow parallelization, why not just using attention layers and forget about sequential models? This is the motivations behind the transformer architecture:
Using self-attention layers means that words will attend to each others. Moreover, to let words attend to each other in a flexible way, the model builds several attention heads: multi-head attention.
At the end of last year, Google AI published its transformer-based architecture, with a modification in the Language Modeling objective, with now two different training objectives:
- Masked language modeling: mask 15% of tokens and predict them based on the whole text.
- Is next sentence prediction
While previous LMs are inherently unidirectional (potentially using a forward and a backward version), BERT claims to bring a bidirectional version of transformers.
Future of Language Models
Language models is a very powerful promise for pre-training models. It enables models to perform competitive results in a zero-shot way, that is without any supervision! This is therefore very useful for many problems where supervised data is limited.
In the future, we expect language models, combined with multi-task learning, to be universal transfer learning techniques. Ideally we would like to move from a “narrow expert” paradigm (learning a model that is very good on 1 dataset, sampled from 1 task) into a pardigm where a learner can transfer knowledge across different domains, different datasets, different tasks, and different languages. Such a universal learner would provide a narrow expert model with contextualized knowledge and improve on performance.
OpenAI has recently raised the issue of ethics in natural language generation, with the release of GPT2, a language model claimed to be so good the institution refrained to publish the whole version and data. This will lead to lots of interesting discussions to define what will be the future of transfer learning in NLP and how we should use it.
□情報・写真提供：ＫＷＡ 村中勝二 氏
Sometimes you want to follow high volume publications like The Verge, NY Times, or VentureBeat because you trust them but you are only interested in narrower topics, trends, or mentions.
Reducing noise and information overload is a problem we care passionately about. We have been working over the last 12 months on a new feature called Leo. You can think of Leo as your non-black-box research assistant – an easy-to-control AI tool which helps you reduce noise in your feeds and never miss important articles.
Here is a quick overview of the Leo 0.5 Beta feature set.
New Priority Tab
If you are part of the Leo 0.5 Beta Program, each of your feeds has now 2 tabs.
The All Tab includes all the articles published by the sources you follow.
The new Priority Tab includes the subset of articles flag by Leo as important – based on the priorities you defined for your Leo.
Three Core Prioritization Skills: Mentions, Topics, and Like Board
Leo 0.5 ships with three core skills: mentions, topics, and like-
The mentions skill allows you to prioritize articles based on mentions of people, company or keywords which are important to you.
For example, you can ask Leo to prioritize all the articles that mention “JP Morgan”
The topic skill allow you to prioritize articles which are about a specific topic you are interested.
For example, you can ask Leo to analyze your tech feed and prioritize articles which are about artificial intelligence, quantum computing, or gaming.
Leo ships with one thousand pre-trained topics. If the topic you are interested in is part of that list, the topic skill is a powerful tool to let you focus your feed on what really matters to you.
Sometimes, the topic you are interested in a very niche. This is where the Like Board skill is very useful and powerful.
For example, if you are in the Sports industry, you might be interested in the emerging Smart Venue trend. Leo does not know out of the box about Smart Venue but if you can create a board and save 30-50 articles about Smart Venue, you can use the Like Board skill to teach your Leo a new personalized topic and ask Leo to prioritize future articles which are similar to the ones you save in that board.
Once you have defined the priorities of your Leo, he will continuously read your feed and flag articles which are aligned with those priorities.
The Like Board is particularly powerful because the more articles you save to that board, the more accurate Leo’s recommendation will become.
Finally, you can easily define more sophisticated priorities by combining multiple skills/layers.
Feedback Loop Via Less Like This
When Leo makes a back prioritization, you have the control to provide him feedback via the Less Like This button.
There are 5 different classes of feedback you can offer to your Leo:
- The “Not About” feedback allows you to teach Leo that it matched the wrong keyword or topic. For example, you were interested in ICO (Initial Coin Offering) and Leo detected ICO (Internet Commissioner Office).
- The “duplicated article” feedback
allowyou to flag articles which are on topic but you have already read about via a different source
- The “I’m not interested in” feedback allow you to flag class of articles you are not interested
about. For example, you might not be interested in market research type articles. If you can flag 10-20 articles as I am not interested in market research, Leo is going to learn and start prioritizing fewer market research articles.
- Sometimes (
speciallyfor keyword alerts), you might get articles from sources you do not care about. The ‘mute domain’ feedback allows you to train your Leo to mute articles from those domains.
- Finally, sometimes, the reason is more complex. The ‘Something else’ feedback offers you an easy way out.
Control and Transparency
A very important aspect of the Leo promise is that it is a fun, non-black-box AI you fully control and can easily collaborate with.
Transparent because each time Leo makes a prioritization, he will explain why the article was prioritized and give you the opportunity to refine that prioritization.
Control because you explicitly define all the priorities of your Leo and you can at anytime go in the Train Leo section and remove or refine a priority. No black box. No lag.
Goodbye Information Overload
Leo 0.1 Alpha customers saw 40-70% noise reduction on their feeds. More targeted feeds mean that you can save time while reducing the risk of missing important articles, or being the last to know about an important risk or market opportunity.
We look forward to seeing how your will be training your Leo!
-Edwin, Remi, and Victoria