Language Trends Analysis
The advent of smartphones and the Internet has brought together a variety of communities and like-minded individuals. This has resulted in the creation of many different subcultures and the emergence of corresponding language patterns. By observing the linguistic diversity of Twitter users, we aim to analyze the language patterns of different groups of people and communities through a mobile keyboard next-word prediction model.
In a collaborative research project as part of ACM Research, my team analyzed language trends with a mobile keyboard next-word prediction model. Our team was deemed the winner of the research competition.
We trained a neural network using TensorFlow on a large corpus of tweets (from Sentiment140), and then refined it on various datasets of tweets from various twitter subgroups.
We created a data pipeline with the tf.Data api to improve training performance and because our dataset could not fit in memory.
We created a 2-layer bidirectional LSTM model to predict the next word.
Then, we used hyperparameter optimization to determine the optimal values for the structure of the model.
Web Demo
You can try out our models via the Web Demo.