This project aims at extracting tweets that include the topic ‘Taylor Swift’ and performing a detailed analysis by exploiting natural language processing.
1. Data Collection:
• Install Twint library for Python to scrape tweets with the keyword "Taylor Swift."
• Use Twint to scrape tweets and store them in a CSV file.
2. Data Preprocessing:
• Handle null values by dropping unnecessary columns and rows.
• Identify the primary key for the dataset.
• Preprocess tweet text by removing hashtags, URLs, mentions, emojis, punctuation, and digits.
• Tokenize the cleaned tweets and remove stop words.
3. Save the cleaned data in a CSV file.
4. Set up HarperDB:
• Create a HarperDB account and instance.
• Launch the instance, create a schema, and define tables.
• Import the cleaned data into HarperDB.
See part two of this series as well.