- Create a HarperDB Account: Sign up on https://harperdb.io/ or sign in at https://studio.harperdb.io/.
- Create a HarperDB Cloud Instance: Follow instructions to create a cloud instance for storing and fetching scraped data.
- Configure HarperDB Schema and Table: Create a schema (e.g., "data_scraping") and a table (e.g., "tweets") with a hash attribute.
- Install Required Packages: Install "harper-sdk-python" (pip install harperdb) and "snscrape" (pip install snscrape).
- Import Packages: Import necessary packages for Twitter scraping and HarperDB.
- Connect to HarperDB Cloud Instance: Connect to the cloud instance using the instance URL, username, and password.
- Create Function to Record Scraped Tweets: Define a function to insert scraped data into the "tweets" table.
- Scrape Tweets Using snscrape: Use snscrape to scrape tweets based on a search query and save them to the table.
- View the Tweets Table: Access your HarperDB cloud instance to view the scraped data in the "tweets" table.
Creating Custom Functions with HarperDB (Optional):
- Enable Custom Functions: Enable Custom Functions in HarperDB Studio.
- Create a Project: Create a project with a specified name, generating necessary files.
- Define a Route: Create a route to fetch data from the "tweets" table using SQL.
- Access Data via API Endpoint: Send an API request to the defined route to retrieve the data.