Indeed Scraper
- indeed_scraper.scrape_indeed_for_title(job_title, sb, db)
Scrape job listings from Indeed for the given job title and store them in MongoDB.
This function performs the following steps: 1. Sets up a MongoDB collection for the specified job title with a unique index on jobID. 2. Constructs the Indeed search URL for the job title and opens the search page using SeleniumBase. 3. Scrapes job links from the defined number of pages of search results. 4. For each job link, extracts detailed job information including location, benefits, description, and additional data from embedded JSON. 5. Stores the extracted job data in the MongoDB collection, skipping duplicates.
Parameters: - job_title (str): The job title to search for. - sb (seleniumbase.SB): An instance of SeleniumBase for browser automation. - db (pymongo.database.Database): MongoDB database instance to store the scraped data.
Note: - Currently limits scraping to 10 pages and processes 100 job links; adjust these limits for full scraping or testing purposes. - Uses static sleep delays (e.g., sb.sleep(5)) for page loading; consider explicit waits for production use. - JSON extraction relies on Indeed’s current page structure and may break if the site changes.