Main Function
Both scrapers are executed with this function
- run_scrapers_parallel.fetch_job_titles_from_mongodb(client)
Fetches the list of job titles from the MongoDB database.
This function connects to the “Jobliste” database and retrieves the “job_titles” field from the document with _id “current_job_titles” in the “job_titles” collection.
If no document is found or the field is missing, an empty list is returned.
Parameters
- clientpymongo.MongoClient
The MongoDB client instance.
Returns
- list[str]
A list of job titles.
- run_scrapers_parallel.main()
Main function to orchestrate the job scraping process.
This function performs the following steps: 1. Connects to the MongoDB database using the URI from the environment. 2. Fetches the list of job titles from the database. 3. For each job title, scrapes job listings from Indeed and Stepstone. 4. Stores the scraped data in the “stepstone_data” database. 5. Closes the MongoDB connection.
The scraping is done using SeleniumBase to avoid detection and to run efficiently on servers.
Notes
The script uses a 2-second delay between scraping each job title to avoid overwhelming the websites.
Ensure that the MongoDB URI is correctly set in the environment.
The script requires SeleniumBase and the necessary webdrivers.