Web Scraping Template Engineer

[ Job Code: WSTE ]

Manipal Dot Net Pvt Ltd has an opening for a self-motivated, team-oriented, hard-working Web Scraping Template Development Engineers, to work on an exciting project in extracting and ingesting data from websites using web crawling tools.


Job Responsibilities:

  • Extract structured/unstructured data from multiple retail and e-commerce websites.
  • Identify a diverse and representative set of product pages of various retailer domains for testing and development.
  • Develop and test XPath templates to extract various attributes from product pages.
  • Use labelling tools to generate training data for Machine Learning engines.
  • Fix bugs and maintain already developed templates.
  • Respond to urgent client requirements – bug fixing for priority websites when requested by the client.
  • Maintain documentation and spreadsheets, and daily reports with clean and precise information as well as statistics about the websites and parameters extracted.
  • Guide and mentor other engineers – applicable for experienced engineers.
  • Perform code reviews and suggest design changes – applicable for experienced engineers.

Desirable Skill Set:

  • Good knowledge and experience of the Linux operating system and Python/Bash scripting.
  • Strong foundation in the application of XPath to processing XML/HTML.
  • Solid grasp of web technologies and protocols (HTML, XPath, JSON, HTTP, CSS etc.).
  • Experienced in the use of version control and code sharing repositories e.g. git/github.
  • Familiarity with Regular Expressions (regex).
  • Experienced in the use of browser-based debuggers e.g. Chrome debugger.

Personal Skills:

  • Keen attention to detail.
  • Ability to thoroughly follow written instructions.
  • Excellent written and verbal communication skills.
  • Eager to learn new technical skills and grow.
  • Team-oriented, with good interpersonal skills.
  • Must be self-motivated and demonstrate a 'can do' attitude.

Qualifications:

  • Bachelor's Degree (BSc/BE/BCA) in Computer Science/IT/Electronics.
  • Master's Degree Preferred (MSc/MCA/MTech).
  • 0-3 years of experience in web-based application development.