Résumé parsing

Resume parsing, also known as CV parsing, resume extraction, or CV extraction, allows for the automated storage and analysis of resume data. The resume is imported into parsing software and the information is extracted so that it can be sorted and searched.

Principle
Resume parsers analyze a resume, extract the desired information, and insert the information into a database with a unique entry for each candidate. Once the resume has been analyzed, a recruiter can search the database for keywords and phrases and get a list of relevant candidates. Many parsers support semantic search, which adds context to the search terms and tries to understand intent in order to make the results more reliable and comprehensive.

Machine learning
Machine learning is extremely important for resume parsing. Each block of information needs to be given a label and sorted into the correct category, whether that's education, work history, or contact information. Rule-based parsers use a predefined set of rules to parse the text. This method does not work for resumes because the parser needs to "understand the context in which words occur and the relationship between them." For example, if the word "Harvey" appears on a resume, it could be the name of an applicant, refer to the college Harvey Mudd, or reference the company Harvey & Company LLC. The abbreviation MD could mean "Medical Doctor" or "Maryland". A rule-based parser would require incredibly complex rules to account for all the ambiguity and would provide limited coverage.

Natural language processing (NLP) is a branch of artificial intelligence which uses machine learning to make predictions and to understand content and context. Acronym normalization and tagging accounts for the different possible formats of acronyms and normalizes them. Lemmatization reduces words to their root using a language dictionary and stemming removes “s”, “ing”, etc. Entity extraction uses regular expressions, dictionaries, statistical analysis and complex pattern-based extraction to identify people, places, companies, phone numbers, email addresses, important phrases and more.

Effectiveness
Resume parsers have achieved up to 87% accuracy, which refers to the accuracy of data entry and categorizing the data correctly. Human accuracy is typically not greater than 96%, so the resume parsers have achieved "near human accuracy."

One executive recruiting company tested three resume parsers and humans to compare the accuracy in data entry. They ran 1000 resumes through the resume parsing software and had humans manually parse and enter the data. The company brought in a third party to evaluate how the humans did compared to the software. They found that the results from the resume parsers were more comprehensive and had fewer mistakes. The humans did not enter all the information on the resumes and occasionally misspelled words or wrote incorrect numbers.

In a 2012 experiment, a resume for an ideal candidate was created based on the job description for a clinical scientist position. After going through the parser, one of the candidate's work experiences was completely lost due to the date being listed before the employer. The parser also didn't catch several educational degrees. The result was that the candidate received a relevance ranking of only 43%. If this had been a real candidate's resume, they wouldn't have moved on to the next step even though they were qualified for the position. It would be helpful if a similar study was conducted on current resume parsers to see if there have been any improvements over the past few years.

Benefits

 * A notable resume study was conducted by Marianne Bertrand and Sendhil Mullainathan in 2003. They wanted to observe the effects of White-sounding names versus Black-sounding names on resumes in the hiring process. They sent identical resumes—varying from low- to high-quality—of the same qualifications and credentials, but differed in the name of the applicants for the same job openings. One group had the stereotypical Caucasian names such as Greg and Emily, and the other group had the stereotypical African-American names such as Darnell and Tamika. Bertrand and Mullainathan then recorded how many of the applicants received callbacks for an interview. The result showed that despite the quality of the resume, the ones of white applicants elicited 50% more callbacks than their black counterparts. Therefore, the quality of the resume mattered less than the race of the applicant in the selection process. The attitudes of the hiring managers were not measured, so it is unknown whether this is a form of implicit or explicit bias. However, companies are continuing to discriminate against Black applicants and have bias built into their hiring processes. Resume parsing can impede the bias that inevitably rises in the hiring process and allow applicants to be ranked based on the objective information. The software can be programmed to disregard and conceal the elements of a resume that can lead to bias (e.g. name, gender, race, age, address, etc).
 * The technology is extremely cost-effective and a resource saver. Rather than asking candidates to manually enter the information, which could discourage them from applying or wasting recruiter's time, data entry is now done automatically.
 * The contact information, relevant skills, work history, educational background and more specific information about the candidate is easily accessible.
 * The applicant screening process is now significantly faster and more efficient. Instead of having to look at every resume, recruiters can filter them by specific characteristics, sort and search them. This allows recruiters to move through the interview process and fill positions at a faster rate.
 * One of the biggest complaints people searching for jobs have is the length of the application process. With resume parsers, the process is now faster and candidates have an improved experience.
 * The technology helps prevent qualified candidates from slipping through the cracks. On average, a recruiter spends 6 seconds looking at a resume. When a recruiter is looking through hundreds or thousands of them, it can be easy to miss or lose track of potential candidates.
 * Once a candidate's resume has been analyzed, their information remains in the database. If a position comes up that they are qualified for, but haven't applied to, the company still has their information and can reach out to them.

Challenges
The parsing software has to rely on complex rules and statistical algorithms to correctly capture the desired information in the resumes. There are many variations of writing style, word choice, syntax, etc. and the same word can have multiple meanings. The date alone can be written hundreds of different ways. It is still a challenge for these resume parsers to account for all the ambiguity. Natural Language Processing and Artificial Intelligence still have a way to go in understanding context-based information and what humans mean to convey in written language.

Resume optimization
Resume parsers have become so omnipresent that it is now recommended that candidates focus on writing to the parsing system rather than to the recruiter. The following techniques have been proposed to increase the probability of success:
 * 1) Use keywords from the job description in relevant places on your resume.
 * 2) Don't use headers or footers, since they may confuse the parsing algorithms.
 * 3) Use a simple style for fonts, layouts and formatting.
 * 4) Avoid graphics.
 * 5) Use standard section names such as “Work Experience” and “Education”.
 * 6) Avoid using acronyms unless they're included in the job description.
 * 7) Don't start with dates in the "Work Experience" section.
 * 8) Stay consistent with formatting past work experience.
 * 9) Send resume in docx, doc and PDF file formats.

Software and vendors
There are many stand-alone options for resume parsers including RChilli, Skillate, CandidateZip, Sovren, Daxtra, Textkernel, Hireability and they are also typically bundled in with applicant tracking systems, which are used by companies to streamline the hiring process.

With recent advancements in machine learning, the text mining and analysis processes, which ensure up to 95% accuracy in data processing, many AI technologies have sprung up to help the job seekers in the creation of application documents. These services focus on creating ATS-friendly resumes, execute resume check and screening, and help with all of the preparation and application processes. Some of the AI builders, such as Leap.ai and Skillroads, concentrate on the resume creation while others, like Stella, also offer help with the job hunt itself as they match candidates to appropriate vacancies. In 2017, Google launched Google for Jobs. This expansion to the search engine uses Cloud Talent Solution, Google's own iteration of the AI resume builder and matching system.

Future
Resume parsers are already standard in most mid- to large-sized companies and this trend will continue as the parsers become even more affordable.

A qualified candidate's resume can be ignored if it is not formatted the proper way or doesn't contain specific keywords or phrases. As Machine Learning and Natural Language Processing get better, so will the accuracy of resume parsers.

One of the areas resume parsing software is working on expanding into is performing contextual analysis on the information in the resume rather than purely extracting it. One employee at a parsing company said “a parser needs to classify data, enrich it with knowledge from other sources, normalize data so it can be used for analysis and allow for better searching.”

Parsing companies are also being asked to expand beyond just resumes or even LinkedIn profiles. They are working on extracting information from industry-specific sites such as GitHub and social media profiles.