项目作者: ahmednabil950

项目描述 :
Project to extract entities from Job Description Articles.
高级语言:
项目地址: git://github.com/ahmednabil950/JOBS-Information-Extraction.git


JOBS-Information-Extraction

Project to extract entities from Job Description Articles.



Objective

Extracting information from job articles (posted by employers):

  1. Job type:
  2. ==========================
  3. "Full-Time, Part-Time, Contract, "Principle, temporary, .. etc"
  1. Skills:
  2. ==========================
  3. Soft-Skills i.e: "customer-service, vervbal communnications skills"
  4. Technical-Skills i.e: "Java, HTTP, networking, LINUX/UNIX, C++, C#, AngularJs, javascript, bootstrap, .. etc"
  1. Job title normalization:
  2. ==========================
  3. Angular/Javascript Developer ====> Software Developers
  4. IT Specialist Junior ====> Computer User Support Specialists
  5. Jr. Electrical Engineer ====> Electrical Engineer
  6. Sr. Electrical Engineer ====> Electrical Engineer
  7. Senior/Junior Electrical Engineer ====> Electrical Engineer
  8. Full-Stack Architect/ Web Developer ====> Web Developers
  9. Jr. .Net Developer ====> Web Developers
  10. Junior Business Analyst ====> Business Analyst
  11. Sr.Java Engineer ====> Software Engineer
  12. Web Developer - Internship ====> Web Developers
  13. Senior Android Developer ====> Software Developers
  14. UX/UI Engineer ====> Software Engineers
  15. and so on...
  1. Years of Experiences
  2. ==========================
  3. At least 5 years
  4. more than 5 years
  5. 1-2 years
  6. 3+ years
  7. 5 years
  8. and so on...

DataSet

ONET Dataset was made public by ONET Organization.

Annotation

  • Built Annotation tool for semi-automation of the skills annotation. Annotaation tooks 3 weeks to annotate around 2000 job description paragraph posted by different employers.
    Data preprocessing tooks one week since there was many wrong labeled data related to ONET normalized job title.

  • IOB Tagging system is followed.

  1. I-TAG: Inside the chnuk
  2. B-TAG: Beginning of the chunck
  3. O: Outside the chunk

Used TAGS

  1. B-TECH: Technical Skill (Beginning)
  2. I-TECH: Technical skill (Inside)
  3. B-SOFT: Soft Skill (Beginning)
  4. I-SOFT: Soft Skill (Inside)
  5. B-CERT: Certification (Beginning)
  6. I-CERT: Certification (Inside)
  7. B-YEXP: Years of Experience (Beginning)
  8. I-YEXP: Years of Experience (Inside)

Reported wrong titles:

  1. Mechanical Engineers
  2. Pharmacist Technicians

Depending on the raw job title may cause ambiguity problems during the training computation even though its low computation with respect to the RNN forward and backward propagation.
To overcome such issue raw job description is used instead to construct sequence to sequence realtion between the input and the output so, more computation is needed.

Experiments | Models Development

  • LSTM Translator:
    LSTM Architecture is used to translate sequence of unnormalized job title to normalized ones. good results is obtained (85% to 89 %).
  • LSTM Sequence Tagger: for predicting each tokens tag
  • Dataset Size: 2000 job description paragraph annotated, 65000 job title pairs for normalization module.
  • Years of experience extraction module is developed by using Spacy functional API to extract date entities. even though some parsing rules is added (regular expressions rules) for sake of good accuracy.

RESULT

JOB TITLE NORMALIZER

  1. raw title: UX/UI Engineers
  2. normalized title: Software Engineers
  3. ====================================================================================================
  4. raw title: UX/UI Designer
  5. normalized title: Graphic Designers
  6. ====================================================================================================
  7. raw title: .NET Developer
  8. normalized title: Web Developers
  9. ====================================================================================================
  10. raw title: Senior Android Engineer
  11. normalized title: Software Developers
  12. ====================================================================================================
  13. raw title: Jr Java Developer
  14. normalized title: Web Developers
  15. ====================================================================================================
  16. raw title: Sr.Java Engineer
  17. normalized title: Mechanical Engineers
  18. ====================================================================================================
  19. raw title: Senior Linux Developer
  20. normalized title: Software Developers
  21. ====================================================================================================
  22. raw title: Junior Network Engineer
  23. normalized title: Network And Computer Systems Administrators
  24. ====================================================================================================
  25. raw title: Android UI/UX Designer
  26. normalized title: Graphic Designers
  27. ====================================================================================================
  28. raw title: Android UI/UX Designer (Part-time)
  29. normalized title: Graphic Designers
  30. ====================================================================================================
  31. raw title: Part-time Android Developer
  32. normalized title: Software Developers
  33. ====================================================================================================
  34. raw title: Full-Time UI/UX Developer
  35. normalized title: Software Developers
  36. ====================================================================================================
  37. raw title: Android Developer - Internship
  38. normalized title: Software Developers
  39. ====================================================================================================
  40. raw title: Jr. Electrical Engineer
  41. normalized title: Electrical Engineers
  42. ====================================================================================================
  43. raw title: Junior Web Application Developer
  44. normalized title: Software Developers
  45. ====================================================================================================
  46. raw title: Full-stack and Frontend Engineer
  47. normalized title: Mechanical Engineers
  48. ====================================================================================================
  49. raw title: Angular/Javascript Developer
  50. normalized title: Web Developers
  51. ====================================================================================================
  52. raw title: IT Specialist Junior
  53. normalized title: Computer User Support Specialists
  54. ====================================================================================================
  55. raw title: Senior Web UI Developer
  56. normalized title: Software Developers
  57. ====================================================================================================
  58. raw title: Jr. Software Engineer (Java)
  59. normalized title: Software Developers
  60. ====================================================================================================
  61. raw title: Senior UI JavaScript Developer
  62. normalized title: Software Developers
  63. ====================================================================================================
  64. raw title: Junior Business Analyst
  65. normalized title: Management Analysts
  66. ====================================================================================================
  67. raw title: Jr Network Tech w/ Basic Cisco
  68. normalized title: Computer And Support Specialists
  69. ====================================================================================================
  70. raw title: Senior Quality Systems Specialist
  71. normalized title: Computer User Support Specialists
  72. ====================================================================================================
  73. raw title: Administrative Assistant
  74. normalized title: Secretaries And Administrative Assistants Except Legal Medical And Executive
  75. ====================================================================================================
  76. raw title: Jr. .Net Developer
  77. normalized title: Web Developers
  78. ====================================================================================================
  79. raw title: Front End Web Developer
  80. normalized title: Web Developers
  81. ====================================================================================================
  82. raw title: Full-Stack Architect/ Web Developer
  83. normalized title: Software Developers

SKILLS RECOGNITION

  1. |Word |True |Pred |
  2. =============================================================
  3. |Software |B-TECH |O |
  4. |Design |I-TECH |O |
  5. |Engineer |O |O |
  6. |with |O |O |
  7. |7 |O |O |
  8. |years |O |O |
  9. |software |B-TECH |O |
  10. |development |I-TECH |O |
  11. |experience |O |O |
  12. |and |O |O |
  13. |past |O |O |
  14. |success |O |O |
  15. |translating |O |O |
  16. |UI/UX |B-TECH |B-TECH |
  17. |design |I-TECH |I-TECH |
  18. |wireframes |O |O |
  19. |to |O |O |
  20. |actual |O |O |
  21. |code |O |O |
  22. |If |O |O |
  23. |you |O |O |
  24. |have |O |O |
  25. |produced |O |O |
  26. |excellent |O |O |
  27. |user |O |O |
  28. |interfaces |O |O |
  29. |for |O |O |
  30. |a |O |O |
  31. |great |O |O |
  32. |application |O |O |
  33. |we |O |O |
  34. |would |O |O |
  35. |love |O |O |
  36. |the |O |O |
  37. |chance |O |O |
  38. |to |O |O |
  39. |tell |O |O |
  40. |you |O |O |
  41. |more |O |O |
  42. |about |O |O |
  43. |this |O |O |
  44. |exciting |O |O |
  45. |opportunity |O |O |
  46. |•OPEN |O |O |
  47. |TO |O |O |
  48. |REMOTE |O |O |
  49. |CANDIDATES |O |O |
  50. |Top |O |O |
  51. |Reasons |O |O |
  52. |to |O |O |
  53. |Work |O |O |
  54. |with |O |O |
  55. |Us |O |O |
  56. |Work |O |O |
  57. |culture |O |O |
  58. |and |O |O |
  59. |environment |O |O |
  60. |focused |O |O |
  61. |on |O |O |
  62. |employee |O |O |
  63. |well |O |O |
  64. |being |O |O |
  65. |exciting |O |O |
  66. |fun |O |O |
  67. |creative |O |O |
  68. |and |O |O |
  69. |client |O |O |
  70. |focused |O |O |
  71. |Fast |O |O |
  72. |growing |O |O |
  73. |company |O |O |
  74. |with |O |O |
  75. |opportunity |O |O |
  76. |for |O |O |
  77. |growth |O |O |
  78. |Work |O |O |
  79. |with |O |O |
  80. |a |O |O |
  81. |collaborative |O |O |
  82. |group |O |O |
  83. |of |O |O |
  84. |veterans |O |O |
  85. |and |O |O |
  86. |novices |O |O |
  87. |to |O |O |
  88. |better |O |O |
  89. |the |O |O |
  90. |well |O |O |
  91. |being |O |O |
  92. |of |O |O |
  93. |our |O |O |
  94. |customers |O |O |
  95. |What |O |O |
  96. |You |O |O |
  97. |Will |O |O |
  98. |Be |O |O |
  99. |Doing |O |O |
  100. |1 |O |O |
  101. |Combine |O |O |
  102. |the |O |O |
  103. |art |O |O |
  104. |of |O |O |
  105. |design |O |O |
  106. |with |O |O |
  107. |the |O |O |
  108. |art |O |O |
  109. |of |O |O |
  110. |programming |O |O |
  111. |2 |O |O |
  112. |Responsible |O |O |
  113. |for |O |O |
  114. |the |O |O |
  115. |translation |O |O |
  116. |of |O |O |
  117. |the |O |O |
  118. |UI/UX |B-TECH |B-TECH |
  119. |design |I-TECH |I-TECH |
  120. |wireframes |O |O |
  121. |to |O |O |
  122. |actual |O |O |
  123. |code |O |O |
  124. |that |O |O |
  125. |will |O |O |
  126. |produce |O |O |
  127. |visual |O |O |
  128. |elements |O |O |
  129. |of |O |O |
  130. |the |O |O |
  131. |application |O |O |
  132. |3 |O |O |
  133. |Work |O |O |
  134. |with |O |O |
  135. |the |O |O |
  136. |Engineering |O |O |
  137. |team |O |O |
  138. |and |O |O |
  139. |bridge |O |O |
  140. |the |O |O |
  141. |gap |O |O |
  142. |between |O |O |
  143. |graphical |O |O |
  144. |design |O |O |
  145. |and |O |O |
  146. |technical |O |O |
  147. |implementation |O |O |
  148. |4 |O |O |
  149. |Define |O |O |
  150. |how |O |O |
  151. |the |O |O |
  152. |application |O |O |
  153. |looks |O |O |
  154. |as |O |O |
  155. |well |O |O |
  156. |as |O |O |
  157. |how |O |O |
  158. |it |O |O |
  159. |works |O |O |
  160. |5 |O |O |
  161. |Develop |O |O |
  162. |new |O |O |
  163. |user |O |O |
  164. |facing |O |O |
  165. |features |O |O |
  166. |6 |O |O |
  167. |Build |O |O |
  168. |reusable |O |O |
  169. |code |O |O |
  170. |and |O |O |
  171. |libraries |O |O |
  172. |fr |O |O |
  173. |future |O |O |
  174. |use |O |O |
  175. |7 |O |O |
  176. |Ensure |O |O |
  177. |the |O |O |
  178. |technical |O |O |
  179. |feasibility |O |O |
  180. |of |O |O |
  181. |UI/UX |B-TECH |B-TECH |
  182. |designs |I-TECH |O |
  183. |8 |O |O |
  184. |Assure |O |O |
  185. |that |O |O |
  186. |all |O |O |
  187. |user |O |O |
  188. |input |O |O |
  189. |is |O |O |
  190. |validated |O |O |
  191. |before |O |O |
  192. |submitting |O |O |
  193. |to |O |O |
  194. |back |O |O |
  195. |end |O |O |
  196. |What |O |O |
  197. |You |O |O |
  198. |Need |O |O |
  199. |for |O |O |
  200. |this |O |O |
  201. |Position |O |O |
  202. |7 |O |O |
  203. |years |O |O |
  204. |of |O |O |
  205. |software |B-TECH |O |
  206. |development |I-TECH |O |
  207. |experience |O |O |
  208. |Proficient |O |O |
  209. |in |O |O |
  210. |Ruby |B-TECH |B-TECH |
  211. |on |I-TECH |O |
  212. |Rails |I-TECH |B-TECH |
  213. |AngularJS |B-TECH |B-TECH |
  214. |React |B-TECH |B-TECH |
  215. |Proficient |O |O |
  216. |in |O |O |
  217. |HTML |B-TECH |B-TECH |
  218. |CSS |B-TECH |B-TECH |
  219. |Understanding |O |O |
  220. |of |O |O |
  221. |server |O |O |
  222. |side |O |O |
  223. |CSS |B-TECH |B-TECH |
  224. |platforms |O |O |
  225. |LESS |O |O |
  226. |SASS |B-TECH |B-TECH |
  227. |Good |O |O |
  228. |understanding |O |O |
  229. |of |O |O |
  230. |AJAX |B-TECH |B-TECH |
  231. |JSON |B-TECH |B-TECH |
  232. |• |O |O |
  233. |Nice |O |O |
  234. |to |O |O |
  235. |haves |O |O |
  236. |Experience |O |O |
  237. |with |O |O |
  238. |AWS |B-TECH |B-TECH |
  239. |Experience |O |O |
  240. |with |O |O |
  241. |Docker |B-TECH |B-TECH |
  242. |or |O |O |
  243. |other |O |O |
  244. |container |O |O |
  245. |based |O |O |
  246. |platforms |O |O |
  247. |What |O |O |
  248. |s |O |O |
  249. |In |O |O |
  250. |It |O |O |
  251. |for |O |O |
  252. |You |O |O |
  253. |Competitive |O |O |
  254. |salary |O |O |
  255. |Comprehensive |O |O |
  256. |benefit |O |O |
  257. |plans |O |O |
  258. |401 |O |O |
  259. |k |O |O |
  260. |matching |O |O |
  261. |Transportation |O |O |
  262. |and |O |O |
  263. |parking |O |O |
  264. |benefits |O |O |
  265. |Flexible |O |O |
  266. |PTO |O |O |
  267. |and |O |O |
  268. |company |O |O |
  269. |holidays |O |O |
  270. |Professional |O |O |
  271. |development |O |O |
  272. |investment |O |O |
  273. |Pet |O |O |
  274. |friendly |O |O |
  275. |office |O |O |
  276. |Fully |O |O |
  277. |stocked |O |O |
  278. |kitchens |O |O |
  279. |with |O |O |
  280. |drinks |O |O |
  281. |snacks |O |O |
  282. |and |O |O |
  283. |coffee |O |O |
  284. |So |O |O |
  285. |if |O |O |
  286. |you |O |O |
  287. |are |O |O |
  288. |a |O |O |
  289. |Sr |O |O |
  290. |Full |O |O |
  291. |Stack |O |O |
  292. |Engineer |O |O |
  293. |with |O |O |
  294. |7 |O |O |
  295. |years |O |O |
  296. |professional |O |O |
  297. |experience |O |O |
  298. |please |O |O |
  299. |apply |O |O |
  300. |today |O |O |
  301. |Required |O |O |
  302. |Skills |O |O |
  303. |Ruby |B-TECH |B-TECH |
  304. |On |I-TECH |O |
  305. |Rails |I-TECH |B-TECH |
  306. |AngularJS |B-TECH |B-TECH |
  307. |React |B-TECH |B-TECH |
  308. |HTML |B-TECH |B-TECH |
  309. |CSS |B-TECH |B-TECH |
  310. |AJAX |B-TECH |B-TECH |
  311. |JSON |B-TECH |B-TECH |
  312. |LESS |O |B-TECH |
  313. |SASS |B-TECH |B-TECH |
  314. |AWS |B-TECH |B-TECH |
  315. |Docker |B-TECH |B-TECH |
  316. |UI/UX |B-TECH |B-TECH |
  317. |JQuery |B-TECH |B-TECH |
  318. |If |O |O |
  319. |you |O |O |
  320. |are |O |O |
  321. |a |O |O |
  322. |good |O |O |
  323. |fit |O |O |
  324. |for |O |O |
  325. |the |O |O |
  326. |Sr |O |O |
  327. |Full |O |O |
  328. |Stack |O |O |
  329. |Engineer |O |O |
  330. |Ruby |B-TECH |B-TECH |
  331. |on |I-TECH |O |
  332. |Rails |I-TECH |O |
  333. |100 |O |O |
  334. |WORK |O |O |
  335. |FROM |O |O |
  336. |HOME |O |O |
  337. |position |O |O |
  338. |and |O |O |
  339. |have |O |O |
  340. |a |O |O |
  341. |background |O |O |
  342. |that |O |O |
  343. |includes |O |O |
  344. |Ruby |B-TECH |B-TECH |
  345. |On |I-TECH |O |
  346. |Rails |I-TECH |B-TECH |
  347. |AngularJS |B-TECH |B-TECH |
  348. |React |B-TECH |B-TECH |
  349. |HTML |B-TECH |B-TECH |
  350. |CSS |B-TECH |B-TECH |
  351. |AJAX |B-TECH |B-TECH |
  352. |JSON |B-TECH |B-TECH |
  353. |LESS |O |B-TECH |
  354. |SASS |B-TECH |B-TECH |
  355. |AWS |B-TECH |B-TECH |
  356. |Docker |B-TECH |B-TECH |
  357. |UI/UX |B-TECH |B-TECH |
  358. |JQuery |B-TECH |B-TECH |
  359. |and |O |O |
  360. |you |O |O |
  361. |are |O |O |
  362. |interested |O |O |
  363. |in |O |O |
  364. |working |O |O |
  365. |the |O |O |
  366. |following |O |O |
  367. |job |O |O |
  368. |types |O |O |
  369. |Information |O |O |
  370. |Technology |O |O |
  371. |Engineering |O |O |
  372. |Professional |O |O |
  373. |Services |O |O |
  374. |Within |O |O |
  375. |the |O |O |
  376. |following |O |O |
  377. |industries |O |O |
  378. |Computer |O |O |
  379. |Software |O |O |
  380. |Our |O |O |
  381. |privacy |O |O |
  382. |policy |O |O |
  383. |Your |O |O |
  384. |resume |O |O |
  385. |and |O |O |
  386. |information |O |O |
  387. |will |O |O |
  388. |be |O |O |
  389. |kept |O |O |
  390. |completely |O |O |
  391. |confidential |O |O |
  392. |Looking |O |O |
  393. |forward |O |O |
  394. |to |O |O |
  395. |receiving |O |O |
  396. |your |O |O |
  397. |resume |O |O |
  398. |through |O |O |
  399. |our |O |O |
  400. |website |O |O |
  401. |and |O |O |
  402. |going |O |O |
  403. |over |O |O |
  404. |the |O |O |
  405. |job |O |O |
  406. |in |O |O |
  407. |more |O |O |
  408. |detail |O |O |
  409. |with |O |O |
  410. |you |O |O |
  411. |CyberCoders |O |O |
  412. |Inc |O |O |
  413. |is |O |O |
  414. |proud |O |O |
  415. |to |O |O |
  416. |be |O |O |
  417. |an |O |O |
  418. |Equal |O |O |
  419. |Opportunity |O |O |
  420. |Employer |O |O |
  421. |All |O |O |
  422. |qualified |O |O |
  423. |applicants |O |O |
  424. |will |O |O |
  425. |receive |O |O |
  426. |consideration |O |O |
  427. |for |O |O |
  428. |employment |O |O |
  429. |without |O |O |
  430. |regard |O |O |
  431. |to |O |O |
  432. |race |O |O |
  433. |color |O |O |
  434. |religion |O |O |
  435. |sex |O |O |
  436. |national |O |O |
  437. |origin |O |O |
  438. |disability |O |O |
  439. |protected |O |O |
  440. |veteran |O |O |
  441. |status |O |O |
  442. |or |O |O |
  443. |any |O |O |
  444. |other |O |O |
  445. |characteristic |O |O |
  446. |protected |O |O |
  447. |by |O |O |
  448. |law |O |O |
  449. |Your |O |O |
  450. |Right |O |O |
  451. |to |O |O |
  452. |Work |O |O |
  453. |– |O |O |
  454. |In |O |O |
  455. |compliance |O |O |
  456. |with |O |O |
  457. |federal |O |O |
  458. |law |O |O |
  459. |all |O |O |
  460. |persons |O |O |
  461. |hired |O |O |
  462. |will |O |O |
  463. |be |O |O |
  464. |required |O |O |
  465. |to |O |O |
  466. |verify |O |O |
  467. |identity |O |O |
  468. |and |O |O |
  469. |eligibility |O |O |
  470. |to |O |O |
  471. |work |O |O |
  472. |in |O |O |
  473. |the |O |O |
  474. |United |O |O |
  475. |States |O |O |
  476. |and |O |O |
  477. |to |O |O |
  478. |complete |O |O |
  479. |the |O |O |
  480. |required |O |O |
  481. |employment |O |O |
  482. |eligibility |O |O |
  483. |verification |O |O |
  484. |document |O |O |
  485. |form |O |O |
  486. |upon |O |O |
  487. |hire |O |O |
  488. |Ruby |B-TECH |B-TECH |
  489. |On |I-TECH |B-TECH |
  490. |Rails |I-TECH |B-TECH |
  491. |AngularJS |B-TECH |B-TECH |
  492. |React |B-TECH |B-TECH |
  493. |HTML |B-TECH |B-TECH |
  494. |CSS |B-TECH |B-TECH |
  495. |AJAX |B-TECH |B-TECH |
  496. |JSON |B-TECH |B-TECH |
  497. |LESS |O |B-TECH |
  498. |SASS |B-TECH |B-TECH |
  499. |AWS |B-TECH |B-TECH |
  500. |Docker |B-TECH |B-TECH |
  501. |UI/UX |B-TECH |B-TECH |
  502. |JQuery |B-TECH |B-TECH |