AI/NLP Engineer – Extract Meaning from Complex Documents (Not a Simple Parsing Task) - Contract to Hire
We are building a system that processes real-world documents (PDFs, emails, reports, and all types of text documents) and extracts meaningful structured signals from unstructured text. This is not a simple parsing or summarization task. The Problem, The documents we work with are: inconsistent in format ambiguous in language written by different authors with different styles often contain indirect or implied recommendations We need to extract signals such as: findings recommendations actions key clinical or operational statements Several approaches have already been attempted: rule-based extraction (regex, YAML rules) → too brittle strict deterministic pipelines → fail on variability basic LLM extraction → inconsistent and not reliable enough We are looking for someone who can design and implement a robust signal extraction approach that can: handle messy, real-world text extract relevant signals with high recall link extracted signals back to source text produce structured outputs that can be used downstream We are not looking for someone to just wire APIs. We are looking for someone who can: think through ambiguity design an approach that works in practice understand tradeoffs between flexibility and control Required in Your Proposal, Please answer the following: How would you approach extracting meaningful signals from documents with inconsistent formatting and ambiguous language? What would your pipeline look like at a high level? What are the biggest failure points in this type of system? Apply tot his job Apply To this Job