See all roles

Principal Scientific Data Architect

Work from home Full-time role Hiring

About Xebia Xebia is a trusted advisor in the modern era of digital transformation, serving hundreds of leading brands worldwide with end-to-end IT solutions. The company has experts specializing in technology consulting, software engineering, AI, digital products and platforms, data, cloud, intelligent automation, agile transformation, and industry digitization. In addition to providing high-quality digital consulting and state-of-the-art software development, Xebia has a host of standardized solutions that substantially reduce the time-to-market for businesses. Xebia also offers a diverse portfolio of training courses to help support forward-thinking organizations as they look to upskill and educate their workforce to capitalize on the latest digital capabilities. The company has a strong presence across 16 countries with development centres across the US, Latin America, Western Europe, Poland, the Nordics, the Middle East, and Asia Pacific. Job Description: Principal Scientific Data Architect (Google Cloud Platform Ecosystem) Role Overview Highly specialized Principal Scientific Data Architect to bridge the gap between advanced Google Cloud engineering and life sciences discovery. This role will redefine how scientific data is structured, scaled, and consumed across our R&D, Onyx, and CMC (Chemistry, Manufacturing, and Controls) divisions. Operating natively within the Google Cloud Platform (GCP) and Databricks on GCP ecosystem, will lead the transition toward a fully automated, software-defined data framework by implementing Schema as Code, Data as Code, and metadata-driven Configuration Data Engineering. The ideal candidate combines elite cloud data architecture expertise with deep scientific literacy, enabling the design of data systems that directly power in-silico molecular discovery and autonomous Agentic AI frameworks.

Key Responsibilities

GCP-Native Data Architecture & Paradigm Shifts Schema as Code: Design and implement version-controlled, programmatically managed data schemas natively integrated with Google BigQuery. Ensure schemas evolve seamlessly using GCP DevOps tools (Cloud Build, Artifact Registry) and Terraform. Data as Code: Treat data assets with software engineering rigor. Implement data versioning, programmability, and automated quality testing using BigQuery features (like Table Snapshots and Time Travel), dbt, and Delta Lake on GCP. Configuration Data Engineering: Architect highly optimized, metadata-driven, configuration-led data pipelines using Google Cloud Composer (Airflow) or Dataflow to abstract infrastructure complexity. Scientific Domain Integration Translate complex biological and chemical concepts (e.g., molecular modalities, chemical structures, solubility traits) into highly scalable logical and physical data models within BigQuery and Databricks. Collaborate closely with computational chemists, biologists, and AI engineers to ensure the data architecture natively supports predictive in-silico modeling. Design robust data layouts that allow autonomous AI agents to easily "dip into" molecular data, extract properties, and explain molecular behavior. Platform & Ecosystem Strategy Optimize the interoperability between Databricks on GCP (Lakehouse architecture) and enterprise-wide Google BigQuery storage and analytics. [1] Inform the integration of semantic web technologies and knowledge graphs (e.g., StarDog) into the overarching Google Cloud data fabric. Ensure data availability and high-performance querying for downstream multi-agent AI ecosystems (Agentic Hubs built on Google Cloud's AI suite or custom frameworks). Required Skills & Qualifications Scientific Domain Knowledge [1] Mandatory: Strong background or proven experience working inside life sciences, pharmaceuticals, biotech, or scientific research organizations. Ability to converse fluently with scientists regarding therapeutic modalities, molecular properties, and R&D pipelines without needing to be a wet-lab scientist. GCP & Technical Architecture Expertise GCP Data Stack: Mastery of Google BigQuery (including BigLake, analytics hubs, and nested JSON schemas) and Databricks on GCP. Software-Defined Data: Proven track record of implementing Schema as Code and Data as Code paradigms using tools like Terraform, dbt, and Git-based CI/CD workflows. Pipeline Automation: Deep experience with configuration-driven pipeline orchestrators, specifically Google Cloud Composer / Apache Airflow. Modeling & Semantics: Strong understanding of relational, dimensional, and graph-based data modeling. Familiarity with knowledge graphs (e.g., StarDog) or biomedical ontologies is a major plus. Soft Skills & Leadership Abstract Thinking: Ability to conceptualize and suggest complex in-silico data solutions at a high strategic level without getting bogged down by immediate technology limitations. Communication: Exceptional ability to articulate the business and scientific value of pure data architecture to non-technical executive stakeholders.

Preferred Qualifications

Professional Google Cloud Data Engineer or Google Cloud Professional Cloud Architect certification. Degree in Computer Science, Data Engineering, Bioinformatics, Computational Chemistry, or a related quantitative field. Experience setting up GCP data foundations specifically engineered to feed Large Language Models (e.g., Vertex AI / Gemini) and autonomous AI agents. Location : Not a constraint Some useful links: Xebia | Creating Digital Leaders. https://www.linkedin.com/company/xebia/mycompany/ http://twitter.com/xebiaindia http://www.youtube.com/XebiaIndia Apply To This Job

You might like

Support Associate Night-Worker (Remote)

Work from home Full-time role

Legal Counsel (m/f/d)

Work from home Full-time role

Online Language Tutor – Flexible Remote Role (Austria)

Work from home Full-time role

Werkstudent (m/w/d) Personalwesen

Work from home Full-time role

Engineering Manager - Customer Success

Work from home Full-time role

(Junior) Strategic Sales Consultant - New Business (m/w/d)

Work from home Full-time role

Expert AI Engineer

Work from home Full-time role

Implementation & Insights Consultant - US

Work from home Full-time role

Cost Planner/Senior Cost Planner - Infrastructure

Work from home Full-time role

Expert AI Engineer

Work from home Full-time role

Looking for Chemistry Department Teaching Assistant - Fall 2023 in Lexington, KY

Work from home Full-time role

Senior Manager, Customer Success

Work from home Full-time role

(919) Microsoft D365 Field Services Solution Architect

Work from home Full-time role

Remote Jobs No Degree | SEO and Public Relations Support | $25 - $30/hr

Work from home Full-time role

Global Getaway Strategist

Work from home Full-time role

Experienced Live Chat Agent – Remote Work Opportunity with arenaflex

Work from home Full-time role

AS - Assistant Professor - Tenure/Tenure Track - Topology - 524751

Work from home Full-time role

Experienced Data Entry Associate – Remote Opportunity with arenaflex

Work from home Full-time role

Integrations Specialist - 9 month FTC

Work from home Full-time role

Anti-Human Trafficking Crisis Response Coordinator

Work from home Full-time role