USD per year
Senior Data Scientist (Contract)
Remote Apply Senior Data Scientist (4-month, full-time) Contract Duration: 16 weeks Location: Remote (but with the ability to take meetings between 9:00 am to 5:00 pm EST time zone) Summary
Project Overview
The Wikimedia Foundation is undertaking a research initiative to better understand how people reach Wikimedia projects, how different traffic sources relate to on-wiki engagement, and how changes in visibility affect content quality and contributor activity. This work will support future improvements to search visibility, content reuse partnerships, and movement-wide understanding of contributor pathways.
Scope of Work
The contractor will support analytical components of this initiative, focusing on:
- Traffic Health & Visibility Trends: Producing descriptive analyses of human traffic patterns, referrers, and indicators of traffic stability over time.
- Engagement & Attribution Exploration: Supporting preliminary assessments of how traffic sources relate to on-wiki engagement (e.g., likelihood of exploring additional pages or initiating contribution).
- Natural Experiments & Content Quality: Assisting in analyses of cases where sudden changes in visibility allow us to study downstream impacts on editing activity or content quality.
Specific methods and models will be selected based on data feasibility, privacy guidance, and consultation with internal teams.
Deliverables
Analytic Deliverables
- Cleaned/joined datasets, summary tables, and Jupyter notebooks supporting analyses
- Time-series analyses for Traffic Health indicators and content reusers
- Natural experiment analyses with interpretable visuals and written summaries
Documentation Deliverables
- Method documentation describing assumptions, data limitations, and analytical decisions
- Short briefs or memos explaining key findings for internal stakeholders
- Clear handover materials enabling reproducibility
Potential Deliverables (Depending on Time & Feasibility)
- Early prototype views for dashboards (Superset/Turnilo)
- Early specification drafts for indicators or experimental frameworks
Qualifications
Technical Skills
- Advanced SQL (large-scale distributed datasets)
- Python expertise (pandas, numpy, statsmodels, scikit-learn, Jupyter)
- Ability to work collaboratively in GitLab repositories
- Time-series modeling experience
- Applied causal inference (Diff-in-Diff, event studies, lag analysis)
- Experience working with log-level or large behavioral datasets
Analytical Skills
- Ability to evaluate data feasibility and design methodological approaches
- Ability to interpret and communicate analytical uncertainty
- Strong documentation practices and reproducibility mindset
Soft/Collaborative Skills
- Ability to work independently in a fast-moving, ambiguous research environment
- Strong communication with non-technical stakeholders
- Ability to manage competing priorities across multiple research modules
- Knowledge of the Wikimedia movement and ecosystem a plus
Collaboration & Reporting
The contractor will collaborate with the Product & Technology department and work closely with Data Engineering, Research & Decision Science, and relevant program teams. They will report to the project’s Staff Data Scientist.
Timeline
4 months (full-time), with sequencing of work dependent on data access, privacy reviews, and research needs.
About the Wikimedia Foundation
The Wikimedia Foundation is the nonprofit organization that operates Wikipedia and the other Wikimedia free knowledge projects. Our vision is a world in which every single human can freely share in the sum of all knowledge. We believe that everyone has the potential to contribute something to our shared knowledge,...
This job posting has expired and is no longer accepting applications.
Browse Active JobsWe are the nonprofit that hosts Wikipedia. We support the people, technology, and policies that enable reliable information to be shared with the world.
View Company Profile