Hello! I'm Guilherme Amorim, a medical doctor and clinical data scientist from Portugal. I am currently working as a Population Health Data Specialist within the Evidence Generation department at AstraZeneca.
I have over 6 years' experience working across clinical care, randomised controlled trials, and epidemiology in the UK and Portugal. I am passionate about using data to answer important questions that contribute to the common good.
You can find below a summary of my CV and a portfolio showcasing some of my work in computer science. My complete CV is
For a complete list of scientific publications please click here
Please note that all opinions and thoughts expressed in my portfolio are my own, and do not represent the views of AstraZeneca or the University of Oxford.
Thanks for visiting!
Main research projects:
1. A Study of Cardiovascular Events in Diabetes (ASCEND)-PLUS (oral semaglutide
for primary cardiovascular prevention in diabetes; ISRCTN76193287), 2024
2. Randomised Evaluation of COVID-19 Therapy (RECOVERY; NCT04381936),
2020-2024
3. Active Monitoring for AtriaL FIbrillation (AMALFI; remote screening for
silent atrial fibrillation in primary care; ISRCTN15544176), 2019-2024
4. A Randomized Trial Assessing the Effects of Inclisiran on Clinical Outcomes Among
People With Cardiovascular Disease (ORION-4; NCT03705234), 2019-2024
Main deliverables:
- Led recruitment of >5,000 participants in the AMALFI trial; contributed to
recruitment of >12,000 UK participants in ORION-4 and >49,000 participants in
RECOVERY
- Main author of the AMALFI statistical analysis plan
- Led NHS data linkage request submissions and linkage data collection from 27 primary
care practices and centralised NHS England datasets (AMALFI and ORION-4)
- Developed data-driven algorithms to identify safety events and medication exposure
(RECOVERY, ORION-4, AMALFI)
- Developed and led a centralised pre-screening procedure using linked hospital lab
data (ORION-4)
- Led or contributed to 24 research papers (6 as first author), total >19,000 citations
(h-index 22), including major contributions to global clinical guidelines on COVID-19
(RECOVERY trial)
- Presented research outputs at several international conferences, including European
Society of Cardiology Congress 2017 and 2021
▪ Assessment of a trial protocol in chronic kidney disease submitted for regulatory approval .
▪ Remote lectures/interviews on conducting clinical trials within
primary care, medical device trials,
and pilot studies.
▪ Supervised two MSc students and an Academic Foundation doctor.
▪ Admission clerking, ambulatory medical care, ward cover; total >1000 hours.
▪ Large group lecturing for final-year medical students preparing for the national medical specialty ranking exam.
▪ Rotation through multiple clinical specialties as a junior doctor, which included ward work, emergency care, and primary care clinics.
Thesis title:
"Big drug data for big drug trials: development, validation, and implementation of
routinely-collected data on medications within large trials in cardiovascular
disease and COVID-19 in England and Scotland"
Academic Performance:
Final grade 16/20 (top
11%)
93% score in the national medical specialty admission exam (top 7%)
Imperial College London
PgDip in Artificial Intelligence
Module 0 (Induction)
ePortfolios in postgraduate learning
I wrote about my personal perspective on the value and importance of maintaining a personal ePortfolio as a postgraduate student in computer science.The importance of a postgraduate degree in the Computer Science field
Full essay hereModule 1 (Understanding Artificial Intelligence)
Artificial Intelligence in Business
I wrote about the potential benefits of AI in healthcare:▪ Initial Post
▪ Summary Post
Here are my contributions to other students' posts:
▪ Peer Response on Rodrigo's Initial Post
▪ Peer Response on Craig's Initial Post
▪ Peer Response on Mateusz's Initial Post
Grade: 74% (Distinction)Implementation of Machine Learning algorithms
I wrote about the applications of self-supervised learning in medicine and healthcare:▪ Initial Post
▪ Summary Post
Here are my contributions to other students' posts:▪ Peer Response on Natali's Initial Post
▪ Peer Response on Craig's Initial Post
▪ Peer Response on Masana's Initial Post
▪ Peer Response on Andrei's Initial Post
Grade: 75% (Distinction)Artificial Intelligence for finance start-ups – opportunities and challenges
Full essay hereImplementation of an AI-based solution for a small fintech consulting start-up
Full essay hereModule 2 (Numerical Analysis)
Exploration of national statistics on alcohol-consumption in England in 2011, using data from the Health Survey for England 2011
Full report here (slides)Critical reflection on my learning journey throughout this module
Full report hereCoding activities developed throughout the module (with parallel coding in R and Python within the same R notebook)
Units 1-6: Loading data, general data exploration, data transformation (filtering, subsetting, recoding), descriptive statistics, simple visualisations and calculationsModule 3 (Machine Learning)
Collaborative Discussion 1: The 4th Industrial Revolution
I read the Schwab (2016) article from the World Economic Forum and discussed the impact of industry 4.0 on healthcare.
▪ Initial Post
▪ Summary Post
Here are my contributions to other students' posts:▪ Peer Response on Martyna's Initial Post
▪ Peer Response on Dinh Khoi Dang's Initial Post
▪ Peer Response on Jafar's Initial Post
Exploratory data analysis
I performed exploratory data analysis and
cleaning in the
Kaggle auto-mpg dataset,
including the following steps:
▪ Identify missing values.
▪ Estimate Skewness and Kurtosis.
▪ Correlation Heat Map.
▪ Scatter plot for different parameters.
▪ Replace categorical values with numerical values (i.e., America 1,
Europe 2 etc.).
Jupyther notebook here:
Practical exploration of Python code for linear and polynomial regression.
Exercise 1: correlation (with some changes to data points made by myself)
Exercise 2: simple linear regression (with some changes to data points made by myself)
Exercise 3: multiple linear regression (no changes made, just read through and explored the code provided)
Exercise 4: polynomial regression (no changes made, just read through and explored the code provided)
Linear regression with Scikit-Learn
In this Unit, I first worked through the
Scikit-Learn linear regression coding example provided. Then I used
global data from to assess the relationship between population and
gross domestic product (GDP) per capita, using correlation and linear
regression (and performing exploratory data analysis and data
cleaning for that end).
Jupyter notebook
K-means clustering
Wiki entry describing K-means clustering algorithm, as well as its streghts and pitfalls, based on experimentation with two interactive animations.
Jaccard coefficients
I calculated Jaccard coefficients (to assess similarity/dissimirity between observations) using a practical example.
K-means clustering (practical application)
I read through the example K-means
clustering notebook provided, and applied this clustering technique
in different settings using the 3 datasets suggested.
Iris dataset
Wine dataset
Australian weather dataset
Artificial Neural Networks (practical application)
I worked through the exercise notebooks
provided and added comments to facilitate interpretation.
Simple perceptron
Perceptron with AND operator
Multilayer perceptron
Gradient descent (practical application)
I worked through the exercise notebook
provided, and added comments and a visusalisation of the cost
function to facilitate interpretation.
Gradient descent
Emerging research in Artificial Neural Networkds
I read the articles by Pruciak (2021) and the
Centre for Data Ethics and Innovation (2019), and then wrote
down some thoughts on the application of Artifical Neural Networks in
healthcare, and my concerns about the use of AI-based technology in
the insurance industry.
Risks and benefits of AI writers
I read the article by Hutson (2021) and wrote about the benefits and dangers of large language model (LLMs) applicationsacross a range of industries and settings, as well as possible risk mitigating approaches.
Initial post
Summary post
Peer response on Martyna's initial post
Peer response on Dinh Khoi Dang's initial
post
Ethical and social implications of convoluted neural networks
I read the article by Wall (2021) and wrote about my thoughts on
the ethical and social implications of convoluted neural networks
(CNNs).
I also went through the Jupyter notebook provided on image
recognition using a simple CNN, changed the input image and verified
the prediction being made (NB the notebook is not my work, I just
changed some parameters for practice purposes).
Convoluted neural network architecture
I explored the architecture of CNN
networks using the CNN Explainer visualisation and associated
article, and tested how the algorithm performed with images belonging
to classes it does know (e.g. orange) vs classes it doesn't (e.g.
motorbike)
Model performance evaluation
I explored the notebook provided with
practical examples of performance evaluation metrics and their
calculations, wrote some annotations, and changed some data points to
see their impact on prediction and performance (the underlying code
is not mine).
Future of Machine Learning
I read the article by Diez-Olivan et al (2019) and noted down some thoughts on use-cases and challenges of prescriptive machine learnign models within healthcare.
Airbnb business analysis using a data science approach
In this group assignment, we used data
from the Kaggle New York City Airbnb dataset to produce a business report. Our
analysis produced insights into price drivers, and a predictive model
which can be used to support pricing strategies and identify over-
and underperforming listings.
Business report
Python code and outputs
Collaborative GitHub repository
Grade: 75% (Distinction)
Image classification using convolutional neural networks
In this individual assignment, I
developed and trained multiple convolutional neural network models
for image classification in the CIFAR-10 dataset and produced a video presentation. My development approach covered diverse architectures,
hyperparameters, and training approaches. I explored how to tackle
common challenges involded in developing machine learning models, and critically assessed the performance on my final model on test data.
Slide presentation
Audio transcript
Python code and outputs
Grade: 81% (Distinction)
Critical reflection on my learning journey throughout this module
Full report hereGrade: 80% (Distinction)
Module 4 (Intelligent Agents)
Collaborative Discussion 1: Agent Based Systems
I discussed the factors leading to the rise of agent-based systems and possible benefits and risks of this approach to organisations.
▪ Initial Post
▪ Summary Post
Here are my contributions to other students' posts:▪ Peer Response on Georgios' post
▪ Peer Response on Yemi's post
Collaborative Discussion 2: Agent Communication Languages
I discussed the pros and cons of using specific agent communication languages (such as KQML) vs traditional method invocation in standard languages (such as Python or Java) for communication among agents in multi-agent systems.
▪ Initial Post
▪ Summary Post
Here are my contributions to other students' posts:▪ Peer Response on Nooras' post
▪ Peer Response on Haris' post
Creating agent dialogues
I used KQML and KIF to create a simple dialogue between two agents.
Creating Parsing Trees
In this unit, I created constituency-based parsing trees to parse three different dialogues using natural language processing.
Collaborative Discussion 3: Deep learning
I discussed the potencial opportunities offered by emerging trends in deep learning models, especially generative AI content, but also the associated risks and ethical considerations.
▪ Initial Post
▪ Summary Post
Here are my contributions to other students' posts:▪ Peer Response on Noora' post
▪ Peer Response on Haris' post
Deep learning in action
I explored the potencial use of deep learning algorithms in healthcare, especially for predictive medicine approaches, and discussed how the technology can be applied, its possible benefits to society, but also the underlying risks and ethical issues.
Online academic research agent
In this group assignment, we produced a concept development report for an intelligent multi-agent academic research assistent, capable of finding results on a website based on search terms (e.g., social media or a search engine), extracting the data, and sending to an offline location.
Development report
Grade: 86% (Distinction)
Complete Python implementation
In this assignmment, I proceeded to produce a coded implementation of the concept articulated in the team development report, using Python. I was able to develop, test, and showcase a fully-functioning software tool composed of multiple agents working in sequence, and using a circular control flow system. I learnt how to employ Langchain for streamlined and scalable LLM-based systems, developed a simple graphical user interface, and implemented a structured SQL database with additional vector indexing features. Finally, I critically assessed my implementation and identified some strenghts and weaknesses, as well as areas for potencial improvement.
Slide presentation
Audio transcript
Jupyter notebook (development and testing)
Grade: 82% (Distinction)
Critical reflection on my learning journey throughout this module
Full report here:Grade: 72% (Distinction)
Module 5 (Knowledge Representation and Reasoning)
Collaborative Discussion 1: Knowledge Representation and Reasoning - historical and semantic considerations
I discussed whether knowledge representation could be considered a novel or ancient concept, and how it relates to knowledge reasoning
▪ Initial Post
▪ Summary Post
Here are my contributions to other students' posts:▪ Peer Response on Nikolaos' post
▪ Peer Response on Abdulrahman's post
▪ Peer Response on Georgios' post
Reflection on knowing vs having information about
Practical exercise: sets, set Theory, truth Tables and logic
Practical exercise: first-order logic
Practical exericse: introduction to logic programming using Prolog
Reflection on the definition of knowledge bases and knowledge-based economies
Reflection on ontology development approaches
Practical exercise: ontology creation using Protégé
Practical exercise: inference and knowledge modelling using Protégé
Reflection on ontology design principles and evaluation
Collaborative Discussion 2: Ontology definitions and computer languages
I analysed a possible definition of an ontology, and discussed the pros and cons of using different computer languages (such as OWL2) fpr building ontologies and modeling knowledge bases
▪ Initial Post
▪ Summary Post
Here are my contributions to other students' posts:▪ Peer Response on Georgios' post
▪ Peer Response on Nikolaos' post
Analysis and Application of the Intelligence Task Ontology (ITO) in AI Benchmarking
Grade: 72% (Distinction)
Complete Python implementation
In this assignmment, I conceptualised, developed, implemented, and evaluated an ontology aimed at supporting an AI-driven job matching algorithm, capable of linking suitable candidates and jobs based on industry, experience, qualifications, and preferred location and salary.
Critical reflection on my learning journey throughout this module
Full report here:You can contact me at guilhermepessoaamorim@gmail.com