Yuki(Yuxin) Chen

Hi this is Yuki, a data scientist in Microsoft.

Education

University of British Columbia(Canada)
Data Science, M.S GPA: 4.3/4.3
Shanghai University of International Business and Economics(China)
Data Science and Big Data Technology, B.S. GPA: 88.49/100
University of California, Los Angeles(Online)
Big Data and Business Applications Online Programme GPA: 4.3/4.3

Relevant Courses

Algorithms and Data Structures, Databases and Data Retrieval, Web and Cloud Computing, C++ Programming, Python Programming, Data Mining and Machine Learning, Data Visualization, Natural Language Processing, Computational Statistics, Linear Algebra, Bayesian Statistics, Artificial Intelligence

Technical skills

Programming: C++, Python, SQL, R, Golang, Lingo, Shell, Git, HTML, CSS, C#
Software: AIMMS, Tableau, MATLAB, SPSS, Photoshop, Audition, Excel, Management Scientist, MySQL, Access
Platforms: Linux, Windows, AWS, Google Cloud Platform, Azure
Open-Source Frameworks: Scikit-learn, Keras, TensorFlow, Pytorch, spaCy, NLTK, Neo4j, Pandas, NumPy, Matplotlib, Beautiful Soup, Scrapy, Selenium, Plotly, Dash, Regex, Seaborn, RESTful API, OpenAI

Know more about me

LinkedIn
Instagram
Github

What I did

During my studies, I have completed multiple projects, including program development, implementation of machine learning algorithms, and also served as an intern for operations research algorithms. My main research interest lies in natural language processing.

Data Scientist

Microsoft (Centific Inc outsourcing)
• Worked at Core Language Skills Science, Azure Language Pillars (ALPS) at Microsoft Azure Cognitive Services.
• Used Azure Language Studio, Azure Machine Learning Studio, Amulet and other tools to train natural language processing models to train natural language processing models to perform tasks such as sentiment analysis, Prompt2Model, write reports analysing the performance of model, improve model performance, and transfer models to compatible settings to provide large-scale services to customers, including text, document, and dialog services.
• Work with multiple teams to improve model quality using codes (Python) and LLMs (GPT4) based on client needs.

Data Scientist Capstone Intern

JustPractice Technologies Inc
• Performed natural language processing techniques such as tokenization, POS tagging and word embedding in SQL and Python on over 400,000 unstructured clinical notes.
• Utilized several technologies and algorithms like LDA (Topic modelling), NER (namely entity recognition), TF-IDF and Multipartite Rank to analyze the clinical notes.
• Built a pipeline to extract corresponding values of keywords and discovered their patterns to aid healthcare professionals to identify diagnoses, procedures and symptoms more efficiently.

Optimization Algorithm Intern

Optimization Analytics Technology Pte Ltd., Shanghai Office
• Analyzed client needs, formulate optimization problems for network planning and resource allocation, and process data with SQL; build mathematical models using AIMMS; develop algorithms and calculate optimal results in C++/Java/Python.
• Helped companies such as Sanofi and Rio Tinto to plan production, storage and transport solutions to optimise costs.
• Exposed to large datasets and real-world challenges faced by various industries.

Task-oriented Question Answering for Traditional Chinese Medicine (TCM)

• Constructed a TCM knowledge graph based on Compendium of Materia Medica using Python and Neo4j.
• Designed and implemented an algorithm for joint-model question answering and information retrieval (an NLP dialogue system) for TCM diagnosis and prescription recommendation.

Chinese Text Retelling Algorithm Based on Multi-level Semantic Unitsr

• Proposed a text retelling method for automated semantic augmentation aimed towards dialogue systems.
• Developed an algorithm to replace extracted Chinese words with synonyms determined by context features, sentence templates, and candidate sets, using a bidirectional LSTM model based on the attention mechanism.

Tag-based Hybrid Bayesian Personalised Ranking for Literature Recommendation

• Developed a recommender system using hybrid Bayesian personalized ranking and weight coefficient algorithms, based on user interactions extracted from Wanfang Data with Python.

Community Health Pass App

• Built an app with App Inventor for communities and residences to manage health profiles and entry records of residents, by using a web API to generate and read QR codes; improving community health during the pandemic.

To be continued...

Get in touch

Now I am finding job related to data or machine learning in Canada, contact me if you have any question or job recommendation.