Articles for category: AI Tools

Generate training data and cost-effectively train categorical models with Amazon Bedrock

In this post, we explore how you can use Amazon Bedrock to generate high-quality categorical ground truth data, which is crucial for training machine learning (ML) models in a cost-sensitive environment. Generative AI solutions can play an invaluable role during the model development phase by simplifying training and test data creation for multiclass classification supervised learning use cases. We dive deep into this process on how to use XML tags to structure the prompt and guide Amazon Bedrock in generating a balanced label dataset with high accuracy. We also showcase a real-world example for predicting the root cause category for

rOpenSci News Digest, March 2025

Dear rOpenSci friends, it’s time for our monthly news roundup! You can read this post on our blog. Now let’s dive into the activity at and around rOpenSci! rOpenSci HQ rOpenSci Champions Program 2025 In Spanish: Apply before April 30th! We have great news: The call for applications to be part of the new cohort of our 2025 Program is now open! And for the first time it will be in Spanish! Our program seeks to identify, recognize and reward people who are leaders in an open science community, research software engineering and the R programming community. This year’s program

Deephaven v0.6.0 Documentation | Deephaven

Deephaven Community Core’s documentation includes several simple how-tos and a section of comprehensive reference articles to get you started. This month, we published a tutorial to familiarize new users with Deephaven’s primary features. Follow along to learn how to get your data into Deephaven and basic querying techniques. We also added new docs teaching users how to: use PyTorch, SciKit, and Tensorflow in Deephaven, which help to get you started building machine learning models. use Pandas and Numba in Python queries. navigate the user interface, such as the Chart Builder feature, which easily creates data visualizations. calculate EMAs, one of

A Production ML system for SEA’s Biggest Hospital Group

I was humbled to be invited by DATAx to share at their conference. They were looking for a hands-on Applied Scientist to share about how data science and machine learning could be applied in healthcare and I was happy to help. For this, I shared the case study of how uCare.ai helped develop a machine learning system for Parkway Pantai Group (Southeast Asia’s largest healthcare group) that estimates a patient’s total bill at the point of pre-admission. Doing so provides greater transparency to patients, helping to reduce potential payment challenges at point of discharge. It also benefits providers where the

GO Full Course – DEV Community

Hey everyone, I’m thrilled to announce the launch of my brand-new Go programming full course on YouTube! 🎉 Go Version I am using Go Version 1.22. About Me For those who don’t know me, I’m Amir, a passionate software developer and educator who loves diving into new programming languages and sharing my knowledge with the community. Over the past few months, I’ve been working hard on creating a comprehensive Go course that will take you from a complete beginner to a confident Go developer. Why Learn Go? Go (or Golang) is an amazing language known for its simplicity, performance, and

An 11B parameter pretrained language model and VLM, trained on over 5000B tokens and 11 languages

The Falcon 2 Models TII is launching a new generation of models, Falcon 2, focused on providing the open-source community with a series of smaller models with enhanced performance and multi-modal support. Our goal is to enable cheaper inference and encourage the development of more downstream applications with improved usability. The first generation of Falcon models, featuring Falcon-40B and Falcon-180B, made a significant contribution to the open-source community, promoting the release of advanced LLMs with permissive licenses. More detailed information on the previous generation of Falcon models can be found in the RefinedWeb, Penedo et al., 2023 and The Falcon

Integrating custom dependencies in Amazon SageMaker Canvas workflows

When implementing machine learning (ML) workflows in Amazon SageMaker Canvas, organizations might need to consider external dependencies required for their specific use cases. Although SageMaker Canvas provides powerful no-code and low-code capabilities for rapid experimentation, some projects might require specialized dependencies and libraries that aren’t included by default in SageMaker Canvas. This post provides an example of how to incorporate code that relies on external dependencies into your SageMaker Canvas workflows. Amazon SageMaker Canvas is a low-code no-code (LCNC) ML platform that guides users through every stage of the ML journey, from initial data preparation to final model deployment. Without

Bones or No Bones? More Deephaven predictions

We predicted a Bones day, but Noodle had other plans. Considering Monday was a No Bones day, we shrugged it off and figured it’s ok to be wrong and appeased ourselves with self-care. from deephaven import new_tablefrom deephaven.column import string_colaccuracy = new_table([ string_col("Date", ["11/1/2021", "11/2/2021", "11/3/2021", "11/4/2021", "11/5/2021"]), string_col("Noodles_Says", ["No Bones", "No Bones", "Bones", "Bones", "Bones"]), string_col("Prediction", ["Bones", "Bones", "Bones", "Bones", "No Bones"])]) Nevertheless, the real question is, how can we better predict Bones days? In a comment on Jonathan’s TikTok (@jongraz), I suggested that perhaps Noodle is so amazing because he bases his behavior on a planetary configuration. Many

What does a Data Scientist really do?

As a data scientist, I sometimes get approached by others on questions related to data science. This could be while at work, or at the meetups I organise and attend, or questions on my site or linkedIn. Through these interactions, I realised there is significant misunderstanding about data science. Misunderstandings arise around the skills needed to practice data science, as well as what data scientists actually do. Perception of what is needed and done Many people are of the perception that deep technical and programming abilities, olympiad level math skills, and a PhD are the minimum requirements, and that having