Articles for category: AI Tools

Thieme E-Journals – Methods of Information in Medicine / Abstract

Abstract Background Clinical procedures are often performed in outpatient clinics without prior scheduling at the administrative level, and documentation of the procedure often occurs solely in free-text clinical electronic notes. Natural language processing (NLP), particularly named entity recognition (NER), may provide a solution to extracting procedure data from free-text electronic notes. Methods Free-text notes from outpatient ophthalmology visits were collected from the electronic clinical records at a single institution over 3 months. The Prodigy low-code annotation tool was used to create an annotation dataset and train a custom NER model for clinical procedures. Clinical procedures were extracted from the entire set of

Benchmarking Text-to-Speech Models in the Wild

Automated measurement of the quality of text-to-speech (TTS) models is very difficult. Assessing the naturalness and inflection of a voice is a trivial task for humans, but it is much more difficult for AI. This is why today, we’re thrilled to announce the TTS Arena. Inspired by LMSys‘s Chatbot Arena for LLMs, we developed a tool that allows anyone to easily compare TTS models side-by-side. Just submit some text, listen to two different models speak it out, and vote on which model you think is the best. The results will be organized into a leaderboard that displays the community’s highest-rated

8.7 Streams primitivos e infinitos

Resumo: Streams primitivos e infinitos 1. Streams Primitivos Objetivo: Evitar boxing/unboxing desnecessários, melhorando desempenho. Tipos disponíveis: IntStream, LongStream, DoubleStream. Iteradores especializados: Exemplo: PrimitiveIterator.OfInt (retorna int com nextInt(), além de next() para Integer). Métodos úteis: range(inicio, fim): Gera sequência de números (ex: IntStream.range(0, 10)). mapToInt, mapToLong, mapToDouble: Convertem Stream para streams primitivos. boxed(): Converte um stream primitivo para Stream (ex: IntStream.generate(…).boxed()). 2. Streams Infinitos Criação via Supplier:Stream.generate(Supplier): Gera elementos indefinidamente (ex: números aleatórios). IntStream.generate(() -> random.nextInt()) Enter fullscreen mode Exit fullscreen mode Stream.iterate(seed, UnaryOperator): Gera sequências com base em um valor inicial (ex: números naturais). IntStream.iterate(0, x -> x + 1)

Compact word vectors with Bloom embeddings · Explosion

A high-coverage word embedding table will usually be quite large. One million 32-bit floats occupies 4MB of memory, so one million 300-dimension vectors will be 1.2GB in size. Such a large model size is at least annoying for many applications, while for others it’s completely prohibitive. There are three obvious approaches to reducing the size of the embedding table: Reduce the number of words in the vocabulary. Reduce the number of dimensions per vector. Reduce the number of bits per dimension. While all three of these options can be effective, there’s also a less obvious solution: Cheat, using a probabilistic

StarCoder2 and The Stack v2

BigCode is releasing StarCoder2, the next generation of transparently trained open code LLMs. All StarCoder2 variants were trained on The Stack v2, a new large and high-quality code dataset. We release all models, datasets, and the processing as well as the training code. Check out the paper for details. What is StarCoder2? StarCoder2 is a family of open LLMs for code and comes in 3 different sizes with 3B, 7B and 15B parameters. The flagship StarCoder2-15B model is trained on over 4 trillion tokens and 600+ programming languages from The Stack v2. All models use Grouped Query Attention, a context

8.8 Praticando o que aprendemos com java.nio.file.Files

1. Classe java.nio.file.FilesObjetivo: Manipular arquivos/diretórios usando Path (Java 7+) e integrar com Streams (Java 8). Método Files.list(Path): Retorna um Stream com os elementos do diretório. Exemplo básico de listagem: Files.list(Paths.get("./caminho/do/diretório")) .forEach(System.out::println); Enter fullscreen mode Exit fullscreen mode 2. Filtrando ArquivosUso de filter para selecionar arquivos .java: Files.list(Paths.get("./caminho/do/diretório")) .filter(p -> p.toString().endsWith(".java")) .forEach(System.out::println); Enter fullscreen mode Exit fullscreen mode 3. Leitura de Linhas de Arquivos Problema ao usar Files.lines dentro de map: Files.lines(Path) lança IOException, que não é tratada em lambdas. Erro de compilação: Lambdas não podem lançar exceções verificadas diretamente. Solução: Criar um método auxiliar para encapsular a exceção: static Stream

Text-Generation Pipeline on Intel® Gaudi® 2 AI Accelerator

With the Generative AI (GenAI) revolution in full swing, text-generation with open-source transformer models like Llama 2 has become the talk of the town. AI enthusiasts as well as developers are looking to leverage the generative abilities of such models for their own use cases and applications. This article shows how easy it is to generate text with the Llama 2 family of models (7b, 13b and 70b) using Optimum Habana and a custom pipeline class – you’ll be able to run the models with just a few lines of code! This custom pipeline class has been designed to offer

8.9 FlatMap – DEV Community

1. Introdução ao flatMap Objetivo: “Achatar” (flatten) Streams aninhados em um único Stream (ex: Stream> → Stream). Aplicação: Útil para processar coleções de coleções, arquivos com múltiplas linhas, ou estruturas hierárquicas. 2. Exemplo Prático com Arquivos Problema:Usar map com Files.lines gera um Stream>, não desejado. Solução com flatMap: Stream linhas = Files.list(Paths.get("./caminho/do/diretório")) .filter(p -> p.toString().endsWith(".java")) .flatMap(p -> lines(p)); // Converte Stream> → Stream Enter fullscreen mode Exit fullscreen mode Resultado: Todas as linhas de todos os arquivos .java em um único Stream. 3. Exemplo com Caracteres (flatMapToInt)Objetivo: Obter todos os caracteres (como int) de todas as linhas dos arquivos. Código:

How we built a Stack Overflow Community questions analyzer (and you can too)

Being part of the GitLab collective is an opportunity to learn first hand about the challenges the community using the DevOps Platform is facing. As a Collective Member logging between 2-3 times a week in StackOverflow reading the questions and discussion posted about GitLab and manually sorting them by ‘Recent Activity’, ‘Trending’ and using Dates, I asked myself: how can we leverage this wealth of data and discover feedback, while finding the most frequent topics where the community has questions? This would be an opportunity to get a quick overview of topics where the community regularly needs help; this would