Articles for category: AI Research

Why use decoders only (gpt) when we have full transformers architecture?

I was going through the architecture of transformer and then I Bert and Gpt, Bert is only using encoder and Gpt is only using decoder part of transformer , ( ik encoder part is utilized for classification, ner, analysis and decoder part is for generating text) but why not utilize the whole transformer architecture. Guide me I am new in this. submitted by /u/VegetableAnnual1839 [comments] Source link

Streamlining hardcoded subtitle extraction

I am trying to create a time table in excel, make a screenshot of every second of the video, detect the characters from that screenshot, create a srt file from that excel sheet in the time table and extract the hard coded subtitles, any ideas for efficiency submitted by /u/gunslinger1893 [comments] Source link

Battle scars to share

Happy Friday, I am looking examples of the failures in implementing AI solutions in businesses for a presentation. I am happy to include your name as provider of this example. . Feel free to remove the business or person's identity to save them from embrassement, but I appreciate industry and the size of the business. I appreciate the help. Murat submitted by /u/MuratOzturan [comments] Source link

Reddit – Heart of the internet

We value your privacy Reddit and its partners use cookies and similar technologies to provide you with a better experience. By accepting all cookies, you agree to our use of cookies to deliver and maintain our services and site, improve the quality of Reddit, personalize Reddit content and advertising, and measure the effectiveness of advertising. By rejecting non-essential cookies, Reddit may still use certain cookies to ensure the proper functionality of our platform. For more information, please see our Cookie Notice and our Privacy Policy. Source link

Reddit – Heart of the internet

We value your privacy Reddit and its partners use cookies and similar technologies to provide you with a better experience. By accepting all cookies, you agree to our use of cookies to deliver and maintain our services and site, improve the quality of Reddit, personalize Reddit content and advertising, and measure the effectiveness of advertising. By rejecting non-essential cookies, Reddit may still use certain cookies to ensure the proper functionality of our platform. For more information, please see our Cookie Notice and our Privacy Policy. Source link

Collaborative learning with large language models

Large language models (LLMs) have significantly improved the state of the art for solving tasks specified using natural language, often reaching performance close to that of people. As these models increasingly enable assistive agents, it could be beneficial for them to learn effectively from each other, much like people do in social settings, which would allow LLM-based agents to improve each other’s performance. To discuss the learning processes of humans, Bandura and Walters described the concept of social learning in 1977, outlining different models of observational learning used by people. One common method of learning from others is through a

Naturally Occurring Equivariance in Neural Networks

This article is part of the Circuits thread, an experimental format collecting invited short articles and critical commentary delving into the inner workings of neural networks. Curve Detectors High-Low Frequency Detectors Contents Convolutional neural networks contain a hidden world of symmetries within themselves. This symmetry is a powerful tool in understanding the features and circuits inside neural networks. It also suggests that efforts to design neural networks with additional symmetries baked in (eg. ) may be on a promising track. To see these symmetries, we need to look at the individual neurons inside convolutional neural networks and the circuits that

Bayesian Multi-Group Gaussian Process Models for Heterogeneous Group-Structured Data

Bayesian Multi-Group Gaussian Process Models for Heterogeneous Group-Structured Data Didong Li, Andrew Jones, Sudipto Banerjee, Barbara E. Engelhardt; 26(30):1−34, 2025. Abstract Gaussian processes are pervasive in functional data analysis, machine learning, and spatial statistics for modeling complex dependencies. Scientific data are often heterogeneous in their inputs and contain multiple known discrete groups of samples; thus, it is desirable to leverage the similarity among groups while accounting for heterogeneity across groups. We propose multi-group Gaussian processes (MGGPs) defined over $\mathbb{R}^p\times \mathscr{C}$, where $\mathscr{C}$ is a finite set representing the group label, by developing general classes of valid (positive definite) covariance functions

Deep representation learning for clustering longitudinal survival data from electronic health records

EHR dataset from UK Biobank This study was conducted using the UK Biobank resource, which has ethical approval and its own ethics committee (https://www.ukbiobank.ac.uk/ethics/). This research has been conducted using UK Biobank resources under Application Number 57952. For our analyses, we used both the primary and hospital inpatient care diagnosis records made available via the UK Biobank study64. We started from 451,265 patients with available hospital inpatient care data. For each patient, a diagnosis sequence was constructed by interleaving the hospital inpatient care data with any available primary care data based on their timestamps. We then mapped all resulting diagnosis

Pile-T5 | EleutherAI Blog

The T5 model (Raffel et al, 2019) is widely used in the NLP community. Its base model has been downloaded from Hugging Face millions of times, leaving no doubt that these models are a favorite of the community. However, T5’s tokenizer omits important code-related tokens and subsequent pretraining datasets have been released with higher quality filtering and more diverse domains. In this blog post, we introduce a new version of T5 intended to address those weaknesses: Pile-T5, trained on the Pile (Gao et al, 2020) and using the LLaMA tokenizer (Touvron et al, 2023). Model Description# Our alternative version replaces