Articles for category: AI Research

[2402.14327] Subobject-level Image Tokenization

[Submitted on 22 Feb 2024 (v1), last revised 12 Mar 2025 (this version, v3)] View a PDF of the paper titled Subobject-level Image Tokenization, by Delong Chen and 4 other authors View PDF HTML (experimental) Abstract:Patch-based image tokenization ignores the morphology of the visual world, limiting effective and efficient learning of image understanding. Inspired by subword tokenization, we introduce subobject-level adaptive token segmentation and explore several approaches, including superpixel, SAM, and a proposed Efficient and PanOptiC (EPOC) image tokenizer. Our EPOC combines boundary detection — a simple task that can be handled well by a compact model — with watershed

[2411.02948] Grounding Natural Language to SQL Translation with Data-Based Self-Explanations

[Submitted on 5 Nov 2024 (v1), last revised 13 Mar 2025 (this version, v2)] View a PDF of the paper titled Grounding Natural Language to SQL Translation with Data-Based Self-Explanations, by Yuankai Fan and 4 other authors View PDF HTML (experimental) Abstract:Natural Language Interfaces for Databases empower non-technical users to interact with data using natural language (NL). Advanced approaches, utilizing either neural sequence-to-sequence or more recent sophisticated large-scale language models, typically implement NL to SQL (NL2SQL) translation in an end-to-end fashion. However, like humans, these end-to-end translation models may not always generate the best SQL output on their first try.

[2409.11697] Monomial Matrix Group Equivariant Neural Functional Networks

[Submitted on 18 Sep 2024 (v1), last revised 13 Mar 2025 (this version, v3)] View a PDF of the paper titled Monomial Matrix Group Equivariant Neural Functional Networks, by Viet-Hoang Tran and Thieu N. Vo and Tho H. Tran and An T. Nguyen and Tan M. Nguyen View PDF HTML (experimental) Abstract:Neural functional networks (NFNs) have recently gained significant attention due to their diverse applications, ranging from predicting network generalization and network editing to classifying implicit neural representation. Previous NFN designs often depend on permutation symmetries in neural networks’ weights, which traditionally arise from the unordered arrangement of neurons in

New creative updates to help advertisers generate lifestyle imagery

Richer, more engaging lifestyle images Since we first announced asset generation in Performance Max, we’ve continued to grow and improve our technology so that you can generate high-quality assets that resonate with your customers and deliver strong results. Last year, we expanded asset generation to six new languages, introduced it to new campaign types, including Demand Gen, and upgraded our image generation model. With the help of Imagen 3, we’re now rolling out the ability to use text prompts to generate images that contain adult people and faces across Performance Max, Demand Gen, Display and Apps campaigns. We’ve conducted extensive

Reddit – Heart of the internet

We value your privacy Reddit and its partners use cookies and similar technologies to provide you with a better experience. By accepting all cookies, you agree to our use of cookies to deliver and maintain our services and site, improve the quality of Reddit, personalize Reddit content and advertising, and measure the effectiveness of advertising. By rejecting non-essential cookies, Reddit may still use certain cookies to ensure the proper functionality of our platform. For more information, please see our Cookie Notice and our Privacy Policy. Source link

Why use decoders only (gpt) when we have full transformers architecture?

I was going through the architecture of transformer and then I Bert and Gpt, Bert is only using encoder and Gpt is only using decoder part of transformer , ( ik encoder part is utilized for classification, ner, analysis and decoder part is for generating text) but why not utilize the whole transformer architecture. Guide me I am new in this. submitted by /u/VegetableAnnual1839 [comments] Source link

Streamlining hardcoded subtitle extraction

I am trying to create a time table in excel, make a screenshot of every second of the video, detect the characters from that screenshot, create a srt file from that excel sheet in the time table and extract the hard coded subtitles, any ideas for efficiency submitted by /u/gunslinger1893 [comments] Source link

Battle scars to share

Happy Friday, I am looking examples of the failures in implementing AI solutions in businesses for a presentation. I am happy to include your name as provider of this example. . Feel free to remove the business or person's identity to save them from embrassement, but I appreciate industry and the size of the business. I appreciate the help. Murat submitted by /u/MuratOzturan [comments] Source link

Reddit – Heart of the internet

We value your privacy Reddit and its partners use cookies and similar technologies to provide you with a better experience. By accepting all cookies, you agree to our use of cookies to deliver and maintain our services and site, improve the quality of Reddit, personalize Reddit content and advertising, and measure the effectiveness of advertising. By rejecting non-essential cookies, Reddit may still use certain cookies to ensure the proper functionality of our platform. For more information, please see our Cookie Notice and our Privacy Policy. Source link

Reddit – Heart of the internet

We value your privacy Reddit and its partners use cookies and similar technologies to provide you with a better experience. By accepting all cookies, you agree to our use of cookies to deliver and maintain our services and site, improve the quality of Reddit, personalize Reddit content and advertising, and measure the effectiveness of advertising. By rejecting non-essential cookies, Reddit may still use certain cookies to ensure the proper functionality of our platform. For more information, please see our Cookie Notice and our Privacy Policy. Source link