March 30, 2025

ikayaniaamirshahzad@gmail.com

Making a web app generator with open ML models


Julian Bilcke's avatar



As more code generation models become publicly available, it is now possible to do text-to-web and even text-to-app in ways that we couldn’t imagine before.

This tutorial presents a direct approach to AI web content generation by streaming and rendering the content all in one go.

Try the live demo here!Webapp Factory

main_demo.gif



Using LLM in Node apps

While we usually think of Python for everything related to AI and ML, the web development community relies heavily on JavaScript and Node.

Here are some ways you can use large language models on this platform.



By running a model locally

Various approaches exist to run LLMs in Javascript, from using ONNX to converting code to WASM and calling external processes written in other languages.

Some of those techniques are now available as ready-to-use NPM libraries:

However, running large language models in such an environment can be pretty resource-intensive, especially if you are not able to use hardware acceleration.



By using an API

Today, various cloud providers propose commercial APIs to use language models. Here is the current Hugging Face offering:

The free Inference API to allow anyone to use small to medium-sized models from the community.

The more advanced and production-ready Inference Endpoints API for those who require larger models or custom inference code.

These two APIs can be used from Node using the Hugging Face Inference API library on NPM.

💡 Top performing models generally require a lot of memory (32 Gb, 64 Gb or more) and hardware acceleration to get good latency (see the benchmarks). But we are also seeing a trend of models shrinking in size while keeping relatively good results on some tasks, with requirements as low as 16 Gb or even 8 Gb of memory.



Architecture

We are going to use NodeJS to create our generative AI web server.

The model will be WizardCoder-15B running on the Inference Endpoints API, but feel free to try with another model and stack.

If you are interested in other solutions, here are some pointers to alternative implementations:

  • Using the Inference API: code and space
  • Using a Python module from Node: code and space
  • Using llama-node (llama cpp): code



Initializing the project

First, we need to setup a new Node project (you can clone this template if you want to).

git clone https://github.com/jbilcke-hf/template-node-express tutorial
cd tutorial
nvm use
npm install

Then, we can install the Hugging Face Inference client:

npm install @huggingface/inference

And set it up in `src/index.mts“:

import { HfInference } from '@huggingface/inference'



const hfi = new HfInference('** YOUR TOKEN **')



Configuring the Inference Endpoint

💡 Note: If you don’t want to pay for an Endpoint instance to do this tutorial, you can skip this step and look at this free Inference API example instead. Please, note that this will only work with smaller models, which may not be as powerful.

To deploy a new Endpoint you can go to the Endpoint creation page.

You will have to select WizardCoder in the Model Repository dropdown and make sure that a GPU instance large enough is selected:

new_endpoint.jpg

Once your endpoint is created, you can copy the URL from this page:

deployed_endpoints.jpg

Configure the client to use it:

const hf = hfi.endpoint('** URL TO YOUR ENDPOINT **')

You can now tell the inference client to use our private endpoint and call our model:

const { generated_text } = await hf.textGeneration({
  inputs: 'a simple "hello world" html page: '
});



Generating the HTML stream

It’s now time to return some HTML to the web client when they visit a URL, say /app.

We will create and endpoint with Express.js to stream the results from the Hugging Face Inference API.

import express from 'express'

import { HfInference } from '@huggingface/inference'

const hfi = new HfInference('** YOUR TOKEN **')
const hf = hfi.endpoint('** URL TO YOUR ENDPOINT **')

const app = express()

As we do not have any UI for the moment, the interface will be a simple URL parameter for the prompt:

app.get('/', async (req, res) => {

  
  res.write('')

  const inputs = `# Task
Generate ${req.query.prompt}
# Out
`

  for await (const output of hf.textGenerationStream({
    inputs,
    parameters: {
      max_new_tokens: 1000,
      return_full_text: false,
    }
  })) {
    
    res.write(output.token.text)

    
    process.stdout.write(output.token.text)
  }

  req.end()
})

app.listen(3000, () => { console.log('server started') })

Start your web server:

npm run start

and open https://localhost:3000?prompt=some%20prompt. You should see some primitive HTML content after a few moments.



Tuning the prompt

Each language model reacts differently to prompting. For WizardCoder, simple instructions often work best:

const inputs = `# Task
Generate ${req.query.prompt}
# Orders
Write application logic inside a JS