I'm working on an LLM harness to evaluate local models, specifically to evaluate quantized models exhaustively and compare them to flagship models i have experience using on OpenRouter. I have a pretty complete corpus of code, I came to a stop and wondered "This is probably the best implementation I've made so far..". So i had an idea. I wonder what AI would say about my code? Do a quick code review? well.. it was brutal.

Gemini: “Skill Level: Intermediate Scripter / Automation Hobbyist”

Claude: “Intermediate Beginner (3-4/10)“

Grok-4: “Developer Skill Level: Intermediate.“

GPT-5: “Skill Assessment: Intermediate/Advanced Hobbyist“

Ouch, Claude 🫠

The improvements suggested

Here’s all the improvements suggested by my panel of LLMs, tally’s represent an LLM making that a main point for improvement:

Hard-coding Configuration: |||
Error Handling Incorrect: |||
Lack of Object Oriented Polymorphism: |
Use Safe Process Management (Context Managers): ||
Modern Typing, Documentation, & Function Signatures: ||||
Security (Shell Injection): |||
Proper Logging: ||
Add Data Validation: ||
Request Timeouts & Retry Logic: |
PEP-8 Compliance: |

"fail fast with exceptions," "never trust input", and "explicit over implicit"

Priorities

I’ve always hacked code together, so i never expected to be ‘right’ on anything as long as it worked. I can start here though, a list of priorities:

Typing, In-Script Documentation, Function Signatures
“Pythonic” Configuration Implementation
Error Handling
Security

These are in order of occurrence between my panel of LLMs. #1 was mentioned by ALL of them, typing and function signatures, and it makes sense. as my code base grows, this is extremely important for readability and clarity.

Typing & Documentation

So, there was some confusion on the terminology on this one. Type hints, type annotations, type comments? Type hinting is the general umbrella term for adding type information to code, meaning to explicitly label the code as a certain data type/return type. Good to know.

This is a basic example from the python PEP docs, slightly modified:

x = 'Greg' # type: str
x: str = 'Greg'
def greeting(name: str) -> str:
    return 'Hello ' + name
greeting(x)

A few things here that matter. First off, I’ve seen this arrow from the function before, couldn’t pinpoint what it was meant to do, but it makes a lot more sense now. None of the types are actually enforceable at runtime. Not evaluated, at least by the python interpreter. So my intuition that they were doing something is where most of my stressors were. So, line one and two are examples of a type comment and type hint, respectively. It can be done either way, I honestly prefer the latter because the former has the possibility of muddying my comments. It’s also consistent with the argument hinting.

def generate(self):
    try:
        if 'localhost' in self.host:
            self._update_status()
            if self.json_payload['model'] not in self.model_status['loaded']:
                self.load(self.json_payload['model'])
        self._rebuild_history()
        self.json_payload['messages'] = self.full_history
        response = requests.post(
                    url = f'{self.host}v1/chat/completions',
                    headers = self.headers,
                    json = self.json_payload
                )
        if response.status_code != 200:
            return {'error':f'Could not generate:\n{response.status_code} - {response.reason}\nheaders: {self.headers}\njson: {self.json_payload}'}
        response = response.json()
        if 'localhost' in self.host:
            return {
                    'response':response['choices'][0]['message']['content'],
                    'reasoning':response['choices'][0]['message']['reasoning_content'],
                    'finish_reason':response['choices'][0]['finish_reason'],
                    'input_tokens':response['usage']['prompt_tokens'],
                    'output_tokens':response['usage']['completion_tokens']
                    }
        return {
                'response':response['choices'][0]['message']['content'],
                'reasoning':response['choices'][0]['message']['reasoning'],
                'finish_reason':response['choices'][0]['finish_reason'],
                'input_tokens':response['usage']['prompt_tokens'],
                'output_tokens':response['usage']['completion_tokens']
                }

    except Exception as e:
        return {'error' : f'Error upon model generation:\n{e}'}

This. This function is an absolute mess. So much that can be cleaned up. Also applicable to most of my optimizations above, so ill implement everything here first.

from typing import Literal, TypedDict

class GenerationResult(TypedDict):
    response:str
    reasoning:str|None
    finish_reason: Literal["stop","length","content_filter","null"]
    input_tokens:int
    output_tokens:int

...

def generate(self) -> GenerationResult:

After some thought, due to my returns being.. variable, I’ve decided to annotate my return type. This will standardize the returns. DEMANDING COMPLIANCE instead of asking politely. This will be the only return type, so all of my except Exception as e: lines and other exceptions will be replaced/removed later. This improvement though, clarifies my function’s signature immensely.

def generate(self) -> GenerationResult:
    """Runs inference on self.history"""
    try:
        if 'localhost' in self.host:
...

This function is simple. No real arguments (except the needed self). Therefore a single doc-string will suffice, otherwise, if it had more arguments, i would definitely use a multi-line to briefly explain the args.

Now with that out of the way, onto fixing my hard-coded values!

Configuration

This took some time. I have never made a real config file. And i assume the one i would’ve made (txt file), wouldn’t be very proper 🙊.

## Create/Load Config File
if os.path.isfile('config.toml'):
    with open('config.toml', 'r') as doc:
        db = tomlkit.parse(doc.read())
else:
    with open('config.toml', 'w') as doc:
        toml_defaults = """[settings]
        local_port = '8090'
        model_dir = '/home/reggie/AI/Models/'
        preset_dir = '/home/reggie/AI/Models/config.ini'
        server_exec = '/home/reggie/llama-cpp/llama.cpp/build/bin/llama-server'
        server_args = '-t -1 -b 512 --mlock --verbose'

        [[models]]
        name = 'GLM-4.7-Flash-IQ3_XXS'
        host = 'http://localhost'

        [[models]]
        name = 'GLM-4.7-Flash-Q4_K_M'
        host = 'http://localhost'

        [[models]]
        name = 'GLM-4.7-Flash-Q6_K_L'
        host = 'http://localhost'

        [[models]]
        name = 'GLM-4.7-Flash-Q8_0'
        host = 'http://localhost'

        [[models]]
        name = 'Nemotron-3-Nano-30B-A3B-IQ4_XS'
        host = 'http://localhost'

        [[models]]
        name = 'Qwen3-14B-Q6_K'
        host = 'http://localhost'

        [[models]]
        name = 'Qwen3-30B-A3B-Thinking-2507-IQ3_XXS'
        host = 'http://localhost'

        [[models]]
        name = 'Qwen3-30B-A3B-Thinking-2507-IQ4_XS'
        host = 'http://localhost'

        [[models]]
        name = 'Qwen3-30B-A3B-Thinking-2507-Q6_K_L'
        host = 'http://localhost'

        [[models]]
        name = 'anthropic/claude-opus-4.5'
        host = 'https://openrouter.ai/api/'

        [[models]]
        name = 'moonshotai/kimi-k2.5'
        host = 'https://openrouter.ai/api/'
        """
        doc.write(toml_defaults)
        db = tomlkit.parse(toml_defaults)

db["settings"]["local_port"] = '8090'
db["settings"]["model_dir"] = '/home/reggie/AI/Models/'
db["settings"]["preset_dir"] = '/home/reggie/AI/Models/config.ini'
db["settings"]["server_exec"] = '/home/reggie/llama-cpp/llama.cpp/build/bin/llama-server'
db["settings"]["server_args"] = '-t -1 -b 512 --mlock --verbose'
server_cmd = (f'{db["settings"]["server_exec"]} '
              f'--models-dir {db["settings"]["model_dir"]} '
              f'--models-preset {db["settings"]["preset_dir"]} '
              f'--host 0.0.0.0 --port {db["settings"]["local_port"]} '
              f'{db["settings"]["server_args"]}')

After a bit of research, i found that there are two commonly agreed upon formats for config files. TOML and YAML. From what i read, YAML was used widely, but is now increasingly being replaced with TOML. The main reasons are because of its simplicity, parsing predictability, and general security improvements over YAML. And if its the preferred way, and a lot of people use it, I should get on-board now.
So what this does is either loads the TOML config, or generates it if it does not exist. this is a great start, but i can see my self abstracting this further into another python file, import it, and just have all of this logic out of the way. But for right now, I want to keep everything in this one file.

You know what…

## Create/Load Config File
if os.path.isfile('config.toml'):
    with open('config.toml', 'r') as doc:
        db = tomlkit.parse(doc.read())
else:
    with open('config.toml', 'w') as doc:
        toml_defaults = """[settings]
        local_port = '8090'
        model_dir = '/home/reggie/AI/Models/'
        preset_dir = '/home/reggie/AI/Models/config.ini'
        server_exec = '/home/reggie/llama-cpp/llama.cpp/build/bin/llama-server'
        server_args = '-t -1 -b 512 --mlock --verbose'

        [[models]]
        name = 'anthropic/claude-opus-4.5'
        host = 'https://openrouter.ai/api/'

        [[models]]
        name = 'moonshotai/kimi-k2.5'
        host = 'https://openrouter.ai/api/'
        """
        llm_model_path = '/home/reggie/AI/Models/'
        for llm_gguf in os.listdir(llm_model_path):
            if '.gguf' in llm_gguf:
                llm_gguf_parsed = llm_gguf.split('.gg')[0]
                toml_defaults += f"""
                [[models]]
                name = {llm_gguf_parsed}
                host = http://localhost
                """

        doc.write(toml_defaults)
        db = tomlkit.parse(toml_defaults)

That’s much better. Now when it generates defaults, it will only configure the models that are in the model directory.

Now, error handling.

Error Handling

If you look back at my function, you might notice some patterns with my exception handling. I thought i was SOO clever for returning a ‘standardized’ error message upon ANY exception. Thinking that I’m providing generous context. You may be have winced when you saw it. I know I am now. SO LISTEN. “Fail fast with exceptions”. That is something i kept seeing over and over again when looking up error handling. I’m a stranger to this, because if my code fails, ill just fix the bug and rerun. Why handle the error?
A few things i found out about exceptions. First, you want to propagate them up only as far as you need to be in the context to properly act on it and recover. Second, if there’s multiple failure modes, starting with the most specific exceptions first, then making exception blocks for the more generalized ones that can occur is good practice. Third, if you cant fix the error, there’s no need to catch it. I have to analyze this function to determine where it’s common failure points are, and so I was able to find the error states of different modules with:

import requests
for meth in dir(requests):
    if 'Error' in meth:
        print(meth)

ConnectionError → If network connection to the server is lost
HTTPError → When receiving a 4xx or 5xx status code
JSONDecodeError → Malformed JSON data received

Realistically, the only variable part in this function is what is put into, and what’s returned from the requests, everything else is locked solid. apparently requests has a .raise_for_status() method, which is a lot cleaner. It raises for any 4xx or 5xx response. I am also removing this giant baby try:except around the whole function. wow. was i that scared it could fail?

class MalformedResponseError(Exception):
    def __init__(self, message, response_raw):
        super().__init__(message)
        self.response_raw = response_raw

...

def generate(self) -> GenerationResult:
    if 'localhost' in self.host:
        self._update_status()
        if self.json_payload['model'] not in self.model_status['loaded']:
            self.load(self.json_payload['model'])
    self._rebuild_history()
    self.json_payload['messages'] = self.full_history
    response = requests.post(
                url = f'{self.host}v1/chat/completions',
                headers = self.headers,
                json = self.json_payload
            )
    response.raise_for_status()
    response = response.json()
    try:
        choices = response['choices'][0]
        message = choices['message']
        usage = response['usage']
        if 'localhost' in self.host:
            return GenerationResult(
                    response=message['content'],
                    reasoning=message['reasoning_content'],
                    finish_reason=choices['finish_reason'],
                    input_tokens=usage['prompt_tokens'],
                    output_tokens=usage['completion_tokens'],
                    )
        return GenerationResult(
                response=message['content'],
                reasoning=message['reasoning'],
                finish_reason=choices['finish_reason'],
                input_tokens=usage['prompt_tokens'],
                output_tokens=usage['completion_tokens'],
                )
    except (KeyError, IndexError) as ex:
        raise MalformedResponseError(f'Missing expected keys.\n{json.dumps(response, indent=4)}', response) from ex

This one was a doozy. So many different ideas for how you could handle exceptions. I have learned a bit though. When the program can run on defaults, or recover and run on a safer value, substitution can suffice. Recovery, to be clear. Substituting ‘erroneous’ values or substituting something close is usually just masking an error state. So, looking at my function and the analysis i made before, i did catch most of the places where it could fail, but realistically, any failure in the response cannot be recovered through an overly-complex exploratory algorithm though the JSON response. The only thing to do is handle the KeyErrors and IndexErrors potentially introduced from generating the return values. This can happen if any models have a slightly different output schema, or if OpenRouter changes their output JSON (like they apparently did before). For this, I decided to make a custom exception. It inherits everything from the Exception class, with the addition of a response_raw variable, which allows us to pass the raw response JSON to the caller if they want to handle it, but we also print the JSON in a human-readable format upon exception in the final except block, for easier debugging.

One thing I found out from this, is that trying to learn through the official docs is a death sentence. Just find a couple books that explain everything in a digestible manner. Lesson earned.

This next one is relatively small, hopefully not too serious, end it on something simple.

Security

This is a very broad topic. so I’m not going to research it directly, but I do know one thing right off the bat, i have a hard-coded api key sitting RIGHT IN HERE. I know it’s never considered okay, because you can accidentally ship the api key in your public code base, but at my current level, I don’t ship 🤷‍♂️. I do however understand that i should get in the habit of this. It was as easy as adding the declaration of an environment variable in my .bashrc, and replacing the api key with os.getenv("OPENROUTER_API_KEY") . Perfect.
Now, my LLM panel cried and begged me to stop using shell=True for my subprocess calls.. and upon looking it up, very true. like i said before. I don’t ship, but I want to develop good habits for when I am in the position where security matters. The only subprocess calls i have are:

subprocess.run('kill -9 $(pgrep llama-server)',shell=True)
subprocess.Popen(server_cmd,shell=True)

To tighten security, set shell=False. Now in order to execute, i need to pass a list containing the command and arguments. But what do i do with kill -9 $(pgrep llama-server)? Luckily i don’t need to worry about it, apparently, pkill can take the -9 argument (which is why i was using kill). sop that makes the line:

subprocess.run([‘pkill‘, ’-9’, ’llama-server’], shell=False)

and my startup command ends up as a long list of arguments:

server_cmd = [
    '/home/reggie/llama-cpp/llama.cpp/build/bin/llama-server',
    '--models-dir', '/home/reggie/AI/Models/',
    '--models-preset', '/home/reggie/AI/Models/config.ini',
    '--host', '0.0.0.0',
    '--port', '8090', 
    '-t', '-1', 
    '--mlock'
    ]
subprocess.Popen(server_cmd, shell=False, stdout=subprocess.DEVNULL)

Also reintroduced the silencing of the output, but leaving errors to show in the console.

FIN

This is just a small example of me trying to just code better in general, and a lot was learned here. Error handling has always been a weak spot, and i feel stronger having a more stable foundation on it,

highly recommended 10/10, would get roasted again

LLMs Roasted My Code. Here's What I Fixed.

The improvements suggested

Priorities

Typing & Documentation

Configuration

Error Handling

Security

FIN

Comments

Command Palette

The improvements suggested

Priorities

Typing & Documentation

Configuration

Error Handling

Security

FIN

Comments