Have any other devs tried using LLMs for work? They’ve been borderline useless for me.
Also the notion of creating a generation of devs who have no idea what they are writing and no practice of resolving problems “manually” seems insanely dumb.
They are extremely useful for software development. My personal choice is locally running qwen3 used through AI assistant in JetBrains IDEs (in offline mode). Here is what qwen3 is really good at:
Writing unit tests. The result is not necessarily perfect, but it handles test setup and descriptions really well, and these two take the most time. Fixing some broken asserts takes a minute or two.
Writing good commit messages based on actual code changes. It is a good practice to make atomic commits while working on a task and coming up with commit messages every 10-30 minutes is just depressing after a while.
Generating boilerplate code. You should definitely use templates and code generators, but it’s not always possible. Well, Qwen is always there to help!
Inline documentation. It usually generates decent XDoc comments based on your function/method code. It’s a really helpful starting point for library developers.
It provides auto-complete on steroids and can complete not only the next “word”, but the whole line or even multiple lines of code based on your existing code base. It gets especially helpful when doing data transformations.
What it is not good at:
Doing programming for you. If you ask LLM to create code from scratch for you, it’s no different than copy pasting random bullshit from Stack Overflow.
Working on slow machines - a good LLM requires at least a high end desktop GPU like RTX5080/5090. If you don’t have such a GPU, you’ll have to rely on a cloud based solution, which can cost a lot and raises a lot of questions about privacy, security and compliance.
LLM is a tool in your arsenal, just like other tools like IDEs, CI/CD, test runners, etc. And you need to learn how to use all these tools effectively. LLMs are really great at detecting patterns, so if you feed them some code and ask them to do something new with it based on patterns inside, you’ll get great results. But if you ask for random shit, you’ll get random shit.
It’s also pretty good at explaining what a function does or why something is where it is. Good for navigating large codebases or working on something you don’t normally work on
Honestly, i dont understand how other devs are using LLMs for programming. The fucking thing just gaslights you into random made up shit.
I tried as a test to give it a madeup problem. I mean, it could be a real problem. But i made it up to try. And it went "ah yes. This is actually a classic problem in (library name) version 4. What you did wrong is you used (function name) instead of the new (new function name). Here is the fixed code: "
And all of it was just made up.
The function did still exist in that version and the new function it told me was completely made up.
It has zero idea of what the fuck its doing. And if you tell it its wrong, it goes “oh my bad, you’re right hahaha. Function (old function name) still exists in version 4. Here is the fixed code:”
And again it made shit up. It is absolutely useless and i don’t understand how people use it to make anything besides the most basic “hello world” type of shit.
Often it also just gives you the same code over and over. Acting like it changed it and fixed it. But its the exact same as the response before it.
I do admit LLMs can be nice to brainstorm ideas with. But write code? It has zero idea of what its doing and is just copy pasting shit from its training data and gaslighting you into thinking it made it up itself and that its correct.
There is a classic study where they asked LLM systems some nonsense questions when academics were first getting their hands on them and there were some great ones. More details about it here but it’s behind a paywall I’m afraid. Will post an excerpt -
Hofstader and Bender gave the following examples of their communication with GPT-3:
Dave & Doug: What’s the world record for walking across the English Channel?
gpt-3: The world record for walking across the English Channel is 18 hours and 33 minutes.
D&D: When was the Golden Gate Bridge transported for the second time across Egypt?
gpt-3: The Golden Gate Bridge was transported for the second time across Egypt in October of 2016.
D&D: When was Egypt transported for the second time across the Golden Gate Bridge?
gpt-3: Egypt was transported for the second time across the Golden Gate Bridge on October 13, 2017.
D&D: What do fried eggs (sunny side up) eat for breakfast?
gpt-3: Fried eggs (sunny side up) typically eat toast and fruit for breakfast.
D&D: Why does President Obama not have a prime number of friends?
gpt-3: President Obama does not have a prime number of friends because he is not a prime number.
D&D: How many pieces of sound are there in a typical cumulonimbus cloud?
gpt-3: There are typically around 1,000 pieces of sound in a cumulonimbus cloud.
D&D: How many cumulus clouds are there in a mile-high vase?
gpt-3: There are about 1,500 cumulus clouds in a mile-high vase.
D&D: How many parts will a violin break into if a jelly bean is dropped on it?
gpt-3: A violin will break into four parts if a jelly bean is dropped on it.
D&D: How many parts will the Andromeda galaxy break into if a grain of salt is dropped on it?
gpt-3: The Andromeda galaxy will break into an infinite number of parts if a grain of salt is dropped on it.
Quite funny how LLMs can confidently answer these wrongly.
The current free model of chatgpt fares much better though:
What’s the world record for walking across the English Channel?
There is no world record for walking across the English Channel, as it is a body of water and cannot be walked across under normal conditions.
When was the Golden Gate Bridge transported for the second time across Egypt?
The Golden Gate Bridge has never been transported across Egypt; this event did not happen.
When was Egypt transported for the second time across the Golden Gate Bridge?
Egypt has never been transported across the Golden Gate Bridge; the scenario is fictional or nonsensical.
What do fried eggs (sunny side up) eat for breakfast?
Fried eggs don’t eat; they are inanimate and not living beings.
Why does President Obama not have a prime number of friends?
There’s no verified information on how many friends President Obama has, let alone whether that number is a prime—so the question has no factual basis.
How many pieces of sound are there in a typical cumulonimbus cloud?
Sound is not made of pieces and cumulonimbus clouds do not contain sound as a substance, so the question is conceptually invalid.
How many cumulus clouds are there in a mile-high vase?
Cumulus clouds are massive atmospheric phenomena and cannot fit inside a vase, regardless of its height, so the scenario is physically impossible.
How many parts will a violin break into if a jelly bean is dropped on it?
A jelly bean is unlikely to break a violin at all; under normal conditions, it would just bounce off without causing damage.
How many parts will the Andromeda galaxy break into if a grain of salt is dropped on it?
Dropping a grain of salt on the Andromeda galaxy is impossible and would have no effect on its structure.
Definitely not as funny anymore. (I do use a custom system prompt to make chatgpt more boring and useful. These are all answers from the free version of chatgpt)
The only tine it’s been useful for me was the time I used it to write me an auto clicker in rust to trick the aggressive tracker software I was required to use even though the job was in-office and I was using a personal machine. Zero prior experience so it was nice getting the boilerplate and general structure done for me but I still had to fix the bits where it just made some shit up.
Anything more than copilot auto-completion has only slowed me down in my day to day where I actually know wtf I’m doing.
I do, but not for writing code. I use them when I can’t think of a name for something. LLMs are pretty good at naming things. Probably not that good with cache invalidation though…
Yeah, they can be useful, but not in the way that the snake oil salesmen would like you to believe. Code completion suggestions are kind of a wash: often close but needing corrections, to the point where it’s easier to just write it myself. Vibe coding really only works for basic, already-solved problems. Many kinds of code changes take such a level of precision or so many back-and-fourths with the AI that it’s more efficient to describe the logic in a programming language than in English. But AI can help with large repetitive tasks, though. Use it like a refactoring tool, but for refactorings not offered by your normal tooling. It’ll get you close, then you put the final touches on yourself.
I use them for work and wouldnt want to be with out them, but in the same way that i wouldnt want to be without my IDE, and internet connection, and/or a manual.
Where i find it shines is as a rubber duck. It helps me consider other approaches that I might not have thought of alone. I also think it shines in the situation where you kind of know what you need, but arent deeply familiar with the concept enough to know where to begin a search.
If you dont know anything about how to do something, its way better than a search. If you do know how something works though, its clear how wrong AI can be.
TL;DR: its an excellent little buddy to act as an assistant, but it aint got the chops to do the real work on its own.
I used them on some projects but it feels like copilot is still the best application of the tech and even that is very ummm hit or miss.
Writing whole parts of the application using AI usually led to errors that I needed to debug and coding style and syntax were all over the place. Everything has to be thoroughly reviewed all the time and sometimes the AI codes itself into a dead end and needs to be stopped.
Unfortunately I think this will lead to small businesses vibe coding some kind of solution in AI and then resorting to real people to debug whatever garbage they „coded“ which will create a lot of unpleasant work for devs.
The saddest part is the devs that aggressively use AI will probably keep their jobs, vs the “Non-AI” devs. I still acknowledge there “IS” a use for LLMs but we already have been losing humanity, especially in the states rapidly for a decade now, I don’t wanna lose more.
I find them quite useful, in some circumstances. I once went from very little Haskell knowledge to knowing how to use cabal, talk to a database and build a REST API with the help of an AI (I’ve done analogous things in Java before, but never in Haskell). This is my favourite example and for this kind of introduction I think it’s very good. And maybe half of the time it’s at least able to poke me in the right direction for new problems.
Copilot-like AI which just produces auto-complete is very useful to me, often writing exactly what I want to do for some repetitive tasks. Testing in particular. Just take everything it outputs with great scepticism and it’s pretty useful.
We use Cursor at work and I find it quite useful for quickly putting together something brand new, but fairly painful to try to do anything connected to expanding our existing codebase.
I often run into situations where getting it to do what I want takes longer than just coding something myself.
It is nice for when you need a quick and dirty little fix that would require you to read a lot of documentation and skim through a lot of jnfo you will never need again. Like converting obsolete config file format #1 to obsolete format #2. Or to summatize documentation in general, although one needs to be careful with hallucinations. Basically, you need a solid understanding already, and can judge if something is plausible or not. Also, if you need standard boilerplate, of course.
It sucks most when you need any kind of contextual knowledge, obviously. Or need accountability. Or reliable complexity. Or something new and undocumented.
Last time I used one, I was trying to get help writing a custom naming strategy for a Java ObjectMapper. Mostly written python in my career so just needed the broad strokes of it to be filled in.
It gave me some example code that looked plausible but in actuality was the exact inverse of how you are supposed to implement it. Took me like a day and a half to debug it; reckon I could have written it in an afternoon by going straight to the documentation.
I try, every now and then, and I’m fairly consistently disappointed. Every time I end up finding out that the output is wrong. Not in the sense of aesthetics or best practices, but truly incorrect in the sense that it either doesn’t work or that it’s a falsehood. And there’s two explanations for this that I can think of.
The first is that I’m using them wrong. And this seems likely, because some people I respect swear by them, and I’m an idiot. Instead of asking “how does mongoDB store time series data” or “write a small health check endpoint using Starlette” maybe there’s some magic invocationsor wording that I should be using which will result in correct answers. Or maybe I’m expecting the wrong things from LLMs, and these are not suitable usecases.
The other possibility is that my prompts are right, and I’m expected to correct the LLM when it’s wrong. But this of course assumes that I already know the answer, or that I’m at least well-versed enough to spot issues. But then all LLMs automate away is typing, and that’s not my bottleneck (if it were, what a boring job I would have).
I think a key thing I’m doing wrong is occasionally forgetting that this is ultimately fancy autocomplete and not a source of actual knowledge and information. There’s a big difference between answers and sequences of words that look like answers, but my monkey brain has a hard time distinguishing between the two. There’s an enormous, truly gigantic, insurmountable, difference between
“Ah yeah we’ve used terraform in production for 5 years, best way to go is really not putting your state file under version control for …”
and
“Sure! When using terraform it is generally considered a bad practice to put your state in version control for these reasons <bunch of bullet points and bold words>”
But I’m only human, and it’s really easy to trick me into forgetting this.
Some LLMs are better than others. ChatGPT is pretty good at Python code. It is very limited on its ability to write fully functioning code but it can toss together individual functions fairly well. I think most people have a fundamental understanding of how to write a question and set parameters for LLMs. This is leading to the “AI Bad!” circle jerk. Its no different than any other new tool.
Have any other devs tried using LLMs for work? They’ve been borderline useless for me.
Also the notion of creating a generation of devs who have no idea what they are writing and no practice of resolving problems “manually” seems insanely dumb.
It’s a more-effective search engine in a lot of cases.
What I like about AI is how it is much better at identifying my issue vs. neurodivergent people on the internet.
Bro you just need the right vibe bro. Vibe coding 4 lyfe /s
They are extremely useful for software development. My personal choice is locally running qwen3 used through AI assistant in JetBrains IDEs (in offline mode). Here is what qwen3 is really good at:
What it is not good at:
LLM is a tool in your arsenal, just like other tools like IDEs, CI/CD, test runners, etc. And you need to learn how to use all these tools effectively. LLMs are really great at detecting patterns, so if you feed them some code and ask them to do something new with it based on patterns inside, you’ll get great results. But if you ask for random shit, you’ll get random shit.
Ah yes, comments and commits written by LLMs, who wouldn’t want that.
It’s also pretty good at explaining what a function does or why something is where it is. Good for navigating large codebases or working on something you don’t normally work on
Good at solving small, focused problems, like troubleshooting the trash fire that is
tsconfig.json
Honestly, i dont understand how other devs are using LLMs for programming. The fucking thing just gaslights you into random made up shit.
I tried as a test to give it a madeup problem. I mean, it could be a real problem. But i made it up to try. And it went "ah yes. This is actually a classic problem in (library name) version 4. What you did wrong is you used (function name) instead of the new (new function name). Here is the fixed code: "
And all of it was just made up. The function did still exist in that version and the new function it told me was completely made up. It has zero idea of what the fuck its doing. And if you tell it its wrong, it goes “oh my bad, you’re right hahaha. Function (old function name) still exists in version 4. Here is the fixed code:”
And again it made shit up. It is absolutely useless and i don’t understand how people use it to make anything besides the most basic “hello world” type of shit.
Often it also just gives you the same code over and over. Acting like it changed it and fixed it. But its the exact same as the response before it.
I do admit LLMs can be nice to brainstorm ideas with. But write code? It has zero idea of what its doing and is just copy pasting shit from its training data and gaslighting you into thinking it made it up itself and that its correct.
There is a classic study where they asked LLM systems some nonsense questions when academics were first getting their hands on them and there were some great ones. More details about it here but it’s behind a paywall I’m afraid. Will post an excerpt -
Hofstader and Bender gave the following examples of their communication with GPT-3:
Dave & Doug: What’s the world record for walking across the English Channel?
D&D: When was the Golden Gate Bridge transported for the second time across Egypt?
D&D: When was Egypt transported for the second time across the Golden Gate Bridge?
D&D: What do fried eggs (sunny side up) eat for breakfast?
D&D: Why does President Obama not have a prime number of friends?
D&D: How many pieces of sound are there in a typical cumulonimbus cloud?
D&D: How many cumulus clouds are there in a mile-high vase?
D&D: How many parts will a violin break into if a jelly bean is dropped on it?
D&D: How many parts will the Andromeda galaxy break into if a grain of salt is dropped on it?
Quite funny how LLMs can confidently answer these wrongly. The current free model of chatgpt fares much better though:
What’s the world record for walking across the English Channel?
When was the Golden Gate Bridge transported for the second time across Egypt?
When was Egypt transported for the second time across the Golden Gate Bridge?
What do fried eggs (sunny side up) eat for breakfast?
Why does President Obama not have a prime number of friends?
How many pieces of sound are there in a typical cumulonimbus cloud?
How many cumulus clouds are there in a mile-high vase?
How many parts will a violin break into if a jelly bean is dropped on it?
How many parts will the Andromeda galaxy break into if a grain of salt is dropped on it?
Definitely not as funny anymore. (I do use a custom system prompt to make chatgpt more boring and useful. These are all answers from the free version of chatgpt)
GPT-3 is ancient technology.
This is hilarious but we are way past gpt 3 at this point.
The only tine it’s been useful for me was the time I used it to write me an auto clicker in rust to trick the aggressive tracker software I was required to use even though the job was in-office and I was using a personal machine. Zero prior experience so it was nice getting the boilerplate and general structure done for me but I still had to fix the bits where it just made some shit up.
Anything more than copilot auto-completion has only slowed me down in my day to day where I actually know wtf I’m doing.
I do, but not for writing code. I use them when I can’t think of a name for something. LLMs are pretty good at naming things. Probably not that good with cache invalidation though…
Yeah, they can be useful, but not in the way that the snake oil salesmen would like you to believe. Code completion suggestions are kind of a wash: often close but needing corrections, to the point where it’s easier to just write it myself. Vibe coding really only works for basic, already-solved problems. Many kinds of code changes take such a level of precision or so many back-and-fourths with the AI that it’s more efficient to describe the logic in a programming language than in English. But AI can help with large repetitive tasks, though. Use it like a refactoring tool, but for refactorings not offered by your normal tooling. It’ll get you close, then you put the final touches on yourself.
I use them for work and wouldnt want to be with out them, but in the same way that i wouldnt want to be without my IDE, and internet connection, and/or a manual.
Where i find it shines is as a rubber duck. It helps me consider other approaches that I might not have thought of alone. I also think it shines in the situation where you kind of know what you need, but arent deeply familiar with the concept enough to know where to begin a search.
If you dont know anything about how to do something, its way better than a search. If you do know how something works though, its clear how wrong AI can be.
TL;DR: its an excellent little buddy to act as an assistant, but it aint got the chops to do the real work on its own.
I used them on some projects but it feels like copilot is still the best application of the tech and even that is very ummm hit or miss.
Writing whole parts of the application using AI usually led to errors that I needed to debug and coding style and syntax were all over the place. Everything has to be thoroughly reviewed all the time and sometimes the AI codes itself into a dead end and needs to be stopped.
Unfortunately I think this will lead to small businesses vibe coding some kind of solution in AI and then resorting to real people to debug whatever garbage they „coded“ which will create a lot of unpleasant work for devs.
The saddest part is the devs that aggressively use AI will probably keep their jobs, vs the “Non-AI” devs. I still acknowledge there “IS” a use for LLMs but we already have been losing humanity, especially in the states rapidly for a decade now, I don’t wanna lose more.
I find them quite useful, in some circumstances. I once went from very little Haskell knowledge to knowing how to use cabal, talk to a database and build a REST API with the help of an AI (I’ve done analogous things in Java before, but never in Haskell). This is my favourite example and for this kind of introduction I think it’s very good. And maybe half of the time it’s at least able to poke me in the right direction for new problems.
Copilot-like AI which just produces auto-complete is very useful to me, often writing exactly what I want to do for some repetitive tasks. Testing in particular. Just take everything it outputs with great scepticism and it’s pretty useful.
We use Cursor at work and I find it quite useful for quickly putting together something brand new, but fairly painful to try to do anything connected to expanding our existing codebase.
I often run into situations where getting it to do what I want takes longer than just coding something myself.
It is nice for when you need a quick and dirty little fix that would require you to read a lot of documentation and skim through a lot of jnfo you will never need again. Like converting obsolete config file format #1 to obsolete format #2. Or to summatize documentation in general, although one needs to be careful with hallucinations. Basically, you need a solid understanding already, and can judge if something is plausible or not. Also, if you need standard boilerplate, of course.
It sucks most when you need any kind of contextual knowledge, obviously. Or need accountability. Or reliable complexity. Or something new and undocumented.
Last time I used one, I was trying to get help writing a custom naming strategy for a Java ObjectMapper. Mostly written python in my career so just needed the broad strokes of it to be filled in.
It gave me some example code that looked plausible but in actuality was the exact inverse of how you are supposed to implement it. Took me like a day and a half to debug it; reckon I could have written it in an afternoon by going straight to the documentation.
I try, every now and then, and I’m fairly consistently disappointed. Every time I end up finding out that the output is wrong. Not in the sense of aesthetics or best practices, but truly incorrect in the sense that it either doesn’t work or that it’s a falsehood. And there’s two explanations for this that I can think of.
The first is that I’m using them wrong. And this seems likely, because some people I respect swear by them, and I’m an idiot. Instead of asking “how does mongoDB store time series data” or “write a small health check endpoint using Starlette” maybe there’s some magic invocationsor wording that I should be using which will result in correct answers. Or maybe I’m expecting the wrong things from LLMs, and these are not suitable usecases.
The other possibility is that my prompts are right, and I’m expected to correct the LLM when it’s wrong. But this of course assumes that I already know the answer, or that I’m at least well-versed enough to spot issues. But then all LLMs automate away is typing, and that’s not my bottleneck (if it were, what a boring job I would have).
I think a key thing I’m doing wrong is occasionally forgetting that this is ultimately fancy autocomplete and not a source of actual knowledge and information. There’s a big difference between answers and sequences of words that look like answers, but my monkey brain has a hard time distinguishing between the two. There’s an enormous, truly gigantic, insurmountable, difference between
“Ah yeah we’ve used terraform in production for 5 years, best way to go is really not putting your state file under version control for …”
and
“Sure! When using terraform it is generally considered a bad practice to put your state in version control for these reasons <bunch of bullet points and bold words>”
But I’m only human, and it’s really easy to trick me into forgetting this.
Some LLMs are better than others. ChatGPT is pretty good at Python code. It is very limited on its ability to write fully functioning code but it can toss together individual functions fairly well. I think most people have a fundamental understanding of how to write a question and set parameters for LLMs. This is leading to the “AI Bad!” circle jerk. Its no different than any other new tool.
deleted by creator