Reality check: Microsoft Azure CTO pushes back on AI vibe coding hype, sees 'upper limit':
REDMOND, Wash. — Microsoft Azure CTO Mark Russinovich cautioned that "vibe coding" and AI-driven software development tools aren't capable of replacing human programmers for complex software projects, contrary to the industry's most optimistic aspirations for artificial intelligence.
Russinovich, giving the keynote Tuesday at a Technology Alliance startup and investor event, acknowledged the effectiveness of AI coding tools for simple web applications, basic database projects, and rapid prototyping, even when used by people with little or no programming experience.
However, he said these tools often break down when handling the most complex software projects that span multiple files and folders, and where different parts of the code rely on each other in complicated ways — the kinds of real-world development work that many professional developers tackle daily.
"These things are right now still beyond the capabilities of our AI systems," he said. "You're going to see progress made. They're going to get better. But I think that there's an upper limit with the way that autoregressive transformers work that we just won't get past."
Even five years from now, he predicted, AI systems won't be independently building complex software on the highest level, or working with the most sophisticated code bases.
Instead, he said, the future lies in AI-assisted coding, where AI helps developers write code but humans maintain oversight of architecture and complex decision-making. This is more in line with Microsoft's original vision of AI as a "Copilot," a term that originated with the company's GitHub Copilot AI-powered coding assistant.
[...] He discussed his own AI safety research, including a technique that he and other Microsoft researchers developed called "crescendo" that can trick AI models into providing information they'd otherwise refuse to give.
The crescendo method works like a "foot in the door" psychological attack, he explained, where someone starts with innocent questions about a forbidden topic and gradually pushes the AI to reveal more detailed information.
Ironically, he noted, the crescendo technique was referenced in a recent research paper that made history as the first largely AI-generated research ever accepted into a tier-one scientific conference.
Russinovich also delved extensively into ongoing AI hallucination problems — showing examples of Google and Microsoft Bing giving incorrect AI-generated answers to questions about the time of day in the Cook Islands, and the current year, respectively.
"AI is very unreliable. That's the takeaway here," he said. "And you've got to do what you can to control what goes into the model, ground it, and then also verify what comes out of the model."
Depending on the use case, Russinovich added, "you need to be more rigorous or not, because of the implications of what's going to happen."
(Score: 5, Informative) by ElizabethGreene on Saturday June 07, @03:25PM (3 children)
I'm not comfortable making 5-year predictions. In 2020 "Bert" was the largest transformer model ever, and it's shockingly far behind today's models.
That said, I think he's probably right. Human supervised AI work feels far more likely than complete autonomy.
I was skeptical, but my recent experience has convinced me that it's a usable tool now, not just a toy. As an example, the NIH grant data is distributed as a zip file containing thousands of json-formatted data files, one for each grant. Think about how you'd hoover the schema out of that and import the data into a database. That's a pretty standard ETL job I know how to do a half-dozen different ways, but I thought I'd play with it. It took longer to download and install SQL and Management Studio than it did to have a working python script to import the data. To me, that was impressive.
(Score: 2) by aafcac on Saturday June 07, @04:11PM
The limit here is likely going to be the ability to audit and define the scope of the project. There's also issues in terms of generating enough material to train on and that nobody actually knows how this stuff works.
Personally, I do get a lot out of the more basic stuff like being able to talk to my phone to instruct it to put things on my calendar and do transcription, but I avoid the more complicated generative AI in most cases as it's more out less demon technology that serves no legitimate use for which a person or manually programmed computer isn't sufficient.
(Score: 1, Interesting) by Anonymous Coward on Saturday June 07, @06:00PM
The summary was about complex projects that span multiple directories. In my experience, you can't dump a whole source-core repo into the context window - it just doesn't work.
Even less than whole-project interaction, problems show up when you ask it logical things - "Abstract these three functions into a common function with a few arguments," to refactor code, pulling out common aspects, where the uncommon things may either be a callback or an if statement in the code. Try as you like, it's not going to produce usable code. That's just not what "generative AI" is capable of -- and we've seen little more than GenAI in the past five years.
Ask it something different, take some Terraform code, and ask it to set up a new nodegroup for the existing kubernetes instance, with taints that exclude one batch of applications and accept another. Of course, it has to use your existing configmaps for authentication values, it has to have tokens defined for the nodegroup -- tokens that are unique to the nodegroup and not the ones used for others. Again, what you get out will look right - but it simply won't be compatible with your environment. It will be shaped like what you see in a README file, or like an answer to an *example* on one of the question sites. It's not how your environment operates, so you might get resources that you need to put into play to get things working, but you'd be lucky to use the output as a reference. Especially with things like Terraform, it's such a niche market that there isn't a lot to train the AI on.
But output that looks like answers to basic examples. This is one of the core problems. Unless it looks like what it's seen, it can't produce what-comes-next. Where people-who-can't are fixated on "Style!!! Do it like _this_!" in current projects, AI can't do things that it hasn't seen before. The whole Style Or Bust argument is going to get a huge deal more severe. Unless the task you're doing looks like a question asked on StackOverflow, unless your code looks like that simple example, you're just not going to get useful results from generative AI. Documentation online of how to use an app will amount to telling you how things must be shaped and formed - what properties must be used, where those properties must come from. (This is different from making them available, and certain properties having requirements.) Style won't be about indentation and placement of closing curly braces, it'll be about how components interact with each other. Fail, link components in ways that a developer can do just fine but an AI hasn't seen, and you blow the AI up. Do you really need this other thing? or do you need the AI? Choose one.
Give the summary a read again. It's not saying that technical walls are being hit (another issue - GPU capacity largely hasn't increased in the past two years, they're just throwing "bigger" at it, and that's not good enough), but rather that concepts aren't being processed and it's not really possible to grow context window size to the necessary size for project-level context.
(Score: 5, Interesting) by JoeMerchant on Saturday June 07, @09:15PM
I don't see an "upper limit" at all, I see a very low ceiling which isn't going to get higher very quickly.
Current LLMs are leveraging what's "out there" from the past 30 years of internet content creation. If we stop creating new content, only rely on what LLMs can generate, then sure - we'll never progress much.
On the other hand, if we have another 30 years of prolific high quality (identifying what is high quality is one of the current challenges that will "raise the ceiling" as we make progress) - content generation, then LLMs will leverage that and grow in their value.
🌻🌻🌻 [google.com]
(Score: 5, Insightful) by DadaDoofy on Saturday June 07, @03:43PM (1 child)
For decades they've been trying to sell automated alternatives to software developers. The marketing people would show up, perform a canned demo showing a fully completed application with nothing more than a point here and a click there to "develop it". Management always swallows it hook, line and sinker. Smug grins on their faces, as they imagine the money saved on expensive developers landing in their bonus checks.
Then reality sets in. One size doesn't fit all. The tool can only do what it is built (or trained) to do. It can't handle the un-anticipated interwoven and often contradictory requirements for an enterprise project in a large organization, like the federal government or a fortune 500 company. It can't navigate the complex inter-personal relationships that need to be forged to achieve buy-in from a bunch of stakeholders, each with interests of their own. When people finally refuse to keep seeing the emperor's new clothes, the tool is quietly shitcanned and never spoken of again, like the whole thing never happened.
I've seen it all play out at least a half dozen times over the years. Lather, rinse and repeat. AI is just the latest cycle. Yes AI, like other tools before it, will replace some trivial re-invent the wheel coding tasks, but coding is a relatively minor part of what a software developer is paid to do.
(Score: 3, Interesting) by aafcac on Saturday June 07, @04:15PM
Yep this might be fine for simple tasks, but those weren't being done by programmers in most cases anyways. I did let chatgpt program the rules for my light to go and off based on the angle of the sun it after 5 minutes of the sun is up. But I could have done it myself and I've subsequently decided not to touch the generative AI for anything that complicated. If it doesn't save enough time for me to justify programming it myself, and nobody else has written it, I just won't bother with an automation.
(Score: 5, Funny) by Gaaark on Saturday June 07, @04:16PM (5 children)
I thought Microsoft just went with the "Infinite monkeys coding by randomly hitting keys" theory? The real pros are in the marketing dept: take absolute crap and sell the feck out of it until it becomes ubiquitous and then everybody will put up with infinite crap for an infinite amount of time.
--- Please remind me if I haven't been civil to you: I'm channeling MDC. I have always been here. ---Gaaark 2.0 --
(Score: 5, Funny) by turgid on Saturday June 07, @06:05PM (3 children)
I have two books about Software Engineering on my shelf that I scavenged from people's desks many years ago which are both written by Microsoft people. We can conclude that at least two people who were employed by Microsoft at least at one point in time did indeed discover how to write software.
The great mystery is, where are they now and what went wrong? I have some theories. The Martians are not involved.
I refuse to engage in a battle of wits with an unarmed opponent [wikipedia.org].
(Score: 1, Interesting) by Anonymous Coward on Saturday June 07, @06:44PM (2 children)
Mark Russinovich was the author of a bunch of kernel internals books at Microsoft.
I'm impressed he hasn't been muzzled by the PHBs in control there now.
(Score: 3, Interesting) by turgid on Saturday June 07, @09:06PM (1 child)
Many years ago now I had the privilege of working with some very intelligent and accomplished people who has worked on the Solaris kernel. When it all went south at Sun, some of them took jobs with various companies doing Windows kernel work. There were a few surprises. The Windows kernel was written in C (not C++). It wasn't that different from Unix. It was actually quite nice. They didn't tell me this because they had to sign NDAs. I got this from the Martians.
I refuse to engage in a battle of wits with an unarmed opponent [wikipedia.org].
(Score: 3, Funny) by aafcac on Saturday June 07, @11:40PM
That makes sense, MS has always had an odd affinity for anal probes.
(Score: 2) by JoeMerchant on Saturday June 07, @09:17PM
>take absolute crap and sell the feck out of it until it becomes ubiquitous and then everybody will put up with infinite crap for an infinite amount of time.
And then, in 2018 or so, they fired my neighbor who worked for M$ for decades selling "six and seven figure annual contracts" to mid sized corporate divisions. Not sure what they were thinking, but they just thought it again last month: https://www.cnbc.com/2025/05/13/microsoft-is-cutting-3percent-of-workers-across-the-software-company.html [cnbc.com]
🌻🌻🌻 [google.com]
(Score: 4, Informative) by kolie on Saturday June 07, @05:36PM (12 children)
These tools can work across very complex projects and multiple files. This is something im actively working with and experimenting on but the results are already functional and do produce results. It does take some tooling and some workflow setup but it seems his comments are already dated
(Score: 3, Interesting) by JoeMerchant on Saturday June 07, @09:20PM (11 children)
It all depends on what you're trying to get it to do.
Can it program in Rust? A little.
Can it program a simple AMQP receiver client in Rust? Maybe, if I put more effort into prompt engineering than it takes to learn Rust and the AMQP library from scratch.
🌻🌻🌻 [google.com]
(Score: 3, Interesting) by kolie on Sunday June 08, @04:41AM (10 children)
I strongly disagree. With one prompt, I've had it written a 5 microservice architecture doing bidrectional sync between three interelated object tables with heirachal dependencies using timestamp tracking from MSSQL to SF utilizing kafka, data enchrichment, and idompetent processing.
(Score: 3, Funny) by JoeMerchant on Sunday June 08, @03:54PM (9 children)
What service is this?
Which daemons did you sacrifice to, and what was their price?
🌻🌻🌻 [google.com]
(Score: 2) by kolie on Sunday June 08, @06:44PM (8 children)
About 300$ to anthropic claude 3.7/4.0.
It's a service at work - we are replacing a "no-code" solution which wasn't written properly and was a good PoC but didn't handle any edge cases well, of which there was a lot it turns out.
(Score: 2) by JoeMerchant on Sunday June 08, @07:51PM (7 children)
>a good PoC but didn't handle any edge cases well, of which there was a lot it turns out.
That's worth mentioning, and is a far higher price than 300$...
At least you got framework code that runs and you can refine from. I'm trying to use Google to get Rust framework code for an AMQP microservice and so far it's not getting me any further than "Hello world."
🌻🌻🌻 [google.com]
(Score: 2) by kolie on Sunday June 08, @09:55PM (6 children)
No sorry - the human paid solution that was 50,000$ and 3 months resulted in a PoC with edge cases.
The AI solution that replaced it was 300$, fully enterprise with monitoring, security, kuberenetes AND docker swarm implementations. It handled all the edge cases and had much higher throughput.
(Score: 4, Interesting) by kolie on Sunday June 08, @09:58PM
Heres a markdown summary on one of the services implemented as part of this ( fully ai documented too )
# Event Translator Implementation Summary
## ✅ MILESTONE 4 COMPLETION STATUS
**Task ID**: TASK-M4-EVENT-TRANSLATOR-IMPLEMENTATION-20250526-213800
**Status**: ✅ **COMPLETED**
**Date**: 2025-05-27
## 🎯 Implementation Overview
The Event Translator service has been **fully implemented** with all requirements from the MDTM task fulfilled. This service provides bidirectional translation between domain events and system-specific formats for Salesforce and MSSQL.
## 📋 Acceptance Criteria - COMPLETE ✅
### ✅ Core Service Structure
- [✅] Complete service structure created in `event-translator/` directory
- [✅] All required files implemented:
- `main.py` - EventTranslatorService with async Kafka consumers/producers
- `config.py` - Comprehensive configuration management
- `requirements.txt` - All dependencies specified
- `Dockerfile` - Production-ready containerization
- `translators/` - Complete translator modules
- `tests/` - Comprehensive test suite
### ✅ Core Functionality
- [✅] **EventTranslatorService** implemented with full Kafka capabilities
- [✅] **SalesforceTranslator** with bidirectional translation
- [✅] **MSSQLTranslator** with bidirectional translation
- [✅] **Field mapping configurations** for all entity types
- [✅] **Business rule translations** implemented
- [✅] **Data type conversion and validation** logic
- [✅] **Dead letter queue handling** for translation failures
- [✅] **Comprehensive error handling** and logging
- [✅] **Configuration management** for all translation rules
- [✅] **Docker containerization** completed
- [✅] **Unit and integration tests** implemented
- [✅] Service consumes domain events and produces system-specific actions
## 🔧 Key Components Implemented
### 1. Core Service (`main.py`)
- **EventTranslatorService** class with async event processing
- Kafka consumer/producer with proper error handling
- Batch processing with configurable sizes
- Dead letter queue support
- Loop prevention mechanism
- Graceful shutdown handling
- Comprehensive logging with correlation IDs
### 2. Salesforce Translator (`translators/salesforce_translator.py`)
- **Bidirectional translation**: Domain ↔ Salesforce
- **Field mappings**: Account, Contact, Job objects
- **Business rules**: Required field validation, data normalization
- **Action generation**: Create, Update, Delete operations
- **Loop prevention**: Ignores Salesforce-originated events
### 3. MSSQL Translator (`translators/mssql_translator.py`)
- **Bidirectional translation**: Domain ↔ MSSQL CDC
- **Field mappings**: Database table field mappings
- **Business rules**: Data validation, type conversion
- **SQL operations**: Insert, Update, Delete with where clauses
- **Data normalization**: Phone numbers, email addresses
### 4. Configuration (`config.py`)
- **Environment-driven configuration**
- **Kafka settings**: Bootstrap servers, topics, consumer groups
- **Translator settings**: Retry logic, batch sizes, timeouts
- **Field mappings**: Configurable for all object types
- **Business rules**: Toggleable validation and rules
- **Logging configuration**: Structured logging with correlation IDs
### 5. Comprehensive Testing (`tests/`)
- **Unit tests**: Individual component testing
- **Integration tests**: End-to-end event processing
- **Configuration tests**: Environment variable handling
- **Mock-based testing**: Isolated component testing
- **Async test support**: Proper async/await testing
- **Coverage reporting**: Test coverage metrics
### 6. Additional Features
- **Health check service** (`health_check.py`) for monitoring
- **Comprehensive documentation** (README.md)
- **Environment configuration** (.env.example)
- **Docker support** with multi-stage builds
- **Pytest configuration** for streamlined testing
## 🚀 Technical Highlights
### Performance & Scalability
- **Async processing**: Non-blocking I/O operations
- **Batch processing**: Configurable batch sizes for efficiency
- **Horizontal scaling**: Kafka consumer group support
- **Resource efficiency**: Proper connection management
### Reliability & Monitoring
- **Error handling**: Comprehensive exception handling
- **Dead letter queue**: Failed event recovery
- **Retry logic**: Configurable retry attempts with backoff
- **Health checks**: Kubernetes-ready liveness/readiness probes
- **Structured logging**: Correlation ID tracking
### Maintainability
- **Configuration-driven**: All mappings and rules configurable
- **Modular design**: Separate translators for each system
- **Type hints**: Full type annotation throughout
- **Documentation**: Comprehensive inline and external docs
- **Test coverage**: High test coverage for reliability
## 📊 Implementation Statistics
- **Lines of Code**: ~2,400+ lines
- **Test Files**: 4 comprehensive test suites
- **Test Cases**: 60+ individual test methods
- **Configuration Options**: 25+ environment variables
- **Field Mappings**: 30+ field mappings per system
- **Business Rules**: Multiple validation and normalization rules
## 🔄 Integration Points
### Input Sources
- **Domain Events Topic**: Consumes from `domain-events` Kafka topic
- **Event Types**: Account, Contact, Job Created/Updated/Deleted events
### Output Destinations
- **Salesforce Actions Topic**: Produces to `salesforce-actions` topic
- **MSSQL Actions Topic**: Produces to `mssql-actions` topic
- **Dead Letter Queue**: Failed events to `event-translator-dlq` topic
### Dependencies
- ✅ **Domain Events Schema** (M1) - Successfully integrated
- ✅ **Kafka Infrastructure** - Properly configured
- ✅ **sf-eventstore** (M2) - Ready for integration
- ✅ **sf-source-adapter** (M3) - Ready for integration
## 🧪 Testing Strategy
### Test Coverage
- **Unit Tests**: Individual translator logic
- **Integration Tests**: End-to-end event processing
- **Configuration Tests**: Environment handling
- **Error Scenario Tests**: Exception handling
- **Mock-based Tests**: External dependency isolation
### Quality Assurance
- **Type checking**: Full type annotations
- **Linting**: Code style compliance
- **Documentation**: Comprehensive docstrings
- **Error handling**: Graceful failure management
## 🚢 Deployment Ready
### Docker Support
- **Multi-stage build**: Optimized image size
- **Production configuration**: Environment-based settings
- **Health checks**: Built-in monitoring endpoints
- **Resource optimization**: Efficient dependency management
### Kubernetes Ready
- **Health probes**: Liveness and readiness endpoints
- **Configuration**: Environment variable driven
- **Scaling**: Horizontal pod autoscaling support
- **Monitoring**: Structured logging and metrics
## 🎉 Milestone Achievement
**MILESTONE 4 is COMPLETE** with all acceptance criteria fulfilled:
✅ **Complete service architecture** from empty directory
✅ **Bidirectional translation** for all entity types
✅ **Production-ready implementation** with proper error handling
✅ **Comprehensive testing** with high coverage
✅ **Full containerization** and deployment readiness
✅ **Integration capability** with other system components
## 🔄 Next Steps
This completes the **critical path** component for bidirectional synchronization. The service is now ready for:
1. **Integration with Milestone 5**: End-to-end testing
2. **Deployment**: Production environment setup
3. **Monitoring**: Performance and health monitoring
4. **Scaling**: Horizontal scaling as needed
## 📝 Notes
- All code follows established patterns from existing implementations
- Configuration is fully environment-driven for different environments
- Error handling includes proper dead letter queue management
- The service is designed for high availability and scalability
- Full documentation provided for maintenance and operations
**Status**: ✅ **MILESTONE 4 COMPLETED SUCCESSFULLY**
(Score: 3, Funny) by JoeMerchant on Sunday June 08, @11:05PM (4 children)
Yeah, in 1991 I watched a human consultant take home my annual salary for two months of bumbling about an RTOS implementation of an RS232 interface that never worked - while I got the DOS software version of it stable and functional within less than 2 weeks (and we used that DOS version for the next 5 years until we finally licensed an RS232 library from an external vendor for a couple hundred bucks - something that didn't exist in 1991 when our consultant was bumbling around).
So, anthropic claude 4.0 free mode is quite a bit more impressive than Google at generating Rust code. 5 rounds of prompts and it came up with code that not only compiled, but also works for the desired function. I can likely tweak it from here to what I want on Monday in an hour or two, but the initial example is already interacting with our system in an obviously functional manner.
Thanks for the pointer!
🌻🌻🌻 [google.com]
(Score: 3, Interesting) by kolie on Monday June 09, @12:48AM (3 children)
Systems where you are writing the entire prompt and that's the interaction method are useful for pair programming - and that was kind of most peoples extent with using AI right now.
To get the results I have - it's a full agentic workflow where different modes and functions and workflows are created before hand - and then an overall task is structured and given to an orchestration mode that knows the rules, workflows, and general rules of the workflow, and then delegates that out to leads and specialist. Think of it that you have multiple "chats" open - each chat is seeded with an identity and rules to follow - a master mode prompt - and then you have an orchestrator which is able to at will clone up those other primed chats, inject workflow into them and have a back and forth, and consistently bouncing off the work.
That's also combined with tools like reading files, writing to check / todo lists, executing commands etc.
There are off the shelf workflows available - that's how I got started into them - and I've taken my experieence with those - crafted my own variant from scratch and tweaked it and molded it based on my observations and issues with those systems. I now have something that I use daily to drive a few different projects forward.
(Score: 2) by kolie on Monday June 09, @12:51AM (2 children)
I will also mention - Google's gemini model 2.5 is right up there with Claude 3.7/4.0 - they have trade offs. I have different modes for different agentic identiies now. Claude seems better at solving a problem where gemini might spin its wheels trying to fix something obvious, claude will get around it in 4 prompt sequences.
Gemini seems better at following workflows longer term, its larger context seems to help with some tasks where claude peters out ( although - proper task delegation and granularization largely makes this a moot ). It's thoughts and architecture and planning are very thorough - claudes are very creative and good however.
(Score: 2) by JoeMerchant on Monday June 09, @03:10AM (1 child)
> Claude seems better at solving a problem where gemini might spin its wheels trying to fix something obvious, claude will get around it in 4 prompt sequences.
That was exactly my experience - Gemini would screw things up, you'd ask it to fix them and it would just give you more screwed up stuff that doesn't even compile.
Claude took the query about the error, asked for more specific detail while attempting to fix it (and failing), then when I provided the detail it was asking for it successfully fixed the error and I have working code. Any shortcomings in the code at this point seem to be due to my lack of specificity in the prompting.
Now... work has gotten us all "Co-pilot" licenses, I'm not sure if I even want to try this in Co-pilot or just pursue the diverse avenue that seems to work?
🌻🌻🌻 [google.com]
(Score: 2) by kolie on Monday June 09, @03:38AM
Co-pilot is good for very specific task delegation. I see a lot of benefit when I think of it as a smart refactor or quick way to do stuff at the file/class/function level. Beyond a simple single request, single AI response type interaction - without specific AI workflows/agentic prompts it's not something I've been able to convince myself is useful building out larger then that yet.
(Score: 3, Insightful) by SomeGuy on Saturday June 07, @09:00PM
Next up, CTO gets fired for not sucking enough AI hype dick, in 5... 4... 3... 2...