I used to trigger a perl script on Friday evening, which would run for about 60 hours. It controlled a test tool to make phone calls, and about once per minute changed configuration to change the nature of those phone calls, and accumulated a file reporting results for each configuration. That sounds like an automated test, but local custom classified it as manual because it was not run using the standard test framework, which among other things reported results into the test management system.
When testing a telephony server, the question isn’t “automated or not”, it is “what/how much are you automating”. You have to test with tools, and those usually are not actual phones. OK, if the product is some sort of Interactive Voice Response system (“If you know your party’s extension, please enter it now. For appointments, press 1 …”), you can do a lot of testing for that IVR UI with a phone. But if the test target is something like “does our product correctly identify a particular protocol header, and process valid data, and detect and handle invalid data without crashing the application or allowing a security violation”, you’re going to be writing test tool scripts, Depending on exactly what you’re doing and why, you might stop at writing and running those tools scripts, or you might make other scripts to figure out if the right thing happened after each test tool run, and another script to run all those scripts, And then you can then get into testing framework territory, and have one-touch execution to configure your SUT first, then run each tool script and its results checker in turn, and maybe write test case results to your test management system.
Automated results checking is generally much harder to get right than automated test execution. If for whatever reason you can’t write a good automated results checker, consider the consequences of providing a bad one. Reporting failures that aren’t will likely get your test junked pretty quickly, and reporting test passed when it shouldn’t, such as “got the expected log message, but didn’t notice that the application then dumped core” – well, that’s really bad. Maybe you need help in order to write good results checking. Expert advice, or at least someone else to take a look? A tool you don’t have? More time allowed to work on it? Maybe your test is valuable enough to make automated execution with manual verification worth doing, at least for a while.