Most demand generation agency evaluations end with the wrong decision.
Not because the buyer was careless. Because they asked the wrong questions. They evaluated the capabilities deck, the case study design, the founder's LinkedIn presence, and whether the account team seemed like people they would enjoy working with. None of those inputs predict whether the agency will produce pipeline.
This listicle gives demand generation leaders, marketing operations executives, and the teams building mid-market B2B tech, enterprise ABM, and Fortune 1000 programs the 11 questions that actually separate credible agencies from expensive ones. Each question includes what a strong answer looks like and the red flags that signal you should keep looking.
Use these questions in the first two calls. Do not wait for a proposal. The answers you need are available in conversation, and the strongest agencies will welcome the directness.
How to Use This List
Each question is designed to reveal one specific thing about how the agency operates. The red flags are not hypothetical. They are the responses that routinely precede failed engagements.
A note on context: the right answer varies slightly depending on whether you are evaluating an agency for a mid-market SaaS program, an enterprise ABM motion, or a Fortune 1000 buying committee engagement. Where the context changes the answer, that is noted.
Question 1: "What is your primary success metric for a demand generation engagement?"
What you are testing: Whether the agency measures its own performance against revenue outcomes or activity outputs.
What a strong answer looks like: "Marketing-sourced pipeline and pipeline influenced." The agency should be able to name the specific metrics it tracks, describe how attribution is captured, and explain how those metrics are reported to the client on a defined cadence. Bonus if they distinguish between pipeline sourced (marketing as first touch) and pipeline influenced (marketing touchpoint during an active deal).
What a weak answer looks like: Any answer that centers MQL volume, lead delivery, email engagement rates, or campaign impressions as the primary success metric. These are inputs. They are not outcomes. An agency that defines success as "we delivered 300 MQLs" is not aligned with your pipeline target.
Red flag: "It depends on what metrics matter to you." This response is designed to sound client-centric. It is actually a signal that the agency has no native outcome orientation. You want an agency that arrives with a point of view on what matters. Not one that mirrors your preferences back to you.
Question 2: "Walk me through a recent engagement where you can show pipeline contribution numbers."
What you are testing: Whether the agency can cite actual pipeline data from client work, not just campaign performance metrics.
What a strong answer looks like: A specific engagement with a specific pipeline contribution number, a description of the attribution methodology used, the time frame, and what the client's pipeline looked like before and after. The answer does not need to name the client. It needs to name the number.
What a weak answer looks like: "We ran a campaign for a B2B software company that generated over 400 leads and drove significant engagement." Engagement is not pipeline. Leads are not pipeline. Significant is not a number.
Red flag: The agency redirects to awards, certifications, or a case study that describes program execution without citing pipeline outcomes. If the best evidence of their work is a Drum Award and a description of creative strategy, you are looking at the wrong agency.
Question 3: "How do you handle the transition from marketing-qualified to sales-accepted?"
What you are testing: Whether the agency has operational depth in lead handoff, or whether their work stops at the MQL and leaves the conversion problem to the client.
What a strong answer looks like: A defined process for MQL routing, lead routing SLA standards, and either a track record of working with the client's sales team to calibrate scoring models, or a protocol for surfacing MQL rejection data and iterating on scoring. The agency should have an opinion on what a good MQL rejection rate looks like and how they adjust programs when rejection is high.
What a weak answer looks like: "We deliver the leads to your CRM and then it is up to your sales team." This is where most demand generation agency engagements fail silently. If the agency has no involvement or opinion about what happens after the MQL, they are running campaigns. They are not running demand generation programs.
Red flag for Fortune 1000 buyers: If the agency cannot describe experience with complex routing logic, territory assignments, or multi-BU handoff protocols, they have not operated at enterprise scale.
Question 4: "What does your ABM program design process look like, and how do you build and maintain the target account list?"
What you are testing: Whether the agency has genuine ABM operational capability or is running account-targeted campaigns with an ABM label.
What a strong answer looks like: A defined process for ICP development, account tiering (Tier 1, 2, 3 with different investment levels and tactics per tier), intent data integration for account prioritization, and a maintenance protocol for the account list including how accounts enter, move between tiers, and exit. The agency should name the intent data platforms they work with.
What a weak answer looks like: "We use LinkedIn to target specific companies and job titles." That is account-targeted advertising. It is not ABM. The distinction matters because ABM requires buying committee mapping, multi-channel orchestration, and account-level performance measurement that LinkedIn targeting alone cannot provide.
Red flag for enterprise ABM buyers: If the agency cannot describe their 6sense, Bombora, or G2 integration process, or if they claim ABM experience but have no intent data platform partnerships, they are not operating at enterprise ABM maturity.
Question 5: "How do you approach buying committee coverage for enterprise deals?"
What you are testing: Whether the agency understands that Fortune 1000 deals involve 8 to 12 stakeholders and has programs designed to reach multiple personas simultaneously, or whether their programs are built around a single decision-maker.
What a strong answer looks like: A description of persona mapping across the buying committee (economic buyer, technical buyer, champion, end user, procurement), content mapping by persona and stage, and channel strategy that reaches different committee members through different channels. The agency should have a view on how buying committee coverage affects deal velocity and conversion rates.
What a weak answer looks like: "We focus on the CMO and then let sales handle the rest of the organization." This model may work for a simple mid-market SaaS sale. It does not work for a Fortune 1000 buying process with a 9-month sales cycle and a committee that includes IT, Legal, Finance, and three levels of marketing leadership.
Red flag: The agency's case studies only mention one persona. Buying committee coverage leaves a visible trace in the work. If it is not there, they are not doing it.
Question 6: "What MarTech platforms do you work with, and how do you integrate your programs with our existing stack?"
What you are testing: Whether the agency is vendor-neutral and can work inside your existing MAP and CRM, or whether they require you to adopt their preferred tools to run programs.
What a strong answer looks like: Demonstrated experience with your specific MAP (Marketo, HubSpot, Pardot, Eloqua), your CRM (Salesforce, HubSpot CRM), and your intent data platform. The agency should describe a specific integration process: how campaign data flows into your CRM, how touchpoints are captured for attribution, and how reporting connects to your existing dashboards.
What a weak answer looks like: "We have our own demand generation platform that your team would use to access campaign performance." This means your data lives in their system, not yours. When the engagement ends, the data does not transfer cleanly. Your attribution history disappears.
Red flag: Any agency that requires you to adopt a new tool as a condition of engagement without a compelling reason tied to your specific constraint is optimizing for their operational convenience, not yours.
Question 7: "What is your minimum viable program scope, and what would you recommend for a company at our stage?"
What you are testing: Whether the agency will right-size the engagement to your actual situation, or whether their minimum scope is designed around their revenue model.
What a strong answer looks like: A specific description of what they consider the minimum viable program for your company size, revenue stage, and primary constraint. For mid-market SaaS at $30 million ARR, the right program looks different than for a Fortune 1000 division running a $50 million quota. A credible agency knows the difference and will tell you if you are too early for their full program.
What a weak answer looks like: A single scope tier regardless of company size, or a proposal that looks identical to what they sold to a client twice your size. Generic scoping is a signal that the agency is selling a product, not designing a program.
Red flag: If the agency cannot tell you what a program at your stage typically produces in pipeline within 90 days, they are not operating with enough outcome data to calibrate recommendations. They are guessing.
Question 8: "How do you measure and report on Fortune 1000 account engagement, and what does a weekly reporting cadence look like?"
What you are testing: Whether the agency's reporting infrastructure is built for enterprise accountability, or whether reporting is a monthly PDF that tells you what happened without explaining why.
What a strong answer looks like: A weekly pipeline contribution summary (opportunities sourced and influenced, by program and channel), an account engagement report showing buying committee penetration at the account level, and a defined escalation process when performance falls below benchmarks. The agency should describe who delivers the report, how quickly questions are answered, and what the escalation path looks like.
What a weak answer looks like: "We send a monthly dashboard with all your campaign metrics." Monthly reporting for a Fortune 1000 program means you discover problems 30 days after they start. By the time the report arrives, you have lost a month of pipeline.
Red flag: The agency cannot show you a sample reporting template before you sign. If they cannot show you what the reports look like, you do not know what you are buying.
Question 9: "Who specifically will be working on our account, and what is their experience level?"
What you are testing: Whether the senior practitioners who pitched the engagement will be working it, or whether the account will be staffed by junior team members after the contract is signed.
What a strong answer looks like: Named individuals with titles, experience summaries, and relevant client references at a comparable company size and industry. The agency should be willing to put named consultant assignment in the contract with a provision requiring your approval for any change.
What a weak answer looks like: "Our team will be assembled once the engagement begins." Or: "You will be working with our enterprise pod, which includes strategists, campaign managers, and analysts." These are descriptions of an org structure, not of the people who will be accountable for your pipeline.
Red flag: Any agency that resists naming the lead account manager before the contract is signed has staffing uncertainty, capacity issues, or a model that relies on junior execution regardless of what the pitch team implied.
Question 10: "How do you handle underperformance, and what remedies exist if pipeline targets are not met?"
What you are testing: Whether the agency has a defined performance governance process, or whether the contract protects them from accountability while you absorb the risk of underperformance.
What a strong answer looks like: A defined escalation process with specific triggers (pipeline below X percent of target for Y consecutive weeks), a root cause analysis protocol, a formal program adjustment process, and a contract structure that includes performance review milestones with defined remedies.
What a weak answer looks like: "We are committed to your success and will work hard to optimize performance." Commitment language is not a governance structure. It is a way of avoiding one.
Red flag for mid-market SaaS buyers: If the contract contains no performance milestones and the only exit mechanism is a 60 or 90 day notice clause, the agency's downside risk is zero. Yours is the full contract value.
Question 11: "Can you describe an engagement that did not work and what you learned from it?"
What you are testing: Whether the agency has the self-awareness and intellectual honesty to describe a failure, or whether every engagement in their history was a success.
What a strong answer looks like: A specific engagement where results fell short, a clear-eyed explanation of why (not a version where external factors were entirely responsible), and a description of what changed in their process as a result. Agencies that can tell this story with specificity are operating with the kind of accountability culture that produces good engagements.
What a weak answer looks like: "We have been very fortunate that our engagements have generally performed well." Or a failure story where every contributing factor was outside the agency's control: the client did not have the right sales team, the market was difficult, the timing was wrong.
Red flag: Any agency that cannot cite a single failure in their history is either too new to have a track record, or is performing for the evaluation rather than engaging with it honestly. Both are disqualifying signals.
The Shortlisting Decision Matrix
Use these 11 questions across three shortlisted agencies. Score each answer on a simple scale: strong answer (2 points), acceptable answer (1 point), weak answer or red flag (0 points). Maximum score is 22.
An agency scoring below 14 should not advance to proposal stage regardless of other factors. An agency that scores well on Questions 1, 2, 9, and 10 but poorly on the others is operationally accountable but strategically thin. An agency that scores well on Questions 4, 5, and 6 but poorly on Questions 1, 2, and 10 is tactically capable but not outcome-oriented.
The highest-value agencies score 18 or above and have no red flags on Questions 1, 2, 9, or 10.
FAQ
What is the difference between a demand generation agency and a lead generation agency? A lead generation agency delivers contacts who meet a defined profile. A demand generation agency builds and runs the programs that move those contacts through the revenue funnel from awareness to pipeline contribution. Lead generation is a component of demand generation. An agency that describes its core output as leads is a lead generation agency regardless of what they call themselves. Require pipeline and revenue metrics to verify which one you are actually talking to.
How many demand generation agencies should I shortlist? Three. A five or seven firm evaluation produces proposals optimized for RFP compliance rather than program design. Three firms, selected using the question framework above, produces sharper proposals, better conversations, and a clearer eventual decision.
What should a demand generation agency engagement cost for mid-market SaaS? Retainers for mid-market B2B SaaS demand generation programs typically run $8,000 to $25,000 per month depending on program complexity, channel mix, and whether the agency manages media spend. Engagements that include ABM infrastructure build, intent data integration, and full-funnel content creation sit at the higher end. Engagements limited to campaign execution on an existing infrastructure sit at the lower end. Do not anchor the investment conversation to a percentage of marketing budget. Anchor it to the value of the pipeline gap you are trying to close.
What should a Fortune 1000 demand generation agency engagement cost? Fortune 1000 ABM and demand generation programs typically run $20,000 to $60,000 per month, with media spend managed on top of the retainer. Multi-BU programs with buying committee coverage across multiple geographies sit at the higher end. Programs scoped to a single product line or geography sit at the lower end.
How long before a demand generation agency engagement produces pipeline results? A well-scoped engagement with clean CRM data, an operational MAP, and defined ICP typically produces first pipeline attribution within 60 to 90 days. Full program maturity, where marketing is consistently contributing 20 to 30 percent of qualified pipeline, takes 6 to 9 months. Agencies that promise meaningful pipeline results in 30 days are either working from a pre-existing warm database you do not know about, or they are describing lead delivery rather than pipeline contribution.
What is the most important contract term in a demand generation agency engagement? Named account team assignment with client approval rights over any change. The quality of a demand generation engagement is directly proportional to the quality of the people working it. Senior practitioners close deals. Junior staff run accounts. The gap between those two outcomes is the most common source of demand generation agency disappointment. Require named names before you sign.
How do I know if my organization is ready to work with a demand generation agency? Three requirements must be in place. First, a MAP and CRM that can capture campaign touchpoints and route leads to sales. Second, a defined ICP that marketing and sales have agreed on. Third, a sales team with the capacity and process to follow up on marketing-generated opportunities within a defined SLA. An agency cannot produce pipeline results without all three of these in place. If any are missing, fix them before engaging an agency. Spending on demand generation without the infrastructure to convert it is waste, not investment.
The Pedowitz Group has helped B2B organizations generate over $25 billion in marketing-sourced revenue since 2006. Learn more at pedowitzgroup.com.