Informatics in Education

Casebook

INFEDU companion guidance

INFEDU Casebook

The INFEDU Casebook is a curated library of generalized recurring editorial patterns. It does not replace the formal author rules on How to Write for INFEDU. Instead, it shows how INFEDU applies those rules when manuscripts fall into edge configurations that often create overclaiming, underreporting, or evidence-chain confusion.

Important: These cases are generalized and depersonalized. They are educational examples, not a decision predictor. They do not reveal confidential reviewer or editor material and should not be read as a hidden acceptance/rejection formula.

Use this page when…

The paper combines more than one contribution type.
The evidence is formative, proxy-based, layered, comparative, or otherwise indirect.
The paper uses specialized materials that must be auditable for review.
The claims need careful bounding to the evidence actually produced.

Use the core page instead when…

You need the stable INFEDU rules and structure expectations.
You are choosing a manuscript type.
You are checking abstracts, discussions, ethics, or blind-review placement.
Your paper does not fall into a recurring edge case.

Fixed case template used below

Every case is presented in the same structure: When this case applies → What INFEDU expects → What is not enough → Minimal repair path → Mini self-check.

Cases included in this starter version

Framework papers via expert review / walkthrough / feasibility
Hybrid empirical + measurement papers
Multi-source assessment reliability / evidence-access studies
AI/NLP output-quality papers
Entry-diagnostic / baseline-profile studies
AI-assisted programming case studies / small-scale mixed-method papers
Verbal protocol / think-aloud studies
Foundational proxy-artifact studies
Computing-education reviews using adjacent literature comparatively
Narrow-topic reviews with layered evidence bases

How authors should use the Casebook

Identify the case that is closest to your manuscript’s evidence configuration, not just its topic.
Read the What INFEDU expects and What is not enough parts together.
Use the Minimal repair path as a revision checklist before submission.
Return to How to Write for INFEDU for the stable rules.

Design, measurement, and evaluation-pattern cases

Framework papers evaluated through expert review, cognitive walkthrough, or feasibility testing

Case ID: C01_expert_review_formative_evaluation | Typical subtype(s): Methodological / measurement; Theoretical / conceptual; Design & evaluation | Typical risk: Formative evidence is presented as if it established effectiveness, full validation, or deployment readiness.

When this case applies

The manuscript introduces a framework, rubric, method, system, or design rationale and evaluates it mainly through expert judgment, walkthrough, heuristic evaluation, or feasibility testing.
The contribution is early-stage and the evidence is primarily formative rather than learner-outcome based.

What INFEDU expects

State the correct subtype and describe the contribution as formative, methodological, conceptual, or feasibility-oriented.
Map each criterion or construct to the evaluation method, participants, protocol, and exact inference level justified.
Provide reviewable materials: rubric, checklist, walkthrough protocol, coding rules, prompts, or task materials.
Use an agreement or reliability approach that matches the scale level and report its limitations.
Describe qualitative analysis transparently if comments, interviews, or think-aloud data are used.
Report ethics basis, consent, and privacy safeguards when feedback from people is used as research data.
If an AI component is evaluated, report model/provider/version, access mode, prompts, and run conditions.
End with bounded next-step validation: what is shown now and what still requires later empirical study.

What is not enough

Expert agreement alone is not evidence of learning effectiveness, educational impact, psychometric validity, or broad deployment readiness.
Threshold-passing ratings are not equivalent to full validation.
“Ethics: not applicable” is not appropriate when identifiable human feedback or recordings were used.

Minimal repair path

Reframe claims around coherence, implementability, usability, transparency, or feasibility.
Attach or append the full evaluation materials.
Move stronger impact or validity claims into future-work language unless later-stage evidence exists.

Mini self-check

Have we stated exactly what kind of claim this evidence can support?
Can a reviewer audit the full rubric/protocol/materials?
Does the conclusion distinguish present evidence from later validation needs?

Hybrid empirical + measurement papers

Case ID: C02_hybrid_empirical_measurement | Typical subtype(s): Empirical study; Methodological / measurement | Typical risk: A paper mixes an intervention claim with an instrument-validation claim without making the primary contribution or evidence chain explicit.

When this case applies

The paper combines a classroom/intervention or comparison study with the introduction, adaptation, or validation of a scale, rubric, or similar instrument.
The manuscript makes both measurement claims and empirical/intervention claims.

What INFEDU expects

State whether the paper is primarily empirical or primarily methodological / measurement.
Separate the measurement evidence chain from the empirical/intervention evidence chain.
Name the outcome precisely: self-report competence, self-efficacy, beliefs, or attitudes should not be described as direct performance unless performance was measured.
Provide construct-level results if claims are made about subscales or components.
Report language of administration, adaptation/translation procedures, and verification steps where applicable.
Keep intervention implications bounded to what the design can justify.

What is not enough

A pooled/global model is not enough for construct-specific claims.
Self-reported competence is not the same as actual teaching skill or objective performance.
A quasi-experimental design does not automatically justify strong causal language.

Minimal repair path

Declare a primary subtype and rewrite the abstract and conclusion accordingly.
Split the results and discussion into two explicit evidence chains.
Add missing construct-level tables, appendices, or translation details.

Mini self-check

Could a reviewer see immediately which contribution is primary?
Have we reported the exact evidence behind each subscale/component claim?
Have we prevented self-report results from drifting into performance claims?

Multi-source assessment reliability and evidence-access comparison studies

Case ID: C03_multi_source_assessment | Typical subtype(s): Empirical study; Methodological / measurement | Typical risk: A confounded comparison between evidence conditions is interpreted as if one single factor had been isolated.

When this case applies

The study compares assessment results across raters, evidence-access conditions, scoring modalities, or AI-assisted versus human-assisted conditions.
The paper focuses on reliability, agreement, evidence completeness, or score differences across design conditions.

What INFEDU expects

State exactly what is being compared: modality, information access, rater role, rubric, score scale, or another design condition.
Provide the full design matrix: who rated whom, whether raters were crossed or nested, and whether ratings were aggregated.
Report the full scoring protocol, rubric, examples, and any prompts or summary documents used by evaluators.
If a new construct is introduced, distinguish the construct from this study’s operationalization.
If more than one parameter changed at once, label the result as a composite condition effect unless the design isolates a single factor.
Explain whether multi-rater statistics describe individual raters or an aggregated rater system.
If Generalizability Theory is used, report variance components and D-study assumptions/results.
Report ethics and consent safeguards when minors or school students are involved.

What is not enough

Near-zero rater variance after aggregation is not proof that individual raters agree independently.
A design that changes modality, information access, and scale format at the same time cannot support a clean single-factor claim.
Reliability evidence does not by itself establish learning effects or pedagogical superiority.

Minimal repair path

Publish the full design matrix and scoring materials.
Reframe findings as composite-condition effects when the design is confounded.
Add missing variance-component or aggregation information.

Mini self-check

Can a reviewer reconstruct exactly who rated what under which evidence condition?
Have we separated individual-rater results from aggregated-system results?
Does the conclusion avoid treating a confounded comparison as a clean modality effect?

AI/NLP educational-content generation papers evaluated mainly by output quality

Case ID: C04_ai_output_quality_proxy | Typical subtype(s): Design & evaluation; Methodological / measurement | Typical risk: Improved generated outputs are presented as evidence of learner comprehension or educational impact without learner-based evidence.

When this case applies

The manuscript designs, adapts, or compares an AI/NLP system for educational use and evaluates it mainly through expert/rubric-based assessment of outputs.
Learner outcomes are not measured directly or are secondary.

What INFEDU expects

State the correct subtype and define the educational problem first.
Explain why output quality is a justified proxy for the intended educational use.
Report the full system and comparison protocol: model/provider/version, adaptation method, prompt template, control inputs, and inference conditions.
Make comparison fairness explicit across systems.
Provide reviewable evaluation materials: rubric anchors, task/prompt set, prompt-level scores, and representative outputs.
If manual scoring is used, report scorer roles, blinding/independence where relevant, and reliability limits.
Check arithmetic consistency across tables, charts, and appendices.
End with bounded next-step validation that states what still requires learner or classroom study.

What is not enough

Better generated outputs do not by themselves prove learning effectiveness, classroom benefit, or student comprehension.
A tool description without a well-defined educational problem is not enough.
Unmatched prompts or system conditions make comparisons uninterpretable.

Minimal repair path

Reframe the manuscript as output-quality or design-evaluation work.
Publish the exact evaluation protocol and prompt-level materials.
Move educational-effectiveness claims into future validation language unless measured directly.

Mini self-check

Have we justified output quality as a proxy rather than as direct educational effect?
Could a reviewer see whether the compared systems were tested fairly?
Do the appendices reconcile exactly with the main results?

Empirical, exploratory, and qualitative-pattern cases

Entry-diagnostic / baseline-profile descriptive studies

Case ID: C05_entry_diagnostic_baseline | Typical subtype(s): Empirical study | Typical risk: A dataset is labeled “baseline” or “entry” even though some measured exposure may already include the current course or programme.

When this case applies

The paper describes what learners bring into a computing/informatics course or programme at entry.
The evidence is primarily descriptive, diagnostic, or associational.

What INFEDU expects

State exactly when the instrument was administered relative to course start, orientation, or any university instruction.
Separate clearly what was learned before university from what may have been encountered within the current programme.
If only a subset of a broader intake questionnaire is analysed, justify the subset and provide the relevant item wording and formats.
If open-ended responses are coded, report the coding categories, who coded, and how disagreements or audit procedures were handled.
Keep implications bounded to heterogeneity, readiness, or support needs traceable to the measured entry variables.

What is not enough

A broad intake survey cannot be selectively reported without making the analysed subset transparent.
Descriptive baseline evidence does not by itself establish that a proposed pedagogy or curriculum change is effective.
Open-ended coding used as evidence without a visible coding procedure is not enough.

Minimal repair path

Clarify timing and relabel the study if the dataset is not a true pre-entry baseline.
Make the analysed item subset and coding scheme auditable.
Shift strong pedagogical prescriptions into design hypotheses unless tested directly.

Mini self-check

Could some measured “prior exposure” actually include the present course or programme?
Can a reviewer see the exact analysed items or coding categories?
Are the recommendations tied only to variables we actually measured?

AI-assisted programming case studies and small-scale mixed-method papers

Case ID: C06_ai_programming_exploratory | Typical subtype(s): Empirical study; Design & evaluation; Methodological / measurement | Typical risk: Learners all had access to AI, but the paper uses causal or impact language as if an effect design or comparator existed.

When this case applies

The study investigates programming learning activities in which generative AI tools were available to learners.
The design combines code artefacts, logs, surveys, interviews, or other mixed evidence sources and is often small-scale or exploratory.

What INFEDU expects

State the subtype clearly and frame the study as exploratory, pilot, descriptive, observational, or feasibility-oriented when no comparator exists.
Report AI tool transparency: provider, product, model/version, access mode, and task/prompt/protocol detail.
Account for all data sources by strand: participants, artefact sets, logs/chats/prompts, survey respondents, interviewees, and how they relate.
Explain the mixed-method integration logic: why both strands are needed and how final interpretation combines them.
Describe artefact-quality evaluation transparently: rubric, scoring rules, evaluator roles, and consistency/audit procedures.
Report minor/school safeguards when applicable.

What is not enough

Without a non-AI comparator or credible identification strategy, the study cannot by itself establish that AI improved learning or code quality.
Qualitative data should not be used merely to decorate an underpowered quantitative strand.
A short pilot or single minicourse does not justify broad generalization.

Minimal repair path

Replace impact language with bounded descriptive or exploratory language.
Publish clear accounting of all strands and missing data.
State design hypotheses for future comparison studies rather than effect conclusions.

Mini self-check

Did all participants have access to AI, and if so have we avoided causal language?
Can a reviewer see how the qualitative and quantitative strands actually integrate?
Have we described the AI tool and task conditions precisely enough for review?

Verbal protocol and think-aloud studies in computing education

Case ID: C07_verbal_protocol_think_aloud | Typical subtype(s): Qualitative empirical study; Mixed-method study; Methodological / measurement | Typical risk: Verbal data are treated as self-explanatory evidence even though the protocol, coding, transcript volume, translation, or trustworthiness safeguards are underreported.

When this case applies

The manuscript uses concurrent think-aloud, immediate retrospective, or delayed retrospective verbalization to study a computing-education problem.
The contribution concerns themes, reasoning patterns, mental models, mechanisms, or transferable design principles.

What INFEDU expects

State whether the protocol is concurrent or retrospective and justify that choice.
Define case boundaries, participant-selection logic, task design, and the amount of usable protocol data.
Describe transcription, segmentation, and coding procedures in enough detail for review.
If AI-assisted transcription, translation, or coding tools were used, report the tool, what data were uploaded, and how outputs were verified.
If excerpts were translated, report who translated them and how translation quality was checked.
Provide a reviewable codebook or coding-rule appendix.
Report trustworthiness safeguards such as agreement on a subset, consensus process, or audit trail.

What is not enough

Raw frequency comparisons are weak when transcript lengths or task opportunities differ greatly across participants or groups.
A single expert may provide illustrative contrast but does not support broad expert-novice generalization.
Pedagogical recommendations still need to be tied directly to analysed evidence.

Minimal repair path

Publish the codebook and protocol details.
Normalize or bound transcript-frequency comparisons when denominators differ.
Reframe expert-novice contrasts as illustrative when the expert sample is extremely small.

Mini self-check

Have we described the protocol and transcription/coding workflow transparently?
Are translated excerpts auditable?
Have we prevented uneven transcript volume from turning into overclaiming?

Foundational studies using non-computing proxy artifacts

Case ID: C08_proxy_artifact_foundational | Typical subtype(s): Empirical study; Conceptual / theoretical; Design & evaluation | Typical risk: A general cognition or explanation study is presented as if it were direct evidence about computing education without an explicit bridge argument.

When this case applies

The paper studies a mechanism relevant to computing education or AI explanation but tests it first with a non-computing artifact used as a controlled proxy.
The manuscript argues that something about the proxy result transfers to computing/informatics contexts.

What INFEDU expects

Put the computing/informatics education question first.
Explain why the chosen artifact is a justified model system for the mechanism under study.
State explicitly what is claimed to transfer from the proxy setting and why.
Distinguish clearly between what the current study shows about the proxy task and what it only suggests about computing artifacts or classroom learning.
Discuss ecological validity, authenticity, artifact complexity, and social/contextual limits on transfer.

What is not enough

Adding computing relevance only in the final implications is not enough.
A proxy result is not direct evidence of classroom effectiveness, learner transfer, or design superiority in computing education.
A convenience artifact without a bridge argument does not establish INFEDU relevance.

Minimal repair path

Strengthen the bridge argument and transfer logic.
Bound the claims to foundational mechanism evidence.
Move curriculum or pedagogy claims into future-hypothesis language unless direct computing-context evidence exists.

Mini self-check

Is the core problem genuinely about computing/informatics education?
Have we justified why this proxy artifact models the mechanism of interest?
Does the conclusion separate direct findings from transfer speculation?

Review and synthesis-pattern cases

Review papers that centre computing education but use broader adjacent literature comparatively

Case ID: C09_adjacent_literature_review | Typical subtype(s): Review | Typical risk: The review title, abstract, or implications imply stronger direct computing-education evidence than the actual corpus contains.

When this case applies

The review focuses on a computing/informatics education problem but includes broader STEM or adjacent educational literature to clarify convergent or contrasting patterns.
The paper relies on a mixed corpus with both direct and comparative evidence.

What INFEDU expects

State the primary review focus clearly so that the central contribution is still to informatics/computing education.
Separate direct and comparative evidence explicitly.
Explain why adjacent studies were retained and what they can and cannot support.
Show the synthesis trace: which design principles, taxonomy elements, or claims are grounded mainly in direct evidence and which are comparative extensions.
Provide a study-level table or evidence map that shows domain, context, study type, and the role each study plays.

What is not enough

A mixed corpus cannot be presented as if all studies were direct computing-education evidence.
Broad educational-technology commentary is not enough; the review must close with implications for computing/informatics education.

Minimal repair path

Tag studies by evidence role and rewrite the abstract/conclusion accordingly.
Add a transparent evidence map.
Reduce directness of claims when much of the corpus is comparative.

Mini self-check

Can a reader tell which studies are direct evidence and which are comparative support?
Does the abstract overstate the direct computing-education evidence?
Are the closing implications specific to informatics/computing education?

Narrow-topic reviews with layered evidence bases

Case ID: C10_layered_evidence_review | Typical subtype(s): Review | Typical risk: A sparse direct evidence base is widened with supporting literature, but the review blurs the boundary between direct findings and interpretive extensions.

When this case applies

The review topic is narrowly scoped and the direct educational literature is sparse, so authors broaden the corpus to include adjacent mechanism-focused or contextual studies.
The manuscript needs an explicit layered evidence structure.

What INFEDU expects

State the exact focal construct or practice in computing/informatics education.
Separate the evidence base into explicit layers such as direct studies, adjacent mechanism studies, and supporting/contextual studies.
Provide study-level traceability with citation, year, context, learner level, method, topical focus, and evidence role.
Keep the synthesis trace explicit so readers can see which conclusions are direct and which are interpretive.
Report exact search dates, full search strings per database, inclusion/exclusion criteria, and screening workflow.
Explain how quality appraisal or evidence-strength handling was done.

What is not enough

Counting supporting studies in the main totals without tagging their role is not enough.
Sparse direct evidence cannot justify strong pedagogical prescriptions unless clearly marked as interpretive.
Visually truncated or unreproducible search strings are not enough.

Minimal repair path

Add evidence-role tagging and layered synthesis language throughout the paper.
Publish full database-specific search strings.
Rewrite conclusions to reflect the strongest evidence actually available.

Mini self-check

Could a reviewer identify every study and its evidence role?
Would the conclusion still be defensible if only the direct evidence layer were considered?
Are the search strings fully reproducible from the submitted files?

RSS