diff --git a/.Rbuildignore b/.Rbuildignore new file mode 100644 index 0000000..6c04d75 --- /dev/null +++ b/.Rbuildignore @@ -0,0 +1,32 @@ +\.Rcheck$ +\.Rout$ +\.Rproj$ +\.tar\.gz$ +^GPATH$ +^GRTAGS$ +^GTAGS$ +^LICENSE$ +^Makefile$ +^README\.Rmd$ +^README\.html$ +^README_cache$ +^TAGS$ +^TODO\.org$ +^\# +^\.Rhistory$ +^\.Rproj\.user$ +^\.\# +^\.clang_complete$ +^\.clangd$ +^\.git$ +^\.github$ +^\.gitlab-ci.yml$ +^\.travis\.yml$ +^_pkgdown\.yml$ +^appveyor\.yml$ +^docs$ +^misc$ +^revdep$ +^test$ +^working$ +~$ diff --git a/DESCRIPTION b/DESCRIPTION new file mode 100644 index 0000000..33871e1 --- /dev/null +++ b/DESCRIPTION @@ -0,0 +1,28 @@ +Package: ibist +Title: Data and Functions for Introduction to Biostatistics with R +Version: 0.1-0 +Authors@R: c( + person(given = "Elizabeth", family = "Schifano", + role = c("aut"), + comment = c(ORCID = "0000-0002-9793-332X")), + person(given = "Jun", family = "Yan", + email = "jun.yan@uconn.edu", + role = c("aut", "cre"), + comment = c(ORCID = "0000-0003-4401-7296")) + ) +Description: Provides datasets and supporting functions for the book + Introduction to Biostatistics with R by Schifano and Yan (2026+), + published by Taylor & Francis. The package is intended for teaching + introductory biostatistics and for reproducing examples in the text. +Depends: + R (>= 4.4.0) +VignetteBuilder: knitr +License: GPL (>= 3) +URL: https://github.com/statds/ibist-R +BugReports: https://github.com/statds/ibist-R/issues +Imports: stats, rlang, ggplot2 +Suggests: knitr, testthat (>= 3.0.0) +LazyData: true +RoxygenNote: 7.3.2 +Encoding: UTF-8 +Config/testthat/edition: 3 diff --git a/LICENSE b/LICENSE deleted file mode 100644 index 0ad25db..0000000 --- a/LICENSE +++ /dev/null @@ -1,661 +0,0 @@ - GNU AFFERO GENERAL PUBLIC LICENSE - Version 3, 19 November 2007 - - Copyright (C) 2007 Free Software Foundation, Inc. - Everyone is permitted to copy and distribute verbatim copies - of this license document, but changing it is not allowed. - - Preamble - - The GNU Affero General Public License is a free, copyleft license for -software and other kinds of works, specifically designed to ensure -cooperation with the community in the case of network server software. - - The licenses for most software and other practical works are designed -to take away your freedom to share and change the works. By contrast, -our General Public Licenses are intended to guarantee your freedom to -share and change all versions of a program--to make sure it remains free -software for all its users. - - When we speak of free software, we are referring to freedom, not -price. Our General Public Licenses are designed to make sure that you -have the freedom to distribute copies of free software (and charge for -them if you wish), that you receive source code or can get it if you -want it, that you can change the software or use pieces of it in new -free programs, and that you know you can do these things. - - Developers that use our General Public Licenses protect your rights -with two steps: (1) assert copyright on the software, and (2) offer -you this License which gives you legal permission to copy, distribute -and/or modify the software. - - A secondary benefit of defending all users' freedom is that -improvements made in alternate versions of the program, if they -receive widespread use, become available for other developers to -incorporate. Many developers of free software are heartened and -encouraged by the resulting cooperation. However, in the case of -software used on network servers, this result may fail to come about. -The GNU General Public License permits making a modified version and -letting the public access it on a server without ever releasing its -source code to the public. - - The GNU Affero General Public License is designed specifically to -ensure that, in such cases, the modified source code becomes available -to the community. It requires the operator of a network server to -provide the source code of the modified version running there to the -users of that server. Therefore, public use of a modified version, on -a publicly accessible server, gives the public access to the source -code of the modified version. - - An older license, called the Affero General Public License and -published by Affero, was designed to accomplish similar goals. This is -a different license, not a version of the Affero GPL, but Affero has -released a new version of the Affero GPL which permits relicensing under -this license. - - The precise terms and conditions for copying, distribution and -modification follow. - - TERMS AND CONDITIONS - - 0. Definitions. - - "This License" refers to version 3 of the GNU Affero General Public License. - - "Copyright" also means copyright-like laws that apply to other kinds of -works, such as semiconductor masks. - - "The Program" refers to any copyrightable work licensed under this -License. Each licensee is addressed as "you". "Licensees" and -"recipients" may be individuals or organizations. - - To "modify" a work means to copy from or adapt all or part of the work -in a fashion requiring copyright permission, other than the making of an -exact copy. The resulting work is called a "modified version" of the -earlier work or a work "based on" the earlier work. - - A "covered work" means either the unmodified Program or a work based -on the Program. - - To "propagate" a work means to do anything with it that, without -permission, would make you directly or secondarily liable for -infringement under applicable copyright law, except executing it on a -computer or modifying a private copy. Propagation includes copying, -distribution (with or without modification), making available to the -public, and in some countries other activities as well. - - To "convey" a work means any kind of propagation that enables other -parties to make or receive copies. Mere interaction with a user through -a computer network, with no transfer of a copy, is not conveying. - - An interactive user interface displays "Appropriate Legal Notices" -to the extent that it includes a convenient and prominently visible -feature that (1) displays an appropriate copyright notice, and (2) -tells the user that there is no warranty for the work (except to the -extent that warranties are provided), that licensees may convey the -work under this License, and how to view a copy of this License. If -the interface presents a list of user commands or options, such as a -menu, a prominent item in the list meets this criterion. - - 1. Source Code. - - The "source code" for a work means the preferred form of the work -for making modifications to it. "Object code" means any non-source -form of a work. - - A "Standard Interface" means an interface that either is an official -standard defined by a recognized standards body, or, in the case of -interfaces specified for a particular programming language, one that -is widely used among developers working in that language. - - The "System Libraries" of an executable work include anything, other -than the work as a whole, that (a) is included in the normal form of -packaging a Major Component, but which is not part of that Major -Component, and (b) serves only to enable use of the work with that -Major Component, or to implement a Standard Interface for which an -implementation is available to the public in source code form. A -"Major Component", in this context, means a major essential component -(kernel, window system, and so on) of the specific operating system -(if any) on which the executable work runs, or a compiler used to -produce the work, or an object code interpreter used to run it. - - The "Corresponding Source" for a work in object code form means all -the source code needed to generate, install, and (for an executable -work) run the object code and to modify the work, including scripts to -control those activities. However, it does not include the work's -System Libraries, or general-purpose tools or generally available free -programs which are used unmodified in performing those activities but -which are not part of the work. For example, Corresponding Source -includes interface definition files associated with source files for -the work, and the source code for shared libraries and dynamically -linked subprograms that the work is specifically designed to require, -such as by intimate data communication or control flow between those -subprograms and other parts of the work. - - The Corresponding Source need not include anything that users -can regenerate automatically from other parts of the Corresponding -Source. - - The Corresponding Source for a work in source code form is that -same work. - - 2. Basic Permissions. - - All rights granted under this License are granted for the term of -copyright on the Program, and are irrevocable provided the stated -conditions are met. This License explicitly affirms your unlimited -permission to run the unmodified Program. The output from running a -covered work is covered by this License only if the output, given its -content, constitutes a covered work. This License acknowledges your -rights of fair use or other equivalent, as provided by copyright law. - - You may make, run and propagate covered works that you do not -convey, without conditions so long as your license otherwise remains -in force. You may convey covered works to others for the sole purpose -of having them make modifications exclusively for you, or provide you -with facilities for running those works, provided that you comply with -the terms of this License in conveying all material for which you do -not control copyright. Those thus making or running the covered works -for you must do so exclusively on your behalf, under your direction -and control, on terms that prohibit them from making any copies of -your copyrighted material outside their relationship with you. - - Conveying under any other circumstances is permitted solely under -the conditions stated below. Sublicensing is not allowed; section 10 -makes it unnecessary. - - 3. Protecting Users' Legal Rights From Anti-Circumvention Law. - - No covered work shall be deemed part of an effective technological -measure under any applicable law fulfilling obligations under article -11 of the WIPO copyright treaty adopted on 20 December 1996, or -similar laws prohibiting or restricting circumvention of such -measures. - - When you convey a covered work, you waive any legal power to forbid -circumvention of technological measures to the extent such circumvention -is effected by exercising rights under this License with respect to -the covered work, and you disclaim any intention to limit operation or -modification of the work as a means of enforcing, against the work's -users, your or third parties' legal rights to forbid circumvention of -technological measures. - - 4. Conveying Verbatim Copies. - - You may convey verbatim copies of the Program's source code as you -receive it, in any medium, provided that you conspicuously and -appropriately publish on each copy an appropriate copyright notice; -keep intact all notices stating that this License and any -non-permissive terms added in accord with section 7 apply to the code; -keep intact all notices of the absence of any warranty; and give all -recipients a copy of this License along with the Program. - - You may charge any price or no price for each copy that you convey, -and you may offer support or warranty protection for a fee. - - 5. Conveying Modified Source Versions. - - You may convey a work based on the Program, or the modifications to -produce it from the Program, in the form of source code under the -terms of section 4, provided that you also meet all of these conditions: - - a) The work must carry prominent notices stating that you modified - it, and giving a relevant date. - - b) The work must carry prominent notices stating that it is - released under this License and any conditions added under section - 7. This requirement modifies the requirement in section 4 to - "keep intact all notices". - - c) You must license the entire work, as a whole, under this - License to anyone who comes into possession of a copy. This - License will therefore apply, along with any applicable section 7 - additional terms, to the whole of the work, and all its parts, - regardless of how they are packaged. This License gives no - permission to license the work in any other way, but it does not - invalidate such permission if you have separately received it. - - d) If the work has interactive user interfaces, each must display - Appropriate Legal Notices; however, if the Program has interactive - interfaces that do not display Appropriate Legal Notices, your - work need not make them do so. - - A compilation of a covered work with other separate and independent -works, which are not by their nature extensions of the covered work, -and which are not combined with it such as to form a larger program, -in or on a volume of a storage or distribution medium, is called an -"aggregate" if the compilation and its resulting copyright are not -used to limit the access or legal rights of the compilation's users -beyond what the individual works permit. Inclusion of a covered work -in an aggregate does not cause this License to apply to the other -parts of the aggregate. - - 6. Conveying Non-Source Forms. - - You may convey a covered work in object code form under the terms -of sections 4 and 5, provided that you also convey the -machine-readable Corresponding Source under the terms of this License, -in one of these ways: - - a) Convey the object code in, or embodied in, a physical product - (including a physical distribution medium), accompanied by the - Corresponding Source fixed on a durable physical medium - customarily used for software interchange. - - b) Convey the object code in, or embodied in, a physical product - (including a physical distribution medium), accompanied by a - written offer, valid for at least three years and valid for as - long as you offer spare parts or customer support for that product - model, to give anyone who possesses the object code either (1) a - copy of the Corresponding Source for all the software in the - product that is covered by this License, on a durable physical - medium customarily used for software interchange, for a price no - more than your reasonable cost of physically performing this - conveying of source, or (2) access to copy the - Corresponding Source from a network server at no charge. - - c) Convey individual copies of the object code with a copy of the - written offer to provide the Corresponding Source. This - alternative is allowed only occasionally and noncommercially, and - only if you received the object code with such an offer, in accord - with subsection 6b. - - d) Convey the object code by offering access from a designated - place (gratis or for a charge), and offer equivalent access to the - Corresponding Source in the same way through the same place at no - further charge. You need not require recipients to copy the - Corresponding Source along with the object code. If the place to - copy the object code is a network server, the Corresponding Source - may be on a different server (operated by you or a third party) - that supports equivalent copying facilities, provided you maintain - clear directions next to the object code saying where to find the - Corresponding Source. Regardless of what server hosts the - Corresponding Source, you remain obligated to ensure that it is - available for as long as needed to satisfy these requirements. - - e) Convey the object code using peer-to-peer transmission, provided - you inform other peers where the object code and Corresponding - Source of the work are being offered to the general public at no - charge under subsection 6d. - - A separable portion of the object code, whose source code is excluded -from the Corresponding Source as a System Library, need not be -included in conveying the object code work. - - A "User Product" is either (1) a "consumer product", which means any -tangible personal property which is normally used for personal, family, -or household purposes, or (2) anything designed or sold for incorporation -into a dwelling. In determining whether a product is a consumer product, -doubtful cases shall be resolved in favor of coverage. For a particular -product received by a particular user, "normally used" refers to a -typical or common use of that class of product, regardless of the status -of the particular user or of the way in which the particular user -actually uses, or expects or is expected to use, the product. A product -is a consumer product regardless of whether the product has substantial -commercial, industrial or non-consumer uses, unless such uses represent -the only significant mode of use of the product. - - "Installation Information" for a User Product means any methods, -procedures, authorization keys, or other information required to install -and execute modified versions of a covered work in that User Product from -a modified version of its Corresponding Source. The information must -suffice to ensure that the continued functioning of the modified object -code is in no case prevented or interfered with solely because -modification has been made. - - If you convey an object code work under this section in, or with, or -specifically for use in, a User Product, and the conveying occurs as -part of a transaction in which the right of possession and use of the -User Product is transferred to the recipient in perpetuity or for a -fixed term (regardless of how the transaction is characterized), the -Corresponding Source conveyed under this section must be accompanied -by the Installation Information. But this requirement does not apply -if neither you nor any third party retains the ability to install -modified object code on the User Product (for example, the work has -been installed in ROM). - - The requirement to provide Installation Information does not include a -requirement to continue to provide support service, warranty, or updates -for a work that has been modified or installed by the recipient, or for -the User Product in which it has been modified or installed. Access to a -network may be denied when the modification itself materially and -adversely affects the operation of the network or violates the rules and -protocols for communication across the network. - - Corresponding Source conveyed, and Installation Information provided, -in accord with this section must be in a format that is publicly -documented (and with an implementation available to the public in -source code form), and must require no special password or key for -unpacking, reading or copying. - - 7. Additional Terms. - - "Additional permissions" are terms that supplement the terms of this -License by making exceptions from one or more of its conditions. -Additional permissions that are applicable to the entire Program shall -be treated as though they were included in this License, to the extent -that they are valid under applicable law. If additional permissions -apply only to part of the Program, that part may be used separately -under those permissions, but the entire Program remains governed by -this License without regard to the additional permissions. - - When you convey a copy of a covered work, you may at your option -remove any additional permissions from that copy, or from any part of -it. (Additional permissions may be written to require their own -removal in certain cases when you modify the work.) You may place -additional permissions on material, added by you to a covered work, -for which you have or can give appropriate copyright permission. - - Notwithstanding any other provision of this License, for material you -add to a covered work, you may (if authorized by the copyright holders of -that material) supplement the terms of this License with terms: - - a) Disclaiming warranty or limiting liability differently from the - terms of sections 15 and 16 of this License; or - - b) Requiring preservation of specified reasonable legal notices or - author attributions in that material or in the Appropriate Legal - Notices displayed by works containing it; or - - c) Prohibiting misrepresentation of the origin of that material, or - requiring that modified versions of such material be marked in - reasonable ways as different from the original version; or - - d) Limiting the use for publicity purposes of names of licensors or - authors of the material; or - - e) Declining to grant rights under trademark law for use of some - trade names, trademarks, or service marks; or - - f) Requiring indemnification of licensors and authors of that - material by anyone who conveys the material (or modified versions of - it) with contractual assumptions of liability to the recipient, for - any liability that these contractual assumptions directly impose on - those licensors and authors. - - All other non-permissive additional terms are considered "further -restrictions" within the meaning of section 10. If the Program as you -received it, or any part of it, contains a notice stating that it is -governed by this License along with a term that is a further -restriction, you may remove that term. If a license document contains -a further restriction but permits relicensing or conveying under this -License, you may add to a covered work material governed by the terms -of that license document, provided that the further restriction does -not survive such relicensing or conveying. - - If you add terms to a covered work in accord with this section, you -must place, in the relevant source files, a statement of the -additional terms that apply to those files, or a notice indicating -where to find the applicable terms. - - Additional terms, permissive or non-permissive, may be stated in the -form of a separately written license, or stated as exceptions; -the above requirements apply either way. - - 8. Termination. - - You may not propagate or modify a covered work except as expressly -provided under this License. Any attempt otherwise to propagate or -modify it is void, and will automatically terminate your rights under -this License (including any patent licenses granted under the third -paragraph of section 11). - - However, if you cease all violation of this License, then your -license from a particular copyright holder is reinstated (a) -provisionally, unless and until the copyright holder explicitly and -finally terminates your license, and (b) permanently, if the copyright -holder fails to notify you of the violation by some reasonable means -prior to 60 days after the cessation. - - Moreover, your license from a particular copyright holder is -reinstated permanently if the copyright holder notifies you of the -violation by some reasonable means, this is the first time you have -received notice of violation of this License (for any work) from that -copyright holder, and you cure the violation prior to 30 days after -your receipt of the notice. - - Termination of your rights under this section does not terminate the -licenses of parties who have received copies or rights from you under -this License. If your rights have been terminated and not permanently -reinstated, you do not qualify to receive new licenses for the same -material under section 10. - - 9. Acceptance Not Required for Having Copies. - - You are not required to accept this License in order to receive or -run a copy of the Program. Ancillary propagation of a covered work -occurring solely as a consequence of using peer-to-peer transmission -to receive a copy likewise does not require acceptance. However, -nothing other than this License grants you permission to propagate or -modify any covered work. These actions infringe copyright if you do -not accept this License. Therefore, by modifying or propagating a -covered work, you indicate your acceptance of this License to do so. - - 10. Automatic Licensing of Downstream Recipients. - - Each time you convey a covered work, the recipient automatically -receives a license from the original licensors, to run, modify and -propagate that work, subject to this License. You are not responsible -for enforcing compliance by third parties with this License. - - An "entity transaction" is a transaction transferring control of an -organization, or substantially all assets of one, or subdividing an -organization, or merging organizations. If propagation of a covered -work results from an entity transaction, each party to that -transaction who receives a copy of the work also receives whatever -licenses to the work the party's predecessor in interest had or could -give under the previous paragraph, plus a right to possession of the -Corresponding Source of the work from the predecessor in interest, if -the predecessor has it or can get it with reasonable efforts. - - You may not impose any further restrictions on the exercise of the -rights granted or affirmed under this License. For example, you may -not impose a license fee, royalty, or other charge for exercise of -rights granted under this License, and you may not initiate litigation -(including a cross-claim or counterclaim in a lawsuit) alleging that -any patent claim is infringed by making, using, selling, offering for -sale, or importing the Program or any portion of it. - - 11. Patents. - - A "contributor" is a copyright holder who authorizes use under this -License of the Program or a work on which the Program is based. The -work thus licensed is called the contributor's "contributor version". - - A contributor's "essential patent claims" are all patent claims -owned or controlled by the contributor, whether already acquired or -hereafter acquired, that would be infringed by some manner, permitted -by this License, of making, using, or selling its contributor version, -but do not include claims that would be infringed only as a -consequence of further modification of the contributor version. For -purposes of this definition, "control" includes the right to grant -patent sublicenses in a manner consistent with the requirements of -this License. - - Each contributor grants you a non-exclusive, worldwide, royalty-free -patent license under the contributor's essential patent claims, to -make, use, sell, offer for sale, import and otherwise run, modify and -propagate the contents of its contributor version. - - In the following three paragraphs, a "patent license" is any express -agreement or commitment, however denominated, not to enforce a patent -(such as an express permission to practice a patent or covenant not to -sue for patent infringement). To "grant" such a patent license to a -party means to make such an agreement or commitment not to enforce a -patent against the party. - - If you convey a covered work, knowingly relying on a patent license, -and the Corresponding Source of the work is not available for anyone -to copy, free of charge and under the terms of this License, through a -publicly available network server or other readily accessible means, -then you must either (1) cause the Corresponding Source to be so -available, or (2) arrange to deprive yourself of the benefit of the -patent license for this particular work, or (3) arrange, in a manner -consistent with the requirements of this License, to extend the patent -license to downstream recipients. "Knowingly relying" means you have -actual knowledge that, but for the patent license, your conveying the -covered work in a country, or your recipient's use of the covered work -in a country, would infringe one or more identifiable patents in that -country that you have reason to believe are valid. - - If, pursuant to or in connection with a single transaction or -arrangement, you convey, or propagate by procuring conveyance of, a -covered work, and grant a patent license to some of the parties -receiving the covered work authorizing them to use, propagate, modify -or convey a specific copy of the covered work, then the patent license -you grant is automatically extended to all recipients of the covered -work and works based on it. - - A patent license is "discriminatory" if it does not include within -the scope of its coverage, prohibits the exercise of, or is -conditioned on the non-exercise of one or more of the rights that are -specifically granted under this License. You may not convey a covered -work if you are a party to an arrangement with a third party that is -in the business of distributing software, under which you make payment -to the third party based on the extent of your activity of conveying -the work, and under which the third party grants, to any of the -parties who would receive the covered work from you, a discriminatory -patent license (a) in connection with copies of the covered work -conveyed by you (or copies made from those copies), or (b) primarily -for and in connection with specific products or compilations that -contain the covered work, unless you entered into that arrangement, -or that patent license was granted, prior to 28 March 2007. - - Nothing in this License shall be construed as excluding or limiting -any implied license or other defenses to infringement that may -otherwise be available to you under applicable patent law. - - 12. No Surrender of Others' Freedom. - - If conditions are imposed on you (whether by court order, agreement or -otherwise) that contradict the conditions of this License, they do not -excuse you from the conditions of this License. If you cannot convey a -covered work so as to satisfy simultaneously your obligations under this -License and any other pertinent obligations, then as a consequence you may -not convey it at all. For example, if you agree to terms that obligate you -to collect a royalty for further conveying from those to whom you convey -the Program, the only way you could satisfy both those terms and this -License would be to refrain entirely from conveying the Program. - - 13. Remote Network Interaction; Use with the GNU General Public License. - - Notwithstanding any other provision of this License, if you modify the -Program, your modified version must prominently offer all users -interacting with it remotely through a computer network (if your version -supports such interaction) an opportunity to receive the Corresponding -Source of your version by providing access to the Corresponding Source -from a network server at no charge, through some standard or customary -means of facilitating copying of software. This Corresponding Source -shall include the Corresponding Source for any work covered by version 3 -of the GNU General Public License that is incorporated pursuant to the -following paragraph. - - Notwithstanding any other provision of this License, you have -permission to link or combine any covered work with a work licensed -under version 3 of the GNU General Public License into a single -combined work, and to convey the resulting work. The terms of this -License will continue to apply to the part which is the covered work, -but the work with which it is combined will remain governed by version -3 of the GNU General Public License. - - 14. Revised Versions of this License. - - The Free Software Foundation may publish revised and/or new versions of -the GNU Affero General Public License from time to time. Such new versions -will be similar in spirit to the present version, but may differ in detail to -address new problems or concerns. - - Each version is given a distinguishing version number. If the -Program specifies that a certain numbered version of the GNU Affero General -Public License "or any later version" applies to it, you have the -option of following the terms and conditions either of that numbered -version or of any later version published by the Free Software -Foundation. If the Program does not specify a version number of the -GNU Affero General Public License, you may choose any version ever published -by the Free Software Foundation. - - If the Program specifies that a proxy can decide which future -versions of the GNU Affero General Public License can be used, that proxy's -public statement of acceptance of a version permanently authorizes you -to choose that version for the Program. - - Later license versions may give you additional or different -permissions. However, no additional obligations are imposed on any -author or copyright holder as a result of your choosing to follow a -later version. - - 15. Disclaimer of Warranty. - - THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY -APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT -HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY -OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, -THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR -PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM -IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF -ALL NECESSARY SERVICING, REPAIR OR CORRECTION. - - 16. Limitation of Liability. - - IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING -WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS -THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY -GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE -USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF -DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD -PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), -EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF -SUCH DAMAGES. - - 17. Interpretation of Sections 15 and 16. - - If the disclaimer of warranty and limitation of liability provided -above cannot be given local legal effect according to their terms, -reviewing courts shall apply local law that most closely approximates -an absolute waiver of all civil liability in connection with the -Program, unless a warranty or assumption of liability accompanies a -copy of the Program in return for a fee. - - END OF TERMS AND CONDITIONS - - How to Apply These Terms to Your New Programs - - If you develop a new program, and you want it to be of the greatest -possible use to the public, the best way to achieve this is to make it -free software which everyone can redistribute and change under these terms. - - To do so, attach the following notices to the program. It is safest -to attach them to the start of each source file to most effectively -state the exclusion of warranty; and each file should have at least -the "copyright" line and a pointer to where the full notice is found. - - - Copyright (C) - - This program is free software: you can redistribute it and/or modify - it under the terms of the GNU Affero General Public License as published - by the Free Software Foundation, either version 3 of the License, or - (at your option) any later version. - - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU Affero General Public License for more details. - - You should have received a copy of the GNU Affero General Public License - along with this program. If not, see . - -Also add information on how to contact you by electronic and paper mail. - - If your software can interact with users remotely through a computer -network, you should also make sure that it provides a way for users to -get its source. For example, if your program is a web application, its -interface could display a "Source" link that leads users to an archive -of the code. There are many ways you could offer source, and different -solutions will be better for different programs; see section 13 for the -specific requirements. - - You should also get your employer (if you work as a programmer) or school, -if any, to sign a "copyright disclaimer" for the program, if necessary. -For more information on this, and how to apply and follow the GNU AGPL, see -. diff --git a/Makefile b/Makefile new file mode 100644 index 0000000..7ed9975 --- /dev/null +++ b/Makefile @@ -0,0 +1,74 @@ +objects := $(wildcard R/*.R) DESCRIPTION +version := $(shell grep -E "^Version:" DESCRIPTION | awk '{print $$NF}') +pkg := $(shell grep -E "^Package:" DESCRIPTION | awk '{print $$NF}') +tar := $(pkg)_$(version).tar.gz +tinytest := $(wildcard inst/tinytest/*.R) +checkLog := $(pkg).Rcheck/00check.log +rmd := $(wildcard vignettes/*.Rmd) +vignettes := $(patsubst %.Rmd,%.html,$(rmd)) + + +.PHONY: check +check: $(checkLog) + +.PHONY: build +build: $(tar) + +.PHONY: install +install: + R CMD build . + R CMD INSTALL $(tar) + +.PHONY: preview +preview: $(vignettes) + +.PHONY: pkgdown +pkgdown: + Rscript -e "library(methods); pkgdown::build_site();" + +.PHONY: deploy-pkgdown +deploy-pkgdown: + @bash misc/deploy_docs.sh + +.PHONY: check-rcpp +check-rcpp: $(tar) + R CMD INSTALL $(tar) + Rscript inst/run_rcpp_test.R > check-rcpp.Rout & + +.PHONY: check-revdep +check-revdep: $(tar) + @mkdir -p revdep + @rm -rf revdep/{*.Rcheck,*.tar.gz} + @cp $(tar) revdep + nohup R CMD BATCH --no-save --no-restore misc/revdep_check.R & + +$(tar): $(objects) + @Rscript -e "library(methods);" \ + -e "devtools::document();"; + @$(MAKE) update-timestamp + R CMD build . + +$(checkLog): $(tar) $(tinytest) + R CMD check --as-cran $(tar) + +vignettes/%.html: vignettes/%.Rmd + Rscript -e "library(methods); rmarkdown::render('$?')" + +.PHONY: readme +readme: README.md +README.md: README.Rmd + @Rscript -e "rmarkdown::render('$<')" + +## update copyright year +.PHONY: update-timestamp +update-timestamp: + @bash misc/update_timestamp.sh + +.PHONY: tags +tags: + Rscript -e "utils::rtags(path = 'R', ofile = 'TAGS')" + +.PHONY: clean +clean: + @$(RM) -r *~ */*~ *.Rhistroy *.tar.gz src/*.so src/*.o \ + *.Rcheck/ *.Rout .\#* *_cache diff --git a/NAMESPACE b/NAMESPACE new file mode 100644 index 0000000..f7d3888 --- /dev/null +++ b/NAMESPACE @@ -0,0 +1,22 @@ +# Generated by roxygen2: do not edit by hand + +export(demo_clt) +export(power.p1s.test) +export(rate.test) +importFrom(ggplot2,aes) +importFrom(ggplot2,after_stat) +importFrom(ggplot2,facet_wrap) +importFrom(ggplot2,geom_histogram) +importFrom(ggplot2,geom_line) +importFrom(ggplot2,ggplot) +importFrom(ggplot2,labs) +importFrom(ggplot2,theme_minimal) +importFrom(rlang,.data) +importFrom(stats,dbinom) +importFrom(stats,density) +importFrom(stats,dnorm) +importFrom(stats,pbinom) +importFrom(stats,pnorm) +importFrom(stats,qbinom) +importFrom(stats,qnorm) +importFrom(stats,uniroot) diff --git a/R/data-nrs.R b/R/data-nrs.R new file mode 100644 index 0000000..a17d783 --- /dev/null +++ b/R/data-nrs.R @@ -0,0 +1,43 @@ +#' Non-restorative sleep and physical activity (Japan cohort study) +#' +#' A large observational dataset from a cohort study conducted in Japan +#' to examine the association between non-restorative sleep (NRS) and +#' physical activity, gender, and age. The data are used to illustrate +#' logistic regression modeling for a binary outcome in a large-sample +#' setting. +#' +#' @format +#' A data frame with 90,122 observations on the following variables: +#' \describe{ +#' \item{id}{Subject identifier.} +#' \item{Gender}{Gender of the subject (integer-coded).} +#' \item{Age_2013}{Age in years in 2013.} +#' \item{EX_2013}{Indicator of regular exercise in 2013 +#' (integer-coded).} +#' \item{PA_2013}{Physical activity measure in 2013 +#' (integer-coded).} +#' \item{NRS_2013}{Indicator of non-restorative sleep in 2013 +#' (1 = presence, 0 = absence).} +#' \item{AgeGroup_2013}{Categorical age group in 2013 +#' (integer-coded).} +#' \item{EXPA_classification}{Combined classification of exercise and +#' physical activity status (integer-coded).} +#' } +#' +#' @details +#' Non-restorative sleep (NRS) is defined as a subjective feeling of lack +#' of refreshment on awakening and reflects qualitative aspects of sleep. +#' Hidaka et al. (2019) analyzed these data using logistic regression to +#' assess whether the probability of NRS is associated with physical +#' activity, gender, and age in a large cohort of adult subjects in +#' Japan. Within this package, the dataset is provided for methodological +#' illustration of binary regression models rather than for substantive +#' epidemiological inference. +#' +#' All variables are stored as integer codes. Missing values are +#' represented as \code{NA}. +#' +#' @source +#' Hidaka et al. (2019). +#' +"nrs" diff --git a/R/demo_clt.R b/R/demo_clt.R new file mode 100644 index 0000000..9fe0f64 --- /dev/null +++ b/R/demo_clt.R @@ -0,0 +1,120 @@ +#' Demonstrate the Central Limit Theorem +#' +#' The \code{demo_clt()} function generates plots to illustrate the +#' Central Limit Theorem (CLT) using a specified random number generator. +#' The function displays standardized sampling distributions for +#' different sample sizes and overlays the standard normal density. +#' +#' @param rng A random number generator function taking the sample size +#' as its first argument (e.g., \code{runif}, \code{rnorm}, +#' \code{rgamma}). +#' @param n A numeric vector of sample sizes (e.g., \code{c(5, 10, 20, +#' 40)}). +#' @param nrep The number of repetitions for generating sample means +#' (default is 10000). +#' @param ... Additional arguments passed to the random number generator +#' (e.g., \code{shape} and \code{rate} for \code{rgamma}). +#' @param pmean The population mean of the distribution. If \code{NULL}, +#' it is estimated from a large Monte Carlo sample. +#' @param psd The population standard deviation of the distribution. +#' If \code{NULL}, it is estimated from a large Monte Carlo sample. +#' +#' @return A \code{ggplot2} object showing the standardized sampling +#' distributions for different sample sizes, compared against the +#' standard normal curve. +#' +#' @examples +#' set.seed(123) +#' demo_clt(runif, n = c(5, 10, 20, 40), min = 0, max = 1) +#' +#' demo_clt(rgamma, n = c(5, 10, 20, 40), shape = 2, rate = 1, +#' pmean = 2, psd = sqrt(2) +#' ) +#' +#' @importFrom rlang .data +#' @importFrom ggplot2 ggplot geom_histogram geom_line aes +#' @importFrom ggplot2 facet_wrap labs theme_minimal after_stat +#' @export +demo_clt <- function( + rng, + n, + nrep = 10000, + ..., + pmean = NULL, + psd = NULL +) { + ## ---- basic validation ---- + if (!is.function(rng)) { + stop("The argument 'rng' must be a function.", call. = FALSE) + } + + if (!is.numeric(n) || any(n <= 0)) { + stop( + "The argument 'n' must be a numeric vector of positive values.", + call. = FALSE + ) + } + + if (!is.numeric(nrep) || length(nrep) != 1L || nrep <= 0) { + stop( + "The argument 'nrep' must be a positive integer.", + call. = FALSE + ) + } + + ## ---- estimate pmean and psd if needed ---- + if (is.null(pmean) || is.null(psd)) { + sample_data <- rng(100000, ...) + if (is.null(pmean)) pmean <- base::mean(sample_data) + if (is.null(psd)) psd <- stats::sd(sample_data) + } + + ## ---- generate standardized sample means ---- + results <- vector("list", length(n)) + names(results) <- as.character(n) + + for (size in n) { + rng_local <- function() rng(size, ...) + sample_means <- replicate(nrep, base::mean(rng_local())) + + results[[as.character(size)]] <- data.frame( + StdMean = (sample_means - pmean) / (psd / sqrt(size)), + SampleSize = size + ) + } + + data <- do.call(rbind, results) + + ## ---- standard normal reference ---- + x_vals <- seq(-4, 4, length.out = 200) + normal_data <- data.frame( + x = x_vals, + y = stats::dnorm(x_vals) + ) + + ## ---- plot ---- + ggplot2::ggplot( + data, + ggplot2::aes(x = .data$StdMean) + ) + + ggplot2::geom_histogram( + ggplot2::aes(y = ggplot2::after_stat(density)), + bins = 30, + color = "black", + fill = "skyblue" + ) + + ggplot2::geom_line( + data = normal_data, + ggplot2::aes(x = .data$x, y = .data$y), + color = "red", + linetype = "dashed", + linewidth = 0.8 + ) + + ggplot2::facet_wrap(~ SampleSize) + + ggplot2::labs( + title = "Demonstrating the Central Limit Theorem", + x = "Standardized sample mean", + y = "Density" + ) + + ggplot2::theme_minimal() +} diff --git a/R/power.p1s.test.R b/R/power.p1s.test.R new file mode 100644 index 0000000..d1f1c0a --- /dev/null +++ b/R/power.p1s.test.R @@ -0,0 +1,307 @@ +#' One-sample proportion power and sample size calculation +#' +#' Computes power, sample size, or detectable effect size for a +#' one-sample binomial test. The interface mirrors +#' \code{stats::power.prop.test()}, but for the one-sample setting. +#' +#' Power can be computed using a normal approximation or exact binomial +#' methods. When \code{exact = TRUE}, the exact test is determined by +#' \code{exact.method}. +#' +#' Exactly one of \code{n}, \code{p0}, \code{p1}, \code{power}, or +#' \code{sig.level} must be \code{NULL}; the missing quantity is solved +#' numerically. +#' +#' @param n Sample size for the single group. +#' @param p0 Null hypothesis proportion. +#' @param p1 Alternative hypothesis proportion. +#' @param power Desired power. +#' @param sig.level Significance level. +#' @param alternative Character string specifying the alternative +#' hypothesis; one of \code{"two.sided"}, \code{"less"}, or +#' \code{"greater"}. +#' @param cc Logical; if \code{TRUE}, apply continuity correction in the +#' normal approximation. +#' @param exact Logical; if \code{TRUE}, use an exact binomial method. +#' @param exact.method Method used for exact binomial power calculation. +#' \code{"quantile"} uses fixed binomial rejection regions for stable +#' power and sample-size inversion; \code{"midp"} applies the mid-p +#' adjustment; \code{"cp"} inverts Clopper--Pearson acceptance regions +#' to guarantee size not exceeding \code{sig.level}. +#' @param tol Numerical tolerance used in root finding. +#' @param max_n Maximum allowable sample size when solving for \code{n}. +#' +#' @return An object of class \code{"power.htest"} containing the +#' computed quantity and test specifications. +#' +#' @examples +#' ## Normal approximation (default) +#' power.p1s.test(n = 50, p0 = 0.1, p1 = 0.25) +#' +#' ## Exact binomial power (quantile-based, default exact method) +#' power.p1s.test(n = 50, p0 = 0.1, p1 = 0.25, exact = TRUE) +#' +#' ## Exact mid-p power +#' power.p1s.test( +#' n = 50, p0 = 0.1, p1 = 0.25, +#' exact = TRUE, exact.method = "midp" +#' ) +#' +#' ## Exact Clopper--Pearson power (guaranteed size control) +#' power.p1s.test( +#' n = 50, p0 = 0.1, p1 = 0.25, +#' exact = TRUE, exact.method = "cp" +#' ) +#' +#' @importFrom stats dbinom pbinom qbinom +#' @importFrom stats dnorm pnorm qnorm +#' @importFrom stats uniroot density +# +#' @export +power.p1s.test <- function( + n = NULL, + p0 = NULL, + p1 = NULL, + power = NULL, + sig.level = 0.05, + alternative = c("two.sided", "less", "greater"), + cc = FALSE, + exact = FALSE, + exact.method = c("quantile", "midp", "cp"), + tol = .Machine$double.eps^0.5, + max_n = 1e7 +) { + alternative <- match.arg(alternative) + exact.method <- match.arg(exact.method) + + ## ---- argument geometry ---- + is_null <- c(is.null(n), is.null(p0), is.null(p1), + is.null(power), is.null(sig.level)) + if (sum(is_null) != 1L) + stop("Exactly one of 'n', 'p0', 'p1', 'power', 'sig.level' must be NULL", + call. = FALSE) + + ## ---- basic validation ---- + if (!is.null(n)) { + if (!is.finite(n) || n <= 0) + stop("'n' must be positive", call. = FALSE) + n <- as.integer(ceiling(n)) + } + if (!is.null(p0) && !(p0 > 0 && p0 < 1)) + stop("'p0' must be in (0,1)", call. = FALSE) + if (!is.null(p1) && !(p1 > 0 && p1 < 1)) + stop("'p1' must be in (0,1)", call. = FALSE) + if (!is.null(sig.level) && !(sig.level > 0 && sig.level < 1)) + stop("'sig.level' must be in (0,1)", call. = FALSE) + if (!is.null(power) && !(power > 0 && power < 1)) + stop("'power' must be in (0,1)", call. = FALSE) + + if (!exact && exact.method != "quantile") { + warning( + "'exact.method' ignored when exact = FALSE", + call. = FALSE + ) + } + + midp = exact.method == "midp" + + ## ---- exact quantile rejection region cache ---- + rr_cache <- new.env(parent = emptyenv()) + + get_rr <- function(n, p0, sig.level) { + key <- paste(n, p0, sig.level, alternative, sep = "|") + if (exists(key, envir = rr_cache, inherits = FALSE)) + return(get(key, envir = rr_cache, inherits = FALSE)) + + if (alternative == "greater") { + rr <- list(type = "greater", + k = qbinom(1 - sig.level, n, p0) + 1L) + } else if (alternative == "less") { + rr <- list(type = "less", + k = qbinom(sig.level, n, p0) - 1L) + } else { + rr <- list( + type = "two.sided", + k_lo = qbinom(sig.level / 2, n, p0) - 1L, + k_hi = qbinom(1 - sig.level / 2, n, p0) + 1L + ) + } + + assign(key, rr, envir = rr_cache) + rr + } + + ## ---- power bodies ---- + + approx_power_body <- quote({ + se0 <- sqrt(p0 * (1 - p0) / n) + se1 <- sqrt(p1 * (1 - p1) / n) + delta <- if (cc) 0.5 / n else 0 + + if (alternative == "greater") { + z <- qnorm(1 - sig.level) + 1 - pnorm((p0 + z * se0 + delta - p1) / se1) + } else if (alternative == "less") { + z <- qnorm(1 - sig.level) + pnorm((p0 - z * se0 - delta - p1) / se1) + } else { + z <- qnorm(1 - sig.level / 2) + pnorm((p0 - z * se0 - delta - p1) / se1) + + 1 - pnorm((p0 + z * se0 + delta - p1) / se1) + } + }) + + exact_quantile_power_body <- quote({ + rr <- get_rr(n, p0, sig.level) + + if (rr$type == "greater") { + k <- rr$k + if (midp) + pbinom(k - 1, n, p1, lower.tail = FALSE) + + 0.5 * dbinom(k, n, p1) + else + pbinom(k - 1, n, p1, lower.tail = FALSE) + } else if (rr$type == "less") { + k <- rr$k + if (midp) + pbinom(k - 1, n, p1) + 0.5 * dbinom(k, n, p1) + else + pbinom(k, n, p1) + } else { + k_lo <- rr$k_lo + k_hi <- rr$k_hi + if (midp) { + pbinom(k_lo - 1, n, p1) + + 0.5 * dbinom(k_lo, n, p1) + + pbinom(k_hi - 1, n, p1, lower.tail = FALSE) + + 0.5 * dbinom(k_hi, n, p1) + } else { + pbinom(k_lo, n, p1) + + pbinom(k_hi - 1, n, p1, lower.tail = FALSE) + } + } + }) + + cp_power_body <- quote({ + x <- 0:n + reject <- if (alternative == "greater") { + pbinom(x - 1, n, p0, lower.tail = FALSE) <= sig.level + } else if (alternative == "less") { + pbinom(x, n, p0) <= sig.level + } else { + 2 * pmin( + pbinom(x, n, p0), + pbinom(x - 1, n, p0, lower.tail = FALSE) + ) <= sig.level + } + sum(dbinom(x[reject], n, p1)) + }) + + ## ---- dispatch ---- + if (!exact) { + power_body <- approx_power_body + midp <- NA + } else if (exact.method == "cp") { + power_body <- cp_power_body + midp <- NA + } else { + power_body <- exact_quantile_power_body + } + + ## ---- solve for missing parameter ---- + if (is.null(power)) { + power <- eval(power_body) + } + + if (is.null(n)) { + if (!exact) { + n <- uniroot(function(n) eval(power_body) - power, + c(1, max_n), tol = tol, + extendInt = "upX")$root + } else { + power_at_n <- function(nn) { + n <- as.integer(nn) + eval(power_body) + } + + alpha_at_n <- function(nn) { + n <- as.integer(nn) + p1 <- p0 + eval(power_body) + } + + if (exact.method == "cp") { + feasible <- function(nn) { + (power_at_n(nn) >= power) + } + + } else { + ## quantile / mid-p: enforce size <= nominal (optionally also add + ## a tol.alpha rule if you want parity across exact methods). + feasible <- function(nn) { + (alpha_at_n(nn) <= sig.level) && + (power_at_n(nn) >= power) + } + } + + ## ---- exponential bracketing + integer bisection ---- + n_lo <- 1L + if (feasible(n_lo)) { + n <- n_lo + } else { + n_hi <- 2L + while (!feasible(n_hi)) { + n_lo <- n_hi + n_hi <- n_hi * 2L + if (n_hi > max_n) + stop("Required n exceeds 'max_n'", call. = FALSE) + } + while (n_hi - n_lo > 1L) { + n_mid <- (n_lo + n_hi) %/% 2L + if (feasible(n_mid)) n_hi <- n_mid else n_lo <- n_mid + } + n <- n_hi + } + power <- power_at_n(n) + } + } + + + if (is.null(p1)) { + p1 <- uniroot(function(pp) eval(power_body) - power, + c(tol, 1 - tol), tol = tol)$root + } + + if (is.null(p0)) { + p0 <- uniroot(function(pp) eval(power_body) - power, + c(tol, 1 - tol), tol = tol)$root + } + + if (is.null(sig.level)) { + sig.level <- uniroot(function(a) eval(power_body) - power, + c(tol, 1 - tol), tol = tol)$root + } + + ## ---- return object ---- + method <- if (!exact) + "One-sample proportion power calculation (normal approximation)" + else if (exact.method == "cp") + "One-sample proportion power calculation (exact Clopper--Pearson)" + else if (exact.method == "midp") + "One-sample proportion power calculation (exact binomial, mid-p)" + else + "One-sample proportion power calculation (exact binomial)" + + structure( + list( + n = n, + p0 = p0, + p1 = p1, + sig.level = sig.level, + power = power, + alternative = alternative, + method = method + ), + class = "power.htest" + ) +} diff --git a/R/rate_test.R b/R/rate_test.R new file mode 100644 index 0000000..61c6f81 --- /dev/null +++ b/R/rate_test.R @@ -0,0 +1,218 @@ +#' Test of Poisson Rates Using Normal Approximation +#' +#' Performs a large-sample (normal) test for one or two Poisson rates with known +#' exposures. The test is carried out on the observed count scale and then +#' translated to rates for reporting. +#' +#' @param x a vector of event counts. A single value specifies a one-sample +#' test; a vector of length two specifies a two-sample comparison. +#' @param T a vector of exposures corresponding to \code{x} (e.g., person-time). +#' Must have the same length as \code{x}. +#' @param r a positive number specifying the null rate per unit exposure, +#' \eqn{\lambda_0}, for a one-sample test. Ignored for two-sample tests. +#' @param alternative a character string specifying the alternative hypothesis, +#' one of \code{"two.sided"}, \code{"less"}, or \code{"greater"}. +#' @param conf.level confidence level for the confidence interval. +#' @param correct logical; if \code{TRUE}, applies a continuity correction on +#' the count scale. For one-sample tests, a \eqn{\pm 0.5} correction is applied +#' to \eqn{x} in the direction determined by \code{alternative}. For +#' \code{"two.sided"}, the correction uses the sign of \eqn{x-\mu_0}. +#' +#' @details +#' One-sample test: +#' Assume \eqn{X \sim \mathrm{Pois}(\mu)} with \eqn{\mu = \lambda T}. The null +#' hypothesis is \eqn{H_0:\lambda=\lambda_0}, where \eqn{\lambda_0=r}. Let +#' \eqn{\mu_0=\lambda_0 T}. The normal approximation gives +#' \deqn{Z = \frac{(x - c) - \mu_0}{\sqrt{\mu_0}} \approx N(0,1),} +#' where \eqn{c} is 0 if \code{correct=FALSE} and is a continuity correction on +#' the count scale if \code{correct=TRUE}. Two-sided p-values use +#' \eqn{2\{1-\Phi(|Z|)\}}. +#' +#' #' Two-sample test: +#' Assume independent \eqn{X_1 \sim \mathrm{Pois}(\lambda_1 T_1)} and +#' \eqn{X_2 \sim \mathrm{Pois}(\lambda_2 T_2)} with known exposures +#' \eqn{T_1} and \eqn{T_2}. The null hypothesis is +#' \eqn{H_0:\lambda_1=\lambda_2}. +#' +#' For hypothesis testing, the function uses the exact conditional +#' representation under \eqn{H_0}: +#' \deqn{ +#' X_1 \mid (X_1+X_2=n) +#' \sim +#' \mathrm{Binom}\!\left(n,\; \frac{T_1}{T_1+T_2}\right), +#' } +#' and applies the same large-sample normal approximation as +#' \code{\link[stats]{prop.test}} (with optional Yates continuity correction) +#' to obtain the test statistic and p-value. +#' +#' For estimation and confidence intervals, the inference target is the +#' difference between the two rates, +#' \eqn{\Delta = \lambda_1 - \lambda_2}. +#' The point estimate is +#' \eqn{\hat\Delta = X_1/T_1 - X_2/T_2}, and the confidence interval is +#' constructed using a normal approximation with estimated standard error +#' \deqn{ +#' \sqrt{X_1/T_1^2 + X_2/T_2^2}. +#' } +#' The continuity correction affects the hypothesis test but not the +#' confidence interval for \eqn{\Delta}. +#' +#' Confidence intervals: +#' For one-sample tests, the confidence interval is for \eqn{\lambda} and is +#' obtained by inverting the same normal approximation on the count scale for +#' \eqn{\mu=\lambda T}, then dividing by \eqn{T}. +#' For two-sample tests, the confidence interval is for the difference in rates +#' \eqn{\lambda_1-\lambda_2}, analogous to the difference in proportions in +#' \code{\link[stats]{prop.test}}. +#' +#' @return +#' An object of class \code{"htest"} containing: +#' \item{statistic}{the standardized normal statistic \code{z}.} +#' \item{parameter}{degrees of freedom (\code{df = 1}).} +#' \item{p.value}{the p-value.} +#' \item{conf.int}{a confidence interval for the difference between the two +#' rates, \eqn{\lambda_1-\lambda_2}, for two-sample tests.} +#' \item{estimate}{estimated rate (one-sample) or estimated rates (two-sample).} +#' \item{null.value}{the null rate (one-sample) or the null rate difference +#' \eqn{\lambda_1-\lambda_2=0} (two-sample).} +#' \item{alternative}{the alternative hypothesis.} +#' \item{method}{a character string describing the test.} +#' \item{data.name}{a character string describing the data.} +#' +#' @seealso +#' \code{\link[stats]{prop.test}}, \code{\link[stats]{poisson.test}} +#' +#' @examples +#' ## One-sample test: compare observed rate to a reference unit rate +#' rate.test(x = 411, T = 25800, r = 0.0119) +#' rate.test(x = 411, T = 25800, r = 0.0119, correct = FALSE) +#' +#' ## Two-sample test: compare two Poisson rates +#' rate.test(x = c(12, 5), T = c(100, 80)) +#' rate.test(x = c(12, 5), T = c(100, 80), correct = FALSE) +#' +#' ## One-sided alternative +#' rate.test(x = 411, T = 25800, r = 0.0119, alternative = "greater") +#' +#' @export +rate.test <- function(x, T, r = NULL, + alternative = c("two.sided", "less", "greater"), + conf.level = 0.95, + correct = TRUE) +{ + alternative <- match.arg(alternative) + + if (length(x) != length(T)) stop("'x' and 'T' must have the same length") + if (!length(x) %in% c(1L, 2L)) stop("'x' must have length 1 or 2") + if (any(!is.finite(x)) || any(!is.finite(T))) stop("'x' and 'T' must be finite") + if (any(x < 0) || any(abs(x - round(x)) > 0)) stop("'x' must be nonnegative integers") + if (any(T <= 0)) stop("'T' must be positive") + if (!is.numeric(conf.level) || length(conf.level) != 1L || + conf.level <= 0 || conf.level >= 1) { + stop("'conf.level' must be a single number in (0, 1)") + } + + zcrit <- qnorm(1 - (1 - conf.level) / 2) + + if (length(x) == 1L) { + ## --- One-sample: H0: lambda = r (unit rate) + if (is.null(r) || length(r) != 1L || !is.finite(r) || r <= 0) { + stop("a single positive null unit rate 'r' must be specified") + } + + mu0 <- r * T + + ## CC on count scale + cc <- 0 + if (isTRUE(correct)) { + cc <- switch(alternative, + "greater" = 0.5, + "less" = -0.5, + "two.sided" = if (x == mu0) 0 else 0.5 * sign(x - mu0) + ) + } + + ## Standardize observed count against null mean (score-style) + z <- (x - cc - mu0) / sqrt(mu0) + + p.value <- switch(alternative, + "two.sided" = 2 * pnorm(-abs(z)), + "less" = pnorm(z), + "greater" = pnorm(z, lower.tail = FALSE) + ) + + ## CI for lambda by inverting |(x-c)-mu|/sqrt(mu) <= zcrit, then /T + x_adj <- x - cc + disc <- zcrit^2 + 4 * x_adj + mu_lo <- 0.5 * (zcrit^2 + 2 * x_adj - zcrit * sqrt(disc)) + mu_hi <- 0.5 * (zcrit^2 + 2 * x_adj + zcrit * sqrt(disc)) + mu_lo <- max(0, mu_lo) + conf.int <- c(mu_lo, mu_hi) / T + + estimate <- c(rate = x / T) + null.value <- c(rate = r) + + ## Ingredients (shared names) + statistic <- c(z = as.numeric(z)) + parameter <- c(df = 1) + method <- "Normal approximation test for a Poisson rate" + data.name <- paste0("x = ", x, ", T = ", T) + + } else { + + ## --- Two-sample: H0: lambda1 = lambda2 + x1 <- x[1]; x2 <- x[2] + T1 <- T[1]; T2 <- T[2] + + if (x1 + x2 == 0) { + stop("at least one event is required for the 2-sample test") + } + + ## Conditional binomial test under H0 via prop.test() + n <- x1 + x2 + p0 <- T1 / (T1 + T2) + + pt <- stats::prop.test(x = x1, n = n,p = p0, alternative = alternative, + conf.level = conf.level, correct = correct) + + ## Estimation target: difference in rates + rate1 <- x1 / T1 + rate2 <- x2 / T2 + diff_hat <- rate1 - rate2 + + se_diff <- sqrt(x1 / T1^2 + x2 / T2^2) + conf.int <- diff_hat + c(-1, 1) * zcrit * se_diff + + estimate <- c("rate 1" = rate1, "rate 2" = rate2) + null.value <- c("rate difference" = 0) + + ## Ingredients (shared names) + statistic <- pt$statistic + parameter <- pt$parameter + p.value <- pt$p.value + method <- paste( + "2-sample test for equality of Poisson rates", + "(conditional binomial normal approximation);", + "CI for rate difference by normal approximation" + ) + data.name <- paste0( + "x = c(", x1, ", ", x2, "), ", + "T = c(", T1, ", ", T2, ")" + ) + } + + ## --- Shared return block (single place) + rval <- list( + statistic = statistic, + parameter = parameter, + p.value = as.numeric(p.value), + conf.int = structure(conf.int, conf.level = conf.level), + estimate = estimate, + null.value = null.value, + alternative = alternative, + method = method, + data.name = data.name + ) + class(rval) <- "htest" + rval +} diff --git a/README.md b/README.md index 26cc5b3..0c2b914 100644 --- a/README.md +++ b/README.md @@ -1 +1,26 @@ -# ibist \ No newline at end of file +# R Package **ibist** + +The R package **ibist** is a companion to the book +"Introduction to Biostatistical Analysis of Proportions and Rates" +by Elizabeth D. Schifano and Jun Yan + +## Features + +The package **splines2** provides functions, datasets, and demos +used in the book. + +## Development + +The latest version of the package is under development at +[GitHub](https://github.com/wenjie2wang/splines2). If it is able to pass +the automated package checks, one may install it by + +``` r +if (! require(remotes)) install.packages("remotes") +remotes::install_github("statds/ibist-R", upgrade = "never") +``` + +## License + +[GNU General Public License](https://www.gnu.org/licenses/) (≥ 3) + diff --git a/data-raw/nrs.R b/data-raw/nrs.R new file mode 100644 index 0000000..efa0f00 --- /dev/null +++ b/data-raw/nrs.R @@ -0,0 +1,5 @@ +## data-raw/nrs.R + +nrs <- read.csv("data-raw/nrs.csv", header = TRUE) + +usethis::use_data(nrs, overwrite = TRUE) diff --git a/data/nrs.rda b/data/nrs.rda new file mode 100644 index 0000000..019dc9d Binary files /dev/null and b/data/nrs.rda differ diff --git a/inst/bib/refs.bib b/inst/bib/refs.bib new file mode 100644 index 0000000..5b7901a --- /dev/null +++ b/inst/bib/refs.bib @@ -0,0 +1,17 @@ + + +@article{wang2021shape, + author = {Wang, Wenjie and Yan, Jun}, + title = {Shape-Restricted Regression Splines with {R} Package + {splines2}}, + journal = {Journal of Data Science}, + volume = 19, + number = 3, + year = 2021, + pages = {498--517}, + doi = {10.6339/21-JDS1020}, + issn = {1680-743X}, + publisher = {School of Statistics, Renmin University of China}, +} + + diff --git a/man/demo_clt.Rd b/man/demo_clt.Rd new file mode 100644 index 0000000..8ec2dfa --- /dev/null +++ b/man/demo_clt.Rd @@ -0,0 +1,48 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/demo_clt.R +\name{demo_clt} +\alias{demo_clt} +\title{Demonstrate the Central Limit Theorem} +\usage{ +demo_clt(rng, n, nrep = 10000, ..., pmean = NULL, psd = NULL) +} +\arguments{ +\item{rng}{A random number generator function taking the sample size +as its first argument (e.g., \code{runif}, \code{rnorm}, +\code{rgamma}).} + +\item{n}{A numeric vector of sample sizes (e.g., \code{c(5, 10, 20, +40)}).} + +\item{nrep}{The number of repetitions for generating sample means +(default is 10000).} + +\item{...}{Additional arguments passed to the random number generator +(e.g., \code{shape} and \code{rate} for \code{rgamma}).} + +\item{pmean}{The population mean of the distribution. If \code{NULL}, +it is estimated from a large Monte Carlo sample.} + +\item{psd}{The population standard deviation of the distribution. +If \code{NULL}, it is estimated from a large Monte Carlo sample.} +} +\value{ +A \code{ggplot2} object showing the standardized sampling + distributions for different sample sizes, compared against the + standard normal curve. +} +\description{ +The \code{demo_clt()} function generates plots to illustrate the +Central Limit Theorem (CLT) using a specified random number generator. +The function displays standardized sampling distributions for +different sample sizes and overlays the standard normal density. +} +\examples{ +set.seed(123) +demo_clt(runif, n = c(5, 10, 20, 40), min = 0, max = 1) + +demo_clt(rgamma, n = c(5, 10, 20, 40), shape = 2, rate = 1, + pmean = 2, psd = sqrt(2) +) + +} diff --git a/man/nrs.Rd b/man/nrs.Rd new file mode 100644 index 0000000..575a0d6 --- /dev/null +++ b/man/nrs.Rd @@ -0,0 +1,51 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/data-nrs.R +\docType{data} +\name{nrs} +\alias{nrs} +\title{Non-restorative sleep and physical activity (Japan cohort study)} +\format{ +A data frame with 90,122 observations on the following variables: +\describe{ + \item{id}{Subject identifier.} + \item{Gender}{Gender of the subject (integer-coded).} + \item{Age_2013}{Age in years in 2013.} + \item{EX_2013}{Indicator of regular exercise in 2013 + (integer-coded).} + \item{PA_2013}{Physical activity measure in 2013 + (integer-coded).} + \item{NRS_2013}{Indicator of non-restorative sleep in 2013 + (1 = presence, 0 = absence).} + \item{AgeGroup_2013}{Categorical age group in 2013 + (integer-coded).} + \item{EXPA_classification}{Combined classification of exercise and + physical activity status (integer-coded).} +} +} +\source{ +Hidaka et al. (2019). +} +\usage{ +nrs +} +\description{ +A large observational dataset from a cohort study conducted in Japan +to examine the association between non-restorative sleep (NRS) and +physical activity, gender, and age. The data are used to illustrate +logistic regression modeling for a binary outcome in a large-sample +setting. +} +\details{ +Non-restorative sleep (NRS) is defined as a subjective feeling of lack +of refreshment on awakening and reflects qualitative aspects of sleep. +Hidaka et al. (2019) analyzed these data using logistic regression to +assess whether the probability of NRS is associated with physical +activity, gender, and age in a large cohort of adult subjects in +Japan. Within this package, the dataset is provided for methodological +illustration of binary regression models rather than for substantive +epidemiological inference. + +All variables are stored as integer codes. Missing values are +represented as \code{NA}. +} +\keyword{datasets} diff --git a/man/power.p1s.test.Rd b/man/power.p1s.test.Rd new file mode 100644 index 0000000..7e032c7 --- /dev/null +++ b/man/power.p1s.test.Rd @@ -0,0 +1,88 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/power.p1s.test.R +\name{power.p1s.test} +\alias{power.p1s.test} +\title{One-sample proportion power and sample size calculation} +\usage{ +power.p1s.test( + n = NULL, + p0 = NULL, + p1 = NULL, + power = NULL, + sig.level = 0.05, + alternative = c("two.sided", "less", "greater"), + cc = FALSE, + exact = FALSE, + exact.method = c("quantile", "midp", "cp"), + tol = .Machine$double.eps^0.5, + max_n = 1e+07 +) +} +\arguments{ +\item{n}{Sample size for the single group.} + +\item{p0}{Null hypothesis proportion.} + +\item{p1}{Alternative hypothesis proportion.} + +\item{power}{Desired power.} + +\item{sig.level}{Significance level.} + +\item{alternative}{Character string specifying the alternative +hypothesis; one of \code{"two.sided"}, \code{"less"}, or +\code{"greater"}.} + +\item{cc}{Logical; if \code{TRUE}, apply continuity correction in the +normal approximation.} + +\item{exact}{Logical; if \code{TRUE}, use an exact binomial method.} + +\item{exact.method}{Method used for exact binomial power calculation. +\code{"quantile"} uses fixed binomial rejection regions for stable +power and sample-size inversion; \code{"midp"} applies the mid-p +adjustment; \code{"cp"} inverts Clopper--Pearson acceptance regions +to guarantee size not exceeding \code{sig.level}.} + +\item{tol}{Numerical tolerance used in root finding.} + +\item{max_n}{Maximum allowable sample size when solving for \code{n}.} +} +\value{ +An object of class \code{"power.htest"} containing the + computed quantity and test specifications. +} +\description{ +Computes power, sample size, or detectable effect size for a +one-sample binomial test. The interface mirrors +\code{stats::power.prop.test()}, but for the one-sample setting. +} +\details{ +Power can be computed using a normal approximation or exact binomial +methods. When \code{exact = TRUE}, the exact test is determined by +\code{exact.method}. + +Exactly one of \code{n}, \code{p0}, \code{p1}, \code{power}, or +\code{sig.level} must be \code{NULL}; the missing quantity is solved +numerically. +} +\examples{ +## Normal approximation (default) +power.p1s.test(n = 50, p0 = 0.1, p1 = 0.25) + +## Exact binomial power (quantile-based, default exact method) +power.p1s.test(n = 50, p0 = 0.1, p1 = 0.25, exact = TRUE) + +## Exact mid-p power +power.p1s.test( + n = 50, p0 = 0.1, p1 = 0.25, + exact = TRUE, exact.method = "midp" +) + +## Exact Clopper--Pearson power (guaranteed size control) +power.p1s.test( + n = 50, p0 = 0.1, p1 = 0.25, + exact = TRUE, exact.method = "cp" +) + +} diff --git a/man/rate.test.Rd b/man/rate.test.Rd new file mode 100644 index 0000000..bcb58c3 --- /dev/null +++ b/man/rate.test.Rd @@ -0,0 +1,117 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/rate_test.R +\name{rate.test} +\alias{rate.test} +\title{Test of Poisson Rates Using Normal Approximation} +\usage{ +rate.test( + x, + T, + r = NULL, + alternative = c("two.sided", "less", "greater"), + conf.level = 0.95, + correct = TRUE +) +} +\arguments{ +\item{x}{a vector of event counts. A single value specifies a one-sample +test; a vector of length two specifies a two-sample comparison.} + +\item{T}{a vector of exposures corresponding to \code{x} (e.g., person-time). +Must have the same length as \code{x}.} + +\item{r}{a positive number specifying the null rate per unit exposure, +\eqn{\lambda_0}, for a one-sample test. Ignored for two-sample tests.} + +\item{alternative}{a character string specifying the alternative hypothesis, +one of \code{"two.sided"}, \code{"less"}, or \code{"greater"}.} + +\item{conf.level}{confidence level for the confidence interval.} + +\item{correct}{logical; if \code{TRUE}, applies a continuity correction on +the count scale. For one-sample tests, a \eqn{\pm 0.5} correction is applied +to \eqn{x} in the direction determined by \code{alternative}. For +\code{"two.sided"}, the correction uses the sign of \eqn{x-\mu_0}.} +} +\value{ +An object of class \code{"htest"} containing: +\item{statistic}{the standardized normal statistic \code{z}.} +\item{parameter}{degrees of freedom (\code{df = 1}).} +\item{p.value}{the p-value.} +\item{conf.int}{a confidence interval for the difference between the two + rates, \eqn{\lambda_1-\lambda_2}, for two-sample tests.} +\item{estimate}{estimated rate (one-sample) or estimated rates (two-sample).} +\item{null.value}{the null rate (one-sample) or the null rate difference + \eqn{\lambda_1-\lambda_2=0} (two-sample).} +\item{alternative}{the alternative hypothesis.} +\item{method}{a character string describing the test.} +\item{data.name}{a character string describing the data.} +} +\description{ +Performs a large-sample (normal) test for one or two Poisson rates with known +exposures. The test is carried out on the observed count scale and then +translated to rates for reporting. +} +\details{ +One-sample test: +Assume \eqn{X \sim \mathrm{Pois}(\mu)} with \eqn{\mu = \lambda T}. The null +hypothesis is \eqn{H_0:\lambda=\lambda_0}, where \eqn{\lambda_0=r}. Let +\eqn{\mu_0=\lambda_0 T}. The normal approximation gives +\deqn{Z = \frac{(x - c) - \mu_0}{\sqrt{\mu_0}} \approx N(0,1),} +where \eqn{c} is 0 if \code{correct=FALSE} and is a continuity correction on +the count scale if \code{correct=TRUE}. Two-sided p-values use +\eqn{2\{1-\Phi(|Z|)\}}. + +#' Two-sample test: +Assume independent \eqn{X_1 \sim \mathrm{Pois}(\lambda_1 T_1)} and +\eqn{X_2 \sim \mathrm{Pois}(\lambda_2 T_2)} with known exposures +\eqn{T_1} and \eqn{T_2}. The null hypothesis is +\eqn{H_0:\lambda_1=\lambda_2}. + +For hypothesis testing, the function uses the exact conditional +representation under \eqn{H_0}: +\deqn{ +X_1 \mid (X_1+X_2=n) +\sim +\mathrm{Binom}\!\left(n,\; \frac{T_1}{T_1+T_2}\right), +} +and applies the same large-sample normal approximation as +\code{\link[stats]{prop.test}} (with optional Yates continuity correction) +to obtain the test statistic and p-value. + +For estimation and confidence intervals, the inference target is the +difference between the two rates, +\eqn{\Delta = \lambda_1 - \lambda_2}. +The point estimate is +\eqn{\hat\Delta = X_1/T_1 - X_2/T_2}, and the confidence interval is +constructed using a normal approximation with estimated standard error +\deqn{ +\sqrt{X_1/T_1^2 + X_2/T_2^2}. +} +The continuity correction affects the hypothesis test but not the +confidence interval for \eqn{\Delta}. + +Confidence intervals: +For one-sample tests, the confidence interval is for \eqn{\lambda} and is +obtained by inverting the same normal approximation on the count scale for +\eqn{\mu=\lambda T}, then dividing by \eqn{T}. +For two-sample tests, the confidence interval is for the difference in rates +\eqn{\lambda_1-\lambda_2}, analogous to the difference in proportions in +\code{\link[stats]{prop.test}}. +} +\examples{ +## One-sample test: compare observed rate to a reference unit rate +rate.test(x = 411, T = 25800, r = 0.0119) +rate.test(x = 411, T = 25800, r = 0.0119, correct = FALSE) + +## Two-sample test: compare two Poisson rates +rate.test(x = c(12, 5), T = c(100, 80)) +rate.test(x = c(12, 5), T = c(100, 80), correct = FALSE) + +## One-sided alternative +rate.test(x = 411, T = 25800, r = 0.0119, alternative = "greater") + +} +\seealso{ +\code{\link[stats]{prop.test}}, \code{\link[stats]{poisson.test}} +} diff --git a/misc/copyright.R b/misc/copyright.R new file mode 100644 index 0000000..2aca1be --- /dev/null +++ b/misc/copyright.R @@ -0,0 +1,17 @@ +## +## R package ibist by Elizabeth D. Schifano and Jun Yan +## Copyright (C) 2024-2025 +## +## This file is part of the R package ibist. +## +## The R package ibist is free software: You can redistribute it and/or +## modify it under the terms of the GNU General Public License as published by +## the Free Software Foundation, either version 3 of the License, or any later +## version (at your option). See the GNU General Public License at +## for details. +## +## The R package ibist is distributed in the hope that it will be useful, +## but WITHOUT ANY WARRANTY without even the implied warranty of +## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. +## + diff --git a/misc/deploy_docs.sh b/misc/deploy_docs.sh new file mode 100755 index 0000000..b5732e9 --- /dev/null +++ b/misc/deploy_docs.sh @@ -0,0 +1,29 @@ +#!/bin/bash + +set -e + +## run locally +PKG=$(grep "Package" DESCRIPTION | awk '{print $NF}') +BUILD_DIR=$(pwd) +DOCS_REPO=$HOME/git/wwenjie.org +TARGET_DIR=$DOCS_REPO/static/$PKG +GIT_COMMIT=$(git rev-parse --short HEAD) + +## update docs by pkgdown +make pkgdown + +# go to the repository for wwenjie.org +cd $DOCS_REPO +git checkout -f +git checkout main +git pull gitlab main +mkdir -p $TARGET_DIR +cp -r $BUILD_DIR/docs/* $TARGET_DIR +git add static/$PKG/ +if [ -n "$(git diff --cached --exit-code)" ] +then + git commit -m "deploy pkgdown for $PKG $GIT_COMMIT" + git push gitlab main +else + printf "The docs was not updated.\n" +fi diff --git a/misc/revdep_check.R b/misc/revdep_check.R new file mode 100644 index 0000000..6495008 --- /dev/null +++ b/misc/revdep_check.R @@ -0,0 +1,5 @@ +revdep_dir <- "revdep" +res <- tools::check_packages_in_dir(revdep_dir, + clean = FALSE, + reverse = list(recursively = FALSE)) +summary(res) diff --git a/misc/update_timestamp.sh b/misc/update_timestamp.sh new file mode 100755 index 0000000..3003f1e --- /dev/null +++ b/misc/update_timestamp.sh @@ -0,0 +1,67 @@ +#!/bin/bash + +# Note: this script is should be sourced from the project root directory + +set -e + +if [ "$(uname)" != "Linux" ]; then + printf "Remeber to update date and version number.\n" +else + printf "Updating date, version, and copyright year.\n" + + # define some variables + yr=$(date +%Y) + dt=$(date +%Y-%m-%d) + cprt_R=misc/copyright.R + cprt_cpp=misc/copyright.cpp + citation=inst/CITATION + version=$(grep "Version" DESCRIPTION | awk '{print $NF}') + + # update copyright year in the template headers + regexp1="s/Copyright \(C\) 2016-[0-9]+/Copyright \(C\) 2016-$yr/" + sed -i -E "$regexp1" $cprt_R + sed "s_#_/_g" $cprt_R > $cprt_cpp + + # update copyright year in all R scripts + for Rfile in R/*.R + do + if ! grep -q 'Copyright (C)' $Rfile; + then + if [ $Rfile != "R/RcppExports.R" ]; + then + cat $cprt_R $Rfile > .tmp + mv .tmp $Rfile + fi + fi + sed -i -E "$regexp1" $Rfile + done + + # update copyright year in all C++ scripts + for cppfile in src/*.cpp inst/include/*.h inst/include/*/*.h + do + if ! grep -q 'Copyright (C)' $cppfile; + then + if [ $cppfile != "src/RcppExports.cpp" ]; + then + cat $cprt_cpp $cppfile > .tmp + mv .tmp $cppfile + fi + fi + sed -i -E "$regexp1" $cppfile + done + rm $cprt_cpp + + # update date in DESCRIPTION + # regexp2="s/Date: [0-9]{4}-[0-9]{1,2}-[0-9]{1,2}/Date: $dt/" + # sed -i -E "$regexp2" DESCRIPTION + + # update version and year in citation + regexp3="s/version ([0-9]+\.*)+/version $version/" + sed -i -E "$regexp3" $citation + # restrict the search and only update the year of package + regexp4="/splines2-package/,/^\)$/ s/20[0-9]{2}/$yr/" + sed -i -E "$regexp4" $citation + + # done + printf "All updated.\n" +fi diff --git a/vignettes/clt.Rmd b/vignettes/clt.Rmd new file mode 100644 index 0000000..820de8d --- /dev/null +++ b/vignettes/clt.Rmd @@ -0,0 +1,98 @@ +mean--- +title: "Demonstrating the Central Limit Theorem" +author: "Companion Package for Biostatistical Analysis of Proportions and Rates" +date: "`r Sys.Date()`" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Central Limit Theorem Demonstration} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +# Introduction + +In this vignette, we demonstrate the Central Limit Theorem (CLT) +using the `cltdemo()` function. The CLT states that, regardless +of the original population distribution, the distribution of the +sample mean approaches a normal distribution as the sample size +increases. + +Here, we use the Gamma distribution with varying skewness, which +is controlled by the `shape` parameter: + +- **Shape = 0.5**: Highly skewed distribution +- **Shape = 1**: Exponential distribution (moderately skewed) +- **Shape = 2**: Less skewed distribution + +We explore the behavior of the sample mean for different sample +sizes (`n = 5, 10, 20, 40`) to illustrate the convergence towards +the normal distribution. + +# Load Required Packages + +```{r setup, include = FALSE} +set.seed(123) +library("ibist") +``` + +# Gamma Distribution with Shape = 0.5 + +The Gamma distribution with `shape = 0.5` is highly skewed. We +expect to see the sample mean distribution becoming more normal +as the sample size increases. + +```{r gamma-shape-0.5} +demo_clt(rgamma, n = c(5, 10, 20, 40), nrep = 10000, + shape = 0.5, rate = 1, pmean = 0.5, psd = sqrt(0.5)) +``` + +# Gamma Distribution with Shape = 1 + +The Gamma distribution with `shape = 1` is equivalent to the +Exponential distribution. This example has moderate skewness, +and we observe the effect of increasing the sample size. + +```{r gamma-shape-1} +demo_clt(rgamma, n = c(5, 10, 20, 40), nrep = 10000, + shape = 1, rate = 1, pmean = 1, psd = sqrt(1)) +``` + +# Gamma Distribution with Shape = 2 + +The Gamma distribution with `shape = 2` has less skewness. Here, +the sample mean distribution converges more quickly to a normal +distribution even for smaller sample sizes. + +```{r gamma-shape-2} +demo_clt(rgamma, n = c(5, 10, 20, 40), nrep = 10000, + shape = 2, rate = 1, pmean = 2, psd = sqrt(2)) +``` + +# A General Distribution Constructed from a Mixture + +When the population `mean` and `sd` are unspecified, the will be +approximated by a large sample (10,000) and then used in +standardization. For instance, consider sampling from the following +mixture distribution. + +```{r mixture} +mymix_rng <- function(n, rate = 0.5) { + ifelse(runif(n) < rate, + rgamma(n, shape = 0.5), rgamma(n, shape = 4)) +} + +demo_clt(mymix_rng, n = c(5, 10, 20, 40)) + +``` + +# Conclusion + +The plots above demonstrate the Central Limit Theorem in action. +As the sample size increases, the distribution of the sample mean +approaches a normal distribution, even for highly skewed +underlying distributions like the Gamma distribution with +`shape = 0.5`. + +This vignette illustrates the robustness of the CLT and its +importance in statistical analysis, especially when dealing with +non-normal data in biostatistical contexts.