Sometimes one might want to search and find socalled #hashtags or @mentions in one's texts (in a broader sense) and store them for later retrieval.
This package offers that facility.
It provides the THashTags class which can be used to parse texts for the occurrence of both #hashtags and @mentions and store the hits in an internal list for later lookup; that list can be stored in a file and later loaded from that file.
You can use Go to install this package for you:
go get -u github.com/mwat56/hashtags@latest
For each #hashtag or @mention a list of IDs is maintained.
These IDs can be any (int64) data that identifies the text in which the #hashtag or @mention was found, e.g. some database record reference or article ID.
The only condition is that it must be unique as far as the program using this package is concerned.
Note that both #hashtag and @mention are stored lower-cased to allow for case-insensitive searches.
To get a THashTags instance there's a simple way:
fName := "mytags.lst" // possibly read from some config file
ht, err := hashtags.New(fName)
if nil != err {
log.PrintF("Problem loading file %q: %v", fName, err)
}
// ...
// do something with the list
// ...
written, err := ht.Store()
if nil != err {
log.PrintF("Problem storing file %q: %v", fName, err)
}
The constructor function New() takes a single arguments: A string specifying the name of the file to use for loading/storing the list's data. If that is an empty string no lading/storing of data will happen.
The package provides a global boolean configuration variable called UseBinaryStorage which is true by default. It determines whether the data written by Store() and read by Load() use plain text (i.e. hashtags.UseBinaryStorage = false) or a binary data format.
The advantage of the plain text format is that it can be inspected by any text related tool (like e.g. grep or diff).
The advantage of the binary format is that it is about three to four times as fast when loading/storing data and it uses less disk space than the text format.
For this reasons it's used by default (i.e. hashtags.UseBinaryStorage == true). During development of your own application using this package, however, you might want to change to text format for diagnostic purposes.
For more details please refer to the package documentation.
There are several kinds of methods provided:
The following methods can be used to handle hashtags:
HashAdd(aHash string, aID int64) boolinsertsaHashas used by documentaID, returning whether anything changed.HashCount() intreturns the number of hashtags currently handled.HashLen(aHash string) intreturns the number of documents usingaHash.HashList(aHash string) []int64returns a list of all document IDs usingaHash.HashRemove(aHash string, aID int64) boolremoves the documentaIDfrom theaHashlist, returning whether anything changed.
The following methods can be used to handle the document IDs of the list entries.
IDlist(aID int64) []stringreturns a list of hashtags and mentions occurring in the document identified byaID.IDparse(aID int64, aText []byte) boolparses the givenaTextfor hashtags and mentions and storesaIDin the respective hashtag/mention lists, returning whether anything changed.IDremove(aID int64) booldeletes the givenaIDfrom all hashtag/mention lists, returning whether anything changed.IDrename(aOldID, aNewID int64) boolchanges the givenaOldIDtoaNewIDin the rare case that a document's ID changed, returning whether anything changed.IDupdate(aID int64, aText []byte) boolreplaces the current hashtags/mentions stored foraIDwith those found inaText, returning whether anything changed.
The following methods can be used to handle mentions:
MentionAdd(aMention string, aID int64) boolinsertsaMentionas used by documentaID, returning whether anything changed.MentionCount() intreturns the number of mentions currently handled.MentionLen(aMention string) intreturns the number of documents usingaMention.MentionList(aMention string) []int64returns a list of all document IDs usingaMention.MentionRemove(aMention string, aID int64) boolremoves the documentaIDfrom theaMentionlist, returning whether anything changed.
Clear() *THashTagsempties the internal data structures: all#hashtagsand@mentionsand their respective IDs are deleted.Filename() stringreturns the filename given to the initialNew()call for reading/storing the list's contents.Len() intreturns the current length of the list i.e. how many #hashtags and @mentions are currently stored in the list.LenTotal() intreturns the length of all #hashtag/@mention lists and their respective number of source IDs stored in the list.List() TCountListreturns a list of #hashtags/@mentions with their respective count of associated IDs.Load() (*THashTags, error)reads the configured file returning the data structure read from the file given with theNew()call and a possible error condition.SetFilename(aFilename string) *THashTagssets the filename for loading/storing the hashtags, returning the updated list instance.Store() (int, error)writes the whole list to the configured file returning the number of bytes written and a possible error.String() stringreturns the whole list as a linefeed separated string.
Although there are a lot of options (methods) available, basically the module is quite straightforward to use.
-
Create a new instance:
myList := hashtags.New("myFile.db") -
Whenever your application receives a new document, retrieve or create it's ID and text, then call
ok := myList.IDparse(docID, docText)This will associate all hashtags and mentions found in
docTextwith the provideddocID.
The following external libraries were used building HashTags:
Copyright © 2019, 2025 M.Watermann, 10247 Berlin, Germany
All rights reserved
EMail : <support@mwat.de>
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.
This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
You should have received a copy of the GNU General Public License along with this program. If not, see the GNU General Public License for details.
