03:43:57 .Im  UsinG. H4cKeRzE 10:03:09 hi guys 10:04:01 i am considering using happstack as a server for a blog server / community, and was wondering how best to go about this 10:06:12 there are some things that are not clear to me: how can i archive out things from the persistence layer? for example i wouldn't want to keep older / less frequently accessed blog posts in my ram if i'm running out of memory 10:07:04 and, are the entries in the persistence layer mutable? can i use it for a user/pass list for example? 10:42:03 hi rdtsc, do you use happstack? 10:42:17 cheater: hi, I do 10:42:34 want to answer some questions from a new user? :-) 10:43:20 cheater: I am a new user too, but I'll try =) 10:43:27 :) 10:43:38 i just posted some Q's before you joined in, can I paste them in msg? 10:44:02 sure 10:45:10 done 10:49:11 cheater: All entries in macid are mutable as in all State monads. As for first question - I don't now 10:50:14 gotcha 13:28:38 how does happstack's persistence layer share across separate servers? 14:01:21 http://tutorial.happstack.com/tutorial/multimaster 14:02:24 The multimaster code is no more. 14:03:27 multimaster code is no more? huh? :) 14:12:06 rdtsc: nice link, thanks 14:12:25 i am reading that tutorial but didnt get that far yet 14:19:16 cheater: the persistance layer is mutable 14:20:03 cool stepcut 14:20:10 what about putting it on the disk? 14:20:22 and/or some central specialized storage? 14:20:44 cheater: there is currently no explicit support for archiving things to disk. 14:20:47 is the physical behavior of the persistence layer changable? 14:20:50 hm 14:21:01 that's not good, is it? 14:21:04 for contralized storage there is experimental support for amazon web services now 14:21:15 physical behavior? 14:21:39 as in, how and where stuff is stored 14:22:19 whether it gets stored in memory, on the local disk, on some central server on the disk, on some specialized server, ... 14:23:04 depends on what you mean by 'stored'. The data is stored in memory, but the events and checkpoints are also written to a log so you can recover from a crash/restart/etc. 14:23:41 the support for writing logs uses 'plugins' of sort that allow you to change how/where things are stored 14:23:48 but there's only so much memory you have. 14:23:59 yes 14:24:06 hence, not everything will fit. 14:24:28 only if *everything* is more than the amount of RAM you have.. 14:24:35 it is. 14:24:53 most servers will take 32GB of RAM these days.. which is more than what a lot of people have for data sets :) 14:25:20 the long term plan is to support sharding so that the dataset can be split across multiple servers 14:25:21 why on earth would i store an item in ram that is only accessed one time per month? 14:26:08 i do not see this as an efficient way of using hardware or money 14:26:20 i'm sure you know what i mean :) 14:26:28 cheater: because it is perhaps cheaper to buy extra RAM than to manage the complexity of figuring out when an item is going to be accessed during the month and making sure it can be readily accessed? 14:26:42 it's very simple for me 14:26:48 cheater: well, keep in mind that over 80% of facebooks working data set is stored in RAM :) 14:27:42 facebook is not very well known for good decisions 14:27:58 oh? 14:28:08 they seem to be doing ok.. 14:28:53 anyway, you can store things on disk, there is just no mechanism to do it automatically 14:29:12 http://glinden.blogspot.com/2009/11/put-that-database-in-memory.html 14:29:12 is there a way to hook into this somewhere so that i could do it in a clean manner? 14:30:13 I think the first step is to become clear on what 'it' is 14:30:25 how do you decide when to store things on disk? 14:31:15 to simplify, when there are 11 or more posts on someone's page, only the first 10 are shown, and the rest is not, and probably will not be, so we put that stuff on disk. 14:31:59 'They go on to argue that a system designed around in-memory storage with disk just used for archival purposes would be much simpler, more efficient, and faster.' <<< they're just reiterating what i said :-) 14:32:16 yes 14:32:36 but I think that archival purposes is very application specific 14:32:52 cheater: use rdbms 14:32:53 in your case, you want to archive all but the 10 most recent blog posts to disk, right? 14:33:37 yes 14:33:49 but still access the ones on the disk the same way as the ones in memory - just slower 14:35:38 so, lets say you want to search for all the posts that occurred in 2010.. that means you still have to have some of the information from those posts store in RAM right? 14:36:26 i won't want to do that 14:36:43 oh ? 14:36:49 yes 14:37:11 if i do, i will implement it, but i do not see this as a requirement in the foreseeable future 14:38:11 i see 14:38:36 hell.. i don't think facebook allows that, either. 14:38:51 how can macid handle such things http://www.big-boards.com/ ? =) 14:41:10 cheater: perhaps you should just immediately save the posts to disk. Your state will just be a big list of the paths to all the posts. That is all that will be stored in the persistance layer. If you want the 10 most recent posts cached in RAM, it would be trivial to implement that with a simple IO thread and a cache. But that is just a cache of things that are on disk.. it does not need to be stored in the persistence layer 14:41:34 in other words, treat the blog posts in much the same way that you would image files 14:41:51 stepcut: that's not the right way to do this. 14:41:56 cheater: why not? 14:42:05 i can store a LOT of blog posts in ram 14:42:07 but, not all of them 14:42:21 cheater: so, make your cache as big as you want? 14:42:46 stepcut: i do not understand what you mean there 14:42:47 cheater: make it hold 1000 posts if you want.. and when the cache gets full it ejects the least recently accessed post 14:42:56 stepcut: no, that's a backwards approach 14:43:01 it should be ram-> disk 14:43:09 not ram->disk->ram 14:43:39 there's no point in putting data through the hdd before it becomes accessible 14:45:03 so yu want a indexable set that is stored in RAM, but some of the elements of that set can also be stored on disk ? 14:46:45 yes. 14:47:12 and what happens when someone does, elem `isElem` mySet 14:47:42 is it going to have to temporarily load each saved item from disk to see if the elements are equal? 14:50:17 i am not sure why i would like to do this 14:50:25 maybe i don't want this sort of functionality 14:50:28 the system you seem to be describing, ram -> disk -> ram, seems very much like the way RDBMS systems work 14:50:50 i have described ram->disk 14:51:03 sorry 14:51:34 I am still unclear how your approach would actually work 14:51:34 or rather, ram->ram->disk where the data looks like this: ram(inside our app, not accessible generally) -> ram (accessible in the store) -> disk (accessible in the store) 14:52:02 whereas you have described ram (inside our app, not accessible generally) -> disk (accessible in the store) -> ram (accessible in the store) 14:52:34 the problem is that before the item i need high performance on, i.e. the first blog post, is accessible in the ram, it has to go into the hdd, and out of it, and that's a huge overhead 14:53:14 and most likely, before the ram cache propagates, thousands/millions/billions of views will hammer the disk first 14:53:50 well, if you want durability, the data has to be logged to the disk first anyway, right ? 14:56:49 no, durability can be ensured otherwise 14:57:35 durability has not got much to do with the medium data is on 14:58:12 so, in your application code, when you write the code that retrieves and displays a blog post, are you going to have to distinguish between get a post that is in RAM, and one that is archived to disk? 14:58:27 maybe I'm completly wrong and off topic here as a reall happstack newby, but what seems the point here is that you let macid do the work with sessions and such for you, and manually implement storing and loading blog posts? 14:58:34 stepcut: no, i would like happstack to do it 14:58:51 cheater: and how will happstack know what data it ought to archive to disk? 15:00:19 stepcut: via some code i will write that will provide a metric that explicitly decides one option or the other. the action of putting stuff on the disk should be code i write but 'in happstack'. the retrieval mechanism could also be code i write, but it will be 'in happstack'. 15:00:45 i.e. if the metric is provided, but the disk io code is not there, it will be all in memory and it will work just normally 15:00:54 well, being 'in happstack' doesn't mean much 15:01:01 whereas if the disk io is provided and the metric is not, then the same thing will happen 15:01:34 stepcut: it means a lot, because 1. then this sort of approach will be reusable for other people who want to do it 2. i will be able to switch to a different implementation because my application is separated from the way happstack works 'internally' 15:02:08 i.e. maybe in the future i would like to store the data in a NAS rather than on local disk, or something equally weird, who knows 15:02:39 i wouldn't like my architecture options to be limited by a badly written application, so i would like to make this invisible behind the happstack api 15:06:38 well, the happstack-state just stores a single haskell value. Not every a list of haskell values, or a table, or anything. Just a single haskell value. Fortunately, that value can be any type which is it possible to serialize (so, it can't be a type that contains functions, etc). 15:07:54 so, the happstack-state mechanism itself can not really know how to arbitrarily serialize some parts of some arbitrary value to disk. But, you can use the state mechanism in combination with some a specialized type and some helper functions to store some data on disk and some in ram 15:08:21 so it is certainly quite possible to build a blog post storage mechanism that behaves the way you want 15:08:37 and it could be implemented in a fairly general way so that you could use it for other things as well 15:10:26 for something like a blog post, there is often a bunch of meta data associated with it.. when was it posted, who is the poster, how many times has it been viewed, etc. One thing to decide is if you want to keep that data in RAM, so that you can quickly and easily search it, etc. 15:12:18 if you look at the scaling problems that companies like facebook, amazon, etc, have had, it seems like a lot of there difficulties have come from not really knowing if the things they want to work with are really in RAM or not. 15:12:23 you can do granular archival control by splitting into separate lists: one list for the meta data, one for the content. 15:12:38 the content list would go to disk fairly quickly, the meta data one could happen much later. 15:13:23 their problems come from not being able to scale RDBMS storage 15:13:34 my reading of, 'They go on to argue that a system designed around in-memory storage with disk just used for archival purposes would be much simpler, more efficient, and faster.', is that 'archival' data should be a distinct barrier. Not something that is transparently accessed... 15:13:52 yes 15:13:56 well 15:14:15 yeah. for one thing, with modern web apps you can have lazy search results 15:14:40 i.e. 'slow media' report search results later, and those get loaded in via ajax, or via pagination 15:14:54 sure 15:16:28 archiving things to the local machine is also, perhaps, not a very good idea 15:16:45 let's say your site is so popular you need 5 servers to host it 15:16:49 i don't see why not but i don't see why yes 15:16:57 so i am partial to agreeing 15:17:30 do you want the archived data on all the machines? Perhaps the archived data should only reside on one machine that acts as a web service provider 15:18:24 certainly much slower, but it is centralized, seldom accessed, and perhaps that is better than all 5 of the other servers having their own copies of the old archived data? 15:19:12 the servers could just have a local blog post cache 15:19:51 so if an old articular suddenly got popular, it would automatically end up in the RAM cache on the main servers 15:21:07 if you use centralized logging for durability, then your front end servers would not have any data stored on them .. they may not even have disks at all 15:21:28 and if your site gets a ton of traffic you could just fire up some extra servers? 15:30:13 i think i would go for sharding with borders shared across nodes 15:30:20 and the shard would contain its own archive 15:30:32 i can do the sharding in my application, that's fine with me 15:30:54 yes 15:31:56 everything is ultimately, 'in your application'. Right now you would have to write your own sharding library, someday we will actually provide one :) 15:35:26 i see happstack as external to my application 15:35:38 i.e. if i want to change something in happstack, i probably want everyone to be able to use it 15:36:45 sure 15:36:49 btw, stepcut: how does happstack work on windows? i would like to check it out on my home pc, i don't have a linux box that i could dedicate to it right now 15:38:40 cheater: windows support is considered essential. 0.4.1 was successfully built on windows, and no one has reported any windows bugs yet 15:38:52 great 15:39:22 if you use cabal install you may need to install happstack-data with the -O0 flag if you get linker errors against syb-with-class 15:39:31 that is a cabal bug though 15:39:37 happstack looks really interesting for me. i'm glad i decided to learn haskell. i'm still reading through the first tutorials, but it looks easy enough. 15:39:51 ok 15:40:05 happstack is neat stuff 15:40:20 =) 15:40:39 though pinning down what exactly is happstack and what is not can be a bit tricky :) 15:41:04 so happstack has its own web server. do haskell/happstack contain some useful tools for my every day web needs, like say, templating, views, or whatnot? 15:41:18 for example, happstack-ixset is just a Set with multiple indexes. It is not really happstack specific in anyway... it's just very useful in happstack applications 15:41:54 happstack itself does not contain any templating. But it integrates nicely with Text.XHtml, HStringTemplate, and HSP 15:42:29 I prefer HSP myself, but not everyone does.. in happstack/happstack/templates/project there is an example of using HSP 15:43:50 i'm a bit anxious to get going with haskell 15:43:56 :) 15:43:58 i'll need to keep reading the tutorials 15:44:13 i'm reading 'learn you a haskell' 15:44:19 but it seems humongous 15:44:22 I heard that one is good 15:44:28 it's jolly 15:44:33 which makes it a breeze to read 15:44:37 :) 15:44:56 in comparison reading the mysql manual is the most terrible experience ever 15:45:10 it's the most inaccessible, dry text i have ever read 15:46:08 :) 15:46:35 stepcut, do you know some big projects that have used happstack? 15:46:43 big as in, high concurrency, lots of visitors, etc 15:47:31 cheater: no 15:48:10 ow 15:48:24 it would be nice to be able to refer to success stories, see how people did thigns 15:48:50 yes it would :) 15:49:39 I think people are having more trouble with the 'getting lots of visitors' part than the handling them part ;) 15:50:20 even if you get 1000 unique page views per day.. that is hardly high concurrency 15:50:32 yes 15:51:20 even 100,000 hits per day is barely more than 1 per second 15:52:28 yes 15:53:53 maybe happstack.com should build a dating site 15:54:07 to demo the technology, and to generate some revenue to fund development ;) 15:54:58 lol 16:30:11 stepcut: if you want to demo the technology, first of all you should fix package in hackage =) 16:30:45 dating site? 16:30:47 better a porn site 16:31:11 ☺ 16:32:34 burp: a dating site would probably pay better 16:34:09 burp: every aspect of running a porn site sucks.. the legal aspects, the billing aspects, the fraud aspects, getting content, getting traffic. 16:35:00 and seems less interesting from a technology showcase.. mostly you just need raw bandwidth and sendfile() 16:36:32 with a dating site you get user generated content, you can avoid billing all together, more opportunities to show off technology, integration with asterisk, xmpp, etc. 16:36:50 still have to get traffic and deal with spammers though 16:37:15 what about user generated content on porn sites 16:37:17 :-) 16:37:28 cheater: USC 2257a 16:37:50 what's that now :-) 16:38:07 ahh, record keeping 16:38:16 well, nobody forces you to store this in america 16:38:19 cheater: you have to keep records on file that the FBI can investigate any time they want with out prior notice for every photo on the site 16:38:34 you can keep that in germany 16:38:42 or denemark 16:38:46 or where ever :-) 16:39:28 stepcut, do you know anything about function-level programming? 16:40:29 cheater: the problem is you have to collect them.. which is difficult for user generated content 16:40:37 function-level programming? 16:40:41 yeah 16:40:48 I am not sure what you mean by that.. 16:41:02 http://en.wikipedia.org/wiki/Function-level_programming 16:42:30 no 16:42:38 aside from what I just read.. 16:45:38 i'm wondering if this is similar to writing fully lambda lifted programs 16:48:14 cheater, did you try reading http://book.realworldhaskell.org (saw you talking about tutorials earlier on) 16:50:45 Muad_Dibber_: nope! 16:50:53 Muad_Dibber_: i'll check it out, thanks a lot 16:55:11 and probably I'm just old news again, but I noticed blog.happstack.com isn't working :-) 16:56:57 Muad_Dibber_: yeah, mae decided not to pay to renew the vanity name for the livejournal account associated with blog.happstack.com 16:57:09 but we didn't come up with a plan as to what to do instead 16:57:14 I guess I could remove the link for now :p 16:57:38 yeah dead links like that look so abandonned :-P 16:59:25 maybe you should build blog about happstack using happstack? 17:00:32 or take existent one 17:00:45 wordpress rather 17:03:14 02 Feb 17:01 - update link to happstack blog (Jeremy Shaw) 17:20:30 Thanks for the link to the guestbook project stepcut, its quite clarifying :-) 17:21:29 no problem 17:23:00 ok, the link is fixed now :) 17:26:53 nice 18:22:58 hi again 18:23:47 hello